5-Simpson’s Paradox

This is the fifth and last of a sequence of 5 post on Simpson’s Paradox. In previous posts, we have discussed the paradox in the context of college admissions and for batting averages. In this post we discuss how Simpson’s Paradox works when evaluating the effectiveness of drugs in treating diseases. The paradox takes the form that the drug seems to work for the population – recovery rates are higher for drug takers and lower for those who do not take the drug. HOWEVER, when we divide the population into subpopulations, we may find that the drug is bad in ALL subpopulations. For example with subpopulations being males and females, we may find that the recovery rate of females who take the drug is lower than the recovery rate of females who did not take the drug. Similarly, the drug lowers the recovery rate for males as well. So the paradox is: the drug is BAD for females, and BAD for males, but good for the general public (without reference to gender). How can this be? Understanding this paradox required working through the causal structures underlying the data.

Simpson’s Paradox for Drug Recovery Rates.

We now present another example of Simpson’s Paradox which brings out some other kinds of causal chains. Suppose a new drug is being tested as a treatment for a disease. One group of patients, known as the “Treatment Group” is given the drug. A second group of patients, known as the “Control Group” is not given the drug. We find that the recovery rate from the disease is 56% in the Treatment Group, and only 44% in the Control Group. Thus it seems that the drug is beneficial; it increases the recovery rate from 44% to 56%. However, when we break down the Treatment and Control Group by Gender, we find rather different conclusions

This drug seems to be good for the population as a whole – it increases recovery rates from 44% in the Control group which did not take the drug, to 56% in the Treatment Group which took the Drug. But when look at Males separately, we find that among Males, the recovery rate was 60% in the treatment group, and 80% in the control group. Taking the drug REDUCED the recovery rate among males from 80% to 60%, causing significant harm. Similarly, for Females, taking the drug REDUCES recovery rate to 20% from 40%. This leads to the Simpson’s Paradox. The Drug is GOOD for the population as a whole, but it is BAD for males, and it is BAD for females! How can this be? A Causal Diagram can help us to understand this paradox.


Taking the Drug or Not Taking the Drug is a causal factor for Recovery, as the arrow shows. But Gender is ALSO a causal factor for recovery. Being female leads to POOR chances for recovery (20% with drug or 40% without drug). Being Male leads to BETTER chances for recovery (60% with drug and 80% without drug). In the population as a whole, recovery rates in the treatment group are affected by TWO factors, Gender and Drug. Taking the drug LOWERS the recovery rate, but having a high proportion of males INCREASES the recovery rate in the Treatment Group. In the Control Group, large proportions of female LOWERS the recovery rate, so that it seems that the Treatment is beneficial. Actually, the drug is harmful, but this harm is concealed by the high proportion of males, which increases the recovery rate in the treatment group.

This is a classic case of CONFOUNDING. GENDER is a confounding variable. It is EXOGENOUS – not determined by either taking the drug or by recovery rates. It affects both the choice of whether or not Drug is taken, and also affects the recovery rates. Women are very likely to NOT take the drug  (1800 W vs 200 M) in the control group, while Men are very likely to take the drug (1800 M vs 200 W) in the drug treatment group. The standard REMEDY for confounding is to CONDITION on the confounding factor – that is, hold it constant, to prevent it from affecting the recovery. Once we hold gender constant, we find the effect of the DRUG ONLY (purged of gender effects) on the recovery rates. We then find that the drug LOWERS the recovery rate in both males and females and therefore is harmful for everybody. The apparent beneficial effect comes from the GENDER effect on recovery – putting in a lot of males into the Drug Treatment group makes it seem as if the drug is having a beneficial effect. In fact, males have good recovery rates relative to women, so having more men is the cause of the higher recovery rate in the drug treatment group.

To see how changing the causal sequencing can completely change the analysis, we consider the same data set for drug treatment, but replace GENDER by Blood Pressure. While Gender is obviously exogenous and cannot be affected by taking drugs, the blood pressure CAN be affected, and so it is not necessarily exogenous. Now consider the following data (same numbers as in the previous example with Gender)

Here we have a situation where the drug in the overall population increases recovery rates from 44% to 56%. However, if we split the population into two types – those with normal blood pressure, and those with low blood pressure – then a different picture emerges. In the subpopulation with low blood pressure, recovery rates are high without the drug, and taking the drug REDUCES the recovery rates. The same thing is true of the normal blood pressure population.   This is a case of Simpson’s Paradox, but the causal sequencing is very different, and therefore the data analysis is very different. Whereas gender is exogenous in the previous example, because gender cannot be affected by drugs, the Blood Pressure is endogenous – it is affected by the drug. Thus the causal diagram is now the following:


Because the Blood Pressure is not an exogenous variable, it is NO LONGER a CONFOUNDER. Instead, the drug action is MEDIATED by the blood pressure. That is, the strength of action of the drug is partially related to blood pressures, and the drug also affects the blood pressure. To understand the causal picture correctly, it is useful to consider a SIMPLER example, where the drug acts SOLELY through the blood pressure and has no direct effect on recovery at all. Suppose that in people with normal blood pressure, the disease Is deadly with recovery rates of only 40%. However, among the population with LOW blood pressure, the recovery rate is very high at 80%. Low Blood Pressure creates a strong protective tendency against this disease. Noting this, suppose that doctors recommend a drug which lowers blood pressure (but has NO OTHER effect).  The causal picture for this setup would be:


In normal population, 90% of people have normal blood pressure and 10% have low BP. Recovery rate is 40% among the normal population and 80% in the Low BP so total recovery rate is 90% x 40% + 10% x 80% = 36%+8%=44%. The drug for blood pressure lowers BP in the normal BP people so that if everyone takes the drug, then 90% get low BP, while only 10% are not affected by the drug and maintain normal BP. After the drug is given, the recovery rate becomes 90% x 80% + 10% x40% = 72% + 4% = 76%. So the drug, which has no direct effect on the disease, works by lowering BP and is highly effective. The general population recovery rate of 44% is increased to 76%.

To match the numbers of our table, and to explain the WHY of Simpson’s Paradox, we need to consider a more complicated situation. Suppose the drug lowers Blood Pressure, which helps to increase recovery rate, as before. BUT suppose the drug itself has a toxic side effect. The drug reduces recovery rates from 40% to 20% among the normal blood pressure population. It also reduces recovery rates from 80% to 60% in the low BP population. Now our table matches the first causal diagram, and has the following interpretation. The drug has a harmful direct effect on recovery. However, the drug lowers the Blood Pressure, and this has a highly beneficial effect on recovery. The sum of the two effects is positive so that recovery rates after drug treatment go up to 56% from 44%.

Note the dramatic difference in analysis between the two cases. When GENDER is the confounder, then the right result is obtained by CONDITIONING on the confounder, and considering the rates separately in the two subpopulations of males and females. Then we come to the conclusion that the drug is harmful. When BP is a CHANNEL for the action of the drug, then BP is no longer a confounder, and we come to the conclusion that the total effect in the general population is the right measure, and so the drug is actually beneficial overall.

It is of great important to realize that the numbers stay the same throughout all of these analyses. It is the STORY behind the data, the HIDDEN real world structures which generate the data, which change. The causal path diagrams ARE maps of these hidden real world structures. Central econometric concepts of exogeneity and endogeneity, as well as confounding and whether or not to condition on a confounder, all depend on the hidden causal structures. Conventional econometric analysis ignores these causal structures and hence generally comes to the wrong conclusions based on superficial analyses.

This is end of our sequence of posts on Simpson’s Paradox. The goal of these posts was to explain how hidden real-world causal structures, which are not captured in observable data, can nonetheless dramatically affect the data analysis. Exactly the same data set can convey radically different messages depending on differences in the causal structures which generated the data. The message is that we must re-build econometrics from the ground up. We must FIRST explicitly introduce causal structures, and then SECONDLY do data analysis conditional on the causality assumptions. It is impossible to do reliable data analysis without having a clear picture of the causal sequences underlying the data. Econometricians have avoided doing this, because positivist prohibition of investigating, or even talking about, unobservable structures. Causality is fundamentally unobservable, as was already noted by Hume. Nonetheless, despite being intrinsically unobservable, it does have strong implications for our data analysis, and cannot be ignored.

2 thoughts on “5-Simpson’s Paradox

  1. Beside my comments posted just now on the previous post, I’d like to add that the series of posts stress the importance of theory, favoring Einstein’s idea. Yes, conceiving theories indoor is sometimes the easier or more productive way to the success of research. The idea is much consistent with the ideas of a priory or transcendentalism, rather than contrary to Kant. Economics currently suffers two contrary illnesses, one is the extreme theorization as the Neoclassical does, which wrongly thinks that the world can be totally theorized hence no space for empirical researches; The other is the extreme empirism (as the author criticized) as econometrics does, which want to “regress” or “evidence” anything while theories forgotten. The more ridiculous is: the two contrary illnesses usually infect one paper, but superficially. The resolution should be a synthesis of them, i.e. the Algorithm Framework Theory as a theory on how a person “boundedly” thinks. Thanks! https://goingdigital2019.weaconferences.net/papers/how-could-the-cognitive-revolution-happen-to-economics-an-introduction-to-the-algorithm-framework-theory/

  2. Asad, thanks for all of these, 1-5. A bit long-winded but still lessons I consider very important for anyone working with statistics or any other quantitative techniques. It’s so easy sometimes to assume one set of numbers from one way of looking at the data is “the answer.” At the same time, however, we must also teach how to stop searching for different stories about causal patterns. We must teach plausibility. When do the stories we consider become implausible? Where is the stopping point? Otherwise our research provides no usable results. Provides no help for people.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.