# 2-Simpson’s Paradox

In the previous post (1-Simpson’s Paradox), we constructed an example where both Humanities and Engineering Departments favor women, giving them substantially greater % admissions than men. However, overall admit ratios for the university favor men, who receive greater % admissions than women. This reversal is called the Simpson’s Paradox. Once the causal basis of the paradox is understood, it becomes very simple; however, standard econometric and statistical analysis completely ignores causality, and continues to be puzzled. In a nutshell, the explanation is as follows. There are TWO causal factors which influence admit rates. One is gender – being a women is a plus. The second is department – applying to Engineering is a plus. When women apply more to Humanities and men apply more to Engineering then these two factors work at cross purposes and the adverse effect of Humanities can overwhelm the favorable effect of being woman. However, causal structures can be more complex, and call for different types of analyses, as we now proceed to discuss.

CONTINUING from previous post (1-Simpson’s Paradox) —

Berk’s idea that apparent discrimination against women (displayed in the lower overall amit ratio for women) is due to women choosing to apply to the more difficult department (Humanities) is only a conjecture. Many other possible causal factors could change this conclusion. For example, suppose that admissions are gender-blind, and based solely on grade point averages. Suppose that in Engineering admit percentages are (A:90%, B:70%, C:50%) while in Humanities the admit percentages are (A:50%,B:30%,C:10%). In this case, if the female applicant pool is divided into 60% A’s, 30% B’s and 10% C’s, while the male applicant pool has significantly lower grade profile of 10% A’s, 30% B’s, and 60% C’s, then exactly the same admissions patterns would be observed, but there would be no discrimination by gender. Then the causal question would become – why do better male candidates not apply to Berkeley, while better female candidates do? The answer could be that there is a better all-male university which is preferred by males by not available to females (for example). These are the “unobserved, real, structural” factors which are not present in the numbers being analyzed.

The main point here is that the numbers, by themselves, do not contain causal information. However, understanding the meaning of the numbers requires understanding of the causal factors. Econometrics methods currently in use are deeply flawed because they provide us with no way of inputting relevant causal information and deriving results which vary according to the causal sequencing. There is an implicit assumption in econometrics that causality does not matter, and the Simpson’s paradox shows that this assumption is wrong. Causality is of essential important in understanding numbers. For the above data set, the causal sequence hypothesized by Berk can be graphed as follows.

Gender affects choice of department, and ALSO affects the admit ratio. The Departments chosen also have an affect on the admit ratio. In this causal diagram, if we want to study the effect of department choice on the admit ratio, then Gender is a confounding variable. It affects both choice of department and the admit ratio. The SOLUTION to the problem of confounding here lies in CONDITIONING on the confounder. That is, hold gender constant, and do calculations separately for females and for males. With this conditioning, we can correctly calculate effects of departments choice on admit rates, for each gender separately. If the goal is to calculate the effect of Gender on Admit_Ratio, then Department is NOT a confounding variable. Rather, Gender acts on Admit Ratio in TWO DIFFERENT ways. One is a direct effect, and the second is an indirect affect through choice of department. Both the direct and indirect effect must be taken into consideration, and we CANNOT condition on department. If John and Jane apply to Berkeley, and we do not know which department they have applied to, then we should use the overall rates 56% and 44% for their chance of admission. This can be seen by the following tree diagrams which traces the probabilities of the various outcomes and paths. The gender affects both choice of department and admit ratios, so the decision tree diagram is drawn separately for males and females. For females, the decision tree looks like this:

By tracing the branches, we can calculate the probabilities of all the outcomes. Only 10% of Females choose Engineering, but those who do have an 80% chance of admission. The 90% females who apply to Humanities have a 40% chance of admission. Overall, Female admission probability is 90% x 40% + 10% x 80% = 36% + 8% = 44%.

Because Gender affects both choice of department AND admit probabilities, we must condition on gender and draw separate diagrams for males and females. The same diagram for males looks like the following:

Males have a 90% chance of applying to Engineering (or 90% of males choose to apply to engineering), and a 60% chance of admission in this department. For the 10% Males who apply to humanities, the chance of admission is only 20%. Note that BOTH departmental admission probabilities are LOWER than those for females. However, overall Male admission probability is 10% x 20% + 90% x 60% = 2% + 54% = 56%, which is higher than the 44% admission probability of Females. This is the Simpsons Paradox, where it seems that all departments within the university favor females, but the university as a whole favors males. The paradox arises because gender affects admissions through two channels – one is the direct channel of the admit rate which is better for females, and the other is the indirect channel of choice of department which again affects the admit rate.

EXERCISE: To understand this better, modify the numbers so that admissions is gender-blind in both departments of Humanities and Engineering. Men and Women are admitted in exactly the same proportions. However, make Humanities tougher, with lower admit ratios, and Engineering easier, which higher admit ratios. Show that if a greater percentage of women apply to Humanities while a greater percentage of men apply to Engineering, we will see apparent discrimination against women in the overall admit ratios — women will have lower overall admit ratio, while men will have higher overall admit ratios.

In the causal diagram of Berk, Gender is a confounding factor which impacts on Department Choice and on Admission Rates. If we want to study Admission Rates by Department, then we must condition on Gender, holding it constant. However, in this same diagram, Department is NOT a confounding variable when it comes to assessing the impact of gender on admissions rates. This is because choice of department is PART of the gender effect. Women CHOOSE the more difficult departments, so the choice cannot be separated from the gender. This point is a bit subtle and complex, and failure to understand it creates massive confusions in discussions of confounding. Deeper understanding can be developed by studying a variety of causal structures which can generate the same patterns of admissions, and noting how they lead to entirely different conclusions about the cause of, and the remedies for, discrimination. We proceed to analyze different causal structures for this pattern of admissions. The DEEPER point we are pursuing here is the NEED to look beneath the surface of the data. The causal structures are NOT PRESENT in the observable statistics, and yet these unobserved real structures MUST be taken into account for a sound data analysis.

NEXT POST: Alternative Causal Structures for Admissions (3-Simpson’s Paradox)