EE 342 | Simpson's Paradox

Simpson’s Paradox

The official description of Simpson’s paradox is: “a phenomenon where associations between variables in a population emerges, disappears or reverses when the population is divided into subpopulations.”

It is best illustrated from a study of gender bias in grad school admissions to UC Berkeley in 1970s. This is a famous story, and often there is a claim of a lawsuit for gender discrimination. In reality, there never was a lawsuit, but it is a striking example nonetheless.

UC Berkeley admissions data

Here is some data I dug up from Wikipedia:

All applicants	Admitted	Male applicants	Admitted	Female applicants	Admitted
12763	41%	8442	44%	4321	35%

So here, it appears that women were admitted at a lower rate than male applicants. Yet, let us break down the data into individual departments. You can see the full dataset here, as well as a brief video describing the same. The following is a snapshot of three departments:

Department	All apps	Admitted	Male apps	Admitted	Female apps	Admitted
A	933	64%	825	62%	108	82%
B	585	63%	560	63%	25	68%
C	918	35%	325	37%	593	34%
Total	2436	53%	1710	58%	726	42 %

When you see the total numbers, (58% acceptance for men, 42% for women), there is a strong indication of gender discrimination against women. Yet, split into departments, there isn’t such a damning story. In the full UC Berkeley database, a similar story reveals itself.

You can even recreate an example where every department treated women applicants more favorably, but the overall totals indicate a higher acceptance rate for men. Can you tweak the numbers in the table above to do so?

What is happening here is that the male applicants seem to be applying to departments with higher acceptance rates, while female applicants are applying to departments with lower acceptance rates. So even if each department seems to be treating the applicants equitably, it appears from aggregation that males are being treated favorably.

Delta Variant Covid-19 cases in UK

In this dataset, you have a comparison of death rates for 286,188 infected patients from UK. When you look at the overall death rates, vaccinated patients have a higher death rate than unvaccinated death rates. But in each age category, vaccinated patients have a lower death rate than unvaccinated patients. The reason for the paradox, of course, is that the vaccinated patients in this data set come from a high risk, older age category, while the unvaccinated patients come from a lower risk, younger age category.

In philosophy

In fact, this phenomenon is not really uncommon. It just indicates that a correlation may happen without a causal effect (or the “obvious” causal effect). When you think about it, the bigger question seems to be why we think this is a paradox or unintuitive. Here is a more philosophical take on the Simpson’s Paradox.