As an introduction, a summary of the history behind this blog post:
- On September 21st, Romy van der Lee and Naomi Ellemers published a paper in PNAS in which they claim to have found compelling evidence of gender bias against women in the allocation of NWO Veni-grants in the period 2010-2012.
- The day after, I posted a blog post in Dutch criticising this study (and they day after that an abridged version in English). In these posts, I explained how the significance of the result is due to Simpson’s paradox – thus a statistical artefact rather than true evidence for gender bias. This blog post sparked an amount of public interest which was new to me. I normally publish on linear algebra, (minor) improvements to statistical procedures and other topics that are generally regarded as boring. This time, I’ve been interviewed by Nature, Science and various Dutch academic newspapers. (Great evidence on how post-peer review and blog posts are Science 2.0 – but that’s another topic).
- Last week, an abbridged and updated version of my blog post appeared as a peer-review letter in PNAS.
- Independently, Beate Volker and Wouter Steenbeek had their letter published in PNAS a few days later.
- Van der Lee and Ellemers responded to both letters (response 1 and 2). In their response they misinterpret the consequences of the Simpson’s paradox. I wasn’t planning on responding again – my time is limited – but since they repeat this incorrect interpretation in multiple responses as well as in the newspaper, I find it important to outline why their statistical reasoning is flawed.
In this blog post I will outline that a correct interpretation of Simpson’s paradox results in insignificance of many p-values and not just the one I focussed on in my criticism. In their response to my letter, Van der Lee and Ellemers wrote:
“Further, Simpson’s paradox cannot explain that fewer women than men are selected for the next phase in each step of the review procedure”.
In their response to Volker and Steenbeek, they phrased this as:
“Simpson’s paradox also cannot account for the observation that in every step of the review procedure women are less likely than men to be prioritized.”
With this response, they refer to Figure 1:
It is clear from this figure that the gender bias seems to increase in each step of the process. It is true that I, in my letter, focussed on gender bias in the final step – the number of awarded grants. This, however, was due to the word count limit that PNAS imposes and not because the other steps cannot be explained by Simpson’s paradox as well: they can.
It is easier to show this through a constructed example, rather than the true NWO data. Suppose that the setting is as follows. The funding agency has two research disciplines, A and B. Both receive 100 applications and through three stages (pre-selection, interviews, awards) it is decided who gets funded. In neither field A nor field B gender bias is present: gender is no issue in this example. However, the percentage of applications by women differs per field, and so does the amount of applications that receives funding.
Field A receives 100 applications: 75 by men and 25 by women. Finally, 40 applications will be funded. So 60 applicants receive bad news, which is equally distributed over the three steps: in each step, 20 scientists will be disappointed. In the case of total absence of gender bias (and coincidence), this leads to the following table:
|Field A||# M||# F||% M||% F|
|Step 0: Applications||75||25||75%||25%|
|Step 1: Pre-selection||60||20||75%||25%|
|Step 2: Interviews||45||15||75%||25%|
|Step 3: Funding||30||10||75%||25%|
As you can see, in each step the gender ratio is 75%-25%. No gender bias at all.
Field B also receives 100 applications: 50 by men and 50 by women. Out of these 100, only 10 will be funded: in each step 30 applications lose out. This leads to the following table:
|Field B||# M||# F||% M||% F|
|Step 0: Applications||50||50||50%||50%|
|Step 1: Pre-selection||35||35||50%||50%|
|Step 2: Interviews||20||20||50%||50%|
|Step 3: Funding||5||5||50%||50%|
Thus also no gender bias in Field B. If we combine the tables for fields A and B (by simply adding up the frequencies for each cell), we obtain:
|Field A + B combined||# M||# F||% M||% F|
|Step 0: Applications||125||75||62.5%||37.5%|
|Step 1: Pre-selection||95||55||63.3%||36.7%|
|Step 2: Interviews||65||35||65.0%||35.0%|
|Step 3: Funding||35||15||70.0%||30.0%|
Converting these percentages into a graph similar to Van der Lee and Ellemers’ Figure 1 provides:
The pattern from the table and figure is very clear: in each step of the process men seem to be favoured at the cost of women. Although the percentages for this example are obvious different than those from the NWO-data, the type of pattern is the same. Since in my example there is no-gender bias whatsoever, Van der Lee and Ellemers’ claim that “Simpson’s paradox also cannot account for the observation that in every step of the review procedure women are less likely than men to be prioritized” evidently is false. The power of paradoxes should not be underestimated.
As a final note: as outlined above, the significant results claimed by Van der Lee and Ellemers is lost once correct statistical reasoning is applied. It is important though to realise that the absence of significant gender bias does not imply that there is no gender bias. There could be and it is important to find out whether – and where! – this is the case or not. To conclude, I quote Volker and Steenbeek, who write:
More in-depth analyses with statistical techniques that overcome the above-mentioned issues are needed before jumping to conclusions about gender inequality in grant awards.
5 thoughts on “One more time: NWO, Gender Bias and Simpson’s Paradox”
The truth of this can also be seen from first principles.
If Simpson’s paradox is in effect at T3 (“awards time”), why should it not be in effect at any earlier time point (T0=applications, T1=pre-selection, T2=interviews)? After all, there is nothing special about “making an award” versus “selecting people for an interview” that would change things.
However, this only works (and your simulation only gets the same relative slope as the NWO data) when the field that has a relatively higher percentage of female applicants is the one that has the highest rejection rate *at each stage*. For example, if the second field accepts almost everyone for an interview, but doesn’t actually end out handing out many grants, it might look as if women are being invited to interviews in a non-discriminatory way (or, if you want to be very political about it, is trying extra-hard to *appear* to treat women fairly), but that discrimination is then setting in after the interviews.
Indeed, in my example the drop-out in each of the three steps was the same. I chose this because of simplicity reasons and because it’s not too far from current practice, but I could have emphasised better that this is a choice. If you would invite *all* applicants to an interview, then obviously Simpson’s paradox wouldn’t appear in that step; and if you would invite *nearly all*, it would appear to a much lesser extend.