In the past weeks, the following graphic poster has been shared frequently on social media:
The small print is pretty small; it reads: “if England get beaten, so will she. Domestic violence increases 26% when England play, 38% if they lose”.
Tomas van Dijk, a journalist working for the Dutch newspaper De Volkskrant contacted me this week. He’s writing a piece factchecking these numbers and asked me for my opinion. As these numbers, 26% and 38%, are taken from a paper from 2013, he suggested to look at this years data as well. Furthermore, he noted that it is known that there is a relation between temperature and domestic violence (more violence when it is too warm), so he asked if I could have a look at the role of temperature as well.
The 2013 study
The numbers mentioned on the poster are taken from this study by profs. Kirby, Francis and O’Flaherty from the University of Lancaster, published in the Journal of Research in Crime and Delinquency in 2014. Should the paper be behind a paywall, then there might be some science hub where you can download it (you might think that this is an explicit invitation to visit sci-hub.tw, I couldn’t possibly comment).
In their paper, Kirby et al. use data from the Lancashire police department, with daily data on the number of reported domestic violence incidents. They look at the data during the World Cups of 2002, 2006 and 2010. As the numbers are count data, they employ the so-called negative-binomial regression model to model the influence of whether and how England played on the number of incidents reported.
I like the paper by Kirby et al. a lot. Not only is the methodology sound, they also are aware of the limitations of their work. For instance, the dataset contains only 14 match days (and only 3 days where England lost), and the violence data are from only one county in England. As they write in their abstract: “Although this is a relatively small study, it has significant ramifications due to the global nature of televised football (soccer) tournaments. If replicated, it presents significant
opportunities to identify and reduce incidents of domestic abuse associated with televised soccer games.” Remembering that the paper was published in 2014, the public call for replications is impressive.
The main findings of this study, are as follows:
|Variable||Relative Risk||95% Confidence Interval|
|Match day; win/draw||1.256||(1.128, 1.411)|
|Match day; lose||1.382||(1.150, 1.661)|
|The day after||1.107||(1.004, 1.228)|
The interpretation of these numbers is as follows. If England plays and either wins or draws, the model predicts 25.6% more incidents than on a match-free day; if England plays and loses, the model predicts 38.2% more incidents; and the day after a match, the risk goes up by 10.7%. Furthermore, in weekends there is considerably more domestic violence than during weekdays (the risks are relative to the values on Thursdays, the least violent day of the week).
It is important to keep in mind that England only lost 3 matches over these three World Cups, and won/drew 11 matches, so the estimates are based on pretty small sample sizes. This is reflected by the wide confidence intervals. Please also note that the model only talks about correlations, and by no means about causation.
Including the 2018 World Cup
Thus, the main thing that Tomas wanted to know – and after he told me about his idea, I wanted to know as well – is how the model would fit to the data were the current World Cup is included as well. (It would’ve been interesting to include the 2014 World Cup as well, even though England only played three matches there. However, we didn’t have the time to include those data as well.)
Tomas put in a freedom of information act request with the Lancashire Constabulary for the recent domestic violence statistics and the constabulary responded with the numbers within 24 hours with the data from June 1st up to July 11th. Thus, the data include everything up to the day where England lost the semi-final from Croatia. (It doesn’t include the data for Saturday’s match against Belgium, but as a match for 3rd place is a strange thing is sports anyway, we didn’t want to wait with our request until after the weekend). The football data for the English team this tournament is as follows:
- 18 June Tunesia 1 – England 2
- 24 June England 6 – Panama 1
- 28 June England 0 – Belgium 1
- 3 July Colombia 1 – England 1*
- 7 July Sweden 0 – England 2
- 11 July Croatia 2 – England 1
After adding the football and violence data to the dataset, I ran the same model that Kirby et al. ran, and obtained a model with as main features:
- The match day effect win/draw was still clearly present, but slightly smaller: the relative risk was 1.194, 95% CI (1.1094, 1.302).
- The match day effect after losing also became smaller: RR 1.274, 95% CI (1.110, 1.463).
- The other effects were more comparable to those found by Kirby et al.
Including temperature data
As a next step, I looked for appropriate sources for temperature data. The Centre for Environmental Data analysis has daily climate data for every 5 km-by-5 km square in the U.K. , called the UKCP09 data set. From the grid, I selected the follow square, on the north side of Preston, to be representative for Lancashire.
From this data set, I took all values of the maximum daily temperature. As the 2018 data is not yet part of this data set, I used another source for that.
This model gave the following output:
The interpretation of the 1.024 value for temperature, is that for each additional degree (Celsius), the risk grows by 2.4%. Just as in the 2014-paper, the increase in violence risk is very clear for match days. However, the estimates are smaller. Most notably, the estimate for what happens when England loses has become much smaller and now is about the same as for when England wins or draws. The reason is that England just happened to lose some games on exceptionally warm days. When they lost (after penalties) to Portugal on 1 July 2006, it was 27 degrees Celsius and on 27 June 2010, the day of the 4-1 defeat against the Germans, it was 26 degrees. (Note that these are the temperatures in Lancashire, not the temperatures of the location of the World Cup). The average temperature is about 20 degrees, so this additional “extra violent when England loses” effect could be due to temperature effects.
A more parsimonious model
Another observation from the table is that there is a lot going on there. Now that the difference between winning/drawing and losing is negligible, we can make the model more parsionious by simply measuring whether England plays or not, without looking at who wins. Furthermore, we can simplify the day-of-week effect by only looking at whether it is weekend or not: the difference between, e.g., Wednesday and Thursday are too small to be of interest.
Therefore, I reduced the model and recieved the following, more parsimonious output:
(For the stats nerds: the overdispersion parameter theta was 116.1 (note that I used R’s glm.nb function, other software would’ve estimated theta as 1/116.1. Furthermore, the AIC = 914.5, lower than the value 919.6 of the previous model which indicates that this parsimonious model fits better.)
The main conclusions of the great paper by Kirby, Francis and O’Flaherty do replicate after taking the current World Cup into account. However, (i) the difference between the effect of winning and that of losing vanishes when you take temperature into account, (ii) the increased risk of domestic violence on England match days seems to be somewhat smaller than the values 26%-38% reported by Kirby et al.: my estimate is an increased risk of 19% (with the 95% confidence interval ranging from 10% to 28%.) I have no idea why the effect was stronger in 2002/2006/2010 than in 2018, that’s something the data don’t tell you.