In the last week, a Danish research study that tried to study the effectiveness of face masks has gotten a lot of attention in the media. The study had been designed so that it should have shown a statistically significant effect if wearing a face mask reduces the risk of COVID-19 infection by 50% or more.
Note that there are a lot of words in italics in the previous sentence. All of these are very important to understand what the outcome of the study really means - and what it does not mean. There have been many articles and posts that explain some of the shortcomings of the study, but many of these miss some very important points. Let's have a closer look, starting with the results.
Study results: Face masks reduce PCR-confirmed infections by 100%, and doctor-confirmed infections by 50%
When looking at a scientific study, the first thing to do is to look at the data. The important results are given in Table 2 of the study:
Let us start with the last two lines of the table (we'll spend plenty of time on the first lines later!). The second-to-last line shows how many study participants had a positive PCR test for the COVID-19 virus. This is the "gold standard" for diagnosis. A positive PCR test is required to be counted as a "confirmed case" in the US and most other countries. The study showed that zero people in the "Face Mask Group" had a positive PCR test for COVID-19. In the control group that did not use face masks, there were 5 confirmed COVID-19 cases.
So, based on the "gold standard" test, the use of face mask prevented 100% of COVID-19 infections in the study!
If we go on to the last line, which shows the number of participants that have been diagnosed with COVID-19 by a health care provider, the picture changes a bit: 5 participants wore face masks were diagnosed with COVID-19, compared to 10 participants in the "no mask" control group. This means:
Judging by the actual diagnosis from health care providers, face mask use reduced COVID-19 by about 50%.
But that's not what the study claimed, some astute readers may point out. And this brings us to the first lines in the table which we have ignored so far, which describe the results of antibody tests and the "primary composite end point". Which warrants some explanation.
Reading through the paper and the 88-page long supplementary material carefully, we learn that the study heavily relied on "dip stick"-type antibody tests that the participants did at home. The tests work pretty much like pregnancy tests, except that instead of peeing on the stick, you have to put a couple of drops of blood on the stick; and instead of a "+' sign, a positive test gives two lines, as opposed to a single line for a negative test.
The study also sent all participants two swab kits for PCR testing, and instructed them to use the kit and send the sample to a lab for PCR testing if they should develop any COVID-19 symptoms. In addition, participants with symptoms where instructed to seek medical help.
The "primary composite endpoint" now takes the combination of PCR test results, antibody test results, and confirmed medical diagnoses. Any participant who is positive in any of these three results counts towards the "composite endpoint". Participants with a positive antibody result at the beginning of the study were excluded from the analysis.
Looking back at the results table, we see that the antibody results dominate the overall results. In the face mask group, the number of positive antibody results is more than 6-fold higher than the number of confirmed diagnoses. This raises an immediate red flag. One potential reason for this discrepancy is that some participants had asymptomatic infections. However, asymptomatic infections typically account for about 50% of all COVID-19 infections, and all symptomatic patients should have received a confirmation by PCR or from health care providers. Therefore, the number of positive antibody tests should only have been about 2-fold higher. This is a clear indication that the antibody results are possibly very wrong.
Scientists familiar with COVID-19 antibody tests will immediately think about false-positive test results. According to the study, the manufacturer indicated that 0.8% of tests will give a false-positive result; for about 2,500 participants in each group, that would be about 20 false positives. But let's ignore false positives for the time being, and look at a different issue: timing.
Timing is crucial. Timing is crucial.
Yes, timing is crucial in more ways than one for this study. Let me explain.
In the study, participants did a COVID-19 antigen test at the beginning of the study, and then again at the end of the study about 30 days later. Anyone who tested negative at the start, and positive at the end, must have gotten infected during the study period, right? Wrong! Very wrong!
As plausible as the "trivial" conclusion seems, it completely ignores what we know about how long it takes to develop antibodies. A brief visit to the CDC web page about antibody testing for COVID-19 shows that "Antibodies most commonly become detectable 1–3 weeks after symptom onset". Let me illustrate this with a figure from a blog post on this topic:
In this study, it took about 10 days after the first COVID-19 symptoms before half of the antibody tests gave positive results, and about 2 weeks before close to 100% of the patients had antibodies. Another study showed slightly shorter times, but also showed that it took more than 4-5 days after symptom onset before antibodies were detectable. We also know that it usually takes another 5 days after infection before the first symptoms appear, and in some cases up to 2 weeks. This means that it will take more than 10 days after infection before antibody tests are positive.
In the context of the Danish study, this means that any participants who got infected within about 10 days before the study start gave a negative result in the first test, but most of them had a positive result in the second test.
Things get worse when we look at a second timing effect: the change in COVID-19 infections in Denmark before and during the study. This is where it gets a bit more complicated. The reported number of confirmed cases peaked on April 9, just before the study started around April 15, and then decreased quickly. But testing increased rapidly after April 19, more than tripling by April 30. One way to eliminate the effect of testing availability and changes is to calculate the actual number of infections from reported COVID-19 deaths. This is shown in the next graph:
The study was done in 2 separate groups starting 2 weeks apart, shown by the blue and red shaded areas. The graph shows a very rapid drop in daily infections from about 2,000 per day to about 500 during the first week of the study, and further drops later. This means that at the start of the first study period, there was a relatively large number of Danes that had been infected in the preceding 10 days. They had not yet developed antibodies to COVID-19 when tested at the start of the study, but would test positive at the second test a month later. This means that the test would wrongly count many infections that occurred before the study began!
We can estimate the actual number of infections that happened in the 10 days before each of the 2 study periods, and compare it to the number of infections during the study period:
The numbers show that there were almost as many infections (22,886) in the 10 days before the study periods as there were during the 30-day study periods (24,943). Since the pre-existing infections could not be affected by face mask wearing, this created a major distortion, increasing the reported infections in the face mask group significantly.
The numbers shown above are calculated for the entire Danish population of about 5.8 million. The groups in the study were about 2,500 participants each, which we can use to calculate the expected number of cases for the face mask group and the control ("no mask") group in the study:
- For the "no mask" group, the expected number of cases is 20, consisting of about 10 cases infected in the 10 days before the study period, and 10 cases infected during the study period.
- For the face mask group, we also expect 10 cases from the 10 days before the study, but the number of cases during the study would be reduced by 50% to 5 cases, so we would expect a total of 15 cases in the face mask group.
Given the design of the study, we would expect to see just 5 fewer cases in the face mask group than in the control group even if faces masks reduce the infection of wearers by 50%. The observed number of cases would be 15 in the face mask group, and 20 in the control group. The observed reduction would be smaller than the expected 50% reduction due to 2 effects:
- the large drop of infections at the start of the study period
- the fact that only antibody-tests, but not PCR tests, were done at the start of the study period (PCR tests give positive results much earlier than antibody tests, often within 2-4 days after infection)
Comparing actual and expected results
The calculations above show that we would expect 15-20 positives in the two groups, with just 5 cases difference between the groups. Note that we had to make some assumptions, for example about the fatality rates, and that the estimates may be off by a factor of 2 - but not much more.
In the study, the authors reported 10 cases of COVID-19 in the control group that were confirmed by health care providers, and 5 cases in the face mask group. Diagnosis are generally only made for symptomatic cases, which are typically estimated to be about 25-50% of total infections; thus, there is very good agreement between the expected and reported numbers.
The number of positive antibody tests reported is slightly higher, between 31 and 37 for IgM and IgG. The difference between the face mask group and the control group is less pronounced than for the diagnosed cases, which is exactly what we would expect as the effect of including participants that had been infected before the start of the study, but who had not yet progressed to a detectable antibody response. The antibody numbers are roughly 2-fold higher than the expected numbers. Two issues that may have contributed to this difference are false-positive antibody results, and a higher infection rate in the study participants, who spent on average 4.5 hours outside their home each day, relative to the general population.
But while we are seeing good agreements between the expected and the reported numbers, the agreements we see are just qualitative. Due to the relative small number of cases, and the "contamination" from infection before the start of the study period, it is unlikely that the results reach the typically required levels of statistical significance. For that, the study would have had to be substantially larger, and ideally also have included a PCR test at the start of the study period.
What went wrong
When designing the study, the authors determined the number of study participants they needed based on an estimated infection rate of 2%, which was reasonable at that time. The authors also were looking for a relative large reduction effect of 50%; to see a smaller effect, a larger study would have been necessary.
However, by the time the study started, the interventions initiated by the Danish government had taken effect, and reduced the number of daily infections by more than 2-fold, with an additional 10-fold reduction by the time the study ended. The overall calculated infection rate for the combined study periods was less than 0.5% - roughly four-fold lower than what had been expected. This resulted in lower case numbers - number that are too low to give a statistically meaningful result. To some extend, the study became a "victim" of the success that Denmark had in containing the COVID-19 pandemic in the spring.
A second factor that contributed to the lack of a "clear signal" from the study was that the authors apparently did not consider the lag time between infection and the begin of a detectable antibody response. In their defense, the data that describe the timing of the antibody response were probably not available when the study was designed. Furthermore, the effect would have been significantly lower if the infection numbers had still been increasing, or at least stable. Nevertheless, the lack of any discussion of the "antibody delay" effect on the study results in the publication is somewhat disappointing. With proper consideration of this effect, the data produced by the study are not only compatible with a 50% protection from wearing face masks - they actually are in agreement, even if they may not be "statistically significant" due to the factors discussed herein.