Thursday, November 26, 2020

Why COVID-19 Is So Hard To Fight

In this post, I will use one graph to explain why COVID-19 is so hard to understand, and therefore to fight. It shows the daily confirmed COVID-19 cases in the US, and the 5-day change in daily cases, since the end of April:


The blue curve at the bottom shows daily confirmed COVID-19 cases in the US. It shows an initial drop to about 20,000 cases, then the "summer rise" to about 70,000 cases per day in July, another drop to about 40,000 in September, and then a rise to 170,000 cases per day in November.

The red curve, which uses the y-axis on the right, shows an indication of the growth (or drop) in cases: the ratio of cases on a given day, divided by the number of cases 5 days earlier. When this ratio is below 1 (in the green section), case numbers are going down; when it is above 1 (in the light red section), the number of cases is increasing. 

The numbers are based on 5-day periods because that is the roughly the average time between getting infected, and passing the infection on to someone else. In scientific jargon, this is often called the "generation interval" or the "serial interval". One way to understand it is to remember that it typically takes about 5 days after infection for symptoms to start, and that the chance of infecting others is largest just before and just after first symptoms.

This means that the 5-day ratio is also very close to the "reproductive number", often called R. In practical terms, R indicates how many others, on average, each infected person infects. If each person infects more than one other person, the number of new infections per day grows; if each person infects less than one other person, the number of daily infections goes down. This can be easily seen in the graph, where the red curve is in the red area whenever the number of daily cases (the blue curve) goes up.

Now take a close look at the values we get for R. In May and August when case numbers were dropping, R was between 0.85 and 1. In the summer and fall periods where case numbers were increasing, R was above 1, but never higher than 1.3. Typical values around 0.9 in "dropping" periods, and around 1.2 in "rising" periods. The difference is quite small - and therein lies the problem! To understand why, we need to look at this from two angles: the "personal risk" perspective, and the "public health" perspective.

Personal risk: A small increase means very little

 When deciding what to do, in a public health crisis, the first question most people will ask is "What is the risk to me?" Depending on the answer, and on personal tolerance for risks, they may be inclined to change their behavior more or less. But regardless what exactly the answer is, everyone will have to accept a certain level of personal risk in the end.

After a few weeks or months of "being good" and, for example, staying away from restaurants and bars, the desire to go back to normal becomes stronger and stronger, and we start doing things again that are slightly more risky. That might be going to restaurants again; meeting with friends; going shopping; not wearing that face mask; or something else. But we'll generally decide that a bit more risk has to be taken. If we are young or healthy, we may well conclude a slight increase in risk still means a very low risk of getting seriously sick. Unless you're a statistician, you probably won't quantify the risk, but just about anybody would agree that a relative risk increase from 0.9 to 1.2 is so small that it's worth taking, if it means we can go back to the gym, the hair dresser, shopping, restaurants, or whatever strikes our fancy. If my personal risk was small to begin with, then even a 2-fold or higher increase in risk may well be worth it.

On a personal level, taking a bit more risk is a perfectly reasonable decision. This also is true if we consider others in our risk assessment, too - kids we send to school, other family members, or friends we meet.

Public health: Small risk increases have disastrous consequences

But what happens if everyone decides that taking a bit more risk is perfectly reasonable, and changes their behavior a bit? Say, for example, in a way that increases the risk of getting COVID-19 by just one third. What happens?

Let's assume we were in a period were new infections were dropping by 10% every 5 days, corresponding to R = 0.9. With 1/3 more infections now, R increases to 1.2: instead of a steady drop, we now have a rapid rise in new infections: 20% more daily infections after 5 days, and 44% more daily infections after 10 days (1.2 x 1.2). After a month of R staying at 1.2, the number of daily infections has grown 3-fold: just about what we saw in the US from October to November. A very small change on the individual level has caused a huge increase on the population level. 

What is a perfectly reasonable decision on a personal level becomes a public health disaster.

Small things are "driving the pandemic"

Currently, the US is just one of many countries that is failing to control the resurgence of COVID-19 infections. A common theme here is that many regions try to contain COVID-19 with a minimal set of measures, for example limited restaurant hours instead of full closures. Against many measures, an often-heard argument is that "X is not driving the pandemic". Various regions have used this argument to leave schools and colleges open, have restaurants operating with minimal or no restrictions, and so on. 

Taken literally, the arguments are correct insofar as that each individual "infection place" like schools or restaurants is not causing the majority of new infections. But even measures that eliminate just a small percentage of new infections can make a huge difference, and a few in combination can make the difference between a controlled epidemic with dropping infection numbers, and a rapidly growing, out-of-control epidemic. Therefore relaxing a few of such "minor impact" measures may well end up "driving" the epidemic from a "dropping" phase into a "rapid growth" phase. This problem is only made worse by halfhearted interventions, which drop R only just below 1.0. This means that case numbers will drop only very slowly, and rapid growth resumes quickly again after any relaxation.

Over the past six months, I have read several hundred scientific publications about COVID-19. Of all these, one of the publications that stuck to my mind the most was published by scientists from New Zealand. Apparently, it formed the basis of New Zealand's successful complete elimination of COVID-19 cases in the country. It listed a large number of interventions which were used in groups, depending on the current level of infections:

We can only hope that a similar rational approach will be used to control COVID-19 in the US and Europe over the next several months, until vaccines become widely available. Otherwise, we will see hundreds of thousands of additional avoidable COVID-19 deaths.


Monday, November 23, 2020

A Close Look At The Danish Face Mask Study

In the last week, a Danish research study that tried to study the effectiveness of face masks has gotten a lot of attention in the media. The study had been designed so that it should have shown a statistically significant effect if wearing a face mask reduces the risk of COVID-19 infection by 50% or more

Note that there are a lot of words in italics in the previous sentence. All of these are very important to understand what the outcome of the study really means - and what it does not mean. There have been many articles and posts that explain some of the shortcomings of the study, but many of these miss some very important points. Let's have a closer look, starting with the results.

Study results: Face masks reduce PCR-confirmed infections by 100%, and doctor-confirmed infections by 50%

When looking at a scientific study, the first thing to do is to look at the data. The important results are given in Table 2 of the study:

Let us start with the last two lines of the table (we'll spend plenty of time on the first lines later!). The second-to-last line shows how many study participants had a positive PCR test for the COVID-19 virus. This is the "gold standard" for diagnosis. A positive PCR test is required to be counted as a "confirmed case" in the US and most other countries. The study showed that zero people in the "Face Mask Group" had a positive PCR test for COVID-19. In the control group that did not use face masks, there were 5 confirmed COVID-19 cases.

So, based on the "gold standard" test, the use of face mask prevented 100% of COVID-19 infections in the study! 

If we go on to the last line, which shows the number of participants that have been diagnosed with COVID-19 by a health care provider, the picture changes a bit: 5 participants wore face masks were diagnosed with COVID-19, compared to 10 participants in the "no mask" control group. This means:

Judging by the actual diagnosis from health care providers, face mask use reduced COVID-19 by about 50%.

But that's not what the study claimed, some astute readers may point out. And this brings us to the first lines in the table which we have ignored so far, which describe the results of antibody tests and the "primary composite end point". Which warrants some explanation.

Reading through the paper and the 88-page long supplementary material carefully, we learn that the study heavily relied on "dip stick"-type antibody tests that the participants did at home. The tests work pretty much like pregnancy tests, except that instead of peeing on the stick, you have to put a couple of drops of blood on the stick; and instead of a "+' sign, a positive test gives two lines, as opposed to a single line for a negative test.

The study also sent all participants two swab kits for PCR testing, and instructed them to use the kit and send the sample to a lab for PCR testing if they should develop any COVID-19 symptoms. In addition, participants with symptoms where instructed to seek medical help.

The "primary composite endpoint" now takes the combination of PCR test results, antibody test results, and confirmed medical diagnoses. Any participant who is positive in any of these three results counts towards the "composite endpoint". Participants with a positive antibody result at the beginning of the study were excluded from the analysis.

Looking back at the results table, we see that the antibody results dominate the overall results. In the face mask group, the number of positive antibody results is more than 6-fold higher than the number of confirmed diagnoses. This raises an immediate red flag. One potential reason for this discrepancy is that some participants had asymptomatic infections. However, asymptomatic infections typically account for about 50% of all COVID-19 infections, and all symptomatic patients should have received a confirmation by PCR or from health care providers. Therefore, the number of positive antibody tests should only have been about 2-fold higher. This is a clear indication that the antibody results are possibly very wrong.

Scientists familiar with COVID-19 antibody tests will immediately think about false-positive test results. According to the study, the manufacturer indicated that 0.8% of tests will give a false-positive result; for about 2,500 participants in each group, that would be about 20 false positives. But let's ignore false positives for the time being, and look at a different issue: timing.

Timing is crucial. Timing is crucial.

Yes, timing is crucial in more ways than one for this study. Let me explain.

In the study, participants did a COVID-19 antigen test at the beginning of the study, and then again at the end of the study about 30 days later. Anyone who tested negative at the start, and positive at the end, must have gotten infected during the study period, right? Wrong! Very wrong!

As plausible as the "trivial" conclusion seems, it completely ignores what we know about how long it takes to develop antibodies. A brief visit to the CDC web page about antibody testing for COVID-19 shows that "Antibodies most commonly become detectable 1–3 weeks after symptom onset".  Let me illustrate this with a figure from a blog post on this topic:

In this study, it took about 10 days after the first COVID-19 symptoms before half of the antibody tests gave positive results, and about 2 weeks before close to 100% of the patients had antibodies. Another study showed slightly shorter times, but also showed that it took more than 4-5 days after symptom onset before antibodies were detectable. We also know that it usually takes another 5 days after infection before the first symptoms appear, and in some cases up to 2 weeks. This means that it will take more than 10 days after infection before antibody tests are positive.

In the context of the Danish study, this means that any participants who got infected within about 10 days before the study start gave a negative result in the first test, but most of them had a positive result in the second test. 

Things get worse when we look at a second timing effect: the change in COVID-19 infections in Denmark before and during the study. This is where it gets a bit more complicated. The reported number of confirmed cases peaked on April 9, just before the study started around April 15, and then decreased quickly. But testing increased rapidly after April 19, more than tripling by April 30. One way to eliminate the effect of testing availability and changes is to calculate the actual number of infections from reported COVID-19 deaths. This is shown in the next graph:

The study was done in 2 separate groups starting 2 weeks apart, shown by the blue and red shaded areas. The graph shows a very rapid drop in daily infections from about 2,000 per day to about 500 during the first week of the study, and further drops later. This means that at the start of the first study period, there was a relatively large number of Danes that had been infected in the preceding 10 days. They had not yet developed antibodies to COVID-19 when tested at the start of the study, but would test positive at the second test a month later. This means that the test would wrongly count many infections that occurred before the study began!

We can estimate the actual number of infections that happened in the 10 days before each of the 2 study periods, and compare it to the number of infections during the study period:


The numbers show that there were almost as many infections (22,886) in the 10 days before the study periods as there were during the 30-day study periods (24,943). Since the pre-existing infections could not be affected by face mask wearing, this created a major distortion, increasing the reported infections in the face mask group significantly.

The numbers shown above are calculated for the entire Danish population of about 5.8 million. The groups in the study were about 2,500 participants each, which we can use to calculate the expected number of cases for the face mask group and the control ("no mask") group in the study:

  • For the "no mask" group, the expected number of cases is 20, consisting of about 10 cases infected in the 10 days before the study period, and 10 cases infected during the study period.
  • For the face mask group, we also expect 10 cases from the 10 days before the study, but the number of cases during the study would be reduced by 50% to 5 cases, so we would expect a total of 15 cases in the face mask group.

 Given the design of the study, we would expect to see just 5 fewer cases in the face mask group than in the control group even if faces masks reduce the infection of wearers by 50%. The observed number of cases would be 15 in the face mask group, and 20 in the control group. The observed reduction would be smaller than the expected 50% reduction due to 2 effects:

  • the large drop of infections at the start of the study period
  • the fact that only antibody-tests, but not PCR tests, were done at the start of the study period (PCR tests give positive results much earlier than antibody tests, often within 2-4 days after infection)

Comparing actual and expected results

The calculations above show that we would expect 15-20 positives in the two groups, with just 5 cases difference between the groups. Note that we had to make some assumptions, for example about the fatality rates, and that the estimates may be off by a factor of 2 - but not much more.

In the study, the authors reported 10 cases of COVID-19 in the control group that were confirmed by health care providers, and 5 cases in the face mask group.  Diagnosis are generally only made for symptomatic cases, which are typically estimated to be about 25-50% of total infections; thus, there is very good agreement between the expected and reported numbers.

The number of positive antibody tests reported is slightly higher, between 31 and 37 for IgM and IgG. The difference between the face mask group and the control group is less pronounced than for the diagnosed cases, which is exactly what we would expect as the effect of including participants that had been infected before the start of the study, but who had not yet progressed to a detectable antibody response. The antibody numbers are roughly 2-fold higher than the expected numbers. Two issues that may have contributed to this difference are false-positive antibody results, and a higher infection rate in the study participants, who spent on average 4.5 hours outside their home each day, relative to the general population.

But while we are seeing good agreements between the expected and the reported numbers, the agreements we see are just qualitative. Due to the relative small number of cases, and the "contamination" from infection before the start of the study period, it is unlikely that the results reach the typically required levels of statistical significance. For that, the study would have had to be substantially larger, and ideally also have included a PCR test at the start of the study period.

What went wrong

When designing the study, the authors determined the number of study participants they needed based on an estimated infection rate of 2%, which was reasonable at that time. The authors also were looking for a relative large reduction effect of 50%; to see a smaller effect, a larger study would have been necessary.

However, by the time the study started, the interventions initiated by the Danish government had taken effect, and reduced the number of daily infections by more than 2-fold, with an additional 10-fold reduction by the time the study ended. The overall calculated infection rate for the combined study periods was less than 0.5% - roughly four-fold lower than what had been expected. This resulted in lower case numbers - number that are too low to give a statistically meaningful result. To some extend, the study became a "victim" of the success that Denmark had in containing the COVID-19 pandemic in the spring.

A second factor that contributed to the lack of a "clear signal" from the study was that the authors apparently did not consider the lag time between infection and the begin of a detectable antibody response. In their defense, the data that describe the timing of the antibody response were probably not available when the study was designed. Furthermore, the effect would have been significantly lower if the infection numbers had still been increasing, or at least stable. Nevertheless, the lack of any discussion of the "antibody delay" effect on the study results in the publication is somewhat disappointing.  With proper consideration of this effect, the data produced by the study are not only compatible with a 50% protection from wearing face masks - they actually are in agreement, even if they may not be "statistically significant" due to the factors discussed herein.