Friday, July 3, 2020

Has COVID-19 Become A Lot Less Deadly?

Short answer

No. This graph shows the main reason why cases have increased recently, but deaths have not yet:
Figure 1:
Typical COVID-19 time periods for New York (April) and Florida (now)

Long answer

It's easy to come to the wrong conclusions when looking at graphs like this one:
Figure 2:
COVID-19 cases and deaths in Florida

Since the beginning of June, the number of confirmed COVID-19 cases in Florida has risen about tenfold, but the number of COVID-19 deaths has remained roughly the same. So it seems obvious that something else has happened - perhaps the higher case numbers are only due to testing? Or the virus has mutated and is less deadly now? Both answers seem logical, but they are wrong. Let me explain.

To start with, let's have a look at the cases and deaths curves from earlier in the pandemic. We'll start with New York:
Figure 3:
Cases and deaths in New York in March - April

We can see that the death curve followed the case curve with a delay of about 1 week. That's true for when cases and deaths were rising at the end of March, and it's also true for the peaks in early to mid April.

But things looked quite different in Germany:
Figure 4:
Cases and deaths in Germany in March-April

For Germany, the offset between cases and deaths was much longer - about two weeks instead of just one week. Let's also look at Spain:
Figure 5:
Cases and deaths in Spain

For Spain, the delay between cases and deaths was even shorter than for New York - only about 2-3 days.

We know with absolute certainty that the observed differences were not due to changes in the corona virus that causes COVID-19. Multiple virus isolates from New York, Germany, and Spain have been sequenced and compared to each other, and while there are small differences between almost all isolates, those are mostly "silent" mutations that have no biological consequence.

However, we do know what did cause the observed differences in time lags between confirmed cases and deaths: the availability of COVID-19 tests. Germany had sufficient tests available so that most people with COVID-19 symptoms or exposure to COVID-1 patients could get tested, and test results were generally reported within a couple of days. Therefore, the time difference of two weeks between test and deaths is close to the about 16-20 days that are the typical time from first symptoms to death for COVID-19.

The situation was very different in New York in March and early April. Test capacity was extremely limited, so that testing was mostly limited to patients with very severe symptoms, often patients that needed hospital care. At the same time, hospital capacity in New York City was fully used, which led to very strict criteria for hospital admissions. As a result, patients were testing much later after the initial infection: not when the first symptoms developed after about 5 days, but only after symptoms got a lot worse, which often took another week or longer.

In addition, test providers were severely backlogged, so that getting test results back often took up to two weeks. Together with the delayed ordering of tests, this reduced the typical time between test results and deaths to a week. In Spain, test availability early in the epidemic was even more restricted than in New York, which reduced the test-to-death time even further.

What about Florida?

Since April, COVID-19 testing capacity in the US has increased significantly. As a result, COVID-19 tests have often been available to anyone with symptoms, and even to people without symptoms who (for example) had been in contact with confirmed COVID-19 cases. This means that on average, anyone infected with COVID-19 can get tested about a week early than in New York in April. Furthermore, test results are usually available within a day or two. Together, these two factors extend the time between test results and deaths by almost 2 weeks, as shown in Figure 1 above. There are also indications that the reporting of confirmed COVID-19 deaths in Florida is slower than in New York, probably by several days.

Therefore, the expected delay between the rise in confirmed COVID-19 cases in Florida and the corresponding rise in COVID-19 deaths is more than three weeks. The rapid rise in cases started about three weeks ago, so the corresponding rise in deaths would be expected to start within the next week or so.

If we look at the cases and deaths for Arizona, where the rise in infections started about a week or two earlier than in Florida, we can indeed see that deaths are starting to increase:

Figure 6:
COVID-19 cases and deaths in Arizona

The number of confirmed cases in Arizona started to rise at the beginning of June; about 3 weeks later, the number of COVID-19 deaths started to rise from about 20 to almost 40.

The effect of younger people being infected

Many news reports have detailed that the current COVID-19 infection wave in the south differs from the initial infection wave: a much larger percentage of young people is infected now. To some extend, this is likely to be distortion linked to testing. A young person with COVID-19 is much less likely to have severe symptoms, require hospital care, or die from COVID-19 than an older person; this is a well-known fact that has been seen desribed in for initial epicenter in Wuhan. When testing was limited in the US, and therefore mostly restricted to patients with severe symptoms, the likelihood that a young infected person would get tested was significantly lower than it is now, with much more testing capacity available.

However, while this may distort the picture somewhat, it is nevertheless true that younger people are now driving the wave of infections. To some extend, this is due to younger people being less concerned about COVID-19, and therefore less likely to adhere to social distancing and face mask guidelines. But independently of that, younger people tend to have a much higher number of social interactions than older people, and are therefore more likely to be infected when restrictions are lifted.

Over time, however, younger people interact with older people - their parents, grand parents, coworkers, and others. As a result, the infection wave spreads to older population groups, albeit with a noticeable delay.  A study by Dr. Jeffrey Harris, an economics professor at the MIT, found this to be the case in infection time lines for Florida. Here is a graph from this study:
Figure 7:
COVID-19 infections by age group in Florida (from Harris, 2020)

The figure shows that the growth in the older (60+) age group trails the growths in the younger (20-39) age groups by about 1-2 weeks, but then increases at about the same pace. The effect of the age distribution and timing on COVID-19 deaths amounts to an additional delay of 1-2 weeks between confirmed infections and deaths.

As a result of all the factors discussed above, the overall delay between the rise in confirmed COVID-19 cases in Florida and a corresponding rise in deaths is likely to be approximate one month.

But the CFR is down!

Another argument made by "partially informed" people that "proves" that the corona virus is getting less harmful is that the case fatality ratio (CFR) is going down. The case fatality ratio is easy to calculate: just divide the number of COVID-19 deaths by the number of confirmed cases. Do this for New York on May 1, and you get 7.6%. Do this for Florida on July 2, and you get 2.1%. Quod erat demonstrandum? Not so fast!

The biggest problem with the CFR is that it uses "cases". Increase testing, and the number of confirmed cases goes up. But the number of deaths does not change (or changes only minimally, assuming most severe cases still get tested). So do more testing, and your CFR goes down! That's exactly what we are seeing - Florida has done a lot more testing than New York. But testing has changed nothing about how deadly the virus is. More testing only warns us that we have a problem earlier, giving us more time to do something to reduce transmissions.

The really relevant number is not the CFR, but the IFR: the infection fatality ratio. But to calculate that, we need to know the actual number of infections - something we usually do not know. There are multiple ways scientist can try to estimated the true number of infections, and all of them must take age distribution into account. For different countries and regions, the studies have returned numbers in the range of 0.4% to 1.4%; these numbers have not really changed much since the first thorough estimates based on Wuhan data in February and March. One of the higher infection fatality rates of 1.45% was estimated for New York City. For age group from 25-44 years, the estimated IFR was 0.12%; for the oldest age group (75+), the infection fatality rate was 17%.

One likely reason why the IFR in New York City was relatively high was the overloading of hospitals and ICU units. Failure to understand the delays between the rise in reported COVID-19 cases and the corresponding deaths has already lead to delayed actions in several affected states, and will likely cause similar hospital capacity problems in many areas in these states - and similar high fatality rates.

Herd immunity still means more than a million COVID-19 deaths in the US

Some individuals who looked at case curves and deaths curves and then wrongly concluded that COVID-19 had mutated to a less deadly form (which it has not) have also advocated to go for "herd immunity".  To reach this point where new COVID-19 transmissions would stop "naturally", at least 60-70% of the population would have to be infected: more than 200 million Americans.  With the fatality rate seen in New York, this would lead to almost 3 million COVID-19 deaths. Even with a fatality rate at the low end of the estimates, 0.5%, "herd immunity" would still mean more than a million deaths from COVID-19.

The vast majority of Americans still considers a million deaths absolutely unacceptable. But some people value their "freedom" to party and not wear face masks higher. Often, they hide their real opinions, instead downplaying how dangerous COVID-19 is. But the science is clear, and it is not "just another opinion". Don't be fooled.

Is the more infectious mutant G614D more deadly?

A couple of hours after writing this post, I found a couple of interesting publications that describe a mutant of the corona virus called "D614G". One study by a large group of researchers from Los Alamos, Duke, Harvard, WUSTL, and the UK looked at 28,576 sequences from corona virus isolates, and tracked the changes over time. The study found solid evidence that the original strain, D614, has been largely replaced by a mutant strain, D614G, in many different countries and continents. The likely reason for this observation is that this strain is more infectious, which is supported by the observation that the mutant virus appears to be present in higher concentration in the upper respiratory tract than the original D614 strain. Such a higher concentration of viral particles would explain a higher infectivity, and it could also cause more severe disease symptoms.

A second study had looked at how common the original and mutant virus strains were in different countries, and correlated this to the reported CFR rates. The study concluded that the mutant D614G strain (called G614 in the study) was linked to higher fatality rates, and therefore more pathogenic - in other words, more deadly. However, as discussed above, the CFR rate depends on both fatalities and testing, and the larger changes in observed rates are linked to testing differences. The testing rates vary dramatically between the countries included in the study, so any conclusion about the mutant being more pathogenic is, at best, tentative.  The study did note that the isolates from New York had a higher percentage of the D614G strain; if this strain is indeed more  pathogenic than other strains, then this could explain the observed higher infection fatality ratio, possibly in combination with, and addition to, other factors like hospital overloading.

Additional studies will be needed to clarify whether or not the D614G strain is indeed more pathogenic than the original D614 strain. At this point in time, we only know that this mutant strain that has become dominant on most countries is more infectious, and can only speculate that it may be more deadly. Still, the scientific evidence we have today points towards more, not less, deadly corona virus variants.

Tuesday, June 30, 2020

Corpus Christi Takes Off

My wife and I love spending our winters in Corpus Christi, Texas, so we're following what's going on there somewhat closely. The most recent data are shocking, as this graph illustrates:
The graph is from the TAMUCC-COVID19 task force report, which is worth a look. 

Note that the positive rate for the tests has gone up very rapidly, and now is above 30%. That indicates that not enough tests are available, which is confirmed by a news report (note the word "tests" is missing after "nucleic acid"). 

The city of Corpus Christi reports 274 new cases today (6/30). How does the situation compare to New York City in March?

The county that Corpus Christi belongs to, Nueces County, has a population of about 360,000. The NYC population of 8.4 million is has about about 23.3 times higher; adjusting the Corpus Christi numbers to the NYC population gives about 6,400 cases per day. The highest number of daily new cases that NYC reported was 6,376 cases on April 6. For the same day, NYC reported a positive rate of 57%.

A big difference between NYC and Corpus Christi is the age distribution of the people testing positive for COVID-19. In NYC, the 75+ group had the highest number of reported cases, about twice as many as the 18-44 group.  In Nueces County, the groups of 20-39 year olds had the highest per capita rate. Part of this difference may be due to extremely limited testing in NYC, where testing was largely limited to patients with severe symptoms; but it is likely that a more important reason for the observed difference is linked to behavioral differences.

In view of the lower positive rate and the differences in the average age of persons with a positive test result, Nueces County is not yet in as critical a situation as New York City was at the beginning of April. But is must be noted that the peak in infections in NYC was observed almost two weeks after a strict stay-at-home order had been issued, which was enforced by police. In contrast, the order issued for Texas is just a small rollback from near-complete reopening; for example, restaurants are allowed to remain open at reduced capacity, and public meetings up to 100 people are allowed, with no limits on meetings for religious purposes, sporting events, and many other exceptions. The Nueces County judge has issued a "Facial Coverings Required" order, but it applies only to some government buildings and most stores with the qualifier "when in a space that will necessarily involve close contact (areas where six (6) feet of separation is not feasible) with others". But effectively, this is only a recommendation, since paragraph 10 of the order states:
"Consistent with Abbott's Order, no civil or criminal penalty will be imposed on individuals for failure to wear a face covering."
In view of strong opposition that many Texans have displayed against using face masks, it remains to be seen how effective the order will be. If it is ignored by a significant part of the population, it is likely that the infection rate in Corpus Christi will reach or exceed New York City levels. Initially, the rate for deaths from COVID-19 in Corpus Christi is likely to be lower, since mostly younger people are currently infected. But over time, the infections will likely spread to all age groups more evenly, which will result in a corresponding large increase in COVID-19 deaths.

Sunday, June 28, 2020

COVID-19 Deaths: A Closer Look

How many deaths does COVID-19 really cause? As I explained in a previous post, "excess mortality analysis" is a good way to answer this question that avoids problems like missing tests, wrong test results, and incorrect classifications of death. This is an updated look at the excess deaths in the US, and how the number of excess deaths compares to the number of officially reported COVID-19 deaths. I'll start this post with results; details about how the results were calculated are given further below in the post.

Excess deaths in different US states

The graph above shows the excess deaths in the states were affected strongly by COVID-19 early in the pandemic, and later showed decreases in reported COVID-19 deaths, for the weeks ending 3/7 tp 5/16/2020. The numbers shown are the percent of excess deaths, above and relative to the expected deaths for a given week, based on averages from the previous 5 years. For example, New Jersey reported a total of 4,735 deaths for the week ending 4/11, compared to the typical average of 1,441 deaths. The excess of 3,294 deaths is 229% of the expected deaths count. Other states shown in this graph had between about 35% (Colorado) and 150% (New York) additional deaths in the worst weeks.

For comparison, here is a look at "late" states that generally showed a later increase in excess deaths, and no or minimal declines:
There are more states in this list, but the relative increases in deaths are smaller: between 10% and 60%.

The graphs above only extend to 5/16 since too many death report data for recent weeks have not yet been submitted to the CDC. For the weeks shown, most states have reported more than ~90% of the actual deaths, and the CDC spreadsheet tries to estimate the number of missing reports based on historical reporting patterns. However, the CDC estimates are "lowball" estimates that under-predict final numbers, so they cannot be used for the most recent weeks.

Overall, 26 states shown in the graphs above have reported increases in overall mortality between 10% and more than 200% for several weeks in the analyzed time period. Note that the graphs above omit most smaller states because the week-to-week variation in deaths is much larger.

States differ in "COVID-19 death reporting ratios"

In a "theoretically perfect" world, everyone who dies of COVID-19 would be tested for the virus in time, have a positive test result, and therefore have COVID-19 listed on the death certificate; at the same time, nobody who died of other causes would have COVID-19 listed. But tests can fail; people can die alone at home, without ever being tested for COVID-19; doctors and medical examiners can make errors; and other things can go wrong. Looking at the ratio between reported COVID-19 deaths and the number of excess deaths can give a quick impression which of these factors dominate. Here's a graph for all states that reported more than 500 COVID-19 deaths for the 10 weeks  3/14 to 5/16/2020:
Three states have reported more COVID-19 deaths than excess deaths: Connecticut, Washington, and North Carolina. For Connecticut and North Carolina, this is due to very slow reporting of death certificates (see the "Methods" section below).  Washington reported 999 COVID-19 deaths and 12,349 total deaths for the 10-week period, which amounts to 882 excess deaths. It is likely that Washington's death reporting is currently at most 99% complete; if so, at least 123 additional deaths will be reported to the CDC over the next months, which will drop the reporting rate to less than 100%.

A closer look at Texas

At the other end of the reporting rate graph is Texas, with 2,973 excess deaths for the 10-week period ending 5/16. The Texas Department of State Health Services data report a total of 1,305 COVID-19 deaths for  5/16, roughly the same as the 1,318 deaths reported by Johns Hopkins. The "Weekly Counts of Deaths by State and Selected Causes" spreadsheet from the CDC shows a total of 1,497 COVID-19 related deaths for this period. This number was reached in the Texas DSHS data on 5/23; this means that COVID-19 deaths totals reported by Texas are one week delayed (which is similar to other states).
However, even considering the delay, only about half of the excess deaths in Texas were attributed to COVID-19.

Of the categories in the CDC "Weekly Counts of Deaths by State and Selected Causes" spreadsheet, only one other category besides COVID-19 shows a large increase over 2019 for weeks 11-20: "Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)". This includes the R99 classification, which is typically used for death certificates with a "Pending" cause of death that remains to be determined. For Texas, roughly half of these"pending" death certificates appear to get a final classification within two months, and the number of deaths in this category drops to about 1% of total deaths within 4 months.

So the question arises: how many of the "pending" (R99) death certificates will later be classified as COVID-19 deaths?

If most of the pending death certificate would later be classified as "COVID-19", then the reporting rate for Texas would increase substantially, and be in the same range as the New York's reporting rate. But does that happen?

We can see what is happening to the "pending" R99 death numbers over time by looking at the CDC spreadsheets from different weeks. In the graph below, the numbers as shown in the spreadsheet from 5/29/2020 are shown, along with the changes in the most recent spreadsheet that was 4 weeks newer:
The change in total deaths illustrates that reporting for the last 3 included weeks is only about 60-90% complete. The observed changes show that only a few, if any, of the "pending" death certificates were classified as COVID-19, with a possible exception for the most recent weeks. If we look at the week of 4-11, for example, 74 death certificates were removed from "pending", but the number of COVID-19 cases increased only by 3. Clearly, the vast majority of "pending" deaths were classified in a different category, and not as COVID-19.

Based on this analysis, it appears that Texas reports less than half of the excess deaths linked to COVID-19 are caused by COVID-19. In contrast, COVID-19 is listed as cause of deaths for a larger fraction of excess deaths in most other states. However, in almost all states, the officially reported COVID-19 deaths underestimate the additional deaths significantly - typically by about one third.

For the entire US, the officially reported number of COVID-19 deaths on 5/16/2020 was 89,084; the number of excess deaths during the weeks ending 3/14/2020 to 5/16/2020 was 124,219. Due to incomplete death certificate reporting that is only partly compensated for, the actual number of excess deaths is, in all probability, even higher.

Likely causes of low COVID-19 death reporting rates

There are multiple reasons why the reporting rates for COVID-19 are significantly lower than 100%. They fall into several categories:
  • No COVID-19 test results
  • False negative test results
  • Reporting errors
An investigation by USA Today reports that
"many medical examiners and coroners refuse to attribute a death to COVID-19 without a positive test before the person died"
Some medical examiners order COVID-19 tests post mortem, but it is unclear how often this happens. To some extend, such testing can also be subject to political pressure. Even when tests are done, they sometimes can return false-negative results, which makes is less likely that COVID-19 is listed as a cause of death.

For Texas in particular, there are several factors that can cause people to avoid COVID-19 tests. Texas is home to about 1.6 million unauthorized immigrants who may avoid COVID-19 tests out of fear of deportation. Texas also has the highest number of people without health insurance in the nation - 17.7% of the population in 2018, about 5 million people. While federal funds have been made available to test and treat uninsured COVID-19 patients, physician's could register for such funds only after April 27,   according to the Texas Medical Association. Many uninsured Texans may have been hesitant to seek tests and treatment, anyway, because they may be responsible for medical costs, for example if no COVID-19 test is ordered. Even political orientation may contribute to avoiding COVID-19 tests: since President Trump has repeatedly stated that "testing makes the US look bad", and that he has "ordered his people to do less testing", some of his avid fans may regard this as a personal instruction to not get tested. If masks have been politicized and are often seen as a sign of "weakness", why would ardent Trump fans think differently of tests? Combine the politicized attitudes towards a public health issue with sudden worsening that is typical for COVID-19,  and many of deaths caused by COVID-19 never get reported correctly.  Delays in reporting further compound the problem.

Delayed deaths create a false sense of security

At the height of the COVID-19 epidemic in New York, increases in reported COVID-19 cases were quickly followed by increases in deaths:
The delay between the two curves was only about one week.

But the recent increases in cases in the US have shown a different picture - here is a graph for Texas:
Cases have gone up rapidly for about two weeks, but deaths have remained more or less constant. Why?

There are two factors that contribute to the observed differences:
  1. Longer delays between test results and death reports.
  2. The current growth of infections is driven by young people.
The primary reason for a longer delay between test results and reports of deaths is increased testing capacities. In New York, testing was so limited that only people with severe COVID-19 symptoms were tested. This means that tests were not done until 10 or more days after the infection. Furthermore, testing was backlogged, and getting results often took several days to two weeks. So on average, test results became available about 17 days after infection. Deaths from COVID-19 happens, on average, about 24 days after infection, and were reported quickly in New York City. This resulted in the observed 7-day offset.

Since then, testing capacities have increased substantially, and backlogs have been eliminated. People can get tested with mild symptoms, and often even before developing any symptoms. On average, test results now are available within about 7 days after infection - 10 days earlier. In addition, the reporting of COVID-19 deaths in Texas appears to be slower by at least several days. Together, this add about two weeks delay, so that we'd expect deaths to rise about 3 weeks after infections started to rise.

The second difference is that the current wave of infections is driven by young people. COVID-19 is much less deadly for younger people, so they were the first to take advantage of states reopening, often pushing the boundaries. As a result, a much higher percentage of young people is now infected, compared to infections that happened in March and April. This also means that for a given number of infected patients, we will see fewer deaths, so the death curve is expected to rise more slowly.

This, however, is only temporary. 20-somethings may be the first to get infected now, but they will pass the infection to their parents, grand parent, coworkers, and others, so that the age distribution will change and more closely resemble the age distribution of the population. As this happens, the death rate will go up - with an additional delay that reflects the "age normalization" of the infection. When COVID-19 deaths shoot up a few weeks from now, it will not really be a surprise - it's perfectly predictable.


Data files were downloaded from the CDC "Excess Deaths Associated with COVID-19" web page.
Excess death graphs used the "Predicted (weighted)", "All data" data sets. The latest file version downloaded on 6/24/2020 was used unless otherwise noted (data files are updated weekly by the CDC).

For North Carolina and Connecticut, data for the weeks of 5/9 and 5/16 were incomplete or missing, and adjusted to the values of the "Average expected count" column plus the number of COVID-19 cases reported for the week. The total correction was 3,515 additional deaths (2,321 for North Carolina, 1,194 for Connecticut).

To calculate the total excess deaths, the "observed number" values for the 10 weeks ending 3/14/2020 to 5/16/2020 were added, and the total "average expected count" for these weeks was subtracted (after applying the corrections for Connecticut and North Carolina described above). The total number for the US was determined by adding the respective values for all states.

Reported numbers for COVID-19 cases and deaths used are based on Johns Hopkins data files downloaded from Github. COVID-19 death reporting rates were calculated by dividing the number of total COVID-19 death reported on 5/16/2020 by the number of excess deaths for the 10 weeks ending 3/14/2020 to 5/16/2020.

Monday, June 22, 2020

Why Face Masks Work

While I started advocating the use of face masks and other measures to contain COVID-19 a while ago, I must admit that I was surprised with more recent evidence that face masks work very well to stop new COVID-19 infections.

So I have been thinking about why face masks that may only capture 50% to 80% of virus particles could have such a large effect. One often-heard argument is that fabric and self-made "face masks don't work", because they do not capture very small droplets, and because they often do not fit well, so that a lot of air breathed in or out goes around the mask, rather than through the mask.

But a reflection about what scientists have learned over the last few months provides a good explanation on why face mask can have a huge impact on the COVID-19 pandemic, even if they "capture" only 50% of the virus particles in exhaled droplets.

In recent weeks, it has become increasingly obvious that transmission through airborne virus particles that are emitted when talking, singing, and breathing play a very important role in COVID-19 transmission. Many "superspreader" events where one person infected dozens of others in a short time frame can only be explained through aerosol transmission; choir practice events, where often more than half of the attending singers got infected, are one example.

A game of chance: the "Independent Action Hypothesis"

To understand what is going on, we need to look at the biology underlying COVID-19. When someone breathes in air that contains small, virus-containing droplets suspended in the air, the virus gets deposited on mucous membranes in the nose, throat, and lungs. From then on, it's a race: the virus needs to find a cell that it can infect; infect the cell and multiply; and then be excreted from the cell in large numbers, to find new cells and infect them. Repeating this cycle, the virus eventually is present in the infected body in billions of copies.

But there's many things that can go wrong. At body temperature, the virus is not very stable, and looses its ability to infect after some time. We do not know exactly what this time is, but the data we have is that it's somewhere in the range between a few minutes and perhaps a couple of hours. Furthermore, the human body is not defenseless: it has many different molecules and cells participating in the "innate immune response" that can "kill" the virus (the "kill" is in quotes because a virus does not meet the formal definition of being alive - but it's easy to understand).

So when a single viable virus particle enters the body, there is a chance that this will lead to a full-blown COVID-19 infection - but that chance is probably very small, perhaps 1 in a 1000 (we do not actually know this number, but 1 in 1,000 is a common guess). But as more virus particles enter the body, the chances of establishing a successful infection increase. If it's 10 particles, chances of "success" go up to (about) 10 in 1,000, or 1 in 100; if it's 100 particles, they rise to about 100 in 1,000, or 10%. If 1,000 virus particles enter the body, the chance of success get close to 100% (not exactly 100%, if you'd look at the probability statistics, but close enough for this discussion).

What I described above is called the "Independent Action Hypothesis". We actually do not know for a fact that it applies to COVID-19, but many scientists believe that it applies because it is the most plausible hypothesis.

"Attack rates" within the same household and at "superspreader events"

Next, we need to look at two important questions:
  1. What percentage of people in a household get infected in one person has COVID-19?
  2. Can everybody get infected?
A number of different studies have looked at how likely it is that someone in the same household gets infected if one family member had COVID-19. The actual results vary, but usually falls somewhere in the range between 20% and 60%. In most, if not all, studies, not everybody in the household got infected. Which brings us to the second question: can everybody get infected? Household studies do not give a good answer to this question, because once a person begins to show COVID-19 symptoms, others in the household are likely to be more careful about keeping their distance, washing hands, and so on, to avoid getting infected.

Instead, we can look at "superspreader" events, where a well-defined group was exposed to one or more COVID-19 patients. Cruise ships are one example that caught a lot of attention early in the epidemic, but may lead to false conclusions: as soon as the first likely cases on cruise ships were diagnosed, the passengers typically were isolated in their cabins, which significantly reduced further transmissions.

But there are several other events in the database of COVID-19 clusters, which includes many superspreader events, that provide a better answer. On a French navy ship 61.9% of soldiers ended up with COVID-19 - a total of 1,081 cases. A choir practice in Washington lead to attack rates of 75-80%, and other choir practice events led to similar infection rates. From such events, we have evidence that at least 60% to 80% of people can get infected with COVID-19.

If we put these two bits of information together, we can conclude that exposures to the corona virus often happen at levels where an infection can happen, but only sometimes happen at levels that are so high that just about everyone who can get infected does get infected. For example, if a typical household exposure would consist of 200 virus particles, then we'd expect an infection rate of about 20% - roughly what was reported in some studies.

How even "bad" masks reduce COVID-19 transmissions by 50% to 75%

Let's do a little Gedankenexperiment, where we have three groups of 100 people each that get exposed to COVID-19. In all groups, each person is exposed to an infected person for exactly the time if would take to transmit 200 virus particles.

In group 1, neither the infector nor the infectee wear a face mask, so the infectee receives 200 virus particles. This results in 20 new infections.

In group 2, only the infector wears a mask. The mask is pretty bad and lets 50% of the virus particles through. Therefore, each infectee receives 100 virus particles. This results in 10 new infections - a 50% reduction.

In group 3, both the infector and the infectee wear a "50% reduction" mask. Each infectee receives 50 virus particles, resulting in a total of 5 new infections. The overall reduction of new COVID-19 cases is 75%.

The numbers above are just intended as examples. The scenario is also simplified; reality would typically include different exposure levels for different people, and other variables. But even with further refinements, it is plausible that a large number of COVID-19 infections happen at levels where the likelihood of infection is directly proportional to the number of virus particles a person is exposed to, and that reducing the number of virus particles by using "bad" face masks can still have a large effect.

In the context of an epidemic, what is basically a linear effect on an individual level can be a much larger effect on the growth rate of the epidemic. For example, if the effective growth rate R is 2.0 without mask, but mask use reduces transmissions by 50%, that would convert the growth from exponential to stationary with R = 1.0. A reduction by 75%, as in the "group3" example above, would lead to R=0.5, and a rapidly shrinking number of new transmissions.

But ...

isn't that too simple? Indeed, I made a number of simplifications above. Also, the analysis often relies on assumptions where we do not have actual data. But the assumptions I made are more reasonable than most, if not all, alternative assumptions (for example a "threshold level" hypothesis instead of the "Independent Action" hypothesis). However, the general conclusion that many transmission happen in the "linear dose-response range" matches many observations made in recent months about COVID-19 transmissions, and what we have learned about the underlying mechanics. If you'd like a more formal analysis, check out the publication titled "To mask or not to mask: Modeling the potential for face mask use by the general public to curtail the COVID-19 pandemic" that came to very similar conclusions.
Note added 6/29/2020:
In the week since I wrote this post, several new studies describe empirical evidence that face masks are effective to reduce COVID-19 transmissions.

One study, titled "Data-driven estimation of change points reveal correlation between face mask use and accelerated curtailing of the COVID-19 epidemic in Italy", showed that the number of new COVID-19 infections declined faster after masks became mandatory, and concluded that "widespread use of face masks and other protective means has contributed substantially to keeping the number of new Italian COVID-19 cases under control in spite of society turning towards a new normality".

A second study titled "Face Masks Considerably Reduce Covid-19 Cases in Germany" looked at different regions in Germany, where face mask use became mandatory at different dates. It concluded "face masks reduce the daily growth rate of reported infections by around 40%".

Faced with a mounting body of scientific evidence that face masks work, and very rapid growth of new COVID-19 infections in many southern states, even "leading Republicans are publicly embracing expert-recommended face masks as a means to slowing the spread of the deadly coronavirus", according to an NPR article, leaving President Trump and Vice President Pence increasingly isolated in their opposition against wearing face masks.

Wednesday, June 17, 2020

Multi-state Evidence That Facemasks Stop COVID-19

The number of new COVID-19 cases in the US has been relatively stable overall, but huge differences between states exist: some are showing continuing declines, while others are reporting rapid increases in case numbers. This graph illustrates the dichotomy:
For the last three weeks, Alabama, Arizona, Arkansas, and California have shown increases in new COVID-19 cases, while Colorado, Connecticut, Delaware, and DC have shown decreases. Only a small fraction of the increases in most states can be explained by increased testing, as I explained in a previous post. Changes in testing do not explain the observed differences to states with decreasing case numbers at all, since testing has increased even in the states with lower reported case numbers.

So - how can we explain that many states show increases in COVID-19 transmissions, while many other states show decreases? One possible explanation are differences in when and how states "re-opened" by removing stay-at-home orders and social distancing restrictions. But while this may indeed explain some of the differences, it is unlikely to be the determining factor, given that most states have started re-opening several weeks ago, and that some states like California where many restrictions remain in place in large population centers nevertheless show increases. But a recent article in the journal "Health Affairs" provides another possible explanation: that differences in new COVID-19 cases are caused by differences in face mask wearing. So I decided to look into this.

According to the article, 15 states and the District of Columbia have issued mandates for face mask use in public, and the list of these states is given in the supplementary materials.  As an indicator of recent changes in COVID-19 cases, I used 15-day trend lines from my "data trend model", which are based on 7-day averages new case numbers from Johns Hopkins data. The slopes of the (log-based) trend lines give a good indication of where case numbers are going, with positive slopes indicating increases and negative numbers indicating decreases. I excluded the two states that had fewer than 10 new cases in the most recent 7-day averages, Montana and Hawaii. The data for the 15 states with the highest and lowest slopes are shown in the next graph:
The 15 states at the top of the graph (Michigan - DC) have decreasing COVID-19 case numbers. 11 of the 15 states have face mask mandates. The 15 states at the bottom of the graph (Utah - Oregon) have shown increasing COVID-19 case numbers in the last 2 weeks; only one of the 15 states, Utah, was listed in the article to have a face mask mandate. However, the actual wording of the executive order issued on 4/10/2020 uses the word "directives", not "mandate" or "order", and does not mention any enforcement provisions or penalties. In an updated order from 5/27, the "directive" is explicitly changed to an "order", but only for certain business employees and health care settings; for the general public, only a "strong recommendation" to use face masks is issued.

Of the states shown as not requiring face mask orders in the declining groups, both Colorado and New Hampshire have issued strong recommendations to use face masks in public. Virginia's governor issued an executive order that requires face masks for "patrons" in most businesses, including retail and restaurants, and in public transportation.

The attitude towards face masks for states in the lower half of the graph, which report increasing COVID-19 cases, tends to be different. For example, the governor of Texas has issued an executive order that "bans local governments from imposing fines or criminal penalties on people who don't wear masks in public".

Even when orders or recommendations to use face masks exist, how many people follow the recommendations can vary significantly. Multiple surveys have shown that Democrats are much more likely to wear masks that Republicans, with reported mask wearing percentages of 75% vs. 53% according to one survey of 2,400 Americans, and 92% for Democrats versus 53% for Republicans in Minnesota according to another survey.  Of the 15 states with declining daily new COVID-19 cases shown in the graph above, only 3 (20%) were won by Donald Trump in the 2016 election; of the 15 states with increasing case numbers, 12 (80%) were won by Trump. Since Trump has steadfastly refused to wear face masks in public, many of his fans also refuse to wear masks.

The research study I mentioned above is just one of many studies that confirm that face masks are effective at reducing COVID-19 transmissions. Another study that used computer simulations showed that face mask can reduce COVID-19 infections and deaths significantly even if just a subset of the population wears face masks, with such reductions being higher if a higher percentage of the population wears masks. A meta-analysis that looked at data from 172 observational studies with a total of 26,697 patients concluded that "face mask use could result in a large reduction in risk of infection", with an reported "adjusted odds ratio" of 0.15.

Many researched had long suspected that face masks can be an important tool for containing COVID-19. This was based on the comparison of Asian countries where face mask use is common, and COVID-19 was quickly contained, to European countries where face mask use was initially strongly discouraged, and the COVID-19 epidemics reached much higher levels. Given what we have learned recently, it appears that face masks, when combined with basic social distancing, are even more effective than even optimists thought. One indicator of how well face masks work can be found in the statistics for European countries like Austria and Germany that have re-opened their economies without seeing an increase in new COVID-19 cases. A common factor in these countries if that face mask use in public is mandatory if social distancing cannot be maintained.

Ironically, the widespread use of face masks seems to be a critical element that allows re-opening the economy while keeping COVID-19 under control - but the president of the US, who has clearly stated that re-opening is his absolute priority, has consistently refused to use face masks.

Monday, June 15, 2020

Does More Testing Explain The Rise In COVID-19 Cases?

The number of new daily COVID-19 cases in the US has been rising for the last few days, and several states have set new records. Is the rise in case numbers mostly due to more testing, as several governors in states with rising numbers have claimed? Or do the rising numbers indicate a "true" increase in transmissions due to state re-openings?

Here is a look at the recent trends for several states that have shown increases in COVID-19 cases since the beginning of May, using Johns Hopkins data:
For comparison, here is a graph of states that have seen a consistent drop in COVID-19 cases:
Note that the second graph uses a logarithmic scale, since some of the drops are quite large. The drops in cases are relatively constant on the log scale, whereas the growth in cases shown in the first graph is restricted to the last 2-3 weeks for most states. Let's have a look at a couple of states in detail, comparing the number of tests to the number of COVID-19 cases (data are from the COVID tracking project). First, Arizona:

Data are smoothed using 7-day trailing averages to remove most of the day-to-day variation. For Arizona, we can see that both test numbers and positive cases increased in the period shown. However, the number of tests (in red) increased only by about 50%, while the number of COVID-19 cases increased by about 300%. So while the increase in testing contributed to the increase in confirmed cases, the primary cause of the observed increase in cases was an increase in transmissions; increased testing alone would only have resulting in a rise of daily cases to about 200, not to more than 400.

Next, Florida:
The number of daily new COVID-19 cases in Florida more than doubled over the last 2 weeks. While there also was a small increase in testing between 6/3 and 6/10, the steepest increase in cases happened after 6/10, when the number of tests dropped. Furthermore, the number of tests per day remained lower than it had been around 5/25. Clearly, the simple equation "more tests = more cases" does not explain the observed trend in Florida.

For comparison, let's have a look at New York:
For New York, the number of tests per day almost doubled to more than 60,000 between 5/20 and 6/14, but the number of confirmed COVID-19 cases dropped from about 2,000 to about 800. Clearly, the effect of continued social distancing and similar government restrictions by far outweighed any  increase from more testing.

The results for many other states are similar: numbers decreased in many northeastern and central states despite more testing, while the increases in many southern and western states were much higher than what can be explained by increased testing.

One factor that comes into play here is that increased test availability changes who can get a test. When tests are in short supply, stringent criteria are used to limit testing to those most likely to be positive; typical restrictions include the presence of symptoms and contact with a confirmed case. During the height of the COVID-19 epidemic in New York City, the positive rates exceeded 50%, and fewer than 10% of infected persons were tested. As tests become more widely available, restrictions who can get tested are relaxed, and the positive rate drops. When convenient, cost-free tests that do not require doctor's referrals or the presence of symptoms are available, some people get a test "just to be safe", or because they had unspecific symptoms so long ago that the chance for a positive test are very small. As a result, a sudden doubling in test numbers does typically not lead to a doubling in confirmed cases, but rather to a significantly smaller increase - which is exactly what we are seeing.

Looking at the graph of daily cases in the US on Worldometers,  very little seems to have changed since the beginning of May - aside from small fluctuations, mostly within each week, the number of daily cases now seems very similar to the number a month ago. What the graph does not show, however, are fundamental differences between the states, with many states showing consistent decreases while others show rapid increases. If we look at the states shown in the first two graphs, but omit the two most heavily hit states, New York and New Jersey, then the drop in one group is pretty much canceled out by the rise in the other group:

What we are seeing now is that the case numbers in the states that used to have by far the highest numbers are so low that further decreases do not change the overall total much; at the same time, the numbers for the "second riser" states like Arizona and Florida have reached levels that do make a difference. Worse, the growth in these states appears to be accelerating, with observed doubling times of two weeks or less. While this growth is not as rapid as during the early growth phase in March, it is fast enough to lead to significant underestimates of the severity of the epidemic.

The states that show rapid COVID-19 have several things in common, which include earlier re-opening and higher average temperatures, compared to the states that show declining case numbers. As described in this article, this dashes the hopes that the virus will "disappear" due to higher temperatures; instead, the higher temperatures cause people to spend more time indoors, where the risk of transmission is higher than outdoors. Right now, states like Arizona show only one fourth the number of COVID-19 cases per million population that New York or New Jersey have, and the actual factor may be even larger since testing was in short supply during the height of the epidemic in the Northeast. But with observed doubling times of two weeks and governors who clearly indicated that they will not re-institute restrictions before hospitals overflow, these states may "catch up" before the summer is over.

Thursday, June 11, 2020

No Mystery - "Pending"

Update: Mystery means "Pending"

This post was originally titled "The Cheating Is Getting Worse", but based on what I learned since writing it, the title was misleading. While the analysis of "excess deaths" provides strong evidence that many COVID-19 related deaths are not reported as COVID-19 deaths in the death certificates, and that the officially reported numbers of COVID-19 deaths are significantly lower than the actual deaths, the classification of deaths in the "mystery" (R00-R99) category is not a sign of cheating; as a reader pointed out, this range includes the R99 category where the cause of death is "pending", with further investigation needed. Over time, most deaths initially classified in the R99 category are assigned a defined cause of deaths.

So the "mystery deaths" are mostly just temporary mysteries which in all likelihood will be solved over time. Some of these mystery deaths may indeed be caused by COVID-19, as the note at the bottom of this page illustrates; even if they are, they may not be classified as such if not positive COVID-19 test exists. However, some of the "mystery" death are likely to be due to other causes, for example accidents and drug overdoses, which can take many months to be resolved.

Details about how death certificate are defined by the state, and thus differ from state to state. Similarly, states differ on how and when they report death statistics to the CDC. Many states change most "Pending" R99 death certificates within a couple of months or so, but others, including Florida, may take up to a year to change them. This can be seen in data from before the COVID-19 epidemic, and has been previously described by the CDC.

While the death reporting patterns may be peculiar, and it may be hard to understand why states like Florida often need half a year or longer to determine a final cause of deaths, there is currently no evidence that indicates that the R99 category is used on purpose to hide COVID-19 related deaths. 

The purpose of excess death analysis is to get a clear picture of the impact of a disease, regardless of any testing issues. It is not to support conspiracy theories based poorly understood or incomplete facts. I therefore deleted the original post. I plan to re-examine excess deaths and COVID-19 death reporting for different states in a future post.
Note added 6/28/2020: A look at excess deaths in different states, with a close look at Texas, is now available at