Friday, July 3, 2020

Has COVID-19 Become A Lot Less Deadly?

Short answer

No. This graph shows the main reason why cases have increased recently, but deaths have not yet:
Figure 1:
Typical COVID-19 time periods for New York (April) and Florida (now)

Long answer

It's easy to come to the wrong conclusions when looking at graphs like this one:
Figure 2:
COVID-19 cases and deaths in Florida

Since the beginning of June, the number of confirmed COVID-19 cases in Florida has risen about tenfold, but the number of COVID-19 deaths has remained roughly the same. So it seems obvious that something else has happened - perhaps the higher case numbers are only due to testing? Or the virus has mutated and is less deadly now? Both answers seem logical, but they are wrong. Let me explain.

To start with, let's have a look at the cases and deaths curves from earlier in the pandemic. We'll start with New York:
Figure 3:
Cases and deaths in New York in March - April

We can see that the death curve followed the case curve with a delay of about 1 week. That's true for when cases and deaths were rising at the end of March, and it's also true for the peaks in early to mid April.

But things looked quite different in Germany:
Figure 4:
Cases and deaths in Germany in March-April

For Germany, the offset between cases and deaths was much longer - about two weeks instead of just one week. Let's also look at Spain:
Figure 5:
Cases and deaths in Spain

For Spain, the delay between cases and deaths was even shorter than for New York - only about 2-3 days.

We know with absolute certainty that the observed differences were not due to changes in the corona virus that causes COVID-19. Multiple virus isolates from New York, Germany, and Spain have been sequenced and compared to each other, and while there are small differences between almost all isolates, those are mostly "silent" mutations that have no biological consequence.

However, we do know what did cause the observed differences in time lags between confirmed cases and deaths: the availability of COVID-19 tests. Germany had sufficient tests available so that most people with COVID-19 symptoms or exposure to COVID-1 patients could get tested, and test results were generally reported within a couple of days. Therefore, the time difference of two weeks between test and deaths is close to the about 16-20 days that are the typical time from first symptoms to death for COVID-19.

The situation was very different in New York in March and early April. Test capacity was extremely limited, so that testing was mostly limited to patients with very severe symptoms, often patients that needed hospital care. At the same time, hospital capacity in New York City was fully used, which led to very strict criteria for hospital admissions. As a result, patients were testing much later after the initial infection: not when the first symptoms developed after about 5 days, but only after symptoms got a lot worse, which often took another week or longer.

In addition, test providers were severely backlogged, so that getting test results back often took up to two weeks. Together with the delayed ordering of tests, this reduced the typical time between test results and deaths to a week. In Spain, test availability early in the epidemic was even more restricted than in New York, which reduced the test-to-death time even further.

What about Florida?

Since April, COVID-19 testing capacity in the US has increased significantly. As a result, COVID-19 tests have often been available to anyone with symptoms, and even to people without symptoms who (for example) had been in contact with confirmed COVID-19 cases. This means that on average, anyone infected with COVID-19 can get tested about a week early than in New York in April. Furthermore, test results are usually available within a day or two. Together, these two factors extend the time between test results and deaths by almost 2 weeks, as shown in Figure 1 above. There are also indications that the reporting of confirmed COVID-19 deaths in Florida is slower than in New York, probably by several days.

Therefore, the expected delay between the rise in confirmed COVID-19 cases in Florida and the corresponding rise in COVID-19 deaths is more than three weeks. The rapid rise in cases started about three weeks ago, so the corresponding rise in deaths would be expected to start within the next week or so.

If we look at the cases and deaths for Arizona, where the rise in infections started about a week or two earlier than in Florida, we can indeed see that deaths are starting to increase:

Figure 6:
COVID-19 cases and deaths in Arizona

The number of confirmed cases in Arizona started to rise at the beginning of June; about 3 weeks later, the number of COVID-19 deaths started to rise from about 20 to almost 40.

The effect of younger people being infected

Many news reports have detailed that the current COVID-19 infection wave in the south differs from the initial infection wave: a much larger percentage of young people is infected now. To some extend, this is likely to be distortion linked to testing. A young person with COVID-19 is much less likely to have severe symptoms, require hospital care, or die from COVID-19 than an older person; this is a well-known fact that has been seen desribed in for initial epicenter in Wuhan. When testing was limited in the US, and therefore mostly restricted to patients with severe symptoms, the likelihood that a young infected person would get tested was significantly lower than it is now, with much more testing capacity available.

However, while this may distort the picture somewhat, it is nevertheless true that younger people are now driving the wave of infections. To some extend, this is due to younger people being less concerned about COVID-19, and therefore less likely to adhere to social distancing and face mask guidelines. But independently of that, younger people tend to have a much higher number of social interactions than older people, and are therefore more likely to be infected when restrictions are lifted.

Over time, however, younger people interact with older people - their parents, grand parents, coworkers, and others. As a result, the infection wave spreads to older population groups, albeit with a noticeable delay.  A study by Dr. Jeffrey Harris, an economics professor at the MIT, found this to be the case in infection time lines for Florida. Here is a graph from this study:
Figure 7:
COVID-19 infections by age group in Florida (from Harris, 2020)

The figure shows that the growth in the older (60+) age group trails the growths in the younger (20-39) age groups by about 1-2 weeks, but then increases at about the same pace. The effect of the age distribution and timing on COVID-19 deaths amounts to an additional delay of 1-2 weeks between confirmed infections and deaths.

As a result of all the factors discussed above, the overall delay between the rise in confirmed COVID-19 cases in Florida and a corresponding rise in deaths is likely to be approximate one month.

But the CFR is down!

Another argument made by "partially informed" people that "proves" that the corona virus is getting less harmful is that the case fatality ratio (CFR) is going down. The case fatality ratio is easy to calculate: just divide the number of COVID-19 deaths by the number of confirmed cases. Do this for New York on May 1, and you get 7.6%. Do this for Florida on July 2, and you get 2.1%. Quod erat demonstrandum? Not so fast!

The biggest problem with the CFR is that it uses "cases". Increase testing, and the number of confirmed cases goes up. But the number of deaths does not change (or changes only minimally, assuming most severe cases still get tested). So do more testing, and your CFR goes down! That's exactly what we are seeing - Florida has done a lot more testing than New York. But testing has changed nothing about how deadly the virus is. More testing only warns us that we have a problem earlier, giving us more time to do something to reduce transmissions.

The really relevant number is not the CFR, but the IFR: the infection fatality ratio. But to calculate that, we need to know the actual number of infections - something we usually do not know. There are multiple ways scientist can try to estimated the true number of infections, and all of them must take age distribution into account. For different countries and regions, the studies have returned numbers in the range of 0.4% to 1.4%; these numbers have not really changed much since the first thorough estimates based on Wuhan data in February and March. One of the higher infection fatality rates of 1.45% was estimated for New York City. For age group from 25-44 years, the estimated IFR was 0.12%; for the oldest age group (75+), the infection fatality rate was 17%.

One likely reason why the IFR in New York City was relatively high was the overloading of hospitals and ICU units. Failure to understand the delays between the rise in reported COVID-19 cases and the corresponding deaths has already lead to delayed actions in several affected states, and will likely cause similar hospital capacity problems in many areas in these states - and similar high fatality rates.

Herd immunity still means more than a million COVID-19 deaths in the US

Some individuals who looked at case curves and deaths curves and then wrongly concluded that COVID-19 had mutated to a less deadly form (which it has not) have also advocated to go for "herd immunity".  To reach this point where new COVID-19 transmissions would stop "naturally", at least 60-70% of the population would have to be infected: more than 200 million Americans.  With the fatality rate seen in New York, this would lead to almost 3 million COVID-19 deaths. Even with a fatality rate at the low end of the estimates, 0.5%, "herd immunity" would still mean more than a million deaths from COVID-19.

The vast majority of Americans still considers a million deaths absolutely unacceptable. But some people value their "freedom" to party and not wear face masks higher. Often, they hide their real opinions, instead downplaying how dangerous COVID-19 is. But the science is clear, and it is not "just another opinion". Don't be fooled.

Is the more infectious mutant G614D more deadly?

A couple of hours after writing this post, I found a couple of interesting publications that describe a mutant of the corona virus called "D614G". One study by a large group of researchers from Los Alamos, Duke, Harvard, WUSTL, and the UK looked at 28,576 sequences from corona virus isolates, and tracked the changes over time. The study found solid evidence that the original strain, D614, has been largely replaced by a mutant strain, D614G, in many different countries and continents. The likely reason for this observation is that this strain is more infectious, which is supported by the observation that the mutant virus appears to be present in higher concentration in the upper respiratory tract than the original D614 strain. Such a higher concentration of viral particles would explain a higher infectivity, and it could also cause more severe disease symptoms.

A second study had looked at how common the original and mutant virus strains were in different countries, and correlated this to the reported CFR rates. The study concluded that the mutant D614G strain (called G614 in the study) was linked to higher fatality rates, and therefore more pathogenic - in other words, more deadly. However, as discussed above, the CFR rate depends on both fatalities and testing, and the larger changes in observed rates are linked to testing differences. The testing rates vary dramatically between the countries included in the study, so any conclusion about the mutant being more pathogenic is, at best, tentative.  The study did note that the isolates from New York had a higher percentage of the D614G strain; if this strain is indeed more  pathogenic than other strains, then this could explain the observed higher infection fatality ratio, possibly in combination with, and addition to, other factors like hospital overloading.

Additional studies will be needed to clarify whether or not the D614G strain is indeed more pathogenic than the original D614 strain. At this point in time, we only know that this mutant strain that has become dominant on most countries is more infectious, and can only speculate that it may be more deadly. Still, the scientific evidence we have today points towards more, not less, deadly corona virus variants.



Tuesday, June 30, 2020

Corpus Christi Takes Off

My wife and I love spending our winters in Corpus Christi, Texas, so we're following what's going on there somewhat closely. The most recent data are shocking, as this graph illustrates:
The graph is from the TAMUCC-COVID19 task force report, which is worth a look. 

Note that the positive rate for the tests has gone up very rapidly, and now is above 30%. That indicates that not enough tests are available, which is confirmed by a news report (note the word "tests" is missing after "nucleic acid"). 

The city of Corpus Christi reports 274 new cases today (6/30). How does the situation compare to New York City in March?

The county that Corpus Christi belongs to, Nueces County, has a population of about 360,000. The NYC population of 8.4 million is has about about 23.3 times higher; adjusting the Corpus Christi numbers to the NYC population gives about 6,400 cases per day. The highest number of daily new cases that NYC reported was 6,376 cases on April 6. For the same day, NYC reported a positive rate of 57%.

A big difference between NYC and Corpus Christi is the age distribution of the people testing positive for COVID-19. In NYC, the 75+ group had the highest number of reported cases, about twice as many as the 18-44 group.  In Nueces County, the groups of 20-39 year olds had the highest per capita rate. Part of this difference may be due to extremely limited testing in NYC, where testing was largely limited to patients with severe symptoms; but it is likely that a more important reason for the observed difference is linked to behavioral differences.

In view of the lower positive rate and the differences in the average age of persons with a positive test result, Nueces County is not yet in as critical a situation as New York City was at the beginning of April. But is must be noted that the peak in infections in NYC was observed almost two weeks after a strict stay-at-home order had been issued, which was enforced by police. In contrast, the order issued for Texas is just a small rollback from near-complete reopening; for example, restaurants are allowed to remain open at reduced capacity, and public meetings up to 100 people are allowed, with no limits on meetings for religious purposes, sporting events, and many other exceptions. The Nueces County judge has issued a "Facial Coverings Required" order, but it applies only to some government buildings and most stores with the qualifier "when in a space that will necessarily involve close contact (areas where six (6) feet of separation is not feasible) with others". But effectively, this is only a recommendation, since paragraph 10 of the order states:
"Consistent with Abbott's Order, no civil or criminal penalty will be imposed on individuals for failure to wear a face covering."
In view of strong opposition that many Texans have displayed against using face masks, it remains to be seen how effective the order will be. If it is ignored by a significant part of the population, it is likely that the infection rate in Corpus Christi will reach or exceed New York City levels. Initially, the rate for deaths from COVID-19 in Corpus Christi is likely to be lower, since mostly younger people are currently infected. But over time, the infections will likely spread to all age groups more evenly, which will result in a corresponding large increase in COVID-19 deaths.

Sunday, June 28, 2020

COVID-19 Deaths: A Closer Look

How many deaths does COVID-19 really cause? As I explained in a previous post, "excess mortality analysis" is a good way to answer this question that avoids problems like missing tests, wrong test results, and incorrect classifications of death. This is an updated look at the excess deaths in the US, and how the number of excess deaths compares to the number of officially reported COVID-19 deaths. I'll start this post with results; details about how the results were calculated are given further below in the post.

Excess deaths in different US states


The graph above shows the excess deaths in the states were affected strongly by COVID-19 early in the pandemic, and later showed decreases in reported COVID-19 deaths, for the weeks ending 3/7 tp 5/16/2020. The numbers shown are the percent of excess deaths, above and relative to the expected deaths for a given week, based on averages from the previous 5 years. For example, New Jersey reported a total of 4,735 deaths for the week ending 4/11, compared to the typical average of 1,441 deaths. The excess of 3,294 deaths is 229% of the expected deaths count. Other states shown in this graph had between about 35% (Colorado) and 150% (New York) additional deaths in the worst weeks.

For comparison, here is a look at "late" states that generally showed a later increase in excess deaths, and no or minimal declines:
There are more states in this list, but the relative increases in deaths are smaller: between 10% and 60%.

The graphs above only extend to 5/16 since too many death report data for recent weeks have not yet been submitted to the CDC. For the weeks shown, most states have reported more than ~90% of the actual deaths, and the CDC spreadsheet tries to estimate the number of missing reports based on historical reporting patterns. However, the CDC estimates are "lowball" estimates that under-predict final numbers, so they cannot be used for the most recent weeks.

Overall, 26 states shown in the graphs above have reported increases in overall mortality between 10% and more than 200% for several weeks in the analyzed time period. Note that the graphs above omit most smaller states because the week-to-week variation in deaths is much larger.

States differ in "COVID-19 death reporting ratios"

In a "theoretically perfect" world, everyone who dies of COVID-19 would be tested for the virus in time, have a positive test result, and therefore have COVID-19 listed on the death certificate; at the same time, nobody who died of other causes would have COVID-19 listed. But tests can fail; people can die alone at home, without ever being tested for COVID-19; doctors and medical examiners can make errors; and other things can go wrong. Looking at the ratio between reported COVID-19 deaths and the number of excess deaths can give a quick impression which of these factors dominate. Here's a graph for all states that reported more than 500 COVID-19 deaths for the 10 weeks  3/14 to 5/16/2020:
Three states have reported more COVID-19 deaths than excess deaths: Connecticut, Washington, and North Carolina. For Connecticut and North Carolina, this is due to very slow reporting of death certificates (see the "Methods" section below).  Washington reported 999 COVID-19 deaths and 12,349 total deaths for the 10-week period, which amounts to 882 excess deaths. It is likely that Washington's death reporting is currently at most 99% complete; if so, at least 123 additional deaths will be reported to the CDC over the next months, which will drop the reporting rate to less than 100%.

A closer look at Texas

At the other end of the reporting rate graph is Texas, with 2,973 excess deaths for the 10-week period ending 5/16. The Texas Department of State Health Services data report a total of 1,305 COVID-19 deaths for  5/16, roughly the same as the 1,318 deaths reported by Johns Hopkins. The "Weekly Counts of Deaths by State and Selected Causes" spreadsheet from the CDC shows a total of 1,497 COVID-19 related deaths for this period. This number was reached in the Texas DSHS data on 5/23; this means that COVID-19 deaths totals reported by Texas are one week delayed (which is similar to other states).
However, even considering the delay, only about half of the excess deaths in Texas were attributed to COVID-19.

Of the categories in the CDC "Weekly Counts of Deaths by State and Selected Causes" spreadsheet, only one other category besides COVID-19 shows a large increase over 2019 for weeks 11-20: "Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99)". This includes the R99 classification, which is typically used for death certificates with a "Pending" cause of death that remains to be determined. For Texas, roughly half of these"pending" death certificates appear to get a final classification within two months, and the number of deaths in this category drops to about 1% of total deaths within 4 months.

So the question arises: how many of the "pending" (R99) death certificates will later be classified as COVID-19 deaths?

If most of the pending death certificate would later be classified as "COVID-19", then the reporting rate for Texas would increase substantially, and be in the same range as the New York's reporting rate. But does that happen?

We can see what is happening to the "pending" R99 death numbers over time by looking at the CDC spreadsheets from different weeks. In the graph below, the numbers as shown in the spreadsheet from 5/29/2020 are shown, along with the changes in the most recent spreadsheet that was 4 weeks newer:
The change in total deaths illustrates that reporting for the last 3 included weeks is only about 60-90% complete. The observed changes show that only a few, if any, of the "pending" death certificates were classified as COVID-19, with a possible exception for the most recent weeks. If we look at the week of 4-11, for example, 74 death certificates were removed from "pending", but the number of COVID-19 cases increased only by 3. Clearly, the vast majority of "pending" deaths were classified in a different category, and not as COVID-19.

Based on this analysis, it appears that Texas reports less than half of the excess deaths linked to COVID-19 are caused by COVID-19. In contrast, COVID-19 is listed as cause of deaths for a larger fraction of excess deaths in most other states. However, in almost all states, the officially reported COVID-19 deaths underestimate the additional deaths significantly - typically by about one third.

For the entire US, the officially reported number of COVID-19 deaths on 5/16/2020 was 89,084; the number of excess deaths during the weeks ending 3/14/2020 to 5/16/2020 was 124,219. Due to incomplete death certificate reporting that is only partly compensated for, the actual number of excess deaths is, in all probability, even higher.

Likely causes of low COVID-19 death reporting rates

There are multiple reasons why the reporting rates for COVID-19 are significantly lower than 100%. They fall into several categories:
  • No COVID-19 test results
  • False negative test results
  • Reporting errors
An investigation by USA Today reports that
"many medical examiners and coroners refuse to attribute a death to COVID-19 without a positive test before the person died"
Some medical examiners order COVID-19 tests post mortem, but it is unclear how often this happens. To some extend, such testing can also be subject to political pressure. Even when tests are done, they sometimes can return false-negative results, which makes is less likely that COVID-19 is listed as a cause of death.

For Texas in particular, there are several factors that can cause people to avoid COVID-19 tests. Texas is home to about 1.6 million unauthorized immigrants who may avoid COVID-19 tests out of fear of deportation. Texas also has the highest number of people without health insurance in the nation - 17.7% of the population in 2018, about 5 million people. While federal funds have been made available to test and treat uninsured COVID-19 patients, physician's could register for such funds only after April 27,   according to the Texas Medical Association. Many uninsured Texans may have been hesitant to seek tests and treatment, anyway, because they may be responsible for medical costs, for example if no COVID-19 test is ordered. Even political orientation may contribute to avoiding COVID-19 tests: since President Trump has repeatedly stated that "testing makes the US look bad", and that he has "ordered his people to do less testing", some of his avid fans may regard this as a personal instruction to not get tested. If masks have been politicized and are often seen as a sign of "weakness", why would ardent Trump fans think differently of tests? Combine the politicized attitudes towards a public health issue with sudden worsening that is typical for COVID-19,  and many of deaths caused by COVID-19 never get reported correctly.  Delays in reporting further compound the problem.

Delayed deaths create a false sense of security

At the height of the COVID-19 epidemic in New York, increases in reported COVID-19 cases were quickly followed by increases in deaths:
The delay between the two curves was only about one week.

But the recent increases in cases in the US have shown a different picture - here is a graph for Texas:
Cases have gone up rapidly for about two weeks, but deaths have remained more or less constant. Why?

There are two factors that contribute to the observed differences:
  1. Longer delays between test results and death reports.
  2. The current growth of infections is driven by young people.
The primary reason for a longer delay between test results and reports of deaths is increased testing capacities. In New York, testing was so limited that only people with severe COVID-19 symptoms were tested. This means that tests were not done until 10 or more days after the infection. Furthermore, testing was backlogged, and getting results often took several days to two weeks. So on average, test results became available about 17 days after infection. Deaths from COVID-19 happens, on average, about 24 days after infection, and were reported quickly in New York City. This resulted in the observed 7-day offset.

Since then, testing capacities have increased substantially, and backlogs have been eliminated. People can get tested with mild symptoms, and often even before developing any symptoms. On average, test results now are available within about 7 days after infection - 10 days earlier. In addition, the reporting of COVID-19 deaths in Texas appears to be slower by at least several days. Together, this add about two weeks delay, so that we'd expect deaths to rise about 3 weeks after infections started to rise.

The second difference is that the current wave of infections is driven by young people. COVID-19 is much less deadly for younger people, so they were the first to take advantage of states reopening, often pushing the boundaries. As a result, a much higher percentage of young people is now infected, compared to infections that happened in March and April. This also means that for a given number of infected patients, we will see fewer deaths, so the death curve is expected to rise more slowly.

This, however, is only temporary. 20-somethings may be the first to get infected now, but they will pass the infection to their parents, grand parent, coworkers, and others, so that the age distribution will change and more closely resemble the age distribution of the population. As this happens, the death rate will go up - with an additional delay that reflects the "age normalization" of the infection. When COVID-19 deaths shoot up a few weeks from now, it will not really be a surprise - it's perfectly predictable.

Methods

Data files were downloaded from the CDC "Excess Deaths Associated with COVID-19" web page.
Excess death graphs used the "Predicted (weighted)", "All data" data sets. The latest file version downloaded on 6/24/2020 was used unless otherwise noted (data files are updated weekly by the CDC).

For North Carolina and Connecticut, data for the weeks of 5/9 and 5/16 were incomplete or missing, and adjusted to the values of the "Average expected count" column plus the number of COVID-19 cases reported for the week. The total correction was 3,515 additional deaths (2,321 for North Carolina, 1,194 for Connecticut).

To calculate the total excess deaths, the "observed number" values for the 10 weeks ending 3/14/2020 to 5/16/2020 were added, and the total "average expected count" for these weeks was subtracted (after applying the corrections for Connecticut and North Carolina described above). The total number for the US was determined by adding the respective values for all states.

Reported numbers for COVID-19 cases and deaths used are based on Johns Hopkins data files downloaded from Github. COVID-19 death reporting rates were calculated by dividing the number of total COVID-19 death reported on 5/16/2020 by the number of excess deaths for the 10 weeks ending 3/14/2020 to 5/16/2020.

Monday, June 22, 2020

Why Face Masks Work

While I started advocating the use of face masks and other measures to contain COVID-19 a while ago, I must admit that I was surprised with more recent evidence that face masks work very well to stop new COVID-19 infections.

So I have been thinking about why face masks that may only capture 50% to 80% of virus particles could have such a large effect. One often-heard argument is that fabric and self-made "face masks don't work", because they do not capture very small droplets, and because they often do not fit well, so that a lot of air breathed in or out goes around the mask, rather than through the mask.

But a reflection about what scientists have learned over the last few months provides a good explanation on why face mask can have a huge impact on the COVID-19 pandemic, even if they "capture" only 50% of the virus particles in exhaled droplets.

In recent weeks, it has become increasingly obvious that transmission through airborne virus particles that are emitted when talking, singing, and breathing play a very important role in COVID-19 transmission. Many "superspreader" events where one person infected dozens of others in a short time frame can only be explained through aerosol transmission; choir practice events, where often more than half of the attending singers got infected, are one example.

A game of chance: the "Independent Action Hypothesis"

To understand what is going on, we need to look at the biology underlying COVID-19. When someone breathes in air that contains small, virus-containing droplets suspended in the air, the virus gets deposited on mucous membranes in the nose, throat, and lungs. From then on, it's a race: the virus needs to find a cell that it can infect; infect the cell and multiply; and then be excreted from the cell in large numbers, to find new cells and infect them. Repeating this cycle, the virus eventually is present in the infected body in billions of copies.

But there's many things that can go wrong. At body temperature, the virus is not very stable, and looses its ability to infect after some time. We do not know exactly what this time is, but the data we have is that it's somewhere in the range between a few minutes and perhaps a couple of hours. Furthermore, the human body is not defenseless: it has many different molecules and cells participating in the "innate immune response" that can "kill" the virus (the "kill" is in quotes because a virus does not meet the formal definition of being alive - but it's easy to understand).

So when a single viable virus particle enters the body, there is a chance that this will lead to a full-blown COVID-19 infection - but that chance is probably very small, perhaps 1 in a 1000 (we do not actually know this number, but 1 in 1,000 is a common guess). But as more virus particles enter the body, the chances of establishing a successful infection increase. If it's 10 particles, chances of "success" go up to (about) 10 in 1,000, or 1 in 100; if it's 100 particles, they rise to about 100 in 1,000, or 10%. If 1,000 virus particles enter the body, the chance of success get close to 100% (not exactly 100%, if you'd look at the probability statistics, but close enough for this discussion).

What I described above is called the "Independent Action Hypothesis". We actually do not know for a fact that it applies to COVID-19, but many scientists believe that it applies because it is the most plausible hypothesis.

"Attack rates" within the same household and at "superspreader events"

Next, we need to look at two important questions:
  1. What percentage of people in a household get infected in one person has COVID-19?
  2. Can everybody get infected?
A number of different studies have looked at how likely it is that someone in the same household gets infected if one family member had COVID-19. The actual results vary, but usually falls somewhere in the range between 20% and 60%. In most, if not all, studies, not everybody in the household got infected. Which brings us to the second question: can everybody get infected? Household studies do not give a good answer to this question, because once a person begins to show COVID-19 symptoms, others in the household are likely to be more careful about keeping their distance, washing hands, and so on, to avoid getting infected.

Instead, we can look at "superspreader" events, where a well-defined group was exposed to one or more COVID-19 patients. Cruise ships are one example that caught a lot of attention early in the epidemic, but may lead to false conclusions: as soon as the first likely cases on cruise ships were diagnosed, the passengers typically were isolated in their cabins, which significantly reduced further transmissions.

But there are several other events in the database of COVID-19 clusters, which includes many superspreader events, that provide a better answer. On a French navy ship 61.9% of soldiers ended up with COVID-19 - a total of 1,081 cases. A choir practice in Washington lead to attack rates of 75-80%, and other choir practice events led to similar infection rates. From such events, we have evidence that at least 60% to 80% of people can get infected with COVID-19.

If we put these two bits of information together, we can conclude that exposures to the corona virus often happen at levels where an infection can happen, but only sometimes happen at levels that are so high that just about everyone who can get infected does get infected. For example, if a typical household exposure would consist of 200 virus particles, then we'd expect an infection rate of about 20% - roughly what was reported in some studies.

How even "bad" masks reduce COVID-19 transmissions by 50% to 75%

Let's do a little Gedankenexperiment, where we have three groups of 100 people each that get exposed to COVID-19. In all groups, each person is exposed to an infected person for exactly the time if would take to transmit 200 virus particles.

In group 1, neither the infector nor the infectee wear a face mask, so the infectee receives 200 virus particles. This results in 20 new infections.

In group 2, only the infector wears a mask. The mask is pretty bad and lets 50% of the virus particles through. Therefore, each infectee receives 100 virus particles. This results in 10 new infections - a 50% reduction.

In group 3, both the infector and the infectee wear a "50% reduction" mask. Each infectee receives 50 virus particles, resulting in a total of 5 new infections. The overall reduction of new COVID-19 cases is 75%.

The numbers above are just intended as examples. The scenario is also simplified; reality would typically include different exposure levels for different people, and other variables. But even with further refinements, it is plausible that a large number of COVID-19 infections happen at levels where the likelihood of infection is directly proportional to the number of virus particles a person is exposed to, and that reducing the number of virus particles by using "bad" face masks can still have a large effect.

In the context of an epidemic, what is basically a linear effect on an individual level can be a much larger effect on the growth rate of the epidemic. For example, if the effective growth rate R is 2.0 without mask, but mask use reduces transmissions by 50%, that would convert the growth from exponential to stationary with R = 1.0. A reduction by 75%, as in the "group3" example above, would lead to R=0.5, and a rapidly shrinking number of new transmissions.

But ...

isn't that too simple? Indeed, I made a number of simplifications above. Also, the analysis often relies on assumptions where we do not have actual data. But the assumptions I made are more reasonable than most, if not all, alternative assumptions (for example a "threshold level" hypothesis instead of the "Independent Action" hypothesis). However, the general conclusion that many transmission happen in the "linear dose-response range" matches many observations made in recent months about COVID-19 transmissions, and what we have learned about the underlying mechanics. If you'd like a more formal analysis, check out the publication titled "To mask or not to mask: Modeling the potential for face mask use by the general public to curtail the COVID-19 pandemic" that came to very similar conclusions.
--
Note added 6/29/2020:
In the week since I wrote this post, several new studies describe empirical evidence that face masks are effective to reduce COVID-19 transmissions.

One study, titled "Data-driven estimation of change points reveal correlation between face mask use and accelerated curtailing of the COVID-19 epidemic in Italy", showed that the number of new COVID-19 infections declined faster after masks became mandatory, and concluded that "widespread use of face masks and other protective means has contributed substantially to keeping the number of new Italian COVID-19 cases under control in spite of society turning towards a new normality".

A second study titled "Face Masks Considerably Reduce Covid-19 Cases in Germany" looked at different regions in Germany, where face mask use became mandatory at different dates. It concluded "face masks reduce the daily growth rate of reported infections by around 40%".

Faced with a mounting body of scientific evidence that face masks work, and very rapid growth of new COVID-19 infections in many southern states, even "leading Republicans are publicly embracing expert-recommended face masks as a means to slowing the spread of the deadly coronavirus", according to an NPR article, leaving President Trump and Vice President Pence increasingly isolated in their opposition against wearing face masks.

Wednesday, June 17, 2020

Multi-state Evidence That Facemasks Stop COVID-19

The number of new COVID-19 cases in the US has been relatively stable overall, but huge differences between states exist: some are showing continuing declines, while others are reporting rapid increases in case numbers. This graph illustrates the dichotomy:
For the last three weeks, Alabama, Arizona, Arkansas, and California have shown increases in new COVID-19 cases, while Colorado, Connecticut, Delaware, and DC have shown decreases. Only a small fraction of the increases in most states can be explained by increased testing, as I explained in a previous post. Changes in testing do not explain the observed differences to states with decreasing case numbers at all, since testing has increased even in the states with lower reported case numbers.

So - how can we explain that many states show increases in COVID-19 transmissions, while many other states show decreases? One possible explanation are differences in when and how states "re-opened" by removing stay-at-home orders and social distancing restrictions. But while this may indeed explain some of the differences, it is unlikely to be the determining factor, given that most states have started re-opening several weeks ago, and that some states like California where many restrictions remain in place in large population centers nevertheless show increases. But a recent article in the journal "Health Affairs" provides another possible explanation: that differences in new COVID-19 cases are caused by differences in face mask wearing. So I decided to look into this.

According to the article, 15 states and the District of Columbia have issued mandates for face mask use in public, and the list of these states is given in the supplementary materials.  As an indicator of recent changes in COVID-19 cases, I used 15-day trend lines from my "data trend model", which are based on 7-day averages new case numbers from Johns Hopkins data. The slopes of the (log-based) trend lines give a good indication of where case numbers are going, with positive slopes indicating increases and negative numbers indicating decreases. I excluded the two states that had fewer than 10 new cases in the most recent 7-day averages, Montana and Hawaii. The data for the 15 states with the highest and lowest slopes are shown in the next graph:
The 15 states at the top of the graph (Michigan - DC) have decreasing COVID-19 case numbers. 11 of the 15 states have face mask mandates. The 15 states at the bottom of the graph (Utah - Oregon) have shown increasing COVID-19 case numbers in the last 2 weeks; only one of the 15 states, Utah, was listed in the article to have a face mask mandate. However, the actual wording of the executive order issued on 4/10/2020 uses the word "directives", not "mandate" or "order", and does not mention any enforcement provisions or penalties. In an updated order from 5/27, the "directive" is explicitly changed to an "order", but only for certain business employees and health care settings; for the general public, only a "strong recommendation" to use face masks is issued.

Of the states shown as not requiring face mask orders in the declining groups, both Colorado and New Hampshire have issued strong recommendations to use face masks in public. Virginia's governor issued an executive order that requires face masks for "patrons" in most businesses, including retail and restaurants, and in public transportation.

The attitude towards face masks for states in the lower half of the graph, which report increasing COVID-19 cases, tends to be different. For example, the governor of Texas has issued an executive order that "bans local governments from imposing fines or criminal penalties on people who don't wear masks in public".

Even when orders or recommendations to use face masks exist, how many people follow the recommendations can vary significantly. Multiple surveys have shown that Democrats are much more likely to wear masks that Republicans, with reported mask wearing percentages of 75% vs. 53% according to one survey of 2,400 Americans, and 92% for Democrats versus 53% for Republicans in Minnesota according to another survey.  Of the 15 states with declining daily new COVID-19 cases shown in the graph above, only 3 (20%) were won by Donald Trump in the 2016 election; of the 15 states with increasing case numbers, 12 (80%) were won by Trump. Since Trump has steadfastly refused to wear face masks in public, many of his fans also refuse to wear masks.

The research study I mentioned above is just one of many studies that confirm that face masks are effective at reducing COVID-19 transmissions. Another study that used computer simulations showed that face mask can reduce COVID-19 infections and deaths significantly even if just a subset of the population wears face masks, with such reductions being higher if a higher percentage of the population wears masks. A meta-analysis that looked at data from 172 observational studies with a total of 26,697 patients concluded that "face mask use could result in a large reduction in risk of infection", with an reported "adjusted odds ratio" of 0.15.

Many researched had long suspected that face masks can be an important tool for containing COVID-19. This was based on the comparison of Asian countries where face mask use is common, and COVID-19 was quickly contained, to European countries where face mask use was initially strongly discouraged, and the COVID-19 epidemics reached much higher levels. Given what we have learned recently, it appears that face masks, when combined with basic social distancing, are even more effective than even optimists thought. One indicator of how well face masks work can be found in the statistics for European countries like Austria and Germany that have re-opened their economies without seeing an increase in new COVID-19 cases. A common factor in these countries if that face mask use in public is mandatory if social distancing cannot be maintained.

Ironically, the widespread use of face masks seems to be a critical element that allows re-opening the economy while keeping COVID-19 under control - but the president of the US, who has clearly stated that re-opening is his absolute priority, has consistently refused to use face masks.

Monday, June 15, 2020

Does More Testing Explain The Rise In COVID-19 Cases?

The number of new daily COVID-19 cases in the US has been rising for the last few days, and several states have set new records. Is the rise in case numbers mostly due to more testing, as several governors in states with rising numbers have claimed? Or do the rising numbers indicate a "true" increase in transmissions due to state re-openings?

Here is a look at the recent trends for several states that have shown increases in COVID-19 cases since the beginning of May, using Johns Hopkins data:
For comparison, here is a graph of states that have seen a consistent drop in COVID-19 cases:
Note that the second graph uses a logarithmic scale, since some of the drops are quite large. The drops in cases are relatively constant on the log scale, whereas the growth in cases shown in the first graph is restricted to the last 2-3 weeks for most states. Let's have a look at a couple of states in detail, comparing the number of tests to the number of COVID-19 cases (data are from the COVID tracking project). First, Arizona:

Data are smoothed using 7-day trailing averages to remove most of the day-to-day variation. For Arizona, we can see that both test numbers and positive cases increased in the period shown. However, the number of tests (in red) increased only by about 50%, while the number of COVID-19 cases increased by about 300%. So while the increase in testing contributed to the increase in confirmed cases, the primary cause of the observed increase in cases was an increase in transmissions; increased testing alone would only have resulting in a rise of daily cases to about 200, not to more than 400.

Next, Florida:
The number of daily new COVID-19 cases in Florida more than doubled over the last 2 weeks. While there also was a small increase in testing between 6/3 and 6/10, the steepest increase in cases happened after 6/10, when the number of tests dropped. Furthermore, the number of tests per day remained lower than it had been around 5/25. Clearly, the simple equation "more tests = more cases" does not explain the observed trend in Florida.

For comparison, let's have a look at New York:
For New York, the number of tests per day almost doubled to more than 60,000 between 5/20 and 6/14, but the number of confirmed COVID-19 cases dropped from about 2,000 to about 800. Clearly, the effect of continued social distancing and similar government restrictions by far outweighed any  increase from more testing.

The results for many other states are similar: numbers decreased in many northeastern and central states despite more testing, while the increases in many southern and western states were much higher than what can be explained by increased testing.

One factor that comes into play here is that increased test availability changes who can get a test. When tests are in short supply, stringent criteria are used to limit testing to those most likely to be positive; typical restrictions include the presence of symptoms and contact with a confirmed case. During the height of the COVID-19 epidemic in New York City, the positive rates exceeded 50%, and fewer than 10% of infected persons were tested. As tests become more widely available, restrictions who can get tested are relaxed, and the positive rate drops. When convenient, cost-free tests that do not require doctor's referrals or the presence of symptoms are available, some people get a test "just to be safe", or because they had unspecific symptoms so long ago that the chance for a positive test are very small. As a result, a sudden doubling in test numbers does typically not lead to a doubling in confirmed cases, but rather to a significantly smaller increase - which is exactly what we are seeing.

Looking at the graph of daily cases in the US on Worldometers,  very little seems to have changed since the beginning of May - aside from small fluctuations, mostly within each week, the number of daily cases now seems very similar to the number a month ago. What the graph does not show, however, are fundamental differences between the states, with many states showing consistent decreases while others show rapid increases. If we look at the states shown in the first two graphs, but omit the two most heavily hit states, New York and New Jersey, then the drop in one group is pretty much canceled out by the rise in the other group:

What we are seeing now is that the case numbers in the states that used to have by far the highest numbers are so low that further decreases do not change the overall total much; at the same time, the numbers for the "second riser" states like Arizona and Florida have reached levels that do make a difference. Worse, the growth in these states appears to be accelerating, with observed doubling times of two weeks or less. While this growth is not as rapid as during the early growth phase in March, it is fast enough to lead to significant underestimates of the severity of the epidemic.

The states that show rapid COVID-19 have several things in common, which include earlier re-opening and higher average temperatures, compared to the states that show declining case numbers. As described in this article, this dashes the hopes that the virus will "disappear" due to higher temperatures; instead, the higher temperatures cause people to spend more time indoors, where the risk of transmission is higher than outdoors. Right now, states like Arizona show only one fourth the number of COVID-19 cases per million population that New York or New Jersey have, and the actual factor may be even larger since testing was in short supply during the height of the epidemic in the Northeast. But with observed doubling times of two weeks and governors who clearly indicated that they will not re-institute restrictions before hospitals overflow, these states may "catch up" before the summer is over.

Thursday, June 11, 2020

No Mystery - "Pending"

Update: Mystery means "Pending"

This post was originally titled "The Cheating Is Getting Worse", but based on what I learned since writing it, the title was misleading. While the analysis of "excess deaths" provides strong evidence that many COVID-19 related deaths are not reported as COVID-19 deaths in the death certificates, and that the officially reported numbers of COVID-19 deaths are significantly lower than the actual deaths, the classification of deaths in the "mystery" (R00-R99) category is not a sign of cheating; as a reader pointed out, this range includes the R99 category where the cause of death is "pending", with further investigation needed. Over time, most deaths initially classified in the R99 category are assigned a defined cause of deaths.

So the "mystery deaths" are mostly just temporary mysteries which in all likelihood will be solved over time. Some of these mystery deaths may indeed be caused by COVID-19, as the note at the bottom of this page illustrates; even if they are, they may not be classified as such if not positive COVID-19 test exists. However, some of the "mystery" death are likely to be due to other causes, for example accidents and drug overdoses, which can take many months to be resolved.

Details about how death certificate are defined by the state, and thus differ from state to state. Similarly, states differ on how and when they report death statistics to the CDC. Many states change most "Pending" R99 death certificates within a couple of months or so, but others, including Florida, may take up to a year to change them. This can be seen in data from before the COVID-19 epidemic, and has been previously described by the CDC.

While the death reporting patterns may be peculiar, and it may be hard to understand why states like Florida often need half a year or longer to determine a final cause of deaths, there is currently no evidence that indicates that the R99 category is used on purpose to hide COVID-19 related deaths. 

The purpose of excess death analysis is to get a clear picture of the impact of a disease, regardless of any testing issues. It is not to support conspiracy theories based poorly understood or incomplete facts. I therefore deleted the original post. I plan to re-examine excess deaths and COVID-19 death reporting for different states in a future post.
--
Note added 6/28/2020: A look at excess deaths in different states, with a close look at Texas, is now available at https://covid19science.blogspot.com/2020/06/covid-19-deaths-closer-look.html

Friday, June 5, 2020

Painting a Rosy Picture: Why Many COVID-19 Tests Fail

Some widely used COVID-19 tests can have very low sensitivity, missing half or more of infections. That's the conclusion from a new study published yesterday from researchers at Harvard and the Beth Israel Deaconess Medical Center in Boston, combined with information that companies have submitted to the FDA about their tests. Unfortunately, one of the least sensitive tests has become very popular in many states.

Let's start with a graph from the study:
Very sensitive tests, like the Abbott PCR M2000 test, will give positive results if the viral load is at least 100 genome copies per milliliter, and therefore detect about 85% of all infections. The other tests in the graph are less sensitive, and therefore detect fewer infections. In other words, they have a higher false-negative rate.

The curve above is based on the analysis of quantitative PCR test results for 4,774 patients with a positive COVID-19 test, which showed a very wide variation in viral loads:

Some patients had as few as 10 copies of the viral genome per ml, while others had 1 billion copies per ml. Between about 100 copies per ml and 100 million copies per ml, the distribution is quite even - the number of people in each category is about the same. This distribution is what creates the solid black line in the first graph.

There are many reasons why the number of virus particles can vary so drastically. One is the timing of the test: the viral load increases from initial infection to the onset of symptoms, and then usually starts to decrease. Researchers from countries with extensive contact tracing and sufficient testing capacities have published many studies that show tests that were initially negative, turned positive after a few days, and later reverted back to negative. Additional variations can come from how exactly the swab is done; where in the body the virus replicates most successfully; the number of virus particles that caused the initial infection; and differences in the innate and adaptive immune response between individuals. Individuals with the highest viral loads may be more likely to be "superspreaders" that can infect dozens of others, but even many of those with low viral loads are likely to contagious.

For the sensitivity figure above, the authors simply used the documentation that had been submitted to the FDA by the different companies. They point out that the description of the "limit of detection" is not standardized - some companies use the number of genomes per milliliter, others use TCID50, and so on. Neither is there a single way to determine the detection limit. Some companies start with swabs that they add a known amount of viral RNA to, and then go through the entire detection protocol, mimicking "real world" situations as closely as possible. But others add the known RNA sample much later in the process, after isolating the viral RNA. To really compare the claimed sensitivities, it is necessary to inspect the protocols closely.

TestUtah, TestNebraska, TestIowa, and Co-Diagnostics

In my previous post, I had talked about a Utah company that has won big testing contracts in Utah, Iowa, and Nebraska, where is has been criticized for what appears to be an extremely low rate of positive test results. The company uses tests from another Utah company, Co-Diagnostics, which stated in March that it can produce 50,000 COVID-19 tests per day. A month later, Co-Diagnostics announced a collaboration with the life sciences company Promega to produce more test kits.
In the documents submitted to the FDA, Co-Diagnostics claims a limit of detection of 4,290 copies per ml. Using the sensitivity curve above, this would result in a detection rate of about 60% - but a closer inspection of the document indicates that this is overly optimistic.

Co-Diagnostic describes that it used sputum samples for the sensitivity experiment. That is highly unusual, since sputum samples are rarely tested in the US, where nose or throat swabs are typically used. The genomic RNA used was added to only after the RNA purification step, directly before the PCR reaction. This avoided potential losses in the elution step, and potential degradation, which could have reduced the reported sensitivity further. But the bigger difference is elsewhere: when swabs are used, they are typically put into a test tube with 2-3 ml of saline or viral storage medium. The purification column used for the Co-Diagnostic kit, however, is designed for a volume of only 140 microliters. This effectively adds a 14- to 20-fold dilution step. For swab samples stored in the standard 2 ml of medium, therefore, the detection limit would be about 61,000 copies of viral RNA. According to the sensitivity graph above, this drops the detection rate to 50%. With accounting for losses during transport, storage, and RNA isolation, a false-negative rate of more than 50% is likely - which is exactly what was observed in Utah.

Abbott's ID NOW and "User Error"

Another test that has been shown to have a high percentage of false negative results is Abbott's ID Now COVID-19 test. One study showed false negative rates up to 45% when using diluted RNA samples. Abbott was quick to go on a counter-offensive and blame "user error" for high reported false negative rates, a statement that was repeated by Health Secretary Alex Azar. However, Abbott's own data showed false negative rates between 8.7% and 16.7% when compared to sensitive PCR assays, and concluded that higher false negative rates are linked to lower viral loads. The results are similar to two independent studies which found false negative rates of 12.3% and 26.1%, while more accurate PCR tests had false-negative rates between 1% and 5%. All studies agree that the Abbott test is less accurate at low viral loads; the highest sensitivities were seen in settings where viral loads were likely to be highest: in symptomatic patients relatively shortly after onset of symptoms.

Both the Co-Diagnostic and the Abbott tests show high false negative rates at low viral loads, which means they are not suited for "open" testing (like state-wide drive through testing without requiring COVID-19  symptoms) or testing done as part of contact tracking, since infected persons that do not (yet) show symptoms have lower viral loads, and are therefore much more likely to give false negative results.

Lessons from China and New York

Looking at the documentation test companies provided to the FDA, it is normal to see claims of 99-100% detection rates. In view of variations in viral load and even results from company-sponsored studies, such numbers are extremely unrealistic. A number of scientific studies from Asian countries report actual PCR detection rates around 60-80%. In China, symptomatic patients were routinely tested by PCR and chest CT scans, and positive chest CT scans were viewed as sufficient to diagnose COVID-19 even if the PCR results were negative.

New York City also provides clues about false-negative test results. During the height of the epidemic in NYC, COVID-19 testing capacity was insufficient, and testing was largely restricted to symptomatic patients admitted to hospitals. At the same time, overloaded hospitals meant that only patients with very severe symptoms were admitted; news papers reported that ambulances refused to transport patients to hospitals unless their blood oxygen levels were dangerously low. Nevertheless, the positive test rate in New York City never got much higher than 50-60%. Even the deaths numbers for NYC reflect that many who died from COVID-19 in NYC either did not get tested, or had negative test results: about 22% percent of COVID-19 deaths (4,727 of 21,782) did not have a positive PCR result. In all likelihood, this number is too low: death certificate analysis for NYC shows 24,480 excess deaths in NYC between 3/15/2020 and 5/23/2020 compared to last year. This  puts the actual rate of COVID-19 related deaths in NYC without a positive PCR result at 30%. Unfortunately, information about which exact tests are used are not published, but the timing of deaths in NYC means that most tests must have been done with tests that have higher sensitivity than the Abbott and Co-Diagnostics tests.



Thursday, June 4, 2020

COVID-19 Variability In US States

Here's a look at confirmed COVID-19 cases and deaths in the US, using 7-day average to smooth out day-to-day variations:
For comparison, here is the graph for Germany:
There is a remarkable difference in the offset between the case and death curves. In Germany, deaths rise and fall with about a 2-week delay, which is close to the about 19 days between symptom onset and death. In the US, the delay is only about one week, mostly due to longer wait times for tests and test results.

But the bigger differences are in how far the curves dropped after the peak in deaths around April 18. In Germany, the number of daily COVID-19 cases dropped by a factor of about 15, and the number of daily deaths by a factor of 10. In the US, cases dropped only by about one third, while the number of deaths dropped by a factor of 2.3.

In general, the number of deaths and cases should fall by the same factor, but that's not what we are seeing. The relatively lower drop factor in cases in Germany is due to the delay between diagnosis and death; if we look at the case numbers 2 weeks ago, the "case drop factor" is also about 10. In contrast, the US shows a larger drop for deaths than for cases. This is partly due to insufficient test capacity during the peak of the epidemic. In many regions, but especially in New York and New Jersey, test capacity was so severely limited at the beginning of April that some hospital patients could not get tested, and wait times for test results sometimes exceeded a week or two. In addition, the ongoing ramp up of testing in late April and May also led to an increase in reported cases.

But when looking at the COVID-19 numbers for the US, we have to keep in mind that there were dramatic differences between states in case numbers and government interventions, which is reflected in the graphs for different states. The states that were hit hardest responded with the most stringent stay-at-home orders and other measures to contain COVID-19, which drove new infections and deaths down - here is New York as an example:
Daily deaths dropped by a factor of about 10, similar to the what we saw for Germany. But other states saw only relatively small drops, followed by relatively steady numbers for both cases and deaths. Georgia is a typical example:
While most states issued "stay-at-home" orders, there were large differences between how strict orders were; whether or not they were enforced; and how long they remained in place. While most states ordered "non-essential" businesses to close, they had very different definitions on what exactly constitutes an "essential" business. Whereas New York closed down just about everything except groceries and hospitals, some states considered all construction-related business as essential and allowed them to continue to operate. Similar differences existed in closing public spaces like parks and beaches. Perhaps even more importantly, significant variation existed in adhering to local restrictions, and to suggestions or mandates to wear face masks.

A number of states with more "business-friendly" restrictions showed a relatively constant increase in new cases. One example is North Carolina:
The relatively steady number of deaths between April 20 and May 16 indicates that some of the steady increase in cases is due to increased testing. However, the number of daily COVID-19 deaths only dropped briefly, and has since risen to new records; at the same time, the number of confirmed cases shows a steep upwards trend, indicating that the number of deaths will continue to grow in the next weeks.

Another interesting case in Utah:
The daily confirmed cases show a slight upward trend from the middle of April to the middle of May, which is also seen in the daily deaths. For the last week, however, the number of cases is rising rapidly, while deaths seem to be on a downward trend. But a closer look at the CDC death certificate data and the analysis of "mystery deaths" indicates that Utah appears to be cheating:

Since April 11, Utah has an increasing number of deaths that are in the "not elsewhere classified" category - "mystery deaths". Typically, Utah should have about 4-5 deaths per week in this category, corresponding to the historical average of 1.25% (the CDC spreadsheet masks all numbers between 1 and 9, so we don't know the exact counts for many weeks - but it will be close to 5). But the number on deaths in the "mystery" category has increased at the same time as COVID-19 cases, and actually to higher levels than the number of "official" COVID-19 deaths. For the week of 5/16, about 10% of all deaths in Utah were "mystery deaths". Considering that reporting of death certificates for recent weeks is typically incomplete, and looking at last years numbers, it appears that COVID-19 is currently increasing the death rate in Utah by 10%. But most of these deaths have been "moved" to the "mystery death" category, and do not get reported as COVID-19 deaths.

There is one plausible explanation for COVID-19 deaths being classified in the "mystery death" category: false negative test results. For properly performed PCR tests, the false negative rate should generally be less than 5%, but it can reach almost 50% for tests done on using Abbott's machine (for which the company blames "user error"). Even that would not explain the ratios we are seeing for Utah, though. However, one company that performs test in Utah, called "TestUtah.com", has been criticized for a what looks like an extremely high percentage of test failures: while other labs in Utah report positive rates of 5%, TestUtah reports rates of only 2%. TestUtah has declined to join other state labs in a test to confirm accuracy.  The same company has also received a no-bid contract in Nebraska under the name "TestNebraska", where it reported a 3% test positive rate, 6-fold lower than the 8% positive rate other labs reported. The company also failed to meet the contractual 48-hour turnaround time and test numbers; all this prompted state lawmakers to call for a termination of the contract in Nebraska. Similar problems have been reported from Iowa, where the company operates under the "TestIowa" name.

On the other hand, companies like TestUtah / TestNebraska / TestIowa may be quite welcome by governors and politicians that regard "re-opening" as more important than preventing COVID-19 transmissions and deaths. The Utah government has declared that "Utah’s social distancing efforts to slow the spread of COVID-19 have been working", and declared most of the state as "low risk" areas where all businesses are allowed to operate, with only 3 counties remaining in the "moderate risk" category, and no county in the "high risk" category. Between inaccurate COVID-19 tests and mis-diagnosed COVID-19 tests, it seems Utah is doing what it can to make COVID-19 "just disappear".  Unfortunately, other states appear to follow a similar approach, as I described in my last post.


Sunday, May 31, 2020

How To Not Count COVID-19 Death

In yesterday's post, I described how the analysis of data from CDC web page titled "Excess Deaths Associated with COVID-19" showed that up to 5,310 excess deaths per week in the US are not attributed to COVID-19 (in addition to up to 16,203 deaths per week that were COVID-19 was listed as a cause of death). That prompted me to have a closer look at the detailed data that are available for download from the CDC web site. Here's a graph that shows one of the things I found:
The graph shows the weekly number of "mystery deaths" for six different states. Every death certificate must list a cause of death, and 98.75% of death certificates have a clearly defined cause. But in about 1 of 80 cases, the medical examiner or doctor who fills out the death certificate cannot clearly identify the cause (or causes) of death. In these cases, the death is described by a code for
"symptoms, signs, abnormal results of clinical or other investigative procedures, and ill-defined conditions regarding which no diagnosis classifiable elsewhere is recorded"
This includes a range of codes (R00-R99) that describe symptoms or abnormal lab findings, with code R99 reserved for "Ill-defined and unknown cause of mortality". Note added 6/15/2020: Apparently, the R99 code is also used for "Pending", and some deaths may take several months before the "Pending" classification is changed to a final classification, often in a different category.

The graph above shows that in 2019, these codes were used more or less the same every week of the year, with some week-to-week variations. But for 5 of the 6 states shown, the number of weekly "mystery deaths" increased markedly sometime in February to March, reaching levels that were more than 2 to 5 times higher than in 2019. Here is a look at the raw data for March to May 2020, and the same period in 2019, for the entire USA:
For the US, the "mystery death" numbers more than doubled at the beginning of March, and kept increasing to about 5-fold of the historical level in May (note that data for recent weeks are incomplete, and are likely to increase as more death certificates are submitted to the CDC). Overall, the deaths listed in the "mystery" category R00-R99 account for roughly half of the excess deaths that do not list COVID-19 as a cause of death.

Mystery illness or undiagnosed COVID-19?

The increase in "mystery deaths" is seen in the weeks where the total number of deaths is higher than expected, so all the "mystery deaths" are in addition to "normal" deaths - they are excess deaths. There are several possible explanations for this:
  1. The deaths were caused by COVID-19, but without a positive COVID-19 test. Many of the deaths may have occurred outside of hospitals, or in regions where COVID-19 is only listed as a cause of death if a positive test result has been obtained before death.
  2. The additional "mystery" deaths were not caused by COVID, but instead by one (or possibly more than one) unknown deadly disease.
  3. Added 6/15/2020: The "mystery" deaths are classified as "Pending" and require further investigation. In cases of injury or drug overdose, the R99 code is often used, and changed to the actual cause of deaths in the following weeks or months. Some, but most likely not all, deaths may eventually list COVID-19 as the cause.
It appears quite unlikely that an unknown disease would cause the deaths of more than 2,000 people per week, distributed over many states, without being detected, and that the disease would cause deaths in the same weeks as COVID-19. So the second explanation is very unlikely.

There are, however, multiple factors that support the idea that the "mystery" deaths are indeed undiagnosed COVID-19 deaths:
  • Tests for COVID-19 can give false-negative results, with some machines reported to give incorrect negative results for up to 45% of tests. A negative test result would reduce the likelihood that COVID-19 is listed on the death certificate even if a patient has shown typical symptoms before death.
  • Quite often, COVID-19 patients first experience somewhat milder, unspecific symptoms which get better, before a very sudden turn to severe symptoms, for example a "cytokine storm". The sudden severe symptoms can make it impossible for patients to get medical help in time, especially if they are living alone. Such a death would be likely to classified as a "mystery death".
Overall, it is very likely that the vast majority of excess deaths classified as "mystery deaths" were caused by COVID-19. Historically, about 1.25% of all deaths were "mystery deaths", so that numbers above 1.25% indicate "hidden" COVID-19 deaths. So let's have a look at what percentage of deaths were classified in the R00-R99 "mystery death" category for the US states for the four week period ending 5/16/2020:
38 states, as well as New York City and the District of Columbia, show significantly increased "mystery death" rates. Four states (Vermont, South Dakota, Rhode Island, and Connecticut) show 0% mystery death rates; all of these states except Connecticut reported  no mystery deaths in 2019, either.
The number of COVID-19 cases varies a lot between states. States with a high number of COVID-19 cases would be expected to have a higher number of undiagnosed "mystery" deaths from COVID-19. The graph below shows the ratio of "mystery" deaths to deaths where COVID-19 was reported on the death certificate:
In this graph, I have also color-coded the bars depending on how the state voted in recent elections, using red for Republican states, blue for Democrat states, and purple for states that have senators from both parties, or voted for presidents from different parties in recent elections. There is a strong, but not uniform, bias: Republican states are more likely to have more "mystery" deaths than COVID-19 deaths, while most Democrat-leaning states have more COVID-19 deaths than mystery deaths. Let's have a closer look at some states.

Hawaii

Hawaii shows the highest percentage of mystery deaths. In the CDC data, there was no COVID-19 death listed for Hawaii; however, the state has reported 17 COVID-19 deaths on its website. Hawaii is unusual in having a very low positive rate in COVID-19 tests: never higher than 3%, and around 0.1-0.3% for the last several weeks, with 55,336 tests performed and 652 confirmed cases. Nevertheless, the relatively high number of "mystery" deaths raises red flags, and deserves an explanation.

Tennessee

Tennessee is near the top in both graphs. Here's a detailed look at the reported data from 2020 and 2019:
Relative to 2019, the number of "mystery" deaths more than doubled in April, and showed a 7-fold increase in the second May week. It appears that Tennessee has never reported even half of the COVID-19 deaths as such on the death certificates, and that this fraction has gone down further in recent weeks, while the number of "mystery" deaths has increased.
Tennessee is one of the states that was very eager to "re-open", and has removed  many of the initial restrictions on May 22, including 50% capacity restrictions on restaurants and retail stores.

Florida

Like Tennessee, Florida has reported that "mystery" deaths more than doubled in March relatively to last year, and then kept increasing into May:

The reported numbers for COVID-19 deaths in Florida started declining in May, but the number of "mystery deaths" increased at the same time. Florida's governor has been very hesitant to issue a "stay-at-home" order, and has been very eager to re-open the state. There have been multiple reports about the Florida Department of Health trying to suppress information about COVID-19, including new restrictions about what data medical examiners were allowed to release to the public, the omission "snowbird" cases from published numbers, and the firing of the scientist responsible for Florida's COVID-19 database because she refused to manually change data. Florida has relatively high COVID-19 test numbers, but test availability varies widely, with testing being less available in poorer neighborhoods - where infection rates tend to be much higher.

Motive, means, and opportunity to "twist" the data to support the rapid re-opening of Florida all were present. The price, according to the death certificates? About 500 deaths per week. Perhaps the "not elsewhere classified" category is indeed appropriate: none of the other categories lists "sacrifice for the economy" as cause of death.