Tuesday, March 24, 2020

How Deadly is the Corona Virus?

How dangerous is the new corona virus really? There's a lot of conflicting information out there. In this post, I will analyze some recent scientific studies that came to different results. I'll try to explain what the studies did, and why some of them are much more likely to be accurate than others.

Let's start with a diagram from one of the studies that illustrates some of the problems:
What we want to know is: if you get infected with the virus,

  • how likely is it that you will die from the infection?
  • how likely is it that you will require hospital care?
We have a pretty good idea how many people have died from the virus (5,798 people as of 3/14/2020). All the other numbers are harder to get - not all hospitalizations get reported; not everyone who becomes sick is counted; and there is a number of infections that show very mild or no symptoms, but this number is unknown.

Severe cases: three weeks in the hospital

"Severe" cases of COVID-19, as the disease cause by the new corona virus is called, require hospital care. Typically, that includes oxygen, and often intensive care and mechanical breathing support (ventilators). The typical time in the hospital for COVID-19 is 22 days. Lengthy rehabilitation periods may be required afterwards, especially when a ventilator was needed.

How likely it is that an infection turns into severe disease strongly depends on age. Here is a table from one analysis from a large team at leading universities in the UK:

  • In the 20-29 year old group, about 1 of 90 infected persons required hospitalization. 
  • In the 40-49 year old group, about 1 in 23 infected persons required hospitalization. 
  • In the 70-79 year old group, about 1 in 6 infected persons required hospitalization. 
Another table from the same publication looks at death rates by age:

(Note that he percentage numbers are calculated with different adjustments in the table shown; check the original publication for details)

This table also shows a dramatic increase in death rates by age. Younger patients have a better chance to recover from an infection even if they require hospital care; older patients are much more likely to die. 

One very important thing to consider is hospital capacity. Hospitals have only a limited number of intensive care beds and ventilators (in the range of 100,000 to 200,000 for the US). In an area with a large number of cases, oxygen, ICU beds, and ventilators are not available for many patients. This dramatically increases the death rates, and has happened in several areas worldwide, including Wuhan (before an intensive government response that included sending 40,000 health care professionals to the city and building new temporary hospitals) and Italy.  Therefore, a very important objective for any infected country or region is to slow down the rate of new infections, so that hospitals are not overwhelmed.

Hidden infections, CFR, and IFR

While we can easily get good numbers for the top of the "severity pyramid" in the picture above, getting accurate numbers for the lower parts is more challenging. Some people get infected with the virus, but do not show any symptoms, or only have very mild symptoms that are similar to a regular cold. The commonly mentioned "fatality rates" simple divide the number of deaths by the number of confirmed cases. This gives numbers around 2-4% for most countries (2.2% for the US, and 6.8% for Italy, as of 3/14/2020). This number is called the "crude case fatality ratio" (CFR). This number is easy to calculate, and gives some important information; calling it "False" is bloody nonsense.

However, what we really want if a number for the chance of dying if we get infected - the ratio of deaths to infections. This number is called the "infection fatality ratio", or short IFR. Relative to the crude CFR, we need to know a few more things:
  • How many deaths are underreported?
  • What are the time delays between infection, confirmation of the infection, and death?
  • How many infections are not counted?
The under-reporting of death can have several reasons. A death caused by the corona virus may look very much like a death by pneumonia or other causes. Some countries, like Italy, test for the corona virus even after death; others, like Germany, do not. In countries where the government controls the press or the health agency reporting, reported numbers may be lower than actual deaths for political reasons. 

The time delay between infection, confirmation of the infection, and death is very important in case of COVID-19. This is because the number of infections is growing very rapidly - in many countries, it doubles every 4-7 days, but even 10-fold increases within a week or 10 days have been seen. Typically, it takes about 5 days after the infection for the first symptoms to appear, and a few more days for symptoms to get severe. Let's look at some numbers, making some assumptions that are reasonable (but not necessarily accurate):
  1. The number of infections doubles every 5 days.
  2. The delay between infection and confirmation (for anyone getting tested) is 5 days.
  3. Death occurs 15 days after infection.
  4. The death rate is 1 death per 50 infections (2%).
  5. We start with 1,000 infections on day 0.
Now let's look at the time line and the numbers we get:
  1. After 5 days, we have 2,000 infections.
  2. We have 1,000 confirmed infections on day 5 (due to the 5-day delay).
  3. Until day 15, we have two more doublings, so we have now 8,000 infections. But due to the 5 day testing delay, we know only about 4,000.
  4. On day 15, 20 of our 1,000 initially infected people are dead.
The calculated death rate on day 15 in this example is 20 / 4,000 = 0.5% - this is 4-fold lower than the actual death rate! 

I gave the numbers above only for illustrative purposes. In reality, this gets more complex, since not everyone gets infected the same day, and there's a lot of variation in the other numbers, too - some people may die 10 days after infection, others may die after 30 days. To get reasonably accurate numbers, some serious statistical modeling is required. But before we look into that, let's look at one more thing first: what happens to the numbers if many infections remain hidden, for example because symptoms are very mild so that the person never gets tested for the corona virus?

Let us assume that 3 out of 4 cases remain undetected, and only 10% of all infections get diagnosed. Now we get:
  1. After 5 days, we have 2,000 infections, but 1,800 remain undiagnosed.
  2. We have 100 confirmed infections on day 5; this includes most of the
  3. Until day 15, we have two more doublings, so we have now 8,000 infections. But due to the 5 day testing delay and only 10% being diagnosed, we know only about 400.
  4. On day 15, 20 of our 1,000 initially infected people are dead.
 The calculated death rate on day 15 in this example is 20 / 400 = 5% - slightly higher than the actual death rate! 

The two examples above show that is very important to have accurate numbers for our calculations, and to take the timing of infections into account. In the next section, we'll examine a few studies who tried to do that, but came to dramatically different results.

So, what is the "correct" fatality rate?

The answer to this question depends a lot on the amount of testing that is being done, which varies widely from country to country. Without any testing, there will be no confirmed corona virus infections! 
Let's start with two studies that both looked at China, but came to very different results: one study calculated an infection fatality rate of 0.657%; the second gave numbers between 0.04% and 0.12%. Unfortunately, the second study is deeply flawed, but likely to nevertheless be picked up as "proof" that the corona virus is comparable to influenza. Not true - the best current estimate is that the corona virus is about five times more deadly!

The first study, from which the tables and figure above are, was performed by 33 scientists, mathematicians, and statisticians at colleges in London and Oxford. It contains detailed tables and links to the data used, as well as an in-depth explanation of the statistical methods used. To get an estimate of the underreporting of infections, it used data from "repatriated expatriates returning to their home countries" on 6 flights (689 persons tested, 6 confirmed infections). Several key aspects of this study were:
  • A detailed look at the age distribution for infections and death rates.
  • Separation between bases in Wuhan and the rest of China.
  • Age-specific estimates for underreporting of infections for the two regions.
Basically, the reported number of infections was too low, compared to the observed infection rates in foreigners living in Wuhan who were flown back to their home countries. This group is used because it is the only clearly defined group where every single person has been tested. In some countries, the "rescued" repatriated were quarantined for 2 weeks, and/or allowed to go home only if two successive tests came back negative, and no other signs of infection were present. In this groups, the infection rate was about 0.8% to 1.5% (depending on which countries and flights are included). It should be noted that some flights had more than one infected person, and that some of those infected tested negative first, but positive later; therefore, it cannot be excluded that some of the expatriates infected others after leaving China. For comparison, the reported infection rate in Wuhan in the middle of February was near 0.2% (about 20,000 confirmed infections out of a population of 11 million). The exact rate varies a bit with the exact date; however, reports that up to 5 million people left Wuhan before the Chinese government imposed travel restrictions indicate that the population number may also have been lower. As an order-of-magnitude estimate, the number of actual infections was probably about 5-times higher than the number of reported infections. The study concludes that the death-to-infection ratio is about 0.66%,  about 5-fold lower than the raw CFR, but also about 5-fold higher than for influenza.

While the first study was well written, the same cannot be said for the second study, which was authored by a graduate student or post-doc and his two advisors in Japan and at Georgia State University. Even though this study is a lot shorter, it took me several times longer to understand what was going on. At first, I got quite excited about the study, since the result was stated clearly and sounded great:
"We also found that most recent crude infection.. adjusted IFR is estimated to be 0.12% ..., which is several orders of magnitude smaller than the crude CFR estimated at 4.19%."
But the authors also claim that this is based on "epidemiological data of Japanese evacuees from Wuhan City" - that's where the first red flags went up. A closer inspection leads to this sentence:
Other parameter estimates for the probability of occurrence and reporting rate are 0.97 (95% CrI: 0.84–1.00) and 0.010 (95% CrI: 0.007–0.014), respectively.
Note that the authors talk about a "parameter estimate" value of 0.010 for the "reporting rate". This means that just 1% of all infections actually would be reported. Simply based on this value, they then conclude that the actual number of infected people in Wuhan was 100-fold higher than reported: 1.9 million.

Let's have a closer look at what they did. Basically, they ran computer simulations where they tried to match the calculated results to the observed results. As the technique for the simulations, they used "a Monte Carlo Markov Chain (MCMC) method" - which is really just a fancy term for letting the computer roll the dice. The model requires a number of parameters, which they had to estimate, and then claimed to have "95% credibility" because they ran the models 100,000 times, and evaluated the results using "potential scale reduction statistic" in a "Bayesian framework".  They further claim:
"We collected information on the timing of the evacuee fights that left Wuhan City as well as the number of passengers that tested positive for COVID-19 in order to calibrate our model"
Well, this all sounds very scientific and confusing, right? But let is have a quick look at the last claim (and assume they meant "flights", not "fights" as written). The table (which is not included in preprint, but instead requires a separate download) show 12 infections among 763 evacuees, giving an infection rate of 1.57%. The underlying assumption is that the inhabitants of Wuhan had the same infection rate; with 10 million inhabitants, that would be 157,000 infections. With a reporting rate of 0.010, only 1572 infections should have been reported. But the article itself states:
"As of February 11th, 2020, a total of 19559 confirmed cases including 820 deaths were reported in Wuhan City."
There is a 12-fold discrepancy! Another way of looking at this is to start with a different claim the authors made:
"Our results indicate that the total number of infections (i.e. cumulative infections) is 1905526"
That's 1.9 million! If this rate is correct, than about 19% of the evacuees should have been infected: about 145 cases would have been expected, not the observed 12! Some correction would need to be applied for the flights that left in January, but the last flight alone, which left February 7 with 198 passengers, should have had more than 35 infected passengers - not just one!

This shows a fatal flaw in this "research". The authors only achieve the very low death rates because they use a "reporting" rate that is an order of magnitude too low. They claim that the the rate was "calibrated" using refugee data is obviously wrong. If this single error is corrected, then the actual actual death rate is somewhere between 0.5% and 1.5% - in line with what the other study concluded.

The second study is an example of very bad science. Please note that this is not a regular research paper that has undergone the usual "peer review" process, but a preprint. It is very unlikely that this paper would have been accepted by any reputable journal, since most referees would have spotted the problems I outlined (and likely others) very quickly, and therefore rejected the paper.

As the paper is written, with limited explanations, multiple typos, and very questionable science, I cannot shake the suspicion that if was "goal oriented pseudo-research": someone set out to prove that the novel corona virus has a death rate comparable to influenza, and succeeded. I fear that this will be picked up by news outlets, which is real shame. This is not science - this is somebody playing with a computer who either has no clue, or no morals!

The bottom line

Yes, the reported "raw CFR" rates of 2% to 4% do not include all infections, and a significant number of infections is not included in these numbers. But the best estimates of the death-to-infection rate, taking all infections into account, indicate that the overall death rate is about or at least 5-fold higher than for influenza. And that is for the general population, and with a limited case load that does not overload hospitals. For people with pre-existing health conditions and the elderly, or if the hospitals are overloaded because the infections spread to quickly, the chance of dying can from COVID-19 can exceed 5%

So please, do your part in slowing down the spread of the virus. Avoid any large scale meetings and situations where you are close to strangers (note that the official limits of 500 or 1000 people is just an arbitrary number - infections spread in smaller groups, too!). Limit social interactions. Work from home if you can, and do not go to work if you feel ill in any way! The disease is most infectious right at the onset of mild symptoms! Check the WHO website and other official government websites for additional information (and don't believe everything someone posts on Facebook).

This post was originally posted on 3/10/2020 at boardsurfr.blogspot.com/2020/03/how-deadly-is-corona-virus.html

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.