Corona Virus Science: How Bad Is the COVID-19 Epidemic in the USA?

How many people are really infected with the corona virus in the US? The number of confirmed cases as I write this (3/23/2020, 9 pm EDT) is 43,734, according to worldometers.info. The number has been going up like crazy in the last few days as more testing capacities came online; just 5 days ago, there were only 9,259 cases. But even now, not everyone with COVID-19 is getting tested, for one of many reasons:

Not everyone can get a test: tests are still restricted at many places, for example requiring a doctor's note and/or the presence of disease symptoms
Not everyone wants to get a test: some people with disease symptoms prefer not to get tested for various reasons - for example because they assume their case will be mild
Some infections have no symptoms, or very mild symptoms. Symptoms can also sometimes be confused with a cold or a mild case of influenza.
Many infections are too new to be tested: the time between infection and the first symptoms is about 5-6 days, and tests are usually only done when symptoms start. A delay of 6 days may not seem like much, but remember that the number of reported infections has grown more than 4-fold in the last 5 days!

We do not know how important each of the factors above is - so how can we know how many people really are infected? What if we "go backwards", starting with the number of deaths? Deaths from COVID-19 are much more likely to be reported than mild cases. From many studies in China and other countries, we have a pretty good idea how long it takes to die from COVID-19: typically about 18 days from the onset of symptoms. If we add a 6-day incubation period, we can assume that people who died today were infected about 24 days ago.

The other number we need to know is how many infected people die from COVID-19. The initially reported numbers of 3.4% or so do not help us, because they were based on the confirmed cases, not total infections. However, a number of different studies have looked at various ways to correct for this and to come up with numbers for the "Infection Fatality Ratio", or IFR. The results vary a bit by study, but around 0.6% seems to be close to a consensus.

Well, that's the start. We can plug all this into a computer simulation (more about that below), and then adjust some parameters so that the numbers that the computer model predicts match the reported numbers for death from COVID-19. I did this for the cases in the US. Here is a figure that compares the predicted and actual total number of deaths:

For the last 14 days, the prediction is pretty good - not perfect, but reasonably close. Let us look at the model and some more results in more detail.

I started the model with 10 infections, and then calculated the development of new infections based on what is known. For example, an infected person is most likely to transmit the infection right when the symptoms start (and possibly even a day before). Each infected patient spreads the infection to about 2 to 5 others, depending on the region and the study you look at. For my model, I used the number 3.7, which together with an "infectivity" distribution centered around days 5 and 6 gave a good agreement with the observed numbers. I then aligned the model curve to the observed data up to today by looking where the model had about 500 total deaths. This happened on day 57, which puts the start of the model run (with 10 infections) on January 26, 2020. Since the first COVID-19 case in the US was reported on January 21, that makes sense - the model seems reasonable.

Now we can look at the number of infections the model predicts, and compare them to the number of reported COVID-19 cases:

For the last two weeks, the number of infections predicted by the model is way higher than the number of confirmed cases. At the beginning of the two-week period, only 0.65% of all infections were confirmed by testing - 1 out of 153 infections. At the end of the period, 2.83% of infections had been confirmed by tests - 1 in 35 infections. In other words, the actual number of infections on 3/23/2020 was likely to be 35 times higher than the reported number of confirmed cases!

Note that many infections were not confirmed because the were too new (item #4 in the list above). Tests are pretty much limited to people with symptoms who are past the incubation period, which is about 5 days. Even after symptoms develop, it will take some time to decide to get a test, to actually get the test, and to have the test results included in the daily reports. This creates an overall testing delay of about 7 days from the initial infection.

If we take this into account, we can compare today's confirmed cases to the predicted infections 7 days ago. This gives us an "undertesting" factor of 12: just 1 out of 12 infections that is old enough to develop symptoms was tested.

If you look at the last column in the table above, you will notice that the "undertesting factor" was much higher 2 weeks ago, and then dropped from 153 to 35 over the course of these two weeks. This is shown in the following graph:

This reflects that more testing capacity has come "online" during the last two weeks, so that more people who need a test can actually get one. The same thing is reflected in the very rapid rise of case numbers in the last few days.

Once we have a computer model, we can use it to calculate "what-if" scenarios. The numbers above already take into account that social-distancing measures and other "non-pharmaceutical interventions" are in place in many parts of the US, and/or recommended for the entire USA. Specifically, the model assumed a 25% drop in transmissions after March 16, and a 50% drop in transmissions after March 21. These two effects show up as brief drops in the daily new infections:

How would the epidemic proceed after that? How would it it go without any intervention (like social distancing), or if the interventions are more effective? And finally - what would happen if the interventions are effective, but would be relaxed later? The following graph shows the predictions for those 4 scenarios, looking at the total number of infections over time:

The results for the "no intervention" model (the blue curve) are shocking: if the epidemic proceeds at the rate is appears to be growing, then almost every person in the US would be infected before the end of April! But remember that the onset of symptoms is delayed by about a week, and deaths from COVID-19 are delayed another 3 weeks, so the largest number of deaths would not happen until May.

The red curve shows what would happen if the current measures (which started after 51 days, on March 16) succeed in dropping the transmission rate by 50%. The infections pile up a bit slower and reach a lower maximum; still, more than two thirds of the US population would be infected before the end of May.

The yellow curve shows what happens if the current measures reduce transmissions by 80%. This would mean that each infected person infects, on average, less than one new person, and the epidemic come to a stop.

The green curve also starts with an 80% reduction, but following a drop of new infections, it assumes that restrictions are relaxed at day 100 of the simulations (about 50 days after they were put in place), and the transmission rate goes up to 50% of the initial rate (the rate remains lower because some measures persist). At this point, the epidemic would rebounce, and closely follow the red curve, but with a delay of about 50 days.

Let's have a look at the number of new infections per day:

Again, the results for the blue "no intervention" model are shocking: the maximum number of new daily infections would be more than 24 million before the end of April! After that, the number drops rapidly, simply because the majority of the US residents would be infected by then.

If the current measures would lead to a 50% reduction in transmissions, the maximum number of daily new infections would be reduced to 7.5 million, and delayed towards the beginning of May. Here are the numbers for the four scenarios:

The only number that looks manageable is the one for an 80% reduction in transmissions, with 1.5 million total infections, a per-day maximum close to 100,000 infections, and about 9,000 total deaths. In all scenarios, the calculated number of deaths is based on an infection fatality ratio of 0.6%; however, this number is likely to low if hospital or ICU capacities are exceeded, which would definitely happen in the other three scenarios.

So, will the current measures be successful in reducing the transmission rate by 80% or more? We do not know. Due to the delayed onset of symptoms and a hesitant and variable adoption of the policies, and insufficient test capabilities that distort the numbers, it will be several weeks before we even get an idea. As the green curve in the models above shows, lifting restrictions too early is dangerous, since the epidemic will flare up again quickly.

A number of studies have shown that reducing transmission rates by the required amounts is not easily accomplished. I will leave the details to other posts, but let's have a quick look at an example of two to illustrate this.

Let's assume that 80% of the population change their behavior so that they successfully avoid infection completely, for example by staying at home, while the remaining 20% mostly ignore the social distancing and "shelter at home" directives. Would that not be enough to reduce transmissions by 80%? Unfortunately, it would not. Shortly after the "good 80%" started staying home, they would indeed see a reduction in infections, likely with some delay from in-household infections. This would reduce the infections briefly. However, the remaining 20% would mostly be in the company of similar-minded individuals, and experience little change in social interactions. Therefore, the epidemic would continue to grow largely unchanged in this subset of the population, and continue to do so until the majority of the "bad 20%" is infected. If the "good" and "bad" group would remain perfectly separated, then the final effect would indeed be a reduction of numbers by 80%. However, some mixing between the groups is likely, and definitely occurs when the restrictions are eventually lifted and relaxed; then, the final outcome would be the same, with merely a delay of a few weeks until peak numbers are reached.

In another Gedankenexperiment, let us assume that shopping is a large contributor to transmissions. An infected person, who may not (yet) have symptoms but already is infective, will leave large numbers of virus particles when he touches handles, carts, or merchandise while shopping. The virus can remain stable for days on many surfaces, and likely even longer on cooled surfaces. Anyone who touches the same surface within the next day or two will pick up virus particles. If the second person then touches his mouth, nose, or eyes before the next time he washes hands, he will likely be infected. Since humans tend to touch their face every couple of minutes, and this habit is very hard to change, things are pretty much over once an virus-infected surface was touched.
Now what effect does it have if we reduce our shopping by 80%? Very little. With the "bad" group getting infected at increasing rates and mostly ignoring warnings, many surfaces will still be virus-infected. The next time a "good" person goes shopping, he is just as likely to be infected, should he forget to not touch his face until after the next hand washing or disinfection.

The two examples above illustrate that multiple measures are needed to effectively reduce transmissions to a tolerable level. Countries that have been successful in containing the virus all have strict rules about wearing face masks in public that are adhered to by the vast majority of the population; disinfectant stations nearly everywhere; and multiple other measures like frequent testing and intensive contact tracking. In our shopping example above, face masks would help in several ways. Since they cover mouth and nose, they keep people from "automatically" touching there. They also reduce the amount of virus an infected person "places" when coughing or just breathing, and reduce the likelihood of infection from "breath sharing" if you happen to be close to another person. Similar arguments can be made for disinfection stations; these seem to be getting less common in the US now because the supply of gel-based disinfectants cannot keep up with the vastly increased demand.

Well, this is a long post, so I thank you for your patience in reading it. I developed my model in Java, since that was the easiest for me. I kept it very simple and straightforward on purpose - it's just one file that you can modify to run your own simulations. It is available for downloading at github.com/prichterich/covid19modeling.

Corona Virus Science

Tuesday, March 24, 2020

How Bad Is the COVID-19 Epidemic in the USA?

1 comment:

Search This Blog