Corona Virus Science: How Fast Will The COVID-19 Pandemic End?

In this post, I will look at ways to predict how fast the COVID-19 cases will go down, based on actual data about COVID-19 cases and deaths. I'll explain all the steps so that anyone with who is getting bored while staying at home can reproduce the results, or do a similar analysis for the country or region they live in.

Let's start by downloading two data files for the COVID-19 pandemic from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series. We want the two "global" .csv files, one for "confirmed" cases and one for deaths.

Open the downloaded file in your favorite spreadsheet program - I used LibreOffice (tip: click the "Detect special numbers" checkbox when opening the file so the dates are read correctly). Find the lines for Germany, and copy them to a new spreadsheet. I also copied the header line, and then used "Cut" and "Paste special" to transpose the lines, so that I now have all the numbers in three columns. After deleting a few rows and editing the header line, I added two columns where I calculated the new cases for each day and the daily death - here's a screen shot:

To start, I want a graph that shows the new cases from March 1 on. I used the "Hide columns" and "Hide rows" functions after selecting columns B and C, and then rows 2-40. Then I selected the "Date" and "Cases" columns and used the graph wizard to create a line graph:

New "confirmed case" numbers for Germany

There are a lot of spikes in the graph, so the first thing we'll do is smooth the data to get a better picture. I simple created a new column where the numbers are the average over 7 days, and plotted that. I also added the daily number of new deaths, also averaged over 7 days, to the graph:

Smoothed case numbers and deaths for Germany

The smoothed data make it much easier to see that the case numbers in Germany peaked about 2 weeks ago, and have been declining since then. The number of deaths is a lot smaller, which makes it hard to compare the curves. To be able to compare the curves better, we can multiply the number of cases with the "Case Fatality Rate", or CFR. If we assume a CFR of 3.7 percent, and multiply the case numbers with that, we get the two curves to be about the same height:

Daily case numbers for Germany multiplied with the assumed CFR of 3.7% to match death rates

Now, we can see that the death curve looks very similar to the cases curve, but is delayed by a number of days. One thing worth noting is that I did not "assume" a CFR of 3.7%, but rather tried a bunch of different values until I got the curves to match well. This should work for any epidemic that has plateaued for a long enough time, but not for an epidemic that is still in the exponential growth phase.

By now, you probably have heard that the initial growth of the epidemic is exponential, and seen many plots that used a logarithmic scale for the y-axis. That's easy to do in LibreOffice, too. I then copied the graph into Preview and added a few lines:

Data from graph 3 using a log scale. Lines indicate region of steady exponential growth.

The gray line shows that there was a period where the case growth was linear when plotted on a logarithmic scale - that means it was exponential. The green lines show the approximate dates for the linear growth in cases: from about 3/6 to 3/18. After 3/18, the growth of new cases slowed down gradually until the end of March; in April, is looks like it is started to turn into a straight line again, but this time with a downward slope, since we get fewer and fewer cases.

Note that we can get exactly the same graph if we plot the log of the case numbers with a linear y-axis; the only thing that will change is the numbering of the y-axis:

Next, we'll let the spreadsheet program estimate the slopes of the lines during the exponential growth phase and during the decline phase. To do that, we separate the days we want to analyze together into separate columns: one from 3/6 to 3/18; and one from 4/9 to 4/15. Then, we can have LibreOffice insert "trend lines":

Estimating growth rates for the COVID-19 epidemic in Germany

We can use the linear functions to describe the growth respectively decline rates. A change by 1.0 units on the log(10) scale corresponds to a 10-fold increase or decrease in case numbers. In early to mid-March, the number of new cases grew 10-fold in about 9 days (1/0.108). In the middle of April, the number of new cases shrank at a rate that would result in 10-fold lower new cases per day within a bit more than one month (1.0 / 0.0301). However, the downward slope of the curve may not yet be stable, so the actual decrease in cases may be faster. One way to get an idea about the uncertainty is to draw different lines through the data, and see how many days are needed to drop the log of the counts by 1.0. When I did that, I got numbers somewhere between 15 days and 40 days. Either way, that's great for Germany - the number of new cases is dropping quickly.

There's a bunch more fun to be had with the graphs. For example, you can extrapolate the initial growth line down to 10 cases or 1 case. You'll end up somewhere around February 20-26: the week after Karneval, which has been linked to one of the biggest outbreaks in Germany near Heinsberg. Other cities also had big parties in middle to late February, which apparently was a huge driver of very rapid exponential growth. The timing of the slowdown also matches the implementation of social distancing and other measures in Germany which started from March 10 to March 22, if a delay of 7-10 days between infection and reporting of results is taken into account.

How about the US?

Let's look at the graphs for the US:

Smoothed (7-day average) daily deaths and case numbers
(multiplied with assumed CFR of 6%) for the US

This graph is the equivalent to graphs 3 for Germany. A few things to note:

The leveling and drop in cases per day is later and less pronounced than for Germany.
Matching the curves required a CFR of 6%, about 65% higher than for Germany. The most likely cause is less testing in the US.
The delay between case report and death is lower in the US.

Following the same approached outlined above, we get this graph:

With the currently available data, the downward slope for the US is significantly lower than for Germany. If this trend were to continue, then the number of cases in the US would drop a lot slower than the number of cases in Germany (using relative drops, not absolute numbers). In 30 days, the number of new cases per day would still be about 18,000; in two months, about 11,000. Over the course of then next 4 months, this slow decline would lead to a total of about 2.1 million confirmed cases, and about 130,000 deaths. Note that this assumes that the current level of social distancing measures would remain in effect the entire period.

However, it must be noted that there is a lot of uncertainty in this prediction. Even slight changes in the slope of the decline line would lead to large changes. The current trend in the case numbers seems to be more downward than the 7-day averages indicate, which would lead to a slower decline. On the other hand, there is a growing movement in many US states to stop the "stay-at-home" measures. This movement will lead to an uptick in new infections if some states lift the restrictions within the next 120 days.

For comparison, let us look at two other scenarios.

The first scenario is "leveling out": the new cases remain constant at the current level (for example because some states relax social distancing regulations). In this case, the predicted number of confirmed cases 4 months from now is about 4 million, leading to about 250,000 deaths.

In the second scenario, the US would quickly reach daily drops in new cases similar at the same rate as Germany. This would lead to a drop in daily new cases to less than 4,000 within a month, and less than 500 within two months. The total number of confirmed cases would be around 1 million, and the total number of COVID-19 deaths about 62,000. Reaching the drop of cases in this scenario would require regulations that are as efficient in stopping COVID-19 transmissions are the regulations in Germany are. While the regulation may appear similar to regulations in the US, there are many important differences. These include:

Test, track, quarantine: Test rates in Germany are significantly higher, and delays to get test results are lower. Tracking of contacts of infected persons is generally done, and anyone with contact has to self-quarantine. Breaking quarantine rules is subject to substantial penalties.
All public meetings are prohibited, and the rule is enforced by police. In contrast, most US states follow the federal guideline to allow meetings of up to 10 persons. Some US states have exceptions for religious and other purposes, and/or higher allowed numbers.
The list of essential business that are allowed to continue operating is significantly stricter in Germany than in most, if not all, US states.

Each of these items contributes to reducing COVID-19 transmissions, and thereby reducing the total number of COVID-19 cases and deaths. To limit the total number of COVID-19 deaths below 100,000, these and other measures would have to be implemented in most or all states very soon.
--
This post was inspired by a study that used an "interrupted time-series analysis" to look at the effects of social distancing measures in the US. The biggest differences are that my approach above is much less scientific, but pretty easy to reproduce by anyone who knows how to use spreadsheet programs; and that I specifically allowed for a "transition period" where the interventions are starting to take effect.

Corona Virus Science

Thursday, April 16, 2020

How Fast Will The COVID-19 Pandemic End?

How about the US?

No comments:

Post a Comment

Search This Blog