Sunday, April 19, 2020

Data Trend Modelling COVID-19 Cases and Deaths

In a recent post, I explained how COVID-19 projections can be based on trends observed from actual data.  I have started to expand on that idea by writing a program that uses the most current data on COVID-19 cases to predict expected cases and deaths in the near future. In this post, I'll compare some preliminary results to results obtained with the IHME model that the White House currently favors. I will use Italy as an example since Italy implemented a country-wide lockdown earlier than other western countries, so we have more data to look at.

Here is a graph that compares actual data with the predictions of two different models: the IHME model and my model, which I named the "Data Trend" model:
 Solid lines represent actual data, dotted line show projections. Daily confirmed cases are shown on blue, and use the left y-axis. All other colors represent daily death numbers, and use the right axis. The use of two axes allows for an easier comparison of the case and death curves.

The model simulations are using data before 4/4/2020. The red dotted line shows the IHME model results from 4/5/2020, downloaded from the IHME site. The IHME model predicts a very rapid drop in deaths. For April 18, this IHME model run predicted about 100 death; the actual number was closer to 600. Clearly, the IHME model was way too optimistic about how fast the deaths rate would drop.

The dotted blue line shows the predictions of daily new cases by the current Data Trend model. Basically, this model estimates new case numbers by extrapolating the observed trend in new cases during the previous 2-3 weeks. In the model run shown, the model used only data until 4/4/2020. The predicted drop in new cases is much slower than the drop predicted by the IHME model. For the time until April 18, the Data Trend model predictions are much closer to the actual case numbers that Italy has reported.

The black dotted line represents the estimates of daily deaths, which is based on case numbers. The number of COVID-19 deaths is proportional to the number of cases; since death happens after the COVID-19 testing, the death curve is delayed by a number of days. In Italy, about every seventh person who tested positive for COVID-19 died (primarily because testing capacity was limited, so testing was restricted to the most severe cases). This is reflected by using a separate y-axis with a roughly 7-fold "zoomed in" scale. Looking at the solid lines, we can see that the death curve  followed the cases curve closely, with a delay of about 5-6 days.

In the graph above, I used a spreadsheet program to estimate the scaling and offset of the death curve, and to predict daily deaths from actual and estimated daily cases. In the future, I'll add this to my program so that these estimations are done automatically (and with a bit more accuracy).

To understand why the two different models give such vastly different projections, we have to look at the underlying assumptions the models make. I'll rephrase these assumptions as questions in plain English:
  • IHME model: What will happen if deaths drop as quickly as they have in Wuhan, China, after "social distancing" measures where implemented?
  • Data Trend Model:  What happens if cases and deaths keep dropping as quickly as they did in the last 2-3 weeks?
The assumptions of the IHME model are completely unrealistic for the US, Italy, and most western country, because the ignore the large impact that the Chinese measures had on COVID-19 transmissions. The initial measures taken in China were much stricter than the measures in the US, and the initial measures in Italy; China later intensified these measures twice. To mention just two of many important differences:
  •  China tested and quarantined anyone who had contact with a confirmed case; this "caught" many infected individuals even before they had symptoms, as well as asymptomatic individuals. Some estimates are that at least 30% of all transmissions happen before symptom onset, or from asymptomatic individuals; very extensive track, test, and quarantine efforts are essential to stop these transmissions.
  • In the final stages, door-to-door controls were done in China, where every person with a fever or other COVID-19 symptoms was tested and quarantined. 
Without such extreme measures, hoping that the case and death rates will drop as quickly as in China, which is a basic and important assumption in the IHME model, is nothing but wishful thinking.

In contrast, the Data Trend Model make no such assumptions; it only assumes that current trends in the case developments will continue. This means that the model cannot give accurate predictions when large changes are made - for example when new "stay-at-home" orders are issued, or such orders are lifted to "restart the economy". However, the models can still be useful in such cases to illustrate the effect of such changes, which will be reflected in the differences between predictions and actual data. Note that when changes happen, their effect in the case reports will be delayed by at least several days; when testing is done only for severe cases, then the delay can be more than 10 days. Similarly, slow adaptation to changes can cause additional delays.

In the case of the Italy example, no changes in "social distancing" policies and other measure were made in the time we analyzed, so the model results match the actual data reasonably well. This is also likely to the the case for the US, since most states have implemented social distancing measures more than 2-3 weeks ago. I will report on model results for the US in the near future; but there is absolutely no doubt that the predicted number of COVID-19 deaths will be substantially higher than the numbers predicted by the IHME model.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.