Tuesday, March 31, 2020

Subgroup Effects in Social Distancing

The US  relies primarily on "social distancing" measures fight the COVID-19 epidemic. To some extend, this is based on computer models that predict that such measures will be successful. However, experience in multiple countries has shown that partial and voluntary social distancing measures are not very effective. Many countries in Europe have increased the measures multiple times, with increasingly stricter measures up to "stay home" orders and closures of non-essential businesses for entire regions.

One of the major limitations of computer models for epidemics is that they are often based on a simple infection model that does not seem to apply to COVID-19. Briefly, the models assume that the chance of getting infected is directly proportional to the number of close contacts an uninfected person has. Models generally link this number of contacts directly to the reproduction number R. The goal is to drop R below 1, which means each infected person will, on average, infect less than one other person. Once this state is reached, the epidemic will eventually stop.

In many studies, it is assumed that reducing contacts will directly reduce R. In COVID-19 studies, the initial reproduction number (called R0, R zero) is often assumed to be close to 2. The (faulty!) logic is that if contacts can be reduced by a bit more than 50 percent, then R should drop below 1, and the epidemic should be contained. For example, if contacts are reduced by 75% because many people stay home, and those who can not stay home do keep distance from others, the assumption is that this should be sufficient.

However, there are multiple reasons why this approach is overly optimistic, at least for COVID-19. In this post, I will focus on just one of these issues: the effect of sub-groups.

The source of the problem is that different people have very different perceptions of risk from COVID-19. Since many regulations are either voluntary or not enforced, those who perceive the risk as low often do not change their behavior more than absolutely necessary. Such behavior is often concentrated in groups that tend to interact socially, as this recent picture from Bondi Beach in Australia illustrates:
It is reasonable to assume that in such subgroups, the number of social contacts is virtually unchanged; therefore, the risk of infection for this subgroup is not adequately reflected in simplistic computer models.

A related problem are infections that occur at work. While some companies can let their workers work from home, or temporarily halted operations, many people continue to go to work everyday. For them, the risk of infection from work interactions remains; for some of them, like grocery store workers, medical professionals, and first responders, the risk can actually be substantially increased.

I have modified my simple computer model to take such sub-grouping into account. I could give it fancy names like "Group-specific Markov-chain iterative implementation of SIRS models", but it's really pretty simple. I just let my group of infected people grow, generally using parameters that are reasonably well known, and tuning them so the predictions matched the reported COVID-19 deaths in the USA. At the day that the federal "15 day guidelines" to stop COVID-19 were announced, the model split the population into 2 or 4 groups with different risk characteristics. Let's have a look at some results:
Predictions of daily death for different uniform and mixed group scenarios
Let's start with the blue line. It shows the predicted the number of deaths if no interactions had taken place, and infections had continued to grow as they did until March 16. The maximum number of deaths is about 140,000 per day. This does not take the higher mortality from ICU overloading into account, but is nevertheless higher than what many other models predict. But do not focus on the absolute numbers, the issue at hand here are the relative effects of interventions and subgroups.
The red line shows the effect of a 20% reduction in contacts. The effect is small, limited to a slight delay and a small reduction in maximum deaths per day. A 50% reduction (yellow line) in transmissions per infection leads to a more pronounced delay and a visible flattening of the curve. At an 80% reduction (green curve), the maximum number of deaths per day drops below 400, a dramatic reduction. My model uses an R0 around 3.7, so the 80% reduction reduces the reproduction number to less than one. This also was assumed to happen right after the announcement on 3/16, when the number of confirmed cases was about 30-fold lower than today. All these curves are very similar to what other models predict; the main difference is that models which use a lower R0 will show a dramatic drop at a lower percentage reduction.

The violet curve was generated using a model that assumed two groups of even size after the announcement: one group reduced contacts a lot, enough to have 80% fewer transmissions in the group; but the other group pretty much ignored the guideline, and transmissions in this group went down 20%. The result for the two groups together is that the curve was mostly scaled down 2-fold, which reflects a rapid spread of the infection in the "ignorer" group.
This separation is obviously simplified, so let's look a the light blue curve. Here, we separated the population into 4 subgroups:
  • "Party animals": 10% of the population does not believe that COVID-19 is dangerous, and decides to party on, with no reduction of transmissions in this group. This group also includes first responders and grocery store workers who have no or limited access to protective equipment.
  • "Non-believers": 20% of the population is not very concerned, but measures still affect them and drop transmission rates by 40%
  • "Workers": 20% of the population that has to keep going to work but experiences a 60% drop in transmissions
  • "The Fearful": 50% of the population who take mostly stay at home, and have a 10-fold reduced transmission risk.
This "almost realistic" mix of groups sees a maximum death rate of about 20,000, seven-fold lower than without interventions. While this is a notable drop, it still results in more weekly new infections than there are ICU beds in the US, and ICU overloading is at least possible.

Let us have a look at the total cumulative number of deaths predicted for each scenario:
Without interventions, the total number of deaths is above 2 million, very similar to what the study by the ICRF group in London predicted. A 20 percent reduction in transmission rates changes little, and even a 50% reduction still leads to about 1.6 million deaths. Our "almost realistic" mix of four groups (the light blue line) shows a slow and steady rise of fatalities from COVID-19, even though 90% of the population experience reduced transmission rates, with half of the population seeing a 10-fold reduction in transmissions which may be hard to reach in the real world. Note that the model assumes that all containment measures would remain in place for the entire duration of the model run; and relaxation would cause a secondary flare-up.

The results above illustrate clearly that "social distancing" measures will only have the desired effect when they are observed by virtually everyone. If substantial groups of "non-believers" continue to have social contacts within a similar-thinking subgroups, the effects of such measures are much reduced. Similarly, the effectiveness of social distancing measures can be dramatically reduced if large subgroups of the population continue to work in group settings, and by groups like grocery store workers, medical professionals, and first responders that do not have adequate access to protective gear and other protective measures.

These computer model results clearly show some major limitations of voluntary social distancing. COVID-19 containment strategies that primarily rely on voluntary social distancing measures are likely to fail, as was observed in Italy and Spain. Once widespread community transmission and high case numbers are seen, as is the case for the US and many European countries, border closures and checks will have only very limited effect.

To contain the COVID-19 epidemics until vaccines become available, a multiple number of complementary and mandatory measures are needed - social distancing is not enough. All countries and regions that have had success in containing the epidemic, including Taiwan, Singapore, Hong Kong, and South Korea, applied a large number of complementary containment measures. Trying to rely on a minimal and voluntary set of measures is not optimistic - it is deadly.

Sunday, March 29, 2020

Can Face Masks Prevent Corona Virus Infections?

Liz showing her self-made facemask

The use of facemasks by the general population is generally discourages in the US and Europe, often stating "they do not help", and claiming scientific evidence. In stark contrast, the widespread use of facemasks has been part of the measures in all Asian countries and regions that have contained the infection, including Taiwan, Honk Kong, South Korea, and Japan. This post analyses some of the relevant science and health agency recommendations.

In the US, both the director of the CDC and the Surgeon General advised against using facemasks. A look at the "How To Protect Yourself" page at the CDC website states that "if you are not sick, you do not need to wear a facemask":
From https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/prevention.html,
Captured 3/29/2020. Highlighting added for this post.
However, the highlighted region explains why this advice is given: facemasks are in short supply. To a large extend, this was caused by offshoring production - 95% of surgical masks and 70% of N95 respirators are made overseas.

Prima facie evidence indicates that facemasks work to prevent infections: health care personnel generally has to wear facemasks around infective patients, and the (cheaper) surgical masks derive their name from the fact that they are worn in surgeries, primarily to prevent infection of the patient. But let's look a the science.

There are some studies that show that facemasks used by the general population can reduce infections, and other studies that do not see any positive effect. It is important to look at the studies carefully, and see what exactly they measured, and how they measured it. I'll talk about just one study as an example, where they looked if wearing facemasks would protect others in the household if a child had a respiratory illness. The study found no effect, but it also found that only 21% of the participants used the masks often or always. If you don't use the mask, it does not work! They also went on to look specifically at the subgroup of people who did wear the mask most or all of the time, and did find a protective effect.

Since surgery masks and respirators are in short supply in the US and Europe, the very limited supplies should absolute be left to where they are needed most: hospitals and doctor's offices. But what about self-made masks? Do they work? Some doctors and official will simply state "they don't work", but what they actually mean is "they do not work nearly as well as properly fitted and used surgical masks and respirators". But it turns out that even the CDC recommend using self-made masks if nothing else is available:
Advice from the CDC website to doctors and hospitals
when no masks are available
That is for good reasons: multiple studies have shown that selfmade masks work to some extend - not as well as "proper" masks, but better than nothing. Here are data from a study that looked at different materials:
For a easier-to-read article, check this web page.

Many of these studies look at how well masks work at how well masks absorb small particles like bacteria or viruses when breathing in. But that's just one way how masks work. For the COVID-19 epidemic, the more important aspect is that masks reduce infected people from spreading the virus. Instead of explaining, just look at this image, and consider that each drop can contain many millions of virus particles:
If you stay 6 feet away from this sneezing or coughing person, you will not get directly hit by most of the droplets, since they fall down. But they then land on the ground, or perhaps the grocery shelf in front of him! There, the millions of viruses sit and wait patiently for someone else to touch the shelf and pick them up. The SARS-CoV-2 virus that causes COVID-19 can be patient - it can "live" for many hours on various surfaces.

Now imagine the picture above with a mask in front of the face. it is really not hard to imagine the difference! For the droplets that are easily visible on a photograph, it is really not that important that the mask captures the tiniest particles.

Let's look at one more aspect: what happens after you touched one of the droplets that the maskless man above deposited in the supermarket shelves. Now, the virus is on your finger. No big deal, since it cannot infect you through your skin, you think? Think again! Here's a picture from a study how often we touch our face:
That's 3 touches to the nose, 4 to the mouth, and 3 to the eyes every hour on average. Touch any of these areas, and the virus gets transferred right into your body where it starts multiplying! We humans touch these areas once every 6 minutes, on average. We've done it all our life, never thinking about it, so that's a habit that is very hard to change.

If we are wearing a facemask, we cannot touch the nose or mouth directly. Sure, we could move the mask to do so, but that is not an old habit, and we can probably stop ourselves in time. That should reduce the touches to one touch every 20 minutes. Our chances of making it out of the supermarket without touching a "transfer area" just increased a lot! If the supermarket has disinfectant at the checkouts, or even better at various places throughout the store, even better!

The US surgeon general said "we don't have data for that". Well, now, it's obvious, is it not? Although he also did point out one thing: you should not adjust your facemask constantly, something people who do not use masks a lot may do. But every nurse and doctor has learned that, so others can, too.

So, while we are waiting to the governments to do something that will drastically increase mask production (something Taiwan started doing in January!), we'll have to make out own, or perhaps use a dusk mask we have around for sanding or painting. When you use the mask, be careful where you touch it: after a while, there may be a lot of droplets with virus caught on the outside, so don't touch the outside!

There are plenty of instructions for various self-made masks on Youtube and other sites. Here's one example that looks simple enough:

So, if you really have to leave your house, do what all people in the countries that have stopped the COVID-19 epidemic early do: wear a facemask to protect yourself and others!

 Added April 3, 2020:

Since I wrote this post, things are changing in Europe and the US. Austria and other European countries have made mask use mandatory in public. Germany has not, but there is a strong movement under the #maskeauf tag to wear masks. In the US, the CDC is expected to suggest that everyone wears masks in public soon. Here's a great video in German about wearing masks:

The only thing I have to add is that studies have shown that the virus is also distributed by normal breathing, not just by coughing and sneezing. But masks reduce how much virus "escapes" in normal breathing, too (and how far it gets away). So make your mask and wear it!

No, It Does NOT Die At 26/27 degrees

At least once a day, I get asked about posts containing the statement that "the corona virus dies at 26/27 degrees Celsius". Sometimes it's part of a private letter from a professor (which is probably where it originated), but it now has also been made into a fancy video.

This is not true. It is based on an extrapolation from other corona viruses that cause the common cold, and "live" primarily in the nose. Nasal tissues are much colder than the rest of the body during the winter. The new corona virus (SARS-CoV-2) infects the throat and the lungs, where it multiplies without any problems at 38 C (100 F).

There is also a study that looked at how long the corona virus survives at different temperatures; here's a snapshot of some results:
This means the virus is
  • very stable at refrigerator temperature
  • stable for days at room temperature
  • stable for 6 hours at body temperature
  • stable to 5 minutes at 56 C (130 F)
Stable here means that a drop containing the virus would be as infectious as at the start. Even small drops contain way more virus than is needed for an infection.
At room temperature, the virus starts to "die" off noticeably within a couple of days. I usually get my mail, put it in the entry room for a couple of days (in a draft-free area), and leave it for 2-3 days before looking at it. Don't forget to wash hands right away after getting mail, and again after opening it! 
After writing this post, I found an article from 2011 that looked at another coronavirus: SARS-CoV, the virus that caused the SARS epidemic in 2002-2004. This article looked at how stable SARS is in solution and in dried form. If found that the SARS-CoV was quite stable even in dried form at 33 C (91 F), but not at 38 C (100 F) if the humidity levels were above 95%.
For some other false myths about the coronavirus and weather or temperature, check the WHO website.

Friday, March 27, 2020

Hidden Cases: Tests Capture Only 1 of 41 Infections

A team of researchers from Stanford University and UC Berkeley has published an interesting study. Since testing for the corona virus is well known to underestimate the number of infections, they looked at hospitalizations to get a better idea how many cases there really are. The study describes in detail how they did it, and is well written and easy to understand, but I'll give a brief summary here.
He said: "No, Houston, it is small - we've got nothing to worry about", but could not understand why they did not believe him.

The basic idea is that COVID-19 patients who have symptoms that are so severe that they need to be treated in a hospital are much more likely to be tested than someone with mild symptoms. If we then take into effect what percentage of patients end up in the hospital, we can estimate the number of infections. To do so, we also need to take into account that several days elapse between the infection, the onset of symptoms, and the admission to the hospital. A lot of studies have tried to determine the exact numbers for each of these variables, so it is possible to have good estimates. These can then be fed into a computer model that calculates how many infections correspond to the observed number of hospitalizations. The authors used numbers from Santa Clara County in California, a region in California that was hit relatively hard by the COVID-19 epidemic.

With the most likely set of parameters, the study estimated a total of 6,500 infections for the almost 2 million people in the county on March 17. With a very optimistic set of parameters, the number is reduced to 1,400 cases; with a more pessimistic set, the number increases to 26,000.

A look at the COVID-19 tracking site shows that for all of California, there were 483 confirmed cases on 17 Mar 2020. The internet archive site shows that 155 of these cases were in Santa Clara county. So only 155 out of 6,500 case were reflected in the "official" numbers. In other words, the tests captured just 1 out of every 41 infections.

This "hidden case" factor of 41 is similar to the factor of 35 reported before for a similar study that looked at reported COVID-19 death in the US. There is 6-day time difference for those two number, during which time the more widespread availability of tests reduced the "hidden case" factor for the US; on March 17, the factor from the death-based study was 92. A likely source for the discrepancy is that Santa Clara county had more available test capacity than typical for the US; the county has reported the ability to do 100 test per day. Even with this test capacity that looks reasonable when compared to reported case numbers, the county lab limited testing hospital patients and members of high-risk groups.

Where does the large "hidden case" factor come from?

Multiple issues contribute to the issue, but we can estimate at least one of them: the "infection-to-test delay factor". If every person would get tested every single day, that factor would not exist (or, mathematically, be 1.0). But this cannot be done at any place in the world, nor would it make sense. Instead, the minimum wait is until first symptoms appear. This is about 5 days for COVID-19. During that time, the size of the epidemic has doubled. In other words, for an epidemic with a doubling time and time to onset of symptoms of 5 day:
  • If every single infected person would be tested on the first day of symptoms, the "infection-to-test delay factor" would be 2.0 
Clearly, this is still to optimistic. Initial symptoms are generally mild, and get worse as the disease progresses. Further more, it may take a day or two before the test is actually performed, and another day or two before the result is included in the officially reported numbers. Together, this makes a delay of about 10 days more likely: 5 days incubation period, 2 days for symptoms to increase, 2 days to get the test, and one day for reporting. During these 10 days, the number of infected people doubles twice, so we get:
  • A more realistic estimate of the "infection-to-test delay factor" is 4.0 
 Note that this factor actually will be higher if the epidemic grows more rapidly. But let's ignore that, and ask:

What else contributes to the "hidden case" factor?

Like in many other countries, access to tests in the US has been limited. Just having symptoms alone was not a sufficient reason to qualify for a test; instead, factors like exposure to a confirmed COVID-19 case, travel history, or being a medical professional were required - often together with symptoms. Without other factors, COVID-19 testing was generally limited to patients with severe symptoms. A typical estimate is that 80% of cases have only mild symptoms (or no symptoms at all). This adds a "light symptoms factor" of 5.

There are other reasons why an infected person may not get tested, or not be included in the official case numbers. Some people prefer not to get tested, even with moderate symptoms; some may also have a cold or the flu which obscures the COVID-19 infection; some may not be able to get to a testing site, or convince their doctor to given them the necessary note; testing may fail and return a false-negative result; and to more. We'll group all those together into one factor, which we will call the "other reasons factor". Let's assume this factor is smaller than the others, and give it a value of 2.

We can view the inverse of the factors as probabilities. The chance that any given infection is "old enough" is 1 / 4, or 25%. The chance that an infection will be severe enough to warrant testing is 1 in 5, or 20%. Since they are independent probabilities, we can just multiply them to see what the chances are that an infected person will be tested and included in the official numbers:
  • p(report) = 0.25 * 0.2 * 0.5 = 0.025 = 1/40
That means only one in 40 infected persons will get tested - just what the simulations predicted! In other words, we can explain quite well why the "official" case numbers understate the true number of infections.

Bottom line: A large "hidden case factor" can easily be explained

Yes, the official numbers are indeed likely to understate the number of confirmed cases by a factor of 40. This cannot just be explained, but actually would be expected from what we know about the testing and the disease:
  • An "infection-to-test delay factor" of 4 is caused by the incubation time, and delays between onset of symptoms and reporting.
  • A "light symptoms factor" of 5 is the result of limiting testing to cases with more severe symptoms,
  • Multiple other factors together are less important, and can be summarized in an "other reasons factor" of 2.0.
In other words, the things we know for sure cause a 20-fold under-reporting; other factors that we know about, but cannot easily quantify, only cause a 2-fold under-reporting.

To improve this situation, testing guidelines must be relaxed. For example, the "light symptoms factor" of 5 could easily be reduced by testing everyone with light symptoms; the testing guidelines in several countries now allow for all suspected cases to be included.

Even if tests are made much more available, though, the "infection-to-test delay factor" remains as long as the epidemic is in an exponential growth phase. The only way to get a true number of current infections would be to test an entire population (or a subset) regardless of symptoms. One step that goes in at least in this direction, but is easier to implement, is the contact tracing with complete testing of all identified contacts.

Understanding the "hidden case factor" is very important for dealing with the pandemic. Humans have a problem to really understand how quickly a rapidly growing epidemic spreads; the "hidden case factor" only compounds this problem. People tend to take a reported number at face value, and conclude the danger is minimal if the number is small. But let's look at what 100 confirmed cases really means for an epidemic that doubles every 5 days if no effective interventions are taken:
  • Now:
    • 100 confirmed cases
    • 4,000 actual infections 
  • 3 weeks later:
    • 800 confirmed cases
    • 32,000 actual infections (of which 160 will die if the IFR is 0.5%)
  • 2 months later:
    • 16,000,000 infections in 2 months without interventions
    • 80,000 deaths within 3 months even if the no additional infections happen after 2 months
This is a hypothetical model. In reality, the epidemic has spread faster in the US and many other countries. The number of reported deaths in the US up to March 26 (1,295) is about the same as the number of reported cases on March 11 (1,301). In other words, the number of reported cases indicated the  number of total deaths less than 3 weeks later.

Did you like the iceberg picture at the top of the post? An iceberg is actually a pretty bad analogy here. For icebergs, 10% are above the water, much more than the 2.5% of COVID-19 infections. Icebergs barely move and do not grow very rapidly and exponentially.

Thursday, March 26, 2020

Death Rates and Testing Rates - Why Fewer Germans Die of COVID-19

Yesterday, a couple of friends approached me with the same questions: why does is look like COVID-19 is less deadly in Germany than in many other countries? So had I a closer look at the numbers at Worldometers. This graph illustrates the issue:
Deaths relative to reported COVID-19 cases on 3/25/2020. Note that the y-axis is logarithmic.
The graph shows the "raw case fatality rate" (CFR) for nine countries. The raw CFR is calculated by simply dividing the number of reported deaths by the number of reported cases; higher numbers indicate a more deadly epidemic. The nine countries in the graph fall into three categories:
  • ~ 0.5% CFR (Australia, Germany, Austria) in green
  • ~ 1.5% CFR (Switzerland, US, Denmark) in blue
  • ~ 7-10% CFR (Spain, Iran, Italy) in red
It is well understood why the third group has a much higher rate. In all three countries, the large number of cases has overloaded the hospital system, and only a small fraction of patients that need a ventilator can get one; most of the other patients who cannot get a needed ventilator die, thereby increasing the death rate dramatically. But what is the reason for the differences between the first two groups?

Better Healthcare?

Theoretically, it could be that a better health care systems manage to keep more patients alive. But looking at the countries in the first two groups, this can be excluded: Switzerland and Denmark both have universal health care that is at least as good as health care in Germany, Austria, and and Australia.

More testing?

Another potential reason for the differences between the first two groups is that some countries perform more tests than others. Let's look at how many tests some of these countries have done; since the countries are vastly different in size, the numbers are normalized to tests per one million inhabitants:
This chart shows that the three countries with the lower CFR rates (Australia, Germany, and Austria) have done substantially more testing than two of the countries with the intermediate CFR rates (Switzerland and United States). This is a strong indicator that the observed higher CFR rates are caused by lower levels of testing. But let's look at some of the differences in more detail.

Limited testing in the USA and Switzerland

The US in particular had problems to get sufficient test capacity online; the COVID tracking project shows that fewer than 1,000 tests were performed in the US until March 4.  Testing capacity has increased in the last 10 days, which resulted in a dramatic rise in the confirmed case numbers. The US now has the highest number of active COVID-19 cases (72,702 on 3/26/2020), and the highest number of new cases per day. If the trend from the last 3 days continues, the US will have the highest number of total confirmed cases by tomorrow, passing both China and Italy.
Switzerland has chosen a very restrictive testing policy that excluded anyone with mild COVID-19 symptoms, unless they had additional risk factors like pre-existing medical conditions. The restrictions even excluded people with symptoms who had been traveling to high-risk regions like northern Italy.

Extensive testing in Germany, Austria, and Australia

Germany has done 120,000 tests per week in the last month. The testing guidelines require testing for anyone with symptoms who has been in contact with a confirmed COVID-19 case, works in health care, or has medical risk facors, but make testing for all other people with disease symptoms optional, depending on available capacity.
Austria reported that 3,138 tests had been completed by March 4. That's about four times more tests than the US, but Austria's population is 37 times smaller than the US population. Austria has announced an ambitious program to vastly increase testing capabilities.
Australia has performed a very high number of tests, both relative to the number of citizens and relative to the number of confirmed cases. The Australian testing guidelines are similar to those of other countries in that they require testing anyone with symptoms who has been in close contact with another COVID-19 infection; but while Germany limits this to confirmed cases, Australia also includes probable cases, even if not yet confirmed. Like Germany, Australia allows tests of symptomatic patients without additional factors if capacity is available.

The special case: Denmark

Denmark differs from the other five countries in groups 1 and 2 in that is has a raw CFR near 1.5% like the US and Denmark, but has done a similar number of tests (relative to population size) as Germany, Austria, and Australia. The reason for this is that Denmark has been able to slow down the exponential growth of the epidemic, as this graph of new confirmed cases indicates:
Daily new cases in Denmark show effectiveness of local restrictions
Denmark managed to stop the daily increase in new infections, reducing the number of new cases from 252 on March 11 to 132 two weeks later. For comparison, look at the daily new cases in the Netherlands:
On March 11, the Netherlands had 121 new cases. In the next two weeks, the number grew almost eight-fold, to 852 cases.
One effect of controlling the COVID-19 epidemic is that the raw case fatality rate (CFR) increases. This is due to the time lag between the diagnosis (the positive test) and the death. This can be a bit difficult to understand, so we'll look at it in the next section.

Time lag effect on fatality rates

To illustrate the time lag effect on estimated fatality rates, let us imagine a small epidemic that starts with 2 cases. Every week, the number of infections doubles, so that we have 4 cases in week 2, 8 cases in week 3, 16 cases in week 4, and 32 cases in week 5.
We'll further assume that every second person dies from the disease, but that it is a slow death that happens two weeks after diagnosis. So nobody will die in weeks 1 and 2; 1 person will die in week 3; and so on. For each week after the first death, we calculate the raw case fatality rate CFR by dividing the total number of deaths by the total number of cases. Here's a little table that shows the numbers for the first few weeks:
CFR calculation example 1:
Doubling every week, 50% fatality, 2 weeks between diagnosis and death
We can see right away that the calculated raw CFR is much lower than the actual fatality rate of 50%: the raw CFR is just 12.5%! This is typical for an epidemic in a exponential growth phase, as long as there is a significant time between diagnosis and death. The lower raw CFR reflects the growth of the epidemic between infection and death. In our example, the number of cases grows 4-fold in the 2 weeks between infection and death, so the raw CFR is 4-fold too low.
Another way of looking at this is to say that most of the infections are too young to die at any point. In week 4, for example, we have a total of 16 cases, but 8 of them are new, and 12 of them are from the previous week. Only 4 cases are at least 2 weeks old, and only this group has died by week 4: 50% of the 4 people.
We can use this knowledge to calculate a time-corrected case fatality rate (we'll call it tCFR). One way to do this is to look back at the earlier total case numbers, and use these for the CFR calculation. So to calculate the tCFR in week 4, we divide the total number of death in week 4 by the total number of cases 2 weeks earlier: 2 divided by 4 gives the correct fatality rate of 50%.

So what happens when an epidemic starts to slow down? Let's look at the extreme case, where we somehow stop all future infections after week 6:
CFR calculation example 2:
New infections drop to zero in week 7
We see that the raw CFR picks up in week 7, and increases to the expected 50% in week 8. At this point in time, all cases had enough time to die. If an epidemic is just slowed down rather than stopped completely, the effect will be somewhere between examples 1 and 2 above: closer to the the correct CFR, but still somewhat lower. Over time, as the number of new cases becomes smaller and smaller relative to the total cases, it will get closer and closer to the real CFR.

Now let us get back to COVID-19, and calculate some time-corrected CFRs. Studies have shown that the average time between onset of symptoms and death is about 19 days for COVID-19. If we assume that it takes 4 days between onset of symptoms and diagnosis, we can look at the confirmed case numbers 19-4 = 15 days in the past to get a time-corrected CFR. Here are the numbers that we get for six countries when we divide reported death on 3/25/2020 by the number of reported cases on 3/10/2020:
Raw CFR for 3/25/2020, and time corrected CFR using reported cases from 3/10/2020
The numbers for the time-corrected CFR are very large, between 9% and 215%. This reflects the very rapid growth of reported cases in the selected countries between 3/10 and 3/25, which ranged from 6.6-fold for Denmark to to 68-fold for the US. As discussed, a part of the observed increase is due to increased testing.

What does it mean that the time-corrected CFR rates for the US and Spain are above 100%? Well, the number says that the number of people who died is larger than the number of people who were diagnosed 2 weeks ago. The obvious cause for this is that the number of confirmed cases on 3/10 was substantially lower than the number of actual infections. This statement is actually true for all countries in the list: we know that the infections-fatality ratio is somewhere in the range of 0.5% to 1%, at least as long as the health care system is not overloaded.

If we compare the corrected CFR rates to the known IFR rate, what we get is the ratio of actual infections to confirmed cases. If we assume an IFR of 1%, then the reported number of cases in Australia reflects only 1 out of 9 actual infections. That sound incredibly high at first glance, but we must remember that this includes all cases that are "too young" to be diagnosed. Around March 10, the apparent doubling time in Australia was about 3 days. If we allow 6 days between infection and first symptoms, and another three days until the COVID-19 test was done and reported, we have 9 days. This is 3 doubling times, and therefore an 8-fold increase in infections! In other words, the number of actual infection in Australia on 3/10 was 9-fold higher than the reported case number primarily because most infections were pre-symptomatic ("too new").

This kind of analysis really just gives us "ballpark" estimates, since we do not have exact data about the delay between symptom onset, testing, and reporting. If all tests were done and reported within a day of symptom onset, then we'd look at just 2 doubling times, and more than 50% infections that escaped detection. Other uncertainties also remain, for example with respect to time between diagnosis and death, and for the "true" IFR, numbers which may be different from those seen in other countries. Indeed, the most credible IFR calculations resulted in a numbers closer to 0.5-0.6%.

But it appears likely that Australia did indeed test about a quarter to half of all infections, and possibly more. For Germany and Denmark, the number is slightly lower. Since Denmark has slowed the spread of the epidemic 2 weeks ago, the raw CFR of 2% can also be used to estimate test coverage; if we assume a true IRF of 0.5%, it indicates that about 25% of all infections were tested.

For the other countries, the corrected CFR numbers are much higher, indicating a larger number of infections that are not reflected in the official case numbers. For Switzerland, the overall factor is about 60, which includes both pre-symptomatic infections and cases that do not meet the stringent test requirements. For the United States and Spain, the actual number of infections on 3/10/2020 was probably more than 100-fold higher than the reported numbers (949 cases for the US; 1,695 cases for Spain).

Take home lesson

The COVID-19 testing practices in many countries are restrictive and have excluded a fraction number of infections.  When tests are limited to symptomatic patients only, the reported case numbers understate actual infections by a factor at at least 2, and possibly 8, excluding "young" infections, even if no further restrictions are in place. However, many countries have additional testing restrictions, for example by requiring severe symptoms or contact to a confirmed case. In countries with low test capacity and/or stringent test requirements, time correction analysis reveals under-reporting factors of 100 or higher.

Differences in testing rates can quickly be identified by comparing raw case fatality rates; low case fatality rates, like seen for Germany, Austria, and Australia, indicate higher testing rates. This analysis is supported by analyzing local testing restrictions, capacities, and number of tests performed.

Understanding that "confirmed case" numbers are likely to under-estimate the actual size of the epidemic is essential to choose sufficiently strict containment measures. Taking reported case numbers at face value is one of the reasons why so many countries all over the world have not enacted effective measures in time.
I understand that many of my readers will find the claim of 100-fold under-reporting hard to believe, but some epidemiologists have issues similar statements. Empirical confirmation is rare, but can be found in Italy, where testing requirements differ between regions. One study where an entire town of 3,000 people was tested after the first confirmed case found 89 infections in the first round of tests, and 6 infections in a second round. This corresponded to an infection rate of 3%, which was 200 times higher than the reported infection rate for Italy at that time. By quarantining the infected individuals, the epidemic was stopped in this town without any additional infections.

Wednesday, March 25, 2020

Why Stopping COVID-19 Is So Hard

From the new case numbers for COVID-19 infections on the worldometers web site, it is clear that many countries are struggling to contain the COVID-19 epidemic:

Many  countries show 15% to 20% new cases per day, relative to total cases. This indicates that the growth of the epidemic there is still exponential, with a doubling of cases every 3 to 5 days. If such growth continues for 30 to 50 days, then the number of cases will grow one thousand fold in each country! But even though all countries in the list above have implemented social distancing measures, the numbers keep growing. In this post, I will look at some of the reasons why this is happening. Some of the ideas are based on this post by a disease epidemiologist that is definitely worth reading.

Problem 1: Hidden infections spread the disease

A lot of advice on the web is similar to what the CDC still advises today:
Yes, you definitely should stay home if you are sick. But this message can also be read as "if I am not sick, I do not need to stay home". That is wrong! Multiple studies have shown that corona virus transmissions can occur before symptoms start. Some people never develop symptoms after infection, but can still spread the infection!

Problem 2: COVID-19 seems too harmless

Compared to other epidemics like SARS, MERS, or Ebola,  COVID-19 seems almost harmless. If has often been compared to the flu, which is very misleading (because it is 5-10 times more deadly, and for other important reasons). COVID-19 kills mostly elderly people and people with medical problems. This gives healthy, younger people the feeling that they have nothing to be afraid off. Together with problem 1 above, they conclude it is perfectly fine for them to ignore social distancing rules, and "party on". This has been seen all over the world - at spring break in the US, beaches in Sydney, the English Garden in Munich, and just about everywhere else.
But young people can still get infected, and often got infected at higher rates due to close contacts in large groups. They then transmit the disease to others, including older people who will die from the infection.

Problem 3: This seems to easy to understand

Many people know about the first two issues. The next set of problems describes issues that are more complex or obscure, but at least as important. But humans have a strong tendency to take simple explanations and view them as sufficient, ignoring more complex issues.
Take, for example, social distancing: if we need to reduce transmissions by 80%, then it should be enough if 80% of the people stay home, right? That seems very intuitive, and some lovely animations on the web even seem to "prove" it - but it is quite wrong.
Similarly, if you can transmit the disease for 14 days, then we all should be fine if we stay home for 14 days, right? Again, wrong!
The "80%" and "14 day" issues are so intuitive that everybody understands them. How about combining them? Would it not be enough if 80% of the people stay home for 14 days? Absolutely not!
But since this seems so intuitive, lots of people think they fully understand this, and ignore the warnings of experts.

Problem 4: Subgroup issues

Epidemiologists sometimes look at the spread of diseases like COVID-19 as a single, coherent group. That is done to keep things simple and can provide some useful insights, but also has limitations.
One example is the different behavior of subgroups.
Let's say we have one large group (80% of the population) that stays at home, while the remaining 20% do not change their behavior. Let's call them "the Fearful" and "the Springbreakers". With much fewer people out in public, we could theorize that the Springbreakers have less contact with others, and that this would cause transmission rates to drop. But in reality, the Springbreakers have always been around other Springbreakers, and mixed little with the Fearful group. So within their group, the COVID-19 transmission rates are largely unchanged, and the epidemic will spread until most people in the group are infected (unless they change their behavior and switch to the Fearful group).

The next thing to look at is what happens in the fearful group. If they would all retreat to their own rooms and stay 100% isolated all the time, they would not infect anyone else. Once the current infections run their course, the group would be COVID-19 free. That would take about a month - most infections are over after 14 days, but some take longer, especially more serious cases.

In reality, though, most of the Fearful will be living with others - families, boyfriends and girlfriends, or room mates, and stay in close contact with them. Being closely together 24/7, anyone infected will likely infect others in the same group. In the US, the average household size is 2.6 people, so we would expect the number of infections in the fearful group to grow more than 2-fold even if they stay at home 100% of the time, without any contact to the outside world.

Problem 5: Multiple infections

Another mental trap that we easily fall into is about getting infected. If we change our behavior so that the number of times we might get infected is reduced by 80% (for example by going shopping every 10 days instead of every 2 days), does that mean we reduce the overall change of getting infected by 80%? No, absolutely not! If shopping gets us infected, then the only thing we have changed is how long it takes to get infected.
One way to think about this is that everyone can get infected multiple times. Usually, we only care about the first time. But if we want to avoid infection completely, we have to avoid all chances to get infected.
The effect of this is that a given reduction of social contacts will not result in a proportional reduction in transmissions. To reduce transmissions by 80%, we need to reduce chances of getting infected by a much larger percentage - possibly as much as 99%!

Problem 6: Fast infections, slow deaths

COVID-19 is a very sneaky disease: it spreads very quickly, but symptoms develop slowly, and deaths come after a relatively long time. Some studies have estimated that the virus starts to spread just one or two days after infection. Typically, the number of infected people doubles every 3-5 days. First symptoms show up after 5-6 days on average; death typically occurs about 18 days after symptoms start. In countries like the US where the doubling time is 3 days and testing is mostly limited to people with disease symptoms, this means that confirmed cases will always be at least 4-fold lower than the number of infections. The 24-day delay between infection and death corresponds to eight successive doublings - that is a 250-fold increase! In other words, the epidemic is likely to appear orders of magnitude less severe than it really is during the early phases judged by deaths, and many times less severe  during the entire growth phase.
This is a very hard concept to grasp. For politicians, it is easy to get a false sense of security, and it may seem hard to justify the drastic actions that would be needed to stop the epidemic early. Hence, it is no surprise that the countries that had the most success in containing the COVID-19 epidemic are countries that were hit hardest by the SARS and MERS epidemics, and/or geographically close to the epicenters: Hong Kong, Taiwan, Singapore, and Macau. Almost all other countries were fooled by the initial low numbers of confirmed cases and death, and reacted too slowly.

The Hammer and Dance

I believe that all the problems above explain why many countries all over the world have had limited, if any, success in containing the COVID-10 epidemic. The natural tendency to respond with denial to difficult issues only aggravates the problems.

To contain the COVID-19 epidemic, drastic actions are needed now, followed by continuous measures that prevent a second flare-up before a vaccine is available. This is beautifully explained as in this article: The Hammer and the Dance.  I strongly suggest that you read it. Just a few comments to the article:
  • It was written a week ago, which is a long time in COVID-19 terms. During this week, the number of confirmed COVID-19 cases in the US has grown from 9,197 to more than 65,000. 
  • The rapid increase in confirmed cases indicates that the reproduction number R is likely to be higher than used in the article, so the measures taken during the "Hammer" period need to be more drastic
  • The article number of infections that the article talks about is lower than the current number of confirmed cases, and much lower than the true number of infections (which is probably 35-times higher than the official number of confirmed cases). This means the "Hammer" period has to be longer.
With every week that passes by without drastic measures, the number of infections in the US grows by a factor somewhere between 2 and 4. A few states have taken measures to contain the spread, but there are many more that have, at best, taken halfhearted measures, often because the local reported numbers are still low. They continue to fall into the traps I described above, and delay action when it could be most effective: while the number of infections is still low. Often, the lack of action is justified with the fear of economic consequences. But the longer the containment measures are delayed, the longer they will have to remain in effect, and the more severe the consequences will be.

Tuesday, March 24, 2020

How Bad Is the COVID-19 Epidemic in the USA?

How many people are really infected with the corona virus in the US? The number of confirmed cases as I write this (3/23/2020, 9 pm EDT) is 43,734, according to worldometers.info. The number has been going up like crazy in the last few days as more testing capacities came online; just 5 days ago, there were only 9,259 cases. But even now, not everyone with COVID-19 is getting tested, for one of many reasons:
  1. Not everyone can get a test: tests are still restricted at many places, for example requiring a doctor's note and/or the presence of disease symptoms
  2. Not everyone wants to get a test: some people with disease symptoms prefer not to get tested for various reasons - for example because they assume their case will be mild
  3. Some infections have no symptoms, or very mild symptoms. Symptoms can also sometimes be confused with a cold or a mild case of influenza. 
  4. Many infections are too new to be tested: the time between infection and the first symptoms is about 5-6 days, and tests are usually only done when symptoms start. A delay of 6 days may not seem like much, but remember that the number of reported infections has grown more than 4-fold in the last 5 days!
We do not know how important each of the factors above is - so how can we know how many people really are infected? What if we "go backwards", starting with the number of deaths? Deaths from COVID-19 are much more likely to be reported than mild cases. From many studies in China and other countries, we have a pretty good idea how long it takes to die from COVID-19: typically about 18 days from the onset of symptoms. If we add a 6-day incubation period, we can assume that people who died today were infected about 24 days ago.

The other number we need to know is how many infected people die from COVID-19. The initially reported numbers of 3.4% or so do not help us, because they were based on the confirmed cases, not total infections. However, a number of different studies have looked at various ways to correct for this and to come up with numbers for the "Infection Fatality Ratio", or IFR. The results vary a bit by study, but around 0.6% seems to be close to a consensus.

Well, that's the start. We can plug all this into a computer simulation (more about that below), and then adjust some parameters so that the numbers that the computer model predicts match the reported numbers for death from COVID-19. I did this for the cases in the US. Here is a figure that compares the predicted and actual total number of deaths:
For the last 14 days, the prediction is pretty good - not perfect, but reasonably close. Let us look at the model and some more results in more detail. 

I started the model with 10 infections, and then calculated the development of new infections based on what is known. For example, an infected person is most likely to transmit the infection right when the symptoms start (and possibly even a day before). Each infected patient spreads the infection to about 2 to 5 others, depending on the region and the study you look at. For my model, I used the number 3.7, which together with an "infectivity" distribution centered around days 5 and 6 gave a good agreement with the observed numbers. I then aligned the model curve to the observed data up to today by looking where the model had about 500 total deaths. This happened on day 57, which puts the start of the model run (with 10 infections) on January 26, 2020. Since the first COVID-19 case in the US was reported on January 21, that makes sense - the model seems reasonable.

Now we can look at the number of infections the model predicts, and compare them to the number of reported COVID-19 cases:
For the last two weeks, the number of infections predicted by the model is way higher than the number of confirmed cases. At the beginning of the two-week period, only 0.65% of all infections were confirmed by testing - 1 out of 153 infections. At the end of the period, 2.83% of infections had been confirmed by tests - 1 in 35 infections. In other words, the actual number of infections on 3/23/2020 was likely to be 35 times higher than the reported number of confirmed cases!

Note that many infections were not confirmed because the were too new (item #4 in the list above). Tests are pretty much limited to people with symptoms who are past the incubation period, which is about 5 days. Even after symptoms develop, it will take some time to decide to get a test, to actually get the test, and to have the test results included in the daily reports. This creates an overall testing delay of about 7 days from the initial infection.

If we take this into account, we can compare today's confirmed cases to the predicted infections 7 days ago. This gives us an "undertesting" factor of 12: just 1 out of 12 infections that is old enough to develop symptoms was tested.

If you look at the last column in the table above, you will notice that the "undertesting factor" was much higher 2 weeks ago, and then dropped from 153 to 35 over the course of these two weeks. This is shown in the following graph:

This reflects that more testing capacity has come "online" during the last two weeks, so that more people who need a test can actually get one. The same thing is reflected in the very rapid rise of case numbers in the last few days.

Once we have a computer model, we can use it to calculate "what-if" scenarios. The numbers above already take into account that social-distancing measures and other "non-pharmaceutical interventions" are in place in many parts of the US, and/or recommended for the entire USA. Specifically, the model assumed a 25% drop in transmissions after March 16, and a 50% drop in transmissions after March 21. These two effects show up as brief drops in the daily new infections:
How would the epidemic proceed after that? How would it it go without any intervention (like social distancing), or if the interventions are more effective? And finally - what would happen if the interventions are effective, but would be relaxed later? The following graph shows the predictions for those 4 scenarios, looking at the total number of infections over time:
The results for the "no intervention" model (the blue curve) are shocking: if the epidemic proceeds at the rate is appears to be growing, then almost every person in the US would be infected before the end of April! But remember that the onset of symptoms is delayed by about a week, and deaths from COVID-19 are delayed another 3 weeks, so the largest number of deaths would not happen until May.

The red curve shows what would happen if the current measures (which started after 51 days, on March 16) succeed in dropping the transmission rate by 50%. The infections pile up a bit slower and reach a lower maximum; still, more than two thirds of the US population would be infected before the end of May.

The yellow curve shows what happens if the current measures reduce transmissions by 80%. This would mean that each infected person infects, on average, less than one new person, and the epidemic come to a stop.

The green curve also starts with an 80% reduction, but following a drop of new infections, it assumes that restrictions are relaxed at day 100 of the simulations (about 50 days after they were put in place), and the transmission rate goes up to 50% of the initial rate (the rate remains lower because some measures persist). At this point, the epidemic would rebounce, and closely follow the red curve, but with a delay of about 50 days.

Let's have a look at the number of new infections per day:
Again, the results for the blue "no intervention" model are shocking: the maximum number of new daily infections would be more than 24 million before the end of April! After that, the number drops rapidly, simply because the majority of the US residents would be infected by then.

If the current measures would lead to a 50% reduction in transmissions, the maximum number of daily new infections would be reduced to 7.5 million, and delayed towards the beginning of May. Here are the numbers for the four scenarios:

The only number that looks manageable is the one for an 80% reduction in transmissions, with 1.5 million total infections, a per-day maximum close to 100,000 infections, and about 9,000 total deaths. In all scenarios, the calculated number of deaths is based on an infection fatality ratio of 0.6%; however, this number is likely to low if hospital or ICU capacities are exceeded, which would definitely happen in the other three scenarios.

So, will the current measures be successful in reducing the transmission rate by 80% or more? We do not know. Due to the delayed onset of symptoms and a hesitant and variable adoption of the policies, and insufficient test capabilities that distort the numbers, it will be several weeks before we even get an idea. As the green curve in the models above shows, lifting restrictions too early is dangerous, since the epidemic will flare up again quickly.

A number of studies have shown that reducing transmission rates by the required amounts is not easily accomplished. I will leave the details to other posts, but let's have a quick look at an example of two to illustrate this.

Let's assume that 80% of the population change their behavior so that they successfully avoid infection completely, for example by staying at home, while the remaining 20% mostly ignore the social distancing and "shelter at home" directives. Would that not be enough to reduce transmissions by 80%? Unfortunately, it would not. Shortly after the "good 80%" started staying home, they would indeed see a reduction in infections, likely with some delay from in-household infections. This would reduce the infections briefly. However, the remaining 20% would mostly be in the company of similar-minded individuals, and experience little change in social interactions. Therefore, the epidemic would continue to grow largely unchanged in this subset of the population, and continue to do so until the majority of the "bad 20%" is infected. If the "good" and "bad" group would remain perfectly separated, then the final effect would indeed be a reduction of numbers by 80%. However, some mixing between the groups is likely, and definitely occurs when the restrictions are eventually lifted and relaxed; then, the final outcome would be the same, with merely a delay of a few weeks until peak numbers are reached.

In another Gedankenexperiment, let us assume that shopping is a large contributor to transmissions. An infected person, who may not (yet) have symptoms but already is infective, will leave large numbers of virus particles when he touches handles, carts, or merchandise while shopping. The virus can remain stable for days on many surfaces, and likely even longer on cooled surfaces. Anyone who touches the same surface within the next day or two will pick up virus particles. If the second person then touches his mouth, nose, or eyes before the next time he washes hands, he will likely be infected. Since humans tend to touch their face every couple of minutes, and this habit is very hard to change, things are pretty much over once an virus-infected surface was touched.
Now what effect does it have if we reduce our shopping by 80%? Very little. With the "bad" group getting infected at increasing rates and mostly ignoring warnings, many surfaces will still be virus-infected. The next time a "good" person goes shopping, he is just as likely to be infected, should he forget to not touch his face until after the next hand washing or disinfection.

The two examples above illustrate that multiple measures are needed to effectively reduce transmissions to a tolerable level. Countries that have been successful in containing the virus all have strict rules about wearing face masks in public that are adhered to by the vast majority of the population; disinfectant stations nearly everywhere; and multiple other measures like frequent testing and intensive contact tracking. In our shopping example above, face masks would help in several ways. Since they cover mouth and nose, they keep people from "automatically" touching there. They also reduce the amount of virus an infected person "places" when coughing or just breathing, and reduce the likelihood of infection from "breath sharing" if you happen to be close to another person. Similar arguments can be made for disinfection stations; these seem to be getting less common in the US now because the supply of gel-based disinfectants cannot keep up with the vastly increased demand.

Well, this is a long post, so I thank you for your patience in reading it. I developed my model in Java, since that was the easiest for me. I kept it very simple and straightforward on purpose - it's just one file that you can modify to run your own simulations.  It is available for downloading at github.com/prichterich/covid19modeling.

Phases of Grieving

People often react to major events in predictable patterns. For the COVID-19 epidemic, I think the stages of grieving concept can help to understand these reactions. The stages (or phases) of grieving are:
  1. Denial.
  2. Anger.
  3. Bargaining.
  4. Depression.
  5. Acceptance.
Clearly, the corona virus epidemic is a grievous event. By the time it is over, most of us will know someone who suffered severely from the COVID-19 disease, and someone who died.

To be able to deal with grieve, people go through the phases above. Sometimes, someone gets "stuck" in an early phase, and never reaches the "Acceptance" level. This usually has terrible consequences for the person, who remains in anger or depression for long periods.

When coming to terms with the COVID-19 pandemic, keep the phases of grieving in mind. Denial is natural. Some people are part of groups that encourage staying in the denial phase. Regarding them as (insert your favorite derogatory term here) helps nobody. But perhaps you can help them move forward.

Anger is the next phase after denial. Again, this is natural. We want to find someone responsible for bad things, and be angry at them. If there is not really someone we can blame, we still try to find someone, however tenuous the connection may be.

Some politicians try to use this phase to their advantage by directing the anger. Do not let them. This applies to both sides. Someone who lies all the time will not change now. Expose the lies. Remember the lies. But keep your anger in check. It is holding you back.

The bargaining phase comes next. This is a sign of progress. If the event is something final like the death of a close friend, this phase will have little impact. But for the COVID-19 epidemic, this is when we start to think productively: what can I do to not get infected? What can I do to not infect others? This can lead to positive changes.

But soon, we realize that what we can do is limited. Perhaps we can find a way to not get infected, but there is little we can do to affect the overall outcome. Depression sets in. Again, this is natural. Expect it. Recognize it. And move on.

The final phase is acceptance. We know that we event is real. We realized that anger will not help us right now, even if there are reasons to be angry. We have seen the limits of bargaining. And we are emerging from depression. Now, we can finally go on with our lives. Hopefully, you will reach the acceptance phase long before the COVID-19 pandemic is over. It will get worse than it is now, and may affect us for years to come. But only when we reach acceptance can we do whatever we can contribute to containing it, without being hindered by our natural emotions.
I am just starting to enter the acceptance phase. I was in denial through most of January and February ("this won't affect us"). I was angry at necessary things not happening in the US (but then I saw that many other countries are also struggling). When my friend Craig told me to focus on facts rather than anger, he helped me enter the bargaining phase - maybe I can help a little by reading all the research coming out, and blogging about it! Depression came soon after - what will I really change? Fortunately, I had plenty of wind, sunshine, and a loving wife to help me through the depression phase, so it never got really bad.

If you see others acting in what seems irrational to you, keep the phases of grieving in mind. Maybe you can also help them along, and help us all to contain the pandemic.

This post was originally posted on 3/16/2020 at boardsurfr.blogspot.com/2020/03/phases-of-grieving.html

Corona Virus Infections Are Much Higher Than Reported

Currently (3/16/2020), there are about 4,000 confirmed coronavirus cases in the US, but the real number of infections is probably a lot higher: between 50,000 and half a million. For other countries with a similar approach to corona testing and contact tracing, the under-reporting is similar: roughly, just 1 or 2 cases out of a 100 infections is reflected in the "confirmed cases" number. This post explains how I came to this conclusion, in a way that you can run the numbers yourself if you want to check them.

Only very limited corona virus testing is available in the US, and many suspected cases could not get tested, even if the treating doctor suggested a test. Some guidelines severely limited testing, for example to health care workers and elderly patients with symptoms, but excluded others even if they showed symptoms typical for COVID-19.

But when a friend posted an estimate by a Johns Hopkins professor that up to 500,000 Americans are infected, I quickly ran some numbers in my head, and concluded "no, that number is too high". Boy, was I wrong! Today, I spend a couple of hours to set up a spreadsheet for a more careful calculation, and discovered that the professor probably was right. My estimates show that there are probably about 200,000 to 300,000 infected people in the US right now (with a range from 50,000 to one million). That's smack in the middle of Prof. Makary's estimate.

To estimate the number of current infections, I started with the number of deaths from COVID-19 in the US. As of today, 77 corona virus deaths were reported in the US, 53 of them during the last 7 days. From that, we can work backwards, using numbers from studies and the actual reported cases in the US (the exact number seems to go up every time I reload the web pages reporting the statistics - it is now 86).

Let's start with a simplified approach. We know that the about 5-6 days elapse between infection and the first onset of symptoms. The average time between onset and symptoms and death is about 18 days. That means someone who died today probably got infected about 23 or 24 days ago. This means that the number of death roughly reflects the number of infections back then - more than 3 weeks ago. So, how many people where infected back then? We cannot know the exact number, but a good estimate of the ratio between infections and deaths (the "IFR") is 0.67%. We can safely assume that all the 53 reported death this week were infected 23 days ago, so if we divide 53 by 0.67%, we get our first estimate of infections: 7,910.

The number of 7,910 is the estimated number of infections 23 days ago - but how many infections do we have today? One thing we can do is to compare the confirmed number of cases 23 days back and today. On 2/22/2020, there were 29 confirmed infections in the US; today, there were 4,597 (as of 3.16.2020, 6:07 pm CST). That's a 158-fold increase in confirmed infections. If we multiply that with our estimate of 7,910 infections 23 days ago, we get about 1.25 million infections now.

However, since the US had problems getting corona virus testing to work, the initial number of reported cases may be artificially low, which would inflate the estimate of the current number. So instead, we can plug in the typical increases in infections that were reported from places where testing was available. Most of these studies show a doubling time between 4.5 and 7 days. If infections double every 5 days, then the number of infections after 23 days would be about 24-fold higher. That would give us an estimate of about 190,000 infections today.

There are a bunch of additional refinements that we can do. For example, some of the 53 death happened a few days ago, which reduces the multiplier we have to use. Another factor to consider is that the number of reported death is likely to be lower than the actual number of deaths, since some deaths may have been diagnosed incorrectly, or have occurred outside of the health care system. We also don't know the exact doubling time. If we use shorter numbers that are closer to the observed growth in the US, we get a higher estimate; if we use longer numbers, it is lower. The range I observed for what I deemed as reasonable numbers was between about 50,000 and 1 million. The "most likely" numbers resulted in an estimate of about 300,000 infections as of today (3/16/2020). This number is about 75-fold higher than the "official" number of confirmed COVID-19 cases.

There are a number of simplifications in these calculations that would affect the outcome somewhat; however, I believe that these changes would be lower than the changes that result from reasonable changes in the parameters, and that the overall "bottom line" is solid.

The bottom line

The reported numbers of "confirmed" COVID-19 cases in the US understates the actual number of corona virus infections by about 50- to 100-fold. The best estimate for current number of infections in the US is about 300,000, with a range from 50,000 to 1,000,000.


P.S.: The day after I wrote this post, I found a study published in the Science magazine which concluded that 86% of the infections in Wuhan remained undetected prior to the January 23rd - only one in six infections was detected. If the same factor would be applied to the reported case numbers in the US, there would have been about 28,000 infections on 3/16/2020. However, testing in the US has been unavailable for many patients, even with symptoms and doctors notes, which means that the actual underreporting factor in the US is likely to be much larger than 6.

This post was originally posted on 3/16/2020 at boardsurfr.blogspot.com/2020/03/corona-virus-infections-are-much-higher

How Deadly is the Corona Virus?

How dangerous is the new corona virus really? There's a lot of conflicting information out there. In this post, I will analyze some recent scientific studies that came to different results. I'll try to explain what the studies did, and why some of them are much more likely to be accurate than others.

Let's start with a diagram from one of the studies that illustrates some of the problems:
What we want to know is: if you get infected with the virus,

  • how likely is it that you will die from the infection?
  • how likely is it that you will require hospital care?
We have a pretty good idea how many people have died from the virus (5,798 people as of 3/14/2020). All the other numbers are harder to get - not all hospitalizations get reported; not everyone who becomes sick is counted; and there is a number of infections that show very mild or no symptoms, but this number is unknown.

Severe cases: three weeks in the hospital

"Severe" cases of COVID-19, as the disease cause by the new corona virus is called, require hospital care. Typically, that includes oxygen, and often intensive care and mechanical breathing support (ventilators). The typical time in the hospital for COVID-19 is 22 days. Lengthy rehabilitation periods may be required afterwards, especially when a ventilator was needed.

How likely it is that an infection turns into severe disease strongly depends on age. Here is a table from one analysis from a large team at leading universities in the UK:

  • In the 20-29 year old group, about 1 of 90 infected persons required hospitalization. 
  • In the 40-49 year old group, about 1 in 23 infected persons required hospitalization. 
  • In the 70-79 year old group, about 1 in 6 infected persons required hospitalization. 
Another table from the same publication looks at death rates by age:

(Note that he percentage numbers are calculated with different adjustments in the table shown; check the original publication for details)

This table also shows a dramatic increase in death rates by age. Younger patients have a better chance to recover from an infection even if they require hospital care; older patients are much more likely to die. 

One very important thing to consider is hospital capacity. Hospitals have only a limited number of intensive care beds and ventilators (in the range of 100,000 to 200,000 for the US). In an area with a large number of cases, oxygen, ICU beds, and ventilators are not available for many patients. This dramatically increases the death rates, and has happened in several areas worldwide, including Wuhan (before an intensive government response that included sending 40,000 health care professionals to the city and building new temporary hospitals) and Italy.  Therefore, a very important objective for any infected country or region is to slow down the rate of new infections, so that hospitals are not overwhelmed.

Hidden infections, CFR, and IFR

While we can easily get good numbers for the top of the "severity pyramid" in the picture above, getting accurate numbers for the lower parts is more challenging. Some people get infected with the virus, but do not show any symptoms, or only have very mild symptoms that are similar to a regular cold. The commonly mentioned "fatality rates" simple divide the number of deaths by the number of confirmed cases. This gives numbers around 2-4% for most countries (2.2% for the US, and 6.8% for Italy, as of 3/14/2020). This number is called the "crude case fatality ratio" (CFR). This number is easy to calculate, and gives some important information; calling it "False" is bloody nonsense.

However, what we really want if a number for the chance of dying if we get infected - the ratio of deaths to infections. This number is called the "infection fatality ratio", or short IFR. Relative to the crude CFR, we need to know a few more things:
  • How many deaths are underreported?
  • What are the time delays between infection, confirmation of the infection, and death?
  • How many infections are not counted?
The under-reporting of death can have several reasons. A death caused by the corona virus may look very much like a death by pneumonia or other causes. Some countries, like Italy, test for the corona virus even after death; others, like Germany, do not. In countries where the government controls the press or the health agency reporting, reported numbers may be lower than actual deaths for political reasons. 

The time delay between infection, confirmation of the infection, and death is very important in case of COVID-19. This is because the number of infections is growing very rapidly - in many countries, it doubles every 4-7 days, but even 10-fold increases within a week or 10 days have been seen. Typically, it takes about 5 days after the infection for the first symptoms to appear, and a few more days for symptoms to get severe. Let's look at some numbers, making some assumptions that are reasonable (but not necessarily accurate):
  1. The number of infections doubles every 5 days.
  2. The delay between infection and confirmation (for anyone getting tested) is 5 days.
  3. Death occurs 15 days after infection.
  4. The death rate is 1 death per 50 infections (2%).
  5. We start with 1,000 infections on day 0.
Now let's look at the time line and the numbers we get:
  1. After 5 days, we have 2,000 infections.
  2. We have 1,000 confirmed infections on day 5 (due to the 5-day delay).
  3. Until day 15, we have two more doublings, so we have now 8,000 infections. But due to the 5 day testing delay, we know only about 4,000.
  4. On day 15, 20 of our 1,000 initially infected people are dead.
The calculated death rate on day 15 in this example is 20 / 4,000 = 0.5% - this is 4-fold lower than the actual death rate! 

I gave the numbers above only for illustrative purposes. In reality, this gets more complex, since not everyone gets infected the same day, and there's a lot of variation in the other numbers, too - some people may die 10 days after infection, others may die after 30 days. To get reasonably accurate numbers, some serious statistical modeling is required. But before we look into that, let's look at one more thing first: what happens to the numbers if many infections remain hidden, for example because symptoms are very mild so that the person never gets tested for the corona virus?

Let us assume that 3 out of 4 cases remain undetected, and only 10% of all infections get diagnosed. Now we get:
  1. After 5 days, we have 2,000 infections, but 1,800 remain undiagnosed.
  2. We have 100 confirmed infections on day 5; this includes most of the
  3. Until day 15, we have two more doublings, so we have now 8,000 infections. But due to the 5 day testing delay and only 10% being diagnosed, we know only about 400.
  4. On day 15, 20 of our 1,000 initially infected people are dead.
 The calculated death rate on day 15 in this example is 20 / 400 = 5% - slightly higher than the actual death rate! 

The two examples above show that is very important to have accurate numbers for our calculations, and to take the timing of infections into account. In the next section, we'll examine a few studies who tried to do that, but came to dramatically different results.

So, what is the "correct" fatality rate?

The answer to this question depends a lot on the amount of testing that is being done, which varies widely from country to country. Without any testing, there will be no confirmed corona virus infections! 
Let's start with two studies that both looked at China, but came to very different results: one study calculated an infection fatality rate of 0.657%; the second gave numbers between 0.04% and 0.12%. Unfortunately, the second study is deeply flawed, but likely to nevertheless be picked up as "proof" that the corona virus is comparable to influenza. Not true - the best current estimate is that the corona virus is about five times more deadly!

The first study, from which the tables and figure above are, was performed by 33 scientists, mathematicians, and statisticians at colleges in London and Oxford. It contains detailed tables and links to the data used, as well as an in-depth explanation of the statistical methods used. To get an estimate of the underreporting of infections, it used data from "repatriated expatriates returning to their home countries" on 6 flights (689 persons tested, 6 confirmed infections). Several key aspects of this study were:
  • A detailed look at the age distribution for infections and death rates.
  • Separation between bases in Wuhan and the rest of China.
  • Age-specific estimates for underreporting of infections for the two regions.
Basically, the reported number of infections was too low, compared to the observed infection rates in foreigners living in Wuhan who were flown back to their home countries. This group is used because it is the only clearly defined group where every single person has been tested. In some countries, the "rescued" repatriated were quarantined for 2 weeks, and/or allowed to go home only if two successive tests came back negative, and no other signs of infection were present. In this groups, the infection rate was about 0.8% to 1.5% (depending on which countries and flights are included). It should be noted that some flights had more than one infected person, and that some of those infected tested negative first, but positive later; therefore, it cannot be excluded that some of the expatriates infected others after leaving China. For comparison, the reported infection rate in Wuhan in the middle of February was near 0.2% (about 20,000 confirmed infections out of a population of 11 million). The exact rate varies a bit with the exact date; however, reports that up to 5 million people left Wuhan before the Chinese government imposed travel restrictions indicate that the population number may also have been lower. As an order-of-magnitude estimate, the number of actual infections was probably about 5-times higher than the number of reported infections. The study concludes that the death-to-infection ratio is about 0.66%,  about 5-fold lower than the raw CFR, but also about 5-fold higher than for influenza.

While the first study was well written, the same cannot be said for the second study, which was authored by a graduate student or post-doc and his two advisors in Japan and at Georgia State University. Even though this study is a lot shorter, it took me several times longer to understand what was going on. At first, I got quite excited about the study, since the result was stated clearly and sounded great:
"We also found that most recent crude infection.. adjusted IFR is estimated to be 0.12% ..., which is several orders of magnitude smaller than the crude CFR estimated at 4.19%."
But the authors also claim that this is based on "epidemiological data of Japanese evacuees from Wuhan City" - that's where the first red flags went up. A closer inspection leads to this sentence:
Other parameter estimates for the probability of occurrence and reporting rate are 0.97 (95% CrI: 0.84–1.00) and 0.010 (95% CrI: 0.007–0.014), respectively.
Note that the authors talk about a "parameter estimate" value of 0.010 for the "reporting rate". This means that just 1% of all infections actually would be reported. Simply based on this value, they then conclude that the actual number of infected people in Wuhan was 100-fold higher than reported: 1.9 million.

Let's have a closer look at what they did. Basically, they ran computer simulations where they tried to match the calculated results to the observed results. As the technique for the simulations, they used "a Monte Carlo Markov Chain (MCMC) method" - which is really just a fancy term for letting the computer roll the dice. The model requires a number of parameters, which they had to estimate, and then claimed to have "95% credibility" because they ran the models 100,000 times, and evaluated the results using "potential scale reduction statistic" in a "Bayesian framework".  They further claim:
"We collected information on the timing of the evacuee fights that left Wuhan City as well as the number of passengers that tested positive for COVID-19 in order to calibrate our model"
Well, this all sounds very scientific and confusing, right? But let is have a quick look at the last claim (and assume they meant "flights", not "fights" as written). The table (which is not included in preprint, but instead requires a separate download) show 12 infections among 763 evacuees, giving an infection rate of 1.57%. The underlying assumption is that the inhabitants of Wuhan had the same infection rate; with 10 million inhabitants, that would be 157,000 infections. With a reporting rate of 0.010, only 1572 infections should have been reported. But the article itself states:
"As of February 11th, 2020, a total of 19559 confirmed cases including 820 deaths were reported in Wuhan City."
There is a 12-fold discrepancy! Another way of looking at this is to start with a different claim the authors made:
"Our results indicate that the total number of infections (i.e. cumulative infections) is 1905526"
That's 1.9 million! If this rate is correct, than about 19% of the evacuees should have been infected: about 145 cases would have been expected, not the observed 12! Some correction would need to be applied for the flights that left in January, but the last flight alone, which left February 7 with 198 passengers, should have had more than 35 infected passengers - not just one!

This shows a fatal flaw in this "research". The authors only achieve the very low death rates because they use a "reporting" rate that is an order of magnitude too low. They claim that the the rate was "calibrated" using refugee data is obviously wrong. If this single error is corrected, then the actual actual death rate is somewhere between 0.5% and 1.5% - in line with what the other study concluded.

The second study is an example of very bad science. Please note that this is not a regular research paper that has undergone the usual "peer review" process, but a preprint. It is very unlikely that this paper would have been accepted by any reputable journal, since most referees would have spotted the problems I outlined (and likely others) very quickly, and therefore rejected the paper.

As the paper is written, with limited explanations, multiple typos, and very questionable science, I cannot shake the suspicion that if was "goal oriented pseudo-research": someone set out to prove that the novel corona virus has a death rate comparable to influenza, and succeeded. I fear that this will be picked up by news outlets, which is real shame. This is not science - this is somebody playing with a computer who either has no clue, or no morals!

The bottom line

Yes, the reported "raw CFR" rates of 2% to 4% do not include all infections, and a significant number of infections is not included in these numbers. But the best estimates of the death-to-infection rate, taking all infections into account, indicate that the overall death rate is about or at least 5-fold higher than for influenza. And that is for the general population, and with a limited case load that does not overload hospitals. For people with pre-existing health conditions and the elderly, or if the hospitals are overloaded because the infections spread to quickly, the chance of dying can from COVID-19 can exceed 5%

So please, do your part in slowing down the spread of the virus. Avoid any large scale meetings and situations where you are close to strangers (note that the official limits of 500 or 1000 people is just an arbitrary number - infections spread in smaller groups, too!). Limit social interactions. Work from home if you can, and do not go to work if you feel ill in any way! The disease is most infectious right at the onset of mild symptoms! Check the WHO website and other official government websites for additional information (and don't believe everything someone posts on Facebook).

This post was originally posted on 3/10/2020 at boardsurfr.blogspot.com/2020/03/how-deadly-is-corona-virus.html