Friday, March 27, 2020

Hidden Cases: Tests Capture Only 1 of 41 Infections

A team of researchers from Stanford University and UC Berkeley has published an interesting study. Since testing for the corona virus is well known to underestimate the number of infections, they looked at hospitalizations to get a better idea how many cases there really are. The study describes in detail how they did it, and is well written and easy to understand, but I'll give a brief summary here.
He said: "No, Houston, it is small - we've got nothing to worry about", but could not understand why they did not believe him.

The basic idea is that COVID-19 patients who have symptoms that are so severe that they need to be treated in a hospital are much more likely to be tested than someone with mild symptoms. If we then take into effect what percentage of patients end up in the hospital, we can estimate the number of infections. To do so, we also need to take into account that several days elapse between the infection, the onset of symptoms, and the admission to the hospital. A lot of studies have tried to determine the exact numbers for each of these variables, so it is possible to have good estimates. These can then be fed into a computer model that calculates how many infections correspond to the observed number of hospitalizations. The authors used numbers from Santa Clara County in California, a region in California that was hit relatively hard by the COVID-19 epidemic.

With the most likely set of parameters, the study estimated a total of 6,500 infections for the almost 2 million people in the county on March 17. With a very optimistic set of parameters, the number is reduced to 1,400 cases; with a more pessimistic set, the number increases to 26,000.

A look at the COVID-19 tracking site shows that for all of California, there were 483 confirmed cases on 17 Mar 2020. The internet archive site shows that 155 of these cases were in Santa Clara county. So only 155 out of 6,500 case were reflected in the "official" numbers. In other words, the tests captured just 1 out of every 41 infections.

This "hidden case" factor of 41 is similar to the factor of 35 reported before for a similar study that looked at reported COVID-19 death in the US. There is 6-day time difference for those two number, during which time the more widespread availability of tests reduced the "hidden case" factor for the US; on March 17, the factor from the death-based study was 92. A likely source for the discrepancy is that Santa Clara county had more available test capacity than typical for the US; the county has reported the ability to do 100 test per day. Even with this test capacity that looks reasonable when compared to reported case numbers, the county lab limited testing hospital patients and members of high-risk groups.

Where does the large "hidden case" factor come from?

Multiple issues contribute to the issue, but we can estimate at least one of them: the "infection-to-test delay factor". If every person would get tested every single day, that factor would not exist (or, mathematically, be 1.0). But this cannot be done at any place in the world, nor would it make sense. Instead, the minimum wait is until first symptoms appear. This is about 5 days for COVID-19. During that time, the size of the epidemic has doubled. In other words, for an epidemic with a doubling time and time to onset of symptoms of 5 day:
  • If every single infected person would be tested on the first day of symptoms, the "infection-to-test delay factor" would be 2.0 
Clearly, this is still to optimistic. Initial symptoms are generally mild, and get worse as the disease progresses. Further more, it may take a day or two before the test is actually performed, and another day or two before the result is included in the officially reported numbers. Together, this makes a delay of about 10 days more likely: 5 days incubation period, 2 days for symptoms to increase, 2 days to get the test, and one day for reporting. During these 10 days, the number of infected people doubles twice, so we get:
  • A more realistic estimate of the "infection-to-test delay factor" is 4.0 
 Note that this factor actually will be higher if the epidemic grows more rapidly. But let's ignore that, and ask:

What else contributes to the "hidden case" factor?

Like in many other countries, access to tests in the US has been limited. Just having symptoms alone was not a sufficient reason to qualify for a test; instead, factors like exposure to a confirmed COVID-19 case, travel history, or being a medical professional were required - often together with symptoms. Without other factors, COVID-19 testing was generally limited to patients with severe symptoms. A typical estimate is that 80% of cases have only mild symptoms (or no symptoms at all). This adds a "light symptoms factor" of 5.

There are other reasons why an infected person may not get tested, or not be included in the official case numbers. Some people prefer not to get tested, even with moderate symptoms; some may also have a cold or the flu which obscures the COVID-19 infection; some may not be able to get to a testing site, or convince their doctor to given them the necessary note; testing may fail and return a false-negative result; and to more. We'll group all those together into one factor, which we will call the "other reasons factor". Let's assume this factor is smaller than the others, and give it a value of 2.

We can view the inverse of the factors as probabilities. The chance that any given infection is "old enough" is 1 / 4, or 25%. The chance that an infection will be severe enough to warrant testing is 1 in 5, or 20%. Since they are independent probabilities, we can just multiply them to see what the chances are that an infected person will be tested and included in the official numbers:
  • p(report) = 0.25 * 0.2 * 0.5 = 0.025 = 1/40
That means only one in 40 infected persons will get tested - just what the simulations predicted! In other words, we can explain quite well why the "official" case numbers understate the true number of infections.

Bottom line: A large "hidden case factor" can easily be explained

Yes, the official numbers are indeed likely to understate the number of confirmed cases by a factor of 40. This cannot just be explained, but actually would be expected from what we know about the testing and the disease:
  • An "infection-to-test delay factor" of 4 is caused by the incubation time, and delays between onset of symptoms and reporting.
  • A "light symptoms factor" of 5 is the result of limiting testing to cases with more severe symptoms,
  • Multiple other factors together are less important, and can be summarized in an "other reasons factor" of 2.0.
In other words, the things we know for sure cause a 20-fold under-reporting; other factors that we know about, but cannot easily quantify, only cause a 2-fold under-reporting.

To improve this situation, testing guidelines must be relaxed. For example, the "light symptoms factor" of 5 could easily be reduced by testing everyone with light symptoms; the testing guidelines in several countries now allow for all suspected cases to be included.

Even if tests are made much more available, though, the "infection-to-test delay factor" remains as long as the epidemic is in an exponential growth phase. The only way to get a true number of current infections would be to test an entire population (or a subset) regardless of symptoms. One step that goes in at least in this direction, but is easier to implement, is the contact tracing with complete testing of all identified contacts.

Understanding the "hidden case factor" is very important for dealing with the pandemic. Humans have a problem to really understand how quickly a rapidly growing epidemic spreads; the "hidden case factor" only compounds this problem. People tend to take a reported number at face value, and conclude the danger is minimal if the number is small. But let's look at what 100 confirmed cases really means for an epidemic that doubles every 5 days if no effective interventions are taken:
  • Now:
    • 100 confirmed cases
    • 4,000 actual infections 
  • 3 weeks later:
    • 800 confirmed cases
    • 32,000 actual infections (of which 160 will die if the IFR is 0.5%)
  • 2 months later:
    • 16,000,000 infections in 2 months without interventions
    • 80,000 deaths within 3 months even if the no additional infections happen after 2 months
This is a hypothetical model. In reality, the epidemic has spread faster in the US and many other countries. The number of reported deaths in the US up to March 26 (1,295) is about the same as the number of reported cases on March 11 (1,301). In other words, the number of reported cases indicated the  number of total deaths less than 3 weeks later.

Did you like the iceberg picture at the top of the post? An iceberg is actually a pretty bad analogy here. For icebergs, 10% are above the water, much more than the 2.5% of COVID-19 infections. Icebergs barely move and do not grow very rapidly and exponentially.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.