Corona Virus Science: 2020

Monday, December 7, 2020

Cool Science and The Importance of Staying Distant

This post looks into several important new insights into COVID-19 that were described in recent publications. I will also look at what this means for us - why it is very important to stick to social distancing and related prevention measures. It's long, so I'll give you a

Short version

If you get exposed to 500 COVID-19 virus particles, you'll probably get infected. That one "infectious dose".
If you're infected, you probably don't know it. You'll be most likely to infect others before you have any symptoms.
When talking, an infected person emits about 500 infectious doses per hour, which would theoretically be enough to infect 500 others. Talking loudly, singing, and yelling increases to 2,500 infectious doses per hour. There are many examples of one person infecting dozens of others.
The virus particles are in aerosol droplets that can remain airborne for up to several hours, and therefore accumulate indoors.
Do the right thing.

Long version

Let's start with a graph that tells us a lot about CVID-19 infections:

This is from a in-depth study of infections in Austria. The scientists used a combination of contact tracing and DNA sequencing to study how exactly the COVID-19 virus was passed from one patient to the next. To get this information, they looked at new mutations that are always present in a subset of virus particles in each infected person, and compared these to the mutations found in others a patient had expected. This allowed the scientists to estimate how many virus particles (or virions) had been passed from one patient to the next - the "bottleneck size" in the graph above.

The results indicate that in many cases, a couple of thousand virions were passed on during an infection. However, a similar number of patients had been infected by a lower number of virus particles, between 20 and 200. In one of the infections studied, the transmission only involved fewer than 10 virions - possible as few as two.

The science behind these results is, in my opinion, pretty cool. But perhaps I am biased, since is is closely related to the type of research for which I have developed commercial software for the last 20 years. Anyway, the study was a collaboration of more than 30 scientists from several top level labs in Vienna and at Harvard, the MIT, and the Dana-Farber Cancer Institute.

The results show what many virologists so far only had suspected: that COVID-19 infections are usually caused by several hundred to thousand virus particles that are somehow transferred from an infected person to another person. Infections can also happen at a lower number of virus particles, but that appears to happen only rarely.

The data indicate that the "independent action hypothesis" of viral infections applies to COVID-19. It basically states that each virus particle that enters your body has a small chance of establishing an infection and causing disease (or, for COVID-19, an asymptomatic infection that can nevertheless infect others). The more virus particles enter your body, the higher the chance that you're getting sick. Here is a graph that illustrates this relation:

An important concept here is the "infectious dose". That is the number of virus particles that has a 50% chance of creating a "successful" infection. For COVID-19, that number seems to be somewhere around 500. We don't know the precise number - it could be 200, or it could be 1000, but it is very unlikely that it is below, say, 100 or above, say, 5000. Note that below this dose, the chance of infection is not zero, but rather declines in a near-linear fashion.

A bit simplified, the "infectious dose" (ID) means this: if you get infected with this many virus particles, the chance that you'll get COVID-19 is slightly higher than the chance that the COVID-19 virus fails to establish itself in our body. Get more than the ID, and you'll most likely get COVID-19; get less, and your chances of walking away healthy are better. That raises the question:

What are the chances of receiving an "infectious dose" of the COVID-19 virus?

To answer this question, we first need to look at how someone with an active COVID-19 infection passes the virus on to someone else. We know that saliva and lung fluids of an infected person can contain a lot of virus particles - a typical number is 100 million virions per milliliter. That's about 200 thousand infectious doses (or, short, IDs)! Some of this fluid, and the virus particles in it, is emitted as droplets of various sizes when a person coughs, sneezes, speaks, sings, or just breathes - but the number and size of these droplets depends a lot on what exactly a person is doing. Coughing and sneezing can produce the largest droplets - perhaps the word "gross" fits (especially if you speak German). But we now know that most COVID-19 transmissions happen before symptoms appear, when coughing and sneezing cannot play a role. If you're interested in the details, I suggest you check this preprint and the references in it. I'll just summarize the results from a different study that combined knowledge about particle sizes, viral loads, and actual transmissions in five super-spreader events for different activities:

Breathing: ~10 IDs / hour (5-10-fold higher during hard exercise)
Talking: ~500 IDs / hour
Singing: ~2,500 IDs / hour

Yelling is similar to singing, and loud speaking in between talking and singing. Note that these particles are in aerosols that can remain airborne for up to several hours.

Now remember that you want to avoid getting anywhere close to just one ID (that is 500 virus particles). If you are sitting really close to someone, you end up "sharing breath" with that person - a significant percentage of the air he breathes out you breathe in. If you're close enough to someone in an area with little ventilation, you could "collect" one ID within a couple of hours, even if he is just breathing! Fortunately, "breath sharing" drops very quickly as the distance between two persons increases, and a distance of four feet or more is usually sufficient to reduce your "collection" to a small fraction of one ID per hour. But remember that this only reduces your risk of infection - it does not eliminate it!

But as you can see from the list above, things get a lot worse when talking or singing. When breathing, the droplets you emit are very small; but when talking, yelling, or singing, the droplets are a lot bigger. A 10-fold larger droplet has 1000-fold more volume, so a given number of droplet can contain 1000-fold more virus particles. But most of the "large" droplets generated by talking are still so small that the water in them evaporates very quickly in normal room air, shrinking the particles to sizes that can remain airborne for a long time ... just waiting for you to breathe them in.

With this in mind, let's consider going out to dinner with a couple of friends we have not seen in a while. We'll go to a restaurant and share a nice booth between four people, not knowing that one of them had gotten infected with COVID-19 a few days ago, and is now shedding the virus at a high rate. Someone is talking all the time, but we're nice and take turns, so the infected person is talking just one fourth of the time. After an hour, though, he has emitted about 500 / 4 = 125 "infectious doses". Let's say the booth is about 2 x 2 x 2.5 meters, or about 10 cubic meters. Without any air exchange, the concentration of infectious doses in the both air would be 125 IDs / 10 cubic meters, or 12.5 IDs / cubic meter. We breath about half a cubic meter of air per hour - so everyone on the table would have been exposed to about 6 infectious doses of the virus, making it very likely they'd get infected.

The unrealistic assumption in our example is that there is no air exchange. In reality, ventilation in typical building creates exchanges the air about 4-10 times per hour. That would drop the virus concentration in our booth by a comparable amount. But it would still leave each of the three uninfected people on the table "collecting" about 1/2 to one infectious dose, meaning it would be almost certain that at least one person would get effected.

So far, we have looked at two parts of the "infection equation": how many virus particles are needed to infect someone with COVID-19, and how many virus particles (or infectious doses) an infected person "emits". But we also need to look at a third part: when an infected person emits virus particles, relative to when they were infected and to when they experience symptoms.

Wrong person, place, and time

While there are still plenty of uncertainties and gaps in our knowledge of COVID-19, scientists from all over the world have collected and published a lot of data that can guide our efforts. In some important aspects, COVID-19 is very different from other infections like influenza. Two of these aspects are the highly variable incubation time and the high variability of symptoms, which includes the frequent complete absence of symptoms. Other diseases progress is a well-defined fashion, for example a symptom-free incubation period of 3-4 days, following by a symptomatic period, with the highest infectivity a few days after first symptoms.

But for COVID-19, the incubation period can vary between 2 and 14 days, and multiple studies have shown that many transmissions occur before a patient has any symptoms, and from persons who never develop any symptoms, or only light, non-specific symptoms. In many super-spreader events, where one infected person infected dozens of others, the "super-spreader" had no or only mild symptoms. Based on these and other data, scientist now think that an infected person is most likely to infect others in just a relatively short period; in symptomatic patients, this is just before symptoms appear.

If an infected person has close contact with many others during this short period of "maximum infectivity", he can infect many of his contacts in a short time - often within an hour or two. Being at the wrong place at the wrong time, and sharing space with the wrong person, makes it extremely likely to get infected. With many contacts, many get infected, as the graph below (from this publication) shows:

What does that mean? Let's go back to our "restaurant with friends" example from above. Since many COVID-19 transmissions happen before symptoms appear, it means very little that our friends do not have any COVID-19 symptoms. The high variability in incubation times means that they could have been infected just a couple of days ago, or two weeks ago, and just now reach their maximum infectivity period, during which we will get infected if we are close to them for an hour. But given the high shedding of infections virus particles when talking, it might not even be our friends who infect us - it could be someone sitting a couple of tables away, or someone who had our booth before us. Sure, the risk of infection is highest when you are close to an infected person, so you should keep at least a 6 ft distance from others whenever possible. But aerosol particles can quickly travel much farther than 6 ft, and any air flow from ventilation can help distributing them. With incubation times as short as 2 days, even someone who just was tested for COVID-19 two days ago may be infectious now.

Break the infection chain. It's not just about you - it's about the ones you infect, and they ones they infect, and so on.

Keep your distance.

Wear your face mask. They work. We know why they work. The better it fits, the better it works - especially at protecting you. Perhaps a 20 or 50% lower chance of getting infected does not mean much to you - but if everyone wears masks (and follows other guidelines), it will drop transmissions by 50%, and the epidemic will "just go away". It's not political. It public health, consideration for others - and the best way to get the economy back on track. Just ask Australia.

Thursday, November 26, 2020

Why COVID-19 Is So Hard To Fight

In this post, I will use one graph to explain why COVID-19 is so hard to understand, and therefore to fight. It shows the daily confirmed COVID-19 cases in the US, and the 5-day change in daily cases, since the end of April:

The blue curve at the bottom shows daily confirmed COVID-19 cases in the US. It shows an initial drop to about 20,000 cases, then the "summer rise" to about 70,000 cases per day in July, another drop to about 40,000 in September, and then a rise to 170,000 cases per day in November.

The red curve, which uses the y-axis on the right, shows an indication of the growth (or drop) in cases: the ratio of cases on a given day, divided by the number of cases 5 days earlier. When this ratio is below 1 (in the green section), case numbers are going down; when it is above 1 (in the light red section), the number of cases is increasing.

The numbers are based on 5-day periods because that is the roughly the average time between getting infected, and passing the infection on to someone else. In scientific jargon, this is often called the "generation interval" or the "serial interval". One way to understand it is to remember that it typically takes about 5 days after infection for symptoms to start, and that the chance of infecting others is largest just before and just after first symptoms.

This means that the 5-day ratio is also very close to the "reproductive number", often called R. In practical terms, R indicates how many others, on average, each infected person infects. If each person infects more than one other person, the number of new infections per day grows; if each person infects less than one other person, the number of daily infections goes down. This can be easily seen in the graph, where the red curve is in the red area whenever the number of daily cases (the blue curve) goes up.

Now take a close look at the values we get for R. In May and August when case numbers were dropping, R was between 0.85 and 1. In the summer and fall periods where case numbers were increasing, R was above 1, but never higher than 1.3. Typical values around 0.9 in "dropping" periods, and around 1.2 in "rising" periods. The difference is quite small - and therein lies the problem! To understand why, we need to look at this from two angles: the "personal risk" perspective, and the "public health" perspective.

Personal risk: A small increase means very little

When deciding what to do, in a public health crisis, the first question most people will ask is "What is the risk to me?" Depending on the answer, and on personal tolerance for risks, they may be inclined to change their behavior more or less. But regardless what exactly the answer is, everyone will have to accept a certain level of personal risk in the end.

After a few weeks or months of "being good" and, for example, staying away from restaurants and bars, the desire to go back to normal becomes stronger and stronger, and we start doing things again that are slightly more risky. That might be going to restaurants again; meeting with friends; going shopping; not wearing that face mask; or something else. But we'll generally decide that a bit more risk has to be taken. If we are young or healthy, we may well conclude a slight increase in risk still means a very low risk of getting seriously sick. Unless you're a statistician, you probably won't quantify the risk, but just about anybody would agree that a relative risk increase from 0.9 to 1.2 is so small that it's worth taking, if it means we can go back to the gym, the hair dresser, shopping, restaurants, or whatever strikes our fancy. If my personal risk was small to begin with, then even a 2-fold or higher increase in risk may well be worth it.

On a personal level, taking a bit more risk is a perfectly reasonable decision. This also is true if we consider others in our risk assessment, too - kids we send to school, other family members, or friends we meet.

Public health: Small risk increases have disastrous consequences

But what happens if everyone decides that taking a bit more risk is perfectly reasonable, and changes their behavior a bit? Say, for example, in a way that increases the risk of getting COVID-19 by just one third. What happens?

Let's assume we were in a period were new infections were dropping by 10% every 5 days, corresponding to R = 0.9. With 1/3 more infections now, R increases to 1.2: instead of a steady drop, we now have a rapid rise in new infections: 20% more daily infections after 5 days, and 44% more daily infections after 10 days (1.2 x 1.2). After a month of R staying at 1.2, the number of daily infections has grown 3-fold: just about what we saw in the US from October to November. A very small change on the individual level has caused a huge increase on the population level.

What is a perfectly reasonable decision on a personal level becomes a public health disaster.

Small things are "driving the pandemic"

Currently, the US is just one of many countries that is failing to control the resurgence of COVID-19 infections. A common theme here is that many regions try to contain COVID-19 with a minimal set of measures, for example limited restaurant hours instead of full closures. Against many measures, an often-heard argument is that "X is not driving the pandemic". Various regions have used this argument to leave schools and colleges open, have restaurants operating with minimal or no restrictions, and so on.

Taken literally, the arguments are correct insofar as that each individual "infection place" like schools or restaurants is not causing the majority of new infections. But even measures that eliminate just a small percentage of new infections can make a huge difference, and a few in combination can make the difference between a controlled epidemic with dropping infection numbers, and a rapidly growing, out-of-control epidemic. Therefore relaxing a few of such "minor impact" measures may well end up "driving" the epidemic from a "dropping" phase into a "rapid growth" phase. This problem is only made worse by halfhearted interventions, which drop R only just below 1.0. This means that case numbers will drop only very slowly, and rapid growth resumes quickly again after any relaxation.

Over the past six months, I have read several hundred scientific publications about COVID-19. Of all these, one of the publications that stuck to my mind the most was published by scientists from New Zealand. Apparently, it formed the basis of New Zealand's successful complete elimination of COVID-19 cases in the country. It listed a large number of interventions which were used in groups, depending on the current level of infections:

We can only hope that a similar rational approach will be used to control COVID-19 in the US and Europe over the next several months, until vaccines become widely available. Otherwise, we will see hundreds of thousands of additional avoidable COVID-19 deaths.

Monday, November 23, 2020

A Close Look At The Danish Face Mask Study

In the last week, a Danish research study that tried to study the effectiveness of face masks has gotten a lot of attention in the media. The study had been designed so that it should have shown a statistically significant effect if wearing a face mask reduces the risk of COVID-19 infection by 50% or more.

Note that there are a lot of words in italics in the previous sentence. All of these are very important to understand what the outcome of the study really means - and what it does not mean. There have been many articles and posts that explain some of the shortcomings of the study, but many of these miss some very important points. Let's have a closer look, starting with the results.

Study results: Face masks reduce PCR-confirmed infections by 100%, and doctor-confirmed infections by 50%

When looking at a scientific study, the first thing to do is to look at the data. The important results are given in Table 2 of the study:

Let us start with the last two lines of the table (we'll spend plenty of time on the first lines later!). The second-to-last line shows how many study participants had a positive PCR test for the COVID-19 virus. This is the "gold standard" for diagnosis. A positive PCR test is required to be counted as a "confirmed case" in the US and most other countries. The study showed that zero people in the "Face Mask Group" had a positive PCR test for COVID-19. In the control group that did not use face masks, there were 5 confirmed COVID-19 cases.

So, based on the "gold standard" test, the use of face mask prevented 100% of COVID-19 infections in the study!

If we go on to the last line, which shows the number of participants that have been diagnosed with COVID-19 by a health care provider, the picture changes a bit: 5 participants wore face masks were diagnosed with COVID-19, compared to 10 participants in the "no mask" control group. This means:

Judging by the actual diagnosis from health care providers, face mask use reduced COVID-19 by about 50%.

But that's not what the study claimed, some astute readers may point out. And this brings us to the first lines in the table which we have ignored so far, which describe the results of antibody tests and the "primary composite end point". Which warrants some explanation.

Reading through the paper and the 88-page long supplementary material carefully, we learn that the study heavily relied on "dip stick"-type antibody tests that the participants did at home. The tests work pretty much like pregnancy tests, except that instead of peeing on the stick, you have to put a couple of drops of blood on the stick; and instead of a "+' sign, a positive test gives two lines, as opposed to a single line for a negative test.

The study also sent all participants two swab kits for PCR testing, and instructed them to use the kit and send the sample to a lab for PCR testing if they should develop any COVID-19 symptoms. In addition, participants with symptoms where instructed to seek medical help.

The "primary composite endpoint" now takes the combination of PCR test results, antibody test results, and confirmed medical diagnoses. Any participant who is positive in any of these three results counts towards the "composite endpoint". Participants with a positive antibody result at the beginning of the study were excluded from the analysis.

Looking back at the results table, we see that the antibody results dominate the overall results. In the face mask group, the number of positive antibody results is more than 6-fold higher than the number of confirmed diagnoses. This raises an immediate red flag. One potential reason for this discrepancy is that some participants had asymptomatic infections. However, asymptomatic infections typically account for about 50% of all COVID-19 infections, and all symptomatic patients should have received a confirmation by PCR or from health care providers. Therefore, the number of positive antibody tests should only have been about 2-fold higher. This is a clear indication that the antibody results are possibly very wrong.

Scientists familiar with COVID-19 antibody tests will immediately think about false-positive test results. According to the study, the manufacturer indicated that 0.8% of tests will give a false-positive result; for about 2,500 participants in each group, that would be about 20 false positives. But let's ignore false positives for the time being, and look at a different issue: timing.

Timing is crucial. Timing is crucial.

Yes, timing is crucial in more ways than one for this study. Let me explain.

In the study, participants did a COVID-19 antigen test at the beginning of the study, and then again at the end of the study about 30 days later. Anyone who tested negative at the start, and positive at the end, must have gotten infected during the study period, right? Wrong! Very wrong!

As plausible as the "trivial" conclusion seems, it completely ignores what we know about how long it takes to develop antibodies. A brief visit to the CDC web page about antibody testing for COVID-19 shows that "Antibodies most commonly become detectable 1–3 weeks after symptom onset". Let me illustrate this with a figure from a blog post on this topic:

In this study, it took about 10 days after the first COVID-19 symptoms before half of the antibody tests gave positive results, and about 2 weeks before close to 100% of the patients had antibodies. Another study showed slightly shorter times, but also showed that it took more than 4-5 days after symptom onset before antibodies were detectable. We also know that it usually takes another 5 days after infection before the first symptoms appear, and in some cases up to 2 weeks. This means that it will take more than 10 days after infection before antibody tests are positive.

In the context of the Danish study, this means that any participants who got infected within about 10 days before the study start gave a negative result in the first test, but most of them had a positive result in the second test.

Things get worse when we look at a second timing effect: the change in COVID-19 infections in Denmark before and during the study. This is where it gets a bit more complicated. The reported number of confirmed cases peaked on April 9, just before the study started around April 15, and then decreased quickly. But testing increased rapidly after April 19, more than tripling by April 30. One way to eliminate the effect of testing availability and changes is to calculate the actual number of infections from reported COVID-19 deaths. This is shown in the next graph:

The study was done in 2 separate groups starting 2 weeks apart, shown by the blue and red shaded areas. The graph shows a very rapid drop in daily infections from about 2,000 per day to about 500 during the first week of the study, and further drops later. This means that at the start of the first study period, there was a relatively large number of Danes that had been infected in the preceding 10 days. They had not yet developed antibodies to COVID-19 when tested at the start of the study, but would test positive at the second test a month later. This means that the test would wrongly count many infections that occurred before the study began!

We can estimate the actual number of infections that happened in the 10 days before each of the 2 study periods, and compare it to the number of infections during the study period:

The numbers show that there were almost as many infections (22,886) in the 10 days before the study periods as there were during the 30-day study periods (24,943). Since the pre-existing infections could not be affected by face mask wearing, this created a major distortion, increasing the reported infections in the face mask group significantly.

The numbers shown above are calculated for the entire Danish population of about 5.8 million. The groups in the study were about 2,500 participants each, which we can use to calculate the expected number of cases for the face mask group and the control ("no mask") group in the study:

For the "no mask" group, the expected number of cases is 20, consisting of about 10 cases infected in the 10 days before the study period, and 10 cases infected during the study period.
For the face mask group, we also expect 10 cases from the 10 days before the study, but the number of cases during the study would be reduced by 50% to 5 cases, so we would expect a total of 15 cases in the face mask group.

Given the design of the study, we would expect to see just 5 fewer cases in the face mask group than in the control group even if faces masks reduce the infection of wearers by 50%. The observed number of cases would be 15 in the face mask group, and 20 in the control group. The observed reduction would be smaller than the expected 50% reduction due to 2 effects:

the large drop of infections at the start of the study period
the fact that only antibody-tests, but not PCR tests, were done at the start of the study period (PCR tests give positive results much earlier than antibody tests, often within 2-4 days after infection)

Comparing actual and expected results

The calculations above show that we would expect 15-20 positives in the two groups, with just 5 cases difference between the groups. Note that we had to make some assumptions, for example about the fatality rates, and that the estimates may be off by a factor of 2 - but not much more.

In the study, the authors reported 10 cases of COVID-19 in the control group that were confirmed by health care providers, and 5 cases in the face mask group. Diagnosis are generally only made for symptomatic cases, which are typically estimated to be about 25-50% of total infections; thus, there is very good agreement between the expected and reported numbers.

The number of positive antibody tests reported is slightly higher, between 31 and 37 for IgM and IgG. The difference between the face mask group and the control group is less pronounced than for the diagnosed cases, which is exactly what we would expect as the effect of including participants that had been infected before the start of the study, but who had not yet progressed to a detectable antibody response. The antibody numbers are roughly 2-fold higher than the expected numbers. Two issues that may have contributed to this difference are false-positive antibody results, and a higher infection rate in the study participants, who spent on average 4.5 hours outside their home each day, relative to the general population.

But while we are seeing good agreements between the expected and the reported numbers, the agreements we see are just qualitative. Due to the relative small number of cases, and the "contamination" from infection before the start of the study period, it is unlikely that the results reach the typically required levels of statistical significance. For that, the study would have had to be substantially larger, and ideally also have included a PCR test at the start of the study period.

What went wrong

When designing the study, the authors determined the number of study participants they needed based on an estimated infection rate of 2%, which was reasonable at that time. The authors also were looking for a relative large reduction effect of 50%; to see a smaller effect, a larger study would have been necessary.

However, by the time the study started, the interventions initiated by the Danish government had taken effect, and reduced the number of daily infections by more than 2-fold, with an additional 10-fold reduction by the time the study ended. The overall calculated infection rate for the combined study periods was less than 0.5% - roughly four-fold lower than what had been expected. This resulted in lower case numbers - number that are too low to give a statistically meaningful result. To some extend, the study became a "victim" of the success that Denmark had in containing the COVID-19 pandemic in the spring.

A second factor that contributed to the lack of a "clear signal" from the study was that the authors apparently did not consider the lag time between infection and the begin of a detectable antibody response. In their defense, the data that describe the timing of the antibody response were probably not available when the study was designed. Furthermore, the effect would have been significantly lower if the infection numbers had still been increasing, or at least stable. Nevertheless, the lack of any discussion of the "antibody delay" effect on the study results in the publication is somewhat disappointing. With proper consideration of this effect, the data produced by the study are not only compatible with a 50% protection from wearing face masks - they actually are in agreement, even if they may not be "statistically significant" due to the factors discussed herein.

Friday, October 30, 2020

New Evidence That Face Masks Work

In the last couple of weeks, several new scientific studies were published that provide solid evidence that face masks can reduce COVID-19 transmissions significantly. In this post, I'll briefly describe several studies, as well as other evidence from trends in the US.

Face mask mandates reduced COVID-19 growth in Missouri counties

One study that was published as a preprint compared the COVID-10 growths in 5 metropolitan regions in Missouri. Two of the regions, St. Louis City and St. Louis County, implement mask mandates in July, while the three other regions did not. Comparing the COVID-19 growth rates between the "mask mandate" regions and the "maskless" regions, the authors found significantly slower growths of COVID-19 cases in the mask mandate regions: 1.36% per day, almost 2-fold lower than the 2.42% observed in the maskless regions. This difference was significantly larger than the difference in growth seem in the weeks before the mask mandates were issued.

To see if this trend persisted after the time analyzed in the study, I downloaded per-county level data from Johns Hopkins University, and looked at the growth in COVID-19 cases from July 1 to October 27:

The counties with a mask mandate (in blue) had about 3 to 4-fold more total COVID-19 cases at the end of the period than at the beginning; the counties without mask mandates showed about 10 to 12-fold growths. This indicates that the mask mandates cut COVID-19 infections to about one third.

Face masks reduce in-flight transmission of COVID-19 dramatically

A second recent study examined how well face masks work in flights. It reviewed a number of previous studies, including one where a single passenger in business class had infected 12 other passengers in the business class cabin:

The originally infected passenger ("patient 0") was on seat 5K, shown in red. The passengers she infected during the flight (shown in orange) were mostly seated behind her, and/or to the side. Several of the infected passengers were more than 6 feet away from patient 0. The flight happened early in the epidemic, and the infected passengers did not wear face masks. The review also lists several other flights with in-flight transmission before the use of face masks on flights became common.

In stark contrast to this "superspreader event" flight is a series of flights with Emirates that arrived in Hong Kong in June and July. Overall, 8 flights transported 58 passengers who were COVID-19 infected during the flight. The flights were 8 hours long, and had a total of 1500 to 2000 passengers, who were quarantined and repeatedly tested for COVID-19 in Hong Kong. The testing showed that not a single transmissions happened on those flights. Emirates had a strict face mask policy in place at the time of the flights, which was enforced during the flights by flight attendants.

The review also describes a number of other flights, both with and without mandatory face masks, which show that the use of face masks during flights dramatically reduces the number of COVID-19 transmissions on flights.

Evidence from face masks mandates in Germany

While face masks mandates were common in Germany from the early phase of the COVID-19 epidemic on, the dates when mask mandates were issues varied by location. Two separate studies examined the resulting differences in COVID-19 transmissions, with focus on Jena, a city that implemented mask mandates early. One study presents qualitative evidence that face masks mandates reduced transmissions in Jena. The second study used "synthetic control methods" and data from 401 German regions to derive quantitative estimate, concluding that face mask mandates reduced the daily growth in COVID-19 infections by about 40%.

More evidence from US regions and states

A number of studies have focused on US states or regions, and come to similar conclusions as the studies above. Here is a figure from one of these studies that shows a drop in COVID-19 infections after the introduction of mask mandates:

The effect shown is not as pronounced as in some of the other studies, which may be due to varying levels of adherence to the mandates, especially since enforcement of mask mandates in many US regions is lax or non-existent. Sadly, the use of face masks has become politicized in the US. Many Republican governors have refused to issue face masks; Republican law enforcement has been reluctant to actually enforce mask laws; and many Republicans refuse to wear face masks.

The effect of the "Republican refusal" can be seen when comparing which US states have the most COVID-19 cases over time. Here are a couple of screen shots from an illustrative animation:

In early June, the most affected states were a roughly even mix of blue and red states. But by late October, the picture had changed dramatically:

Now, Republican states dominate the distribution. Of course, other factors like early re-openings and resistance against new containment measures are also likely to play a role, but negative attitudes towards face mask wearing and face mask mandates are certainly a big factor in this development.

The evidence is very clear - face masks reduce COVID-19 transmissions and save lives. Scientists have a pretty good idea why and how face masks work. I have discussed why misguided "herd immunity" strategies won't work, using Texas as an example. For the US as a whole, just "letting it take it's natural course" would cost more than a million additional lives; when taking into account that death rates increase significantly when hospitals are overloaded, the number of additional deaths in the US would more likely exceed 2 million. Anyone who still believes that COVID-19 deaths are overstated in the US needs to have a close look at excess death calculations, which show that the official COVID-19 numbers represent only 2 out of 3 COVID-19 linked deaths.

So, please, if you go to an indoor space where other people are, or if you are outdoors in a crowd, or closer than 6 feet to someone else who does not live with you: wear a mask!

Tuesday, October 27, 2020

Misleading COVID-19 Information in Florida

This page describes a systematic pattern of misrepresenting information about COVID-19 by officials in Florida. These officials include the governor Ron Desantis, the governor's spokesman Fred Piccolo Jr., Florida's Surgeon General, Dr. Scott A. Rivkees, and Republicans in the Florida House.

A Red Flag: Is COVID-19 Becoming More Deadly in Florida?

What started my investigation was a strange observation: based on reported COVID-19 confirmed case numbers and deaths, it appeared that COVID-19 is becoming more deadly in Florida than it has been during the summer peak. One way of looking at this is by looking at the relation between reported case numbers and reported death rates; since deaths are typically delayed by several weeks relative to test results, I am comparing death rates to case rates two weeks earlier, using 2-week averages for both deaths and cases:

While the time-adjusted case fatality ratio (CFR) for the US remained almost constant for the US between July and October, it increased from about 1.3% to about 4% for Florida. This peculiar increase prompted me to look for possible explanations.

The Florida COVID-19 Dashboard: How to Understate COVID-19 Deaths

One of the first stops was Florida's official COVID-19 dashboard. The graphs on the right side that depict cases and deaths are interesting:

The top graph shows the new cases, which show an increase over the last month. The bottom graph shows COVID-19 deaths, and the immediate impression is that things must be getting a lot better - the graph shows a clear downward trend in deaths! Wonderful - but in direct contradiction to the increasing fatality rates we had seen in the previous figure. What gives?

The first hint comes from the title "Resident Deaths by Date of Death". That seems reasonable enough - until you read the fine print: "The Deaths by Day chart shows the total number of Florida residents with confirmed COVID-19 that died on each calendar day (12:00 AM - 11:59 PM). Death data often has significant delays in reporting, so data within the past two weeks will be updated frequently."

The key here is that "death data often have significant delays in reporting". That means that the numbers for the last several weeks understate the actual death substantially; the number for the last few days show only a small fraction of the deaths that actually occurred. But rather than stating this clearly, the fine print states that data "will be updated frequently". Perhaps understating the actual death toll may be a bad thing, but updating frequently must be a good thing, right?

But the Florida government had a reason to choose the "death by day" reporting: it will always show a positive trend in deaths, since there will always be fewer cases for the last few days. Anyone who looks at the graph without reading and understanding the fine print will always conclude that the COVID-19 situation in Florida is improving. Always. And who reads the fine print?

For an example, we can use the screen shots of the Florida COVID-19 dashboard that the COVID Tracking Project has captured. Here is what the death graph looked on 8/2/2020:

Death by day 8/2/20

For comparison, here is what the graph looks like when plotting the number of new death reported:

That's a very different picture for the last two weeks of July! If we look at the screenshot of the Florida dashboard from 8/15, it gives a very different picture for these weeks:

Florida dashboard as of 8/15

Note that the cases around 7/20 now hover around 160 per day, instead of the 120 per day as reported on 8/2. For the beginning of August, we now see around 140 cases per day; two weeks later, this increases to 180 per day.

The bottom line is that the "By day of death" graph on Florida's COVID-19 dashboard will never show an accurate picture of the actual trends in recent weeks. It will always understate deaths for the last 2 weeks substantially, and show a decline of deaths in the most recent days. Given the observed reporting delays, the only apparent purpose of the death graph on Florida's COVID-19 dashboard is to mislead.

Even worse, the graph creates an incentive to delay the reporting of COVID-19 deaths. Early in the COVID-19 epidemic, Florida's board of medical examiners published data about COVID-19 deaths directly. However, when the government noticed that the numbers reported by the medical examiners where higher than the numbers reported by the state, the health department stopped the release of the medical examiner's list. Afterwards, only numbers released by the Florida Department of Health were available, whenever the department chooses to include a deaths. When deaths are added with a 2-week delay, as was typical in the summer, it would help to create the impression that the worst problems were in the past. If a death was added more than 30 days after it happened, it would never show in the death graph on Florida's COVID-19 dashboard.

This created a strong incentive to delay death reports in Florida for anyone who wanted to downplay the severity of the COVID-19 epidemic. As a result, the reporting delays increased substantially since the summer:

But while the delayed reporting was welcome when it reduced the number of reported COVID-19 deaths in the summer, it is now creating a problem: eventually, the death have to be reported!

Killing Two Birds With One Stone: "Investigate All COVID-19 Deaths!"

On October 21, Florida's Surgeon General, who had remained surprisingly quiet during the COVID-19 epidemic up to this point, issued a press release stating that all COVID-19 fatalities reported to the state will be subject to a "thorough review". In addition to criticizing that some reports were more than 30 days late, he focused on 5 cases where more than three months had elapsed between the COVID-19 diagnosis by PCR test and the eventual deaths.

The issue was quickly picked up by governor DeSantis' spokesman Fred Piccolo Jr., who stated:

"What is different about the deaths, is that the health department was finding people who were admitted as positive as far back as March or April and who passed away in August or September or October. Is that a COVID death?”

Looking at the data in the Surgeon General's press release shows that Piccolo is stretching the truth beyond the breaking point. Questioning if someone who was diagnosed in March and died in October really died of COVID-19 seems reasonable, right? But the earliest test date listed by the Surgeon General was from June, not March or April - three months later. The longest elapsed time between test report and death was 111 days. While this is still a long time, it is shorter than times that have been reported for people who recovered from COVID-19, as a quick Google search shows:

A patient in North Carolina was released after 137 days in the hospital. Her complications which were directly caused by COVID-19 included a heart attack and kidney and lung failure.
Two men in Georgia were in the hospital for COVID-19 for more than 4 months. One of the two was released, the other is still in the hospital.
A 35-year of woman in the UK was treated for 141 days in the hospital, which included 105 days on the ventilator.

The last case is interesting because the treatment happened in a hospital run by UK's National Health Service - a public health system that Republicans typically describe as "socialist".

Those are just some random samples from a quick internet search, and all of the listed patients survived. Scientific studies show that survivors typically spend less time in hospitals than patients who die; other studies report very long hospital stays, for example three patients with more than 50 days in a hospital in one early study from China (as well as two more patients who still were in the hospital after 37 days). Other studies show that hospital stays in the US tend to be longer than in China, and that a significant fraction of patients stay in hospital care for more than 40 days. Some patients get admitted to the hospital for COVID-19 multiple time. In one case in Belgium, DNA sequencing proved that a patient had been infected on two separate occasions from different people; this patient died from the second infection.

These examples show that there is plenty of both anecdotal and scientific evidence of patients who require hospital treatments for several months, and that a small number of cases with a large time between diagnosis and death is therefore not suspicious. It is very likely that the investigations will come to the same conclusions, although it is extremely unlikely that the Florida government would announce such conclusions.

The Pattern: Create Doubt About COVID-19 Deaths

The Surgeon General's press released discussed above is just one of many examples where Republican politicians in Florida try to create doubt about the true number of COVID-19 deaths. A recent example is a "Florida House report" commissioned by Republican House Speaker Jose Oliva. The report says that "60% of death certificates issued for state residents whose deaths were attributed to COVID-19 had reporting errors and most were filed by medical examiners". It speculates that this "may be inflating the COVID-19 death toll by 10%".

Phased differently, the results could be phrased as "a close investigation looking for problems has found that 90% of the reported death are definitely due to COVID-19, with the remaining 10% possibly being due to COVID-19 or some other cause". But instead, the House Speaker, who has no medical background, talks about "compromised data".

Note that the reporting about the issue starts with casting doubt on 60% of the death certificates. It is likely that many readers will remember this particular number, and few will remember than in reality, at most 10% of the death certificates are questionable with respect to COVID-19.

Another example of the "cast doubt" strategy is governor DeSantis' mentioning of the death of a motorcyclist who had tested positive in an accident, and who was initially included on Florida's list of COVID-19 related deaths. However, even before governor DeSantis made the statement in an interview on July 20, this case had already been removed from the reported death counts. Nevertheless, this example is very "sticky", and comes up frequently in conversations with COVID-19 deniers.

The Reality: Florida Reports Less Than 3 Out Of 4 COVID-19 Deaths

The is a simple number that really determines how deadly the COVID-19 epidemic is: the number of people who die in addition to the number who would die in a typical year without COVID-19. This number, called "excess deaths", can easily be looked up based on death certificate data that all states submit to the CDC, and which the CDC publishes on its web site.

Based on spreadsheets last updated on 10/21/2020, and looking at actually submitted death certificates from the weeks ending between 3/7 and 9/19/2020, we can compare the excess deaths to the number of death certificates that listed COVID-19 as a cause of death:

During these roughly 6 months, the number of excess deaths in Florida was 21,263 (note that this number will go up slightly in the next few months, since some deaths certificates are submitted with delays up to a year). Of these, 14,795 death certificates listed COVID-19 as a cause of death. This is about 69.6% of the excess deaths. The graph above shows that excess deaths and COVID-19 deaths follow the same pattern, which strongly indicates that the vast majority of excess deaths is most likely caused by COVID-19, and not some other cause like violence or suicide.

The data for excess death calculations are readily available. Excess death analyses have been published on many web sites, including the Financial Times and Our World In Data. Several scientific studies have analyzed excess mortality in the US, including a study recently published by the CDC. There is world-wide agreement on using excess mortality analysis to determine the impact of epidemics.

The result of excess death analysis for Florida is clear: the current process fails to correctly identify COVID-19 as a cause of death in 3 out of 10 cases. The COVID-19 reporting problem that Florida has is one of under reporting, not of over reporting. This could be addressed by requiring COVID-19 tests and, if necessary, autopsies for any deaths where COVID-19 cannot be excluded by clear evidence.

Just don't wait for the governor or state Republicans to suggest that.

Friday, October 23, 2020

New Case Record in the US

The US has set a new record for daily COVID-19 cases today, with 81,210 new cases according to Worldometers.info. The COVID Tracking Project reports an even higher number of 83,010 cases.

In July, the increase of cases was primarily driven by rapid rises in Texas, Florida, California, and Arizona. These states reacted with measures that partially rolled back the "re-opening", for example bar closures and local mask mandates.

In contrast, the current rise in COVID-19 infections is the result of rising case numbers in the vast majority of US states:

Source and full table: npr.org

A total of 22 states show an increase in average daily cases and report more than 1,000 cases per day. The top 3 states currently account for about 20% of new cases; during the "second peak" in the summer, this number was closer to 50%.

To stop the current growth in new COVID-19 infections, most of these states would have to increase the stringency of restrictions. But whereas several countries in Europe, where COVID-19 has also been rising rapidly, have announced plans for new lockdowns and other severe measures, such actions seem extremely unlikely in most US states, at least in the near future.

Another important trend in COVID-19 infections is that infections have shifted from metropolitan areas to smaller towns and rural areas:

From https://covid.cdc.gov/covid-data-tracker/

Resistance against anti-COVID measures like mandatory face masks is a lot stronger in most rural areas. At the same time, medical support is often worse, with hospitals mostly located in larger cities, where the frequent sound of ambulances transporting COVID-19 patients alerted every resident about how serious the situation was.

In view of these (and other) factors, it is likely that we will see many more records of COVID-19 infections in the US in the coming weeks and months. Reported COVID-19 deaths have just started to increase from about 700 daily deaths (7-day average) a week ago to more than 800 today. Since reported COVID-19 deaths rise with a delay of at least 2-3 weeks after confirmed case numbers rise, additional increases are inevitable.

Some people in the US will doubtlessly interpret the rising number of COVID-19 cases as a positive sign, hoping that it will help the US to achieve "herd immunity". But we have to look no further than to the rising case numbers in New York, where an estimated 25% to 35% of the population has been infected by COVID-19, to see how far away we are from herd immunity. Close to 9 million confirmed cases in the US may seem like a large number, but still represent less than 3% of the population. Epidemiologists estimate that herd immunity would require at least 60% to 70% of the population to be immune to COVID-19. Even after taking into account that only about one out of 5 COVID-19 infections is reflected in the "confirmed case" numbers, reaching herd immunity through infection would lead to at least one million of additional deaths in the US. Considering what we know about re-infection from other corona viruses, which appears to be common a year or less after the initial infection, it is very questionable if reaching herd immunity "the natural way" is possible at all. But effective vaccines, which could let us reach herd immunity without additional deaths, will not be available in sufficient quantities until the summer of 2021 at the earliest.

Tuesday, October 20, 2020

CDC Reports 299,000 Additional Deaths

Today, the CDC published a study that looked at the excess mortality linked to COVID-19 in the US. As I explained in previous posts, "confirmed cases" and even official COVID-19 death numbers paint a sometimes misleading picture, since such numbers are distorted by many factors, including test availability, the willingness of people to take tests, and test accuracy. But the one number that is not subject to any of these problems is the number of people who have died, compared to previous years.

Based on the analysis of death certificates submitted to the CDC's "National Vital Statistics System", the CDC reports that 299,000 more people have died this year than in previous years - in other words, the US had almost 300 thousand additional deaths. Only 198,081 of the death certificates listed COVID-19 as a cause of death. I explained possible causes for this discrepancy in a previous post. The predominant cause for additional deaths that do not list COVID-19 as the cause of death are missing positive COVID-19 tests. One clear indicator is that excess deaths are very closely linked to COVID-19 deaths.

The official COVID-19 death counts include only 2 out of 3 actual deaths linked to COVID-19 in the US. To get a realistic picture of the deaths that COVID-19 has caused in the US, take the official reported numbers, and add 50%. For example, the 220,000 deaths reported yesterday (10/19) by Johns Hopkins University correspond to a total of 330,000 excess deaths due to COVID-19 in the US. Spreadsheets with detailed numbers that are updated weekly are available on the CDC web site.

Multiple countries report low excess death rates

Excess death analysis is also a very useful tool to compare the severity of the COVID-19 pandemic in different countries, which often have very different testing policies, capacities, and COVID-19 reporting rules. A study that was published last week in Nature Medicine looked at the death data from 21 different countries for the first phase of the pandemic, from mid-February to May. Here is a graph from the study that summarizes the results (click on the image for a larger version):

The countries that had the highest excess mortality were Spain and England & Wales, where deaths increased by about 35-40% during the period studied. The next group of countries includes Italy, Scotland, and Belgium, closely followed by Belgium, Sweden and the Netherlands, with roughly 20-25% additional deaths.

There are seven countries that did not show any significant increase in deaths: Bulgaria, New Zealand, Slovakia, Hungary, Czechia, Australia, and Poland. With the exception of Poland, all of these countries actually reported fewer deaths than expected. The study authors mention the reduction of work-related injuries during lockdowns in these countries as one likely reason for the observed reduction in death.

Three countries (Norway, Finland, and Denmark) had excess death rates below 5%; overall, almost half of the countries studied (10 of 21) reported either no excess deaths, or an increase below 5%.

How does the US compare?

I downloaded the latest data from the CDC web site to calculate comparable excess death numbers for the US, using the weeks ending 2/15/2020 to 5/30/2020. For the entire US, the excess death rate for this period was about 15.3% when using the numbers for reported death certificates only, or 16.3% if using the "weighted" data set that tries to compensate for late submission of death certificates. This number is roughly the same as what France reported.

However, the US is significantly larger than any of the countries in the study above, and closer in size to all of the countries combined. COVID-19 spread in the US in a regional pattern, with the northeastern states reporting the highest numbers from March to May:

The five northeastern states shown (NY, NJ, MA, CT, and RI) had a combined excess death rate of 48.4% - significantly higher than any of the 21 states from the study above.

After the end of May, COVID-19 infections and deaths in most European countries and in New Zealand were at at least 10-fold lower than during the March to May period. In the US, the COVID-19 infections "moved" to southern and western states, which now experienced a large number of additional deaths:

The excess death rates in Arizona and Texas during the summer months were similar to the rates seen in the worst European countries, Spain and England & Wales, in the spring. This is somewhat astonishing, considering that these states had several more months to prepare, and that some improvements in treating COVID-19 had been made in the meantime. Louisiana stands out because it reported excess death rates near 25% both for the spring and the summer periods.

In conclusion, the comparison of excess deaths linked to COVID-19 showed that the affected regions in the US have done as poorly as, or worse than, the European countries that had the highest relative increase in deaths. The primary causes are well known: lack of testing early in the epidemic; false statements by leading politicians about the severity of COVID-19; and an irresponsible push to re-open, ignoring even the minimal re-opening guidelines the administration had released.

In recent weeks, the state of the pandemic has diverged dramatically between the countries. While New Zealand has remained completely free of domestic COVID-19 infections and fully re-opened the economy, many European countries have seen a very rapid rise in COVID-19 infections. Most countries are issuing new restrictions, and the worst-hit areas have initiated a new round of full lockdowns.

In the US, COVID-19 case numbers have increased more slowly, from about 36,000 in the middle of September to almost 61,000 today (7-day averages). Given a weekly growth rate of 17%, the US will set new records for daily COVID-19 cases later this week or next week. If future increases in the US follow the patterns seen in Europe, the increases will likely accelerate further. It is quite possible that the US will see more than 100,000 new confirmed COVID-19 cases per day before election day (November 3). Reported COVID-19 deaths lag behind by about 2-4 weeks, but 7-day averages have started to increase over the last three days, from 704 deaths per day to 753.

Thursday, September 10, 2020

400,000 COVID-19 Deaths in the US in 2020?

Recently, media headlines reported that an "important" computer model now predicts more than 400,000 COVID-19 deaths in the US by the end of the year. I have described in earlier posts why I am not a fan of this particular computer model, so I'll just ignore the model. However, I'll have a closer look at the question if we may indeed see 400,000 deaths linked to COVID-19 in the US before the end of the year.

At first glance, the number of 400,000 seems too high. So far, the official COVID-19 deaths numbers in the US are below 200,000, and many people think even this number is an overstatement. So let's look at a graph to get started:

The graph shows the weekly number of COVID-19 deaths, based on death certificates submitted to the National Center of Health Statistics, as a dotted line. It also shows the number of "excess deaths" as a black line. Here's a short section from the data file that I downloaded from the CDC web site to generate the graph that explains the "excess deaths":

All lines show data for the week that ended on 8/15/2020. In the first line, the numbers show data only for the deaths certificates that have already been reported to the NCHS. Some states and counties are very slow to submit their data; typically, it takes 6-8 weeks for 99% of all death certificates to be submitted. When the data file was generated on 9/9/2020, a total of 55,145 deaths had been reported for the week. From previous years, the CDC has a pretty good idea how many deaths would have typically occurred in this week: 51,639. For example, the number of deaths reported for the same week in 2019 was 51,128; the expected number is based on data from the last 4 or 5 years, which eliminates most week-to-week fluctuations.

So for the week of 8/15, the number of deaths certificates received so far by the NCHS was 3,506 higher than expected - in other words, there were 3,506 reported "excess deaths". But the CDC knows that the data are incomplete, and can estimate how many "late" deaths certificates will arrive in the next months. This number is shown in the second line, where the Type is listed as "Predicted (weighted). The numbers indicate that the CDC expects to receive almost 5,000 additional death certificates, which will result in more than 8,000 excess deaths for the week. When the same extrapolation is applied to all recent weeks, the calculated number of excess deaths for 2020 (up to 8/29/2020) increases to 252,307.

In the third line above, the number of death certificates that list COVID-19 as a cause of deaths has been subtracted in the excess deaths columns. This results in 2,027 excess deaths that are not counted as COVID-19 deaths for the week, and a total of 81,012 excess deaths for the year. So roughly, for every 2 reported COVID-19 deaths, there is an additional excess deaths that is not reported as COVID-19.

In the graph above, the number of excess deaths very closely follows the number of reported COVID-19 deaths. This is also true when we look at the data for individual states and regions, for example New York City and Texas:

Again, we see that the number of excess deaths is very closely linked to the number of COVID-19 deaths, but always higher. In states had very low COVID-19 cases and deaths, for example Hawaii and Alaska, the number of excess deaths also remained very low (and sometimes negative), regardless of whether or not the states issued stay-at-home orders.

This is strong evidence that the actual number of COVID-19 deaths is significantly higher than the reported number of COVID-19 deaths - in fact, about 45% higher. One likely reason for this under-counting is that many COVID-19 victims have died without ever being tested for COVID-19, and often outside of hospitals. False-negative tests for COVID-19, which can easily happen if the PCR tests are administered too early or too late in an infection, are another possible reason.

But regardless of the reasons, it is a proven fact that the number of deaths linked to COVID-19 in the US is significantly larger than the reported number of deaths. According to death certificate data, more than 250,000 people had died in addition to the "usual" number of expected deaths by the end of August. There are additional death, not people "who would have died anyway".

So, now back to the question if the US will reach 400,000 COVID-19 deaths before the end of the year. Rather than running complicated computer models, I'll just give you a "back of the envelope" calculation. As of today, the 7-day average of confirmed COVID-19 deaths is slightly above 700, or about 5,000 a week. This is down from almost 1,200 deaths per day in early August. The drop closely mimics the drop in confirmed cases, which reached almost 70,000 per day in mid-July, but seemed stable in the low forty thousands before the Labor Day weekend.

If we assume that the number of new infections and the number of COVID-19 deaths remain stable at the current level, we would see 5,000 additional confirmed COVID-19 deaths per week. With 16 weeks remaining in the year, that's an additional 80,000 confirmed deaths. As I write this, Worldometers shows a total of 196,183 COVID-19 deaths for the US; thus, we'd have about 276,000 COVID-19 deaths by the end of the year. Since the COVID-19 linked excess mortality is 45% higher, this predicts that the US will see about 400,000 COVID-19-linked excess deaths before the end of the year.

If you read the media articles that I mentioned at the beginning of this post, you may have noticed that I have not considered many factors that may increase COVID-19 transmissions and deaths before the end of the year. Some of these include:

Secondary transmissions after school and college re-openings.
Re-starting of sport events and other large gatherings.
Seasonality: it is quite possible that transmissions increase during fall and winter.

Initial evidence of college re-openings has shown clearly that many students will be infected when living on or near campus. This is to a large extend because they know that their personal risk of dying of COVID-19 is extremely low; however, infected students will cause secondary infections in college and school personal, parents, and other community members, in general people likely to be older and at higher risk for severe or deadly COVID-19 symptoms.

Perhaps the most worrying factor is the potential that infections will rise during the colder seasons. we do not know if this will happen, but there are multiple potential reasons - from spending more time indoors, where transmissions are higher, to lower vitamin D levels from reduce sun exposure, which has been linked to more severe disease. But perhaps the most troubling indicator is that the list of countries with the highest number of COVID-19 cases now includes many countries from South America, which is just emerging from winter; some of these countries report extremely high case loads and COVID-19 deaths numbers despite strong government interventions to reduce transmissions.

If COVID-19 transmissions in the US rise again from the current levels, regardless of whether that is due to one of the factors listed above or to other reasons, then it is likely that the number of COVID-19 deaths will increase significantly, quite possibly exceeding half a million "excess" deaths.

We'll know for sure sometime early spring next year, when most death certificates for 2020 have been submitted to the CDC.

Thursday, August 27, 2020

Separate Realities

Being trained as a scientist, I have always believed that there is such a thing as truth and a single reality. But people in the USA seem to be living in completely different realities, as a recent poll shows. It asked Americans if they found the current number of deaths from COVID-19 in the US acceptable. The results vary dramatically by political orientation:

The majority of Republicans (57%) think that the current number of deaths is acceptably, but only 1 out of 10 Democrats thinks this way.

A key to understanding these differences is that "About two-thirds of Republicans (64%) think the number of US fatalities from coronavirus is actually lower than what is being reported". In sharp contrast, only 12% of Democrats think that the number of COVID-19 deaths is lower. But in both parties, the percentage of people who believe that COVID-19 deaths are overstated closely mimics the percentage of people who deem the current fatalities acceptable. This raises the question:

Are the reported COVID-19 deaths numbers accurate, too low, or too high?

This is a reasonable question. There is no doubt that not all COVID-19 deaths are reported accurately. If someone dies of COVID-19 without ever being in a hospital and without a positive COVID-19 test, the death will often not be reported as a COVID-19 death, leading to under-reporting. But at other times, COVID-19 may be listed on the death certificate even if the death clearly was not caused by COVID-19, causing over-reporting of death. Many people who believe that actual deaths are lower than reported numbers will have stories of someone without COVID-19 symptoms who died in a motorcycle accident or similar. Some believe that hospitals cheat by listing COVID-19 as the cause of death so they get higher reimbursements.

There is no doubt that both under- and over-reporting of COVID-19 deaths happens. Theoretically, both could happen at a similar rate so over- and under-reporting cancel each other out, but this seems unlikely. So how can we get an idea how many more deaths are really caused by the coronavirus epidemic in the US?

The most straightforward way is to compare the number of people who died since the start of the epidemic to the number of people who died during the same time period in previous years. If COVID-19 has caused a large number of additional death, that should show up in the number of reported death. This approach, named "excess death analysis" is standard when trying to estimate the impact of epidemics; for example, it is generally used to estimate how many people die of influenza.

Before we get started, let's have a quick look at the number of COVID-19 deaths reported in the US as of today (August 27, 2020, 8:15 pm):

The CDC web site reports 178,998 COVID-19 deaths.
Worldometer.info reports 184,764 COVID-19 deaths.
Johns Hopkins University reports 180,527 COVID-19 deaths.

Exact numbers differ a bit depending on when and how data are collected, but we can say the reported number is close to 180,000.

Next, we can look at the data from the National Center for Health Statistics, were all states send reports of deaths to. The web page provides a download link for "National and State Estimates of Excess Deaths", so you can download a file in .csv format that you can import into Excel or OpenOffice.

The file contains state-by-state data for weekly reported and expected deaths since 2017, and totals for the entire US. There are three data sets in the table, and we'll look at each in turn.

The first data set we'll examine is the "unweighted, all causes" set. These are the numbers for the death reports that the CDC had received by the time the file was generated. For recent weeks, these numbers will be incomplete, since not all states and counties have reported their numbers yet; typically, it takes about 8 weeks until about 99% of the death certificates have been submitted. Therefore, this data set will give an underestimate of the number of excess deaths.

The latest data included are for the week that ended 8/15/2020. The "unweighted" (incomplete) data set reports an excess of 203,840 deaths for 2020. This number is significantly higher than the number of reported COVID-19 deaths. This is a clear indication that the reported COVID-19 deaths understate the actual number of deaths cause by the epidemic.

Since many recent death certificates are missing from the "unweighted" data set, we need to look at the other data sets. The next one we can look at is the "weighted, all causes" data set. For this set, the CDC has estimated how complete the submission of deaths certificates was for each week and jurisdiction, and adjusted the totals to account for missing reports. The "weighted" (predicted) data set reports an excess of 245,305 deaths for 2020.

The third data set in the file ("weighted, all causes excluding COVID-10") calculates excess deaths after subtracting reported COVID-19 deaths. This gives the number of excess deaths that are not classified as COVID-19 on the death certificate. This results in 82,049 excess deaths, in addition to 163,256 COVID-19 deaths. In other words, for every 2 reported COVID-19 deaths, there is another additional death that does not list COVID-19 as the cause of death.

One way of interpreting these results is that only about two thirds of COVID-19 linked deaths are reported. In other words, the actual death toll from the corona virus epidemic in the US is about 50% higher than reported in the official death counts.

There are a few different ways to calculate these numbers, but they all end up with pretty similar results: actual excess deaths are about 40-50% higher than reported deaths.

The final exercise is to provide a "best estimate" of the death toll as of today. The CDC spreadsheet only contains data until 8/15/2020; since then, and additional 10,536 deaths have been reported. Using the under-reporting factor of 50.3% described above means we expect more than 5,000 additional excess deaths, for a total of 261,136 excess deaths in the US linked to the COVID-19 epidemic.

The analysis is based on data submitted by the states to the CDC, a government organization that has been under the control of the current administration for the last 3 1/2 years. The data are publicly available, and anyone can download them and do their own analysis. But even the incomplete "unweighed" data, which does not include many deaths from the most recent weeks, clearly show: