Friday, June 5, 2020

Painting a Rosy Picture: Why Many COVID-19 Tests Fail

Some widely used COVID-19 tests can have very low sensitivity, missing half or more of infections. That's the conclusion from a new study published yesterday from researchers at Harvard and the Beth Israel Deaconess Medical Center in Boston, combined with information that companies have submitted to the FDA about their tests. Unfortunately, one of the least sensitive tests has become very popular in many states.

Let's start with a graph from the study:
Very sensitive tests, like the Abbott PCR M2000 test, will give positive results if the viral load is at least 100 genome copies per milliliter, and therefore detect about 85% of all infections. The other tests in the graph are less sensitive, and therefore detect fewer infections. In other words, they have a higher false-negative rate.

The curve above is based on the analysis of quantitative PCR test results for 4,774 patients with a positive COVID-19 test, which showed a very wide variation in viral loads:

Some patients had as few as 10 copies of the viral genome per ml, while others had 1 billion copies per ml. Between about 100 copies per ml and 100 million copies per ml, the distribution is quite even - the number of people in each category is about the same. This distribution is what creates the solid black line in the first graph.

There are many reasons why the number of virus particles can vary so drastically. One is the timing of the test: the viral load increases from initial infection to the onset of symptoms, and then usually starts to decrease. Researchers from countries with extensive contact tracing and sufficient testing capacities have published many studies that show tests that were initially negative, turned positive after a few days, and later reverted back to negative. Additional variations can come from how exactly the swab is done; where in the body the virus replicates most successfully; the number of virus particles that caused the initial infection; and differences in the innate and adaptive immune response between individuals. Individuals with the highest viral loads may be more likely to be "superspreaders" that can infect dozens of others, but even many of those with low viral loads are likely to contagious.

For the sensitivity figure above, the authors simply used the documentation that had been submitted to the FDA by the different companies. They point out that the description of the "limit of detection" is not standardized - some companies use the number of genomes per milliliter, others use TCID50, and so on. Neither is there a single way to determine the detection limit. Some companies start with swabs that they add a known amount of viral RNA to, and then go through the entire detection protocol, mimicking "real world" situations as closely as possible. But others add the known RNA sample much later in the process, after isolating the viral RNA. To really compare the claimed sensitivities, it is necessary to inspect the protocols closely.

TestUtah, TestNebraska, TestIowa, and Co-Diagnostics

In my previous post, I had talked about a Utah company that has won big testing contracts in Utah, Iowa, and Nebraska, where is has been criticized for what appears to be an extremely low rate of positive test results. The company uses tests from another Utah company, Co-Diagnostics, which stated in March that it can produce 50,000 COVID-19 tests per day. A month later, Co-Diagnostics announced a collaboration with the life sciences company Promega to produce more test kits.
In the documents submitted to the FDA, Co-Diagnostics claims a limit of detection of 4,290 copies per ml. Using the sensitivity curve above, this would result in a detection rate of about 60% - but a closer inspection of the document indicates that this is overly optimistic.

Co-Diagnostic describes that it used sputum samples for the sensitivity experiment. That is highly unusual, since sputum samples are rarely tested in the US, where nose or throat swabs are typically used. The genomic RNA used was added to only after the RNA purification step, directly before the PCR reaction. This avoided potential losses in the elution step, and potential degradation, which could have reduced the reported sensitivity further. But the bigger difference is elsewhere: when swabs are used, they are typically put into a test tube with 2-3 ml of saline or viral storage medium. The purification column used for the Co-Diagnostic kit, however, is designed for a volume of only 140 microliters. This effectively adds a 14- to 20-fold dilution step. For swab samples stored in the standard 2 ml of medium, therefore, the detection limit would be about 61,000 copies of viral RNA. According to the sensitivity graph above, this drops the detection rate to 50%. With accounting for losses during transport, storage, and RNA isolation, a false-negative rate of more than 50% is likely - which is exactly what was observed in Utah.

Abbott's ID NOW and "User Error"

Another test that has been shown to have a high percentage of false negative results is Abbott's ID Now COVID-19 test. One study showed false negative rates up to 45% when using diluted RNA samples. Abbott was quick to go on a counter-offensive and blame "user error" for high reported false negative rates, a statement that was repeated by Health Secretary Alex Azar. However, Abbott's own data showed false negative rates between 8.7% and 16.7% when compared to sensitive PCR assays, and concluded that higher false negative rates are linked to lower viral loads. The results are similar to two independent studies which found false negative rates of 12.3% and 26.1%, while more accurate PCR tests had false-negative rates between 1% and 5%. All studies agree that the Abbott test is less accurate at low viral loads; the highest sensitivities were seen in settings where viral loads were likely to be highest: in symptomatic patients relatively shortly after onset of symptoms.

Both the Co-Diagnostic and the Abbott tests show high false negative rates at low viral loads, which means they are not suited for "open" testing (like state-wide drive through testing without requiring COVID-19  symptoms) or testing done as part of contact tracking, since infected persons that do not (yet) show symptoms have lower viral loads, and are therefore much more likely to give false negative results.

Lessons from China and New York

Looking at the documentation test companies provided to the FDA, it is normal to see claims of 99-100% detection rates. In view of variations in viral load and even results from company-sponsored studies, such numbers are extremely unrealistic. A number of scientific studies from Asian countries report actual PCR detection rates around 60-80%. In China, symptomatic patients were routinely tested by PCR and chest CT scans, and positive chest CT scans were viewed as sufficient to diagnose COVID-19 even if the PCR results were negative.

New York City also provides clues about false-negative test results. During the height of the epidemic in NYC, COVID-19 testing capacity was insufficient, and testing was largely restricted to symptomatic patients admitted to hospitals. At the same time, overloaded hospitals meant that only patients with very severe symptoms were admitted; news papers reported that ambulances refused to transport patients to hospitals unless their blood oxygen levels were dangerously low. Nevertheless, the positive test rate in New York City never got much higher than 50-60%. Even the deaths numbers for NYC reflect that many who died from COVID-19 in NYC either did not get tested, or had negative test results: about 22% percent of COVID-19 deaths (4,727 of 21,782) did not have a positive PCR result. In all likelihood, this number is too low: death certificate analysis for NYC shows 24,480 excess deaths in NYC between 3/15/2020 and 5/23/2020 compared to last year. This  puts the actual rate of COVID-19 related deaths in NYC without a positive PCR result at 30%. Unfortunately, information about which exact tests are used are not published, but the timing of deaths in NYC means that most tests must have been done with tests that have higher sensitivity than the Abbott and Co-Diagnostics tests.

1 comment:

  1. I had a TestIowa test at about Day 40 of symptoms-negative. They swabbed each nostril for 10 seconds. No sputum.


Note: Only a member of this blog may post a comment.