COVID-19: those Frightening Graphs on Infection Rates and Mortality are not Exactly What They Seem To Be (Updated with reader comments)
Synopsis: the scary charts showing exponential rises in infections and mortality of COVID-19 speak only to those people tested and confirmed to have COVID-19. Those not tested are not included. Which makes the meaning of those graphs and statistics far more limited than it might seem, scary as they are.
I’m going to speak only of the USA because I don’t know what is going on elsewhere as far as statistically valid randomized testing of the populace (if that is done, anywhere).
Reader Staale A writes with a link to a useful web site OurWorldInData.org, which among other things does a great job of explaining the CFR (Case Fatality Rate) and IFR (Infection Fatality Rate), the latter being a total unknown, as I discuss further below. That page and pags like it should be required reading for every member of the media.
Unknown numbers making a misleading statistic
To obtain a mortality rate or similar statistic, we need to know two things: (1) the number of deaths (numerator), and (2) the total cohort (how many infected), as the denominator.
Mortality rate = deaths / totalNumberInfected
(this is the IFR, as explained at OurWorldInData.org)
What is the totalNumberInfected? No one knows, because we are not testing people who are not showing symptoms (asymptomatic). And due to limited testing kits and facilities, many if not most of those with mild symptoms have not been tested either. Furthermore, if you are tested today, a week from now you would need to be tested again to rule out infection with certainty! Thus anyone asymptomatic is not likely to be tested.
The official line is that most people have mild symptoms and recover without significant issues. Most of these people do not get tested (if you don’t feel sick, why would you?). So they are never included in the denominator of the statistic, which makes it invalid. And do the tests detect those have have been infected and have already recovered?
To be tested at this point, you must have symptoms and a doctor must order the test and a test must be available and you have to go get it done. And that assumes the test is reliable, with very low false positive and false negative results. Here in the USA, only a tiny fraction of the population has been tested (I’d guess it is at most 0.1%).
That means we don’t know how many people are or were infected (and recovered) and thus we don’t really know how bad COVID-19 is, except in the relatively small group of people who have tested positive, a tiny minority (in the US) at this point.
There is a potential silver lining here: if the actual (unknown) number of infected people greatly exceeds those who have tested positive, it might be that we are further along on our way to “herd immunity” than we think. For example, it could be (no one knows) that for every person testing positive, there are 2 or 3 or 0.738 other people who are infected but who have no symptoms or very mild symptoms, who will recover and gain immunity and not shed virus once recovered. There is a nasty flip side however: those people could unwittingly infect vulnerable people for a time, not knowing of their own infection. So everyone should assume they might be infected at any time, and behave accordingly.
- Some statistically valid random testing ought to be done every week or so in order to get a handle on the true status of COVID-19—infection rate and rate of growth. But to date that is just not feasible, even if the number of test kits were adequate.
- The charts showing rapidly rising infection rates may be more a reflection of the number of tests performed than of the actual number of infections! One month ago, very few tests were conducted (here in the US). Maybe there are 3X as many infected people as we think. The true number is just not known.
- As the number of tests increase, we can expect the infection count to rise rapidly. But that conflates the number of tests performed with the actual infection count and rate of growth. If we could test everyone in the country every week, then we’d know the actual number of infections and rate of growth, but that is impossible. Conversely, if we want to keep the known infection rate down, tests could be limited to 1000 per day (crazy and bad idea, but the point is that testing is expanding rapidly, so we must expect many more cases to be detected which before would not have been detected, which is precisely what is happening now).
- It is a certainty that there are far more infections than officially stated, since not everyone is tested and authorities have already stated that many people can be asymptomatic (and thus go untested).
- The actual mortality rate per infection is unknown. All we know is the mortality rate for those who tested positive, leaving out all infected persons who were never tested and recovered.
- Knowledge still seems to be limited on specifically why COVID-19 hits some people hard and others shrug it off. Age and existing conditions apply, but genetics, gender, diet, and other factors may be involved.
- Only when geographically widespread randomized testing is done (or many months pass with a large percentage of the population tested) can anything meaningful be said about how many people are or were infected (does the test detect those infected but recovered and no longer infected?).
Tony K writes:
Here in Alberta, the health authorities post the number of test performed and that is 2-3,000 per day. Fortunately the positive results are just in the hundreds as of yesterday.(542) And they have full understanding of the nature of the recent up tick of new cases.There was only 9, yesterday here in Edmonton. The lowest in two weeks. As before the increase of diagnosed cases were in the 10s,20s, and early this week it was a 50% increase in new cases in one day.
DIGLLOYD: if and only if the number of tests per day is the same and if and only if the number of positive results (CV19 infections) is 50% higher than that same number of tests performed in the prior time period* and the tests have a very low false positive and false negative rate and the testees were randomly sampled among the population... then that number would mean something of broader applicability. Why health authorities do not specify these essential facts is baffling—the data cannot be interpreted properly or used for intelligent public policy with such questions left unaddressed.
Numbers quoted like that without stating the key variables have no scientific, statistical or logical merit whatsoever with respect to the population as a whole, since the test subjects are self-selecting (showing signs of illness and therefore tested).
OTOH, I have little doubt that the disease is spreading fast.
* More precisely, what is the rate of positive results per person tested? (and that's assuming an error-free test).
Jonathan G writes:
Iceland is the only country thus far to do population testing, which as you pointed out is essential to getting this data (which is in turn absolutely essential to planning a response against an epidemic of any contagious disease). Without this data any response plan is based on imaginary epidemiology. Not great news for herd immunity unfortunately. It looks like the 1/20 or 1/10 numbers being speculated on were hopeful wishing. Still unknown is: 1) Do asymptomatic individuals spread the disease? Yes, no, or at a reduced rate? 2) Does catching the disease make you immune? A lot of people assume this is the case, but reality doesn't care about our assumptions. Individuals in Italy and China appear to have caught it again.
DIGLLOYD: Jonathan G’s statement of “population testing” does not tell me what that means: does that mean randomized sampling across geographic areas? If so, that is highly significant.
However, the official Icelandic site does not say anything about the testing approach (at least not that I can find). I even downloaded the data; it is a trivial spreadsheet with nothing detailing methodology at all. There are numerous problems with the data as presented:
- Failure to state whether random sampling was done (highly unlikely, almost certainly only sick people were tested).
- Failure to state whether the rates are on a per-person-tested basis (e.g., the number of infections found for every 1000 tests performed). For example, if 1000 people are tested and 150 test positive (15%), and a week later 5000 people are tested and 300 test positive (twice as many), that is very good news since the infection rate would be only 6%, way down from 15%.
- Failure to call out the fact that the infections are highly localized (clustered) and thus self-selecting samples with no known relevance to the population as a whole.
- In the infections by age graph, failure to relate the distribution of ages within the total population versus the ages of the infected persons. I can guarantee that there is a ZERO infection rate for those of age 130 to 150 years old, for example.
I can find no notes or footnotes or explanations of any kind as to these key details. I consider this extremely slipshod work/reporting.