This is meant as a comment on how statistics can be very easily moulded to reinforce any point that we want to make. That’s because statistics are a tool and as such you can use them for whatever purpose you want, be it good or evil, to educate or mislead. But the analogy goes even further because, like any other tool, a user who is not familiar with it can end up using it wrongly and produce unintended, and perhaps detrimental, effects.
In the case of statistics, this happens disturbingly often when they are simply given to journalists without a clear explanation of what they mean.
And now, to the point we’re trying to make: over the past weeks (months already!) we’ve been hearing every day about the latest figures for infections and deaths. The latter is just over 305,000 at the time of writing, Friday 15 May 2020. But there are some things that most people don’t understand about how extremely unreliable these figures are:
- This is a completely new disease so there are no international standards whatsoever on how to deal with this, even the diagnostic tests are still being developed, and every professional involved is having a hard enough time trying to deal with the tsunami on a day by day basis. So forget taking the time for keeping and checking an accurate tally of positives, recoveries and deaths.
- As a result, the figures reported by public agencies vary enormously, since each one will be following whatever guidelines and resources they have available. One example, some countries like the UK were only counting COVID deaths when patients had tested positive and died in hospital, thus leaving out all those in care homes, a very significant percentage. As a result, the UK appeared to have a far lower mortality rate than other countries which were counting the deaths of any COVID positive patient even if diagnosed over the phone, like Belgium.
- Of course, this difference in reporting can be down to countries following different guidelines, but also to a deliberate effort to downplay the number of cases and victims. Knowing how some governments control their population through media censorship and propaganda, the reliability of their reported figures should be taken with a big dose of scepticism. In fact, since statistics are called that way because they come from the state, here’s a map of the countries where they should be called sceptistics instead.
- Even with the best intentions, these figures depend on the sampling on which they are based. Unless every person in a country has been tested, we can’t know the exact number of infections. The next best thing is testing as many people as possible, because the more people we test, the more accurately we can model the spread. But even here there are pitfalls; it is not the same to test 100 people who come into a hospital as 100 individuals at random. And of course, some places are having their scarce resources completely swamped by the situation, to the point that they are having to use cardboard coffins and dig mass graves, so widespread testing is the last of their concerns.
All of this leaves us with the question: if each country is testing at a different rate, interpreting results differently and reporting figures with different levels of transparency, how can we ever know the truth?!?
The only way to avoid being told a lie, be it deliberate or involuntary, is not to ask. Instead, we must go look for ourselves. The most reliable method (which is not to say 100% accurate) to understand the impact of the pandemic is to look at the figure for the average number of deaths in previous years and compare it to this year. Luckily, some smart journalists have done this (BBC, New York Times, The Economist, among others) and the picture that emerges is a bit disheartening. For example, let’s look at the difference between deaths above the average figures and those caused by COVID in Europe, one of the regions in the world with the best combination of good healthcare systems and reliable data.
Each source considers slightly different time periods, but let’s assume that all the excess deaths are related to COVID, either directly due to the disease or indirectly to lack of healthcare resources (understanding that any assumption will be inaccurate but not completely off the mark). In that case, it seems that, except for the Swedish (who else?) and the Belgians (didn’t see that one coming, but I suspect there’s also an issue with how they count positive cases) the rest of Europe is not doing a very good job of measuring the true impact of COVID.
What this means is that, if the rest of the world were doing the same job as Europe, the official figure of deaths due to COVID should be half again higher. So those 305,000 deaths mentioned earlier, would actually be over 457,000.
But again, this is what we are seeing this Europe, where everyone can go to hospital if they need to and there is transparency in the figures reported, so we can safely assume that the picture for the rest of the world is even worse. In fact, the same sources also tell us this:
In this scenario, those 305,000 deaths would be just one tenth of the real figure and we would have over three million victims of COVID.
This is what we see:
But the reality is somewhere between these extremes:
Why is this important? Because as simple human beings we like to have a clear number to which we can anchor ourselves. Conversations, opinions and important decisions are based on official figures. But, as we have seen, the ONLY thing we know for certain about those official figures is that they are NOT real!