Statistics of Medical Tests
Inspired by a recent video by 3Blue1Brown, I want to talk about the mathematics of probability, particularly in relation to medical tests. If you have seen the video in question, feel free to skip this posts. If you prefer Grant Sanderson’s visual presentation over my writing, go over to his YouTube channel and give the original video a watch: The medical test paradox: Can redesigning Bayes rule help?. In this post I’m going to explain in my own words the same concept, and provide some thoughts on its real-life application.
Imagine that there is a disease which affects 1% of the population. There is a test out there to check whether or not someone is afflicted with the disease, which has a 90% accuracy. This means that, of the people who have the disease, 90% of them will correctly receive positive results (they’re sick and the test says they are sick) and 10% of them will incorrectly receive negative results (they’re sick but the test says they are not sick). Of the people who don’t have the disease, 90% will receive correctly negative results (they’re not sick and the test says they are not sick), and 10% will receive incorrectly positive results (they are not sick but the test says they are sick). We decide to conduct the test on the entirety of the population, regardless of things like symptoms, previous history with the disease, etc. Let’s say that you’re one of the people in said population, and you receive a positive test. Given the numbers above, what is the probability that you’re sick?
The gut reaction might be to say 90%. After all, the test is 90% accurate, so if my test came back positive, then there’s 90% chance it’s correct and 10% chance it’s false. But let’s approach it mathematically, first. Let’s say that we have a population of 1,000. With 1% affected, that means 10 people have the disease and 990 do not. With a 90% accuracy, 9 sick people will receive positive results and 1 sick person will receive a false negative result. Of the healthy people, 891 (90% of 990) will receive negative results, and 99 people will receive false positive results. In total, 108 people received a positive result, but only 9 of them are actually sick. So in fact, their chance of being sick is 9 in 108, or around 8.3%. That may sound disappointingly small, but it is a significant improvement over the 1% probability we started off with, isn’t it? Doing the same calculations will also tell you that, with a negative test result, your chances drop from 1% to just 0,1%.
The mathematics done to achieve these numbers is as follows. We start with three variables:
A // Population Size
B // Prevalence of the disease in said population
C // Effectiveness of the test In the case above, A is 1,000, B is 1% or 0.01 and C is 90% or 0.9. Then, we have the following equations: A * B * C
(A-B*A)*(1-C) The first one is the 90% of the 1% of the population, or the true positives. The second equation is the 10% of the remaining 99% of the population, or the true negatives. You add the two numbers together, and use it to divide the first one. In the example above, that means doing the following: A * B * C = 1000*0.1*0.9 = 9
(A-B*A)*(1-C) = (1000 – 0.01*1000)*(1-0.9) = 990*0.1 = 99
99 + 9 = 108
9/108 = 1/12 ≈ 8.3% If you wanted to instead calculate the probability given a negative test result, the solution would look like this: 1000*0.99*0.9 = 891
(1000 – 0.99*1000)*(1-0.9) = 10*0.1 = 1
891 + 1 = 892
891/892 = 99.9% That’s the chance of getting the correct diagnosis, so the chance of an improper diagnosis is 1 – 0.999 = 0.001 = 0.1%.
A good way to think about it is through geometry. We can imagine the entire population as a blue square, and the sick 1% as a red section of said square. Asking what the probability of being sick is is the same as asking what percentage of the blue square is red.
The above image, dimensions 500x200 pixels, represents our population of 1000 people. Every 10x10 area is a single person. The red area, the size 10x100 pixels is 1% of the total area, just like 10 people are 1% in a population of 1000.
The area inside the green square represents all the people who tested positive. It includes 90% of the sick people and 10% of the healthy people. Zooming into it, you will see that now red pixels make up 1/12th of the area, precisely as we calculated. The reason I talk about a geometric approach is that it makes repeated tests simpler to visualise. Let’s say that, after doing one test that turned out positive, you do another test to validate the results of the previous one.
Again, 90% of the red area was covered, as well as 10% of the blue area. The new area is now just 1810 pixels (or about 18,1 people), of which 830 pixels are red, or about 46%. So with just two tests, we increased our probability from 1% to 8,3% and then to 46%. Especially interesting is what’s left over. The remaining 8990 pixels are the people who, after getting a positive test, got a negative result on the second try. Calculating the area, it turns out that red space is just 0,89% of it. In other words, despite initially getting a positive result, their chances now are below that of the average population.
Now, some of the things I’ve said above are inaccurate. For example, I said that a false test “says they are not sick”, despite this whole post talking about how that’s an inaccurate statement. I wonder if you noticed that. Either way, another area of interest is how exactly the results differ as the variables change. The population actually makes no difference. In the examples above, a 10x10 pixel area denoted one person, but it could just as well denote a million people, and the percentage solutions would’ve been the same. Whether the population is 1,000 or 1,000,000, the percentages remain the same, it's just that 10% of 1,000 is 100, while 10% of 1,000,000 is 100,000. Playing around with the other two variables we can see that, the more prevalent a disease is, the less valuable the test becomes. At 1% disease rate, a 90% accurate test with a positive results increases the chances to 8.3%, over eightfold. At 10% disease rate, the same test will give you 50% chance. The absolute value is much larger, but relatively to the previous chances, the test increased them only fivefold. At 50% disease rate, your new chance will be 90%, so not even a twofold increase. Below, the graph shows the relation between disease rate, x, and the resulting probability of being sick, y.
If instead we keep the disease rate constant and change the accuracy we’ll see, among other things, that a test that’s accurate only 50% of the time is essentially worthless, as no matter how many tests you do, the proportion of sick people to healthy people remains constant. The value of the test increases as its accuracy increases… which is rather obvious, but the increase is not linear, so at very high values, even a slight improvement in accuracy leads to much better value. At 1% disease rate, an 80% accurate test increases these chances to 3.9%, while an 81% test goes to 4.1%. Meanwhile, a 98% accurate test increases chances to 33%, while a 99% accurate test gives a whopping 50% chance. Here’s a graph showing how it changes (x is the test accuracy, y is the resulting probability of actually being sick):
The real life, sadly, is much more complicated than a simple mathematical equation. For starters, let’s look at the test accuracy. I’ve simplified it to a simple number, which assumed that the false negative rate and the false positive rate are the same. In practice, there’s no reason for that to be the case. A test could have, for example 80% accuracy on sick patients and 95% accuracy on healthy patients. Furthermore, individual variation could come into play, with tests reacting weirdly to certain patients. How a test is conducted, what kind of test it is, it all influences these numbers. The disease rate is just as problematic. It assumes that everyone has an equal chance, which is obviously not the case. An immunocompromised person is much more likely to contract the disease than one with a fully-functional immune system. Once the disease is in the system, an 80-year old is much more likely to die from it than a 20-year old. Precautions taken are important as well, and most of all, symptoms. A person with a runny nose and a cough is way more likely to have the flu than a person with neither of these.
Nothing I said above is revolutionary. Nearly all of it is common knowledge. The trick is how to apply that knowledge to the mathematical formulas. If the average rate of a certain disease across the entire population is 1%, how much does it change when someone shows a symptom or two? If we are talking about a heart disease, how much do previously existing heart conditions influence it? How big of a role does diet play? Are there significant variations due to genes? At the time of writing this, Germany has had 1,8 million confirmed cases of Covid-19 among a population of 83 million. Put into this very simplified formula, that’s a disease rate of 1,8%, with the real percentage likely to be higher due to delays in reporting, and a positive result on a hypothetical test for Covid with, say, 80% accuracy, would increase their chances to 6.8%. Is that enough to send them to a hospital, or force them to self-isolate? How do these chances look like among older people, or among smokers? How effective are masks, hand hygiene, etc.? These are all very difficult questions, as they require extensive studies and expertise, and the people who are forced to make decisions based on them have to shoulder the burden of many deaths no matter which path they choose.