Testing, testing ... coronavirus and a dangerous drug called Fiction
A lot of people are being tested for the novel coronavirus that causes COVID-19. (Not enough in my neck of the planet, thanks to our communicable-disease-in-chief, but that's another story.) But something is missing from the endless commentary on the crisis: good information - or any, as far as I can tell, on what we know about the accuracy of the tests. So far, in fact, I have not even seen the usual kind of totally misleading claim to the effect that the tests are "95% accurate!" - or "90%!" or whatever. "Dogs can detect lung cancer by smell alone in 87.3% of cases." "New AI can tell which mugshots are of gay people with 91.4% accuracy." You've read the stories.
The base rate fallacy is really really important. I like to introduce it to my students with this story:
(1) There's a dangerous new drug going around called Fiction. Kids who take this stuff go glassy-eyed, don't sleep enough, even forget to eat meals sometimes, and spend hours and hours in a chair staring into a book only to start muttering to themselves, or jabbering to friends and family, about people, places and events that aren't even real. Scary - and research indicates that about 5% of the population has been sucked in by this evil stuff.
(2) I'm the Admissions Officer for Ridiculously Expensive University. I'm all about reality-based education, and I'm putting my foot down this year: any little schmuck who's been doing Fiction in the High School janitor's closet, their application goes straight out the window.
(3) Luckily I've developed a test that will tell me whether you've been snorting J.K. Rowling's product, or whatever. (She's one of the world's most successful pushers of Fiction, apparently, and has already ruined the lives of millions.)
(4) You are one of 1,000 applicants to REU this year; I've tested every last applicant - and your test came back positive!!!
(5) Now, let's say we've established independently (through studies involving known Jane Austen junkies currently incarcerated) that the test is 90% accurate. These are people who have been caught red-handed with a flashlight, a lace bonnet, and a copy of Pride and Prejudice; it correctly identifies them nine times out of ten.
(6) So: What's the probability that you, with the positive test, are in fact a user?
Doctors have been presented with similar cases, and have proved - just like most people - hopelessly unable to get the answer right or even to show roughly what steps to take to arrive at the right answer. Common responses to the probability question: "Er, nine out of ten, obviously!" "Er, about fifty-fifty?" "Er .. fairly high?"
The right answer - and especially the way to it - is worth noting; it will help clarify things when you next come across one of those stories with a lot of exclamation marks about AI's ability to identify the mugshots of gay dogs, or whatever.
The 1,000 applicants fall into four groups:
(1) About 50 (5% of 1,000) were using Fiction. Of them, 45 (90%) will be fingered by the test. That's 45 true positives ...
(2) ... but the other 5 out of that 50 - who also took Fiction - will be missed by the test: these are the 5 false negatives.
(3) 950 applicants (95% of 1,000) never indulged in this terrible habit. And the test ("90% accurate!!") will say so for 855 of them: the true negatives.
(4) But the remaining 10% of the 950, a group of 95 innocents, will get a false positive.
(5) The probability that your test actually tells us what it looks like it tells us ("Take a hike, you miserable plot-head!") is simply the probability that any positive is a true positive. Which is the true positives divided by the total of all positives, or TP / (TP+FP). Which is 45/(45+ 95). Which is 45/140.
Which is not quite 33%.
How many people really have COVID-19 and are walking around tested and free? How many are being quarantined even though they don't have the disease they tested positive for? No idea! And the sad thing is, I can't find a single journalist even asking this question.