In Part 1 of this series I provided an overview of the novel coronavirus, coming at us at an exponential speed, with the how and why healthcare systems are being overwhelmed.
Here in Part 2 we’ll look at how advanced analytics and artificial intelligence is being used to augment current efforts to track and prevent further infection. But there is a lot of AI hype, too.
Part 3 will be devoted to the politics of pandemics, how politicians are belatedly realizing that, as health systems buckle and deaths mount, they will have to weather the storm. We examine how America, despite its wealth and the excellence of its medical science, has squandered its chance to prepare for the pandemic, and how China’s president, Xi Jinping, celebrated a precipitous fall in cases with a victory lap in Wuhan.
In Part 4 the focus will be on our “socioeconomic design” principles (companies do profits; people do themselves). The rapid spread of COVID-19 imposed massive disruption on international travel networks, integrated economies and globalized supply chains. It highlights the risks that accompany the benefits we all enjoy from the free movement and connectivity we take for granted today, as well as the extraordinary degree to which Western countries now depend on China for critical commodities. And irony. The U.S. imports most of its surgical masks and more than 90 per cent of its pharmaceuticals from China.
What systemic risks will be magnified? Will early warning systems be changed? Early warning systems can be technological, such as systems that health systems already use to share anomalous illnesses globally. And what effect on supply chains, and scientific and medical research? Nicolas Granatino and Julia Belluz (among others) have begun analysing all of this.
And Part 5, conclusions and the long view. Taking a tip from media mate Patrick Wyman who creates marvellous history podcasts, I’ve been thinking about the end of the Roman Empire and what it must have felt like to live through that, in all its various manifestations – the collapse of political authority, spreading pandemics, economic crises. Today’s societal meltdown … on-so-many-levels … makes a lot more sense. I’ll include some bits and pieces from Exponential View, the brilliant newsletter published each weekend (with mid-week supplements) by Azeem Azhar, one of those chaps who bridges the gap between two cultures – that of technology on one end, and humanities on the other – to provide a holistic understanding of our near future.
15 March 2020 (Sliema, Malta) – I find myself in the enviable position … if that is the correct term … as COVID-19 rages all around me. I split my time between homes in Greece and Malta, plus Brussels where my wife works, and Rome (where my video crew/production facilities are based) so I have been receiving coronavirus feedback from multiple sources across Europe. Add to that a lot of time to read just about everything I can and to call multiple sources and voila … I can consolidate a lot of information into what I hope will be a concise and helpful look at this pandemic.
In today’s piece I will attempt to provide some perspective on the data science being employed to help us wrangle with this horror. But to do this correctly, you need to understand the data, and you need to understand the medical science. As I noted in Part 1 of this series, in China they only began to understand the exponential speed of the spread of the virus when they looked at the numbers in retrospect. The testing had been so sparse that none of the available numbers meant much, at least to the point they were unreliable.
Note: this is happening in the U.S. There are interpretations of the available data that draw a much milder picture of the spread of the virus, but these can hardly be convincing yet. The testing has been minimal. This is why the numbers in Italy spiked. As they widened testing, they realized they had completely underestimated the extent of infections.
In the UK, a bigger issue: the government is not sharing the evidence, data, and models on which it is basing its “COVID-19 solution” policies. Without complete openness the public’s trust will be lost, solutions questioned.
Those of us who are data junkies know this: unless you understand the data, understand its reliability and nonreliabiity, then your “data science” will be meaningless. And applied to the whole pandemic situation, you need to retrofit the conditions to the numbers. Why? As Joshua Stern explained to me:
The causal model that works best has the disease spreading weeks or months earlier than any of the current timelines, and what appears to be disease spread is really data discovery. There may be thousands of deaths before January that were actually COVID-19, but nobody knew that was even a category. So the exponential growth is a growth in discovery, and the disease growth, while perhaps still exponential, is a smaller exponential. But there are half a dozen issues about the testing as well, which uses different tests, different standards, different reporting, etc. What is needed is inference to the best explanation, not just a blind data reduction. And much more testing, 100x more testing. And treatments, and vaccines.
Joshua is a “Linkedin buddy”, part of my coterie of friends and advisers who I have consulted with on this series. Over the past twenty years, Joshua has split his time between IT and AI, working in IT and separately pursuing foundational work in AI. He is a fascinating chap who has worked mostly on Microsoft platforms including Azure, SQL Server maximum performance and critical enterprise systems, and we have discussed numerous topics based on his own research on the conceptual and foundational aspects of reason, knowledge, and intelligence. The philosophical term Computational Theory of Mind (CTM), an aspect of philosophy of mind, probably captures it best. You’ll meet more of my coterie in this series.
And to do the data science properly, or at least to understand the data science you’ll read about, you need to know the real science, in this case the epidemiology which is the branch of medicine which deals with the incidence, distribution, and possible control of diseases and other factors relating to health. If you do, it makes the data science easier to understand. So before we get into the Big Data analytics and AI applications, I’ll run through a few key medical points.
But first, even though the politics of pandemics is reserved for Part 3, a few words about The Orange Menace.
Donald Trump observes the coronavirus pandemic
The U.S. is a sociology experiment gone horribly wrong. I still have difficulty talking about and writing about this stuff because although I am no longer a U.S. citizen I find myself irrevocably tangled in America’s hopes, arrogance, and despair.
In the United States, the pandemic has devolved into a kind of grotesque caricature of American federalism. The private sector has taken on quasi-state functions at a time when the executive branch of government – drained of scientific expertise, starved of moral vision – has taken on the qualities of a failed state. In a country where many individuals, companies, institutions, and local governments are making hard decisions for the good of the nation, the most important actor of them all – the Trump administration – has been a shambolic bonanza of incompetence.
As I watched The Donald’s various pronouncements these past few days it merely further confirmed what I have written about the last few years: the virus infecting U.S. politics … a decades-long drift into entertainment and triviality … has finally collided in real time with the actual virus that’s paralyzing American communities. The question is whether both might soon burn themselves out.
Watching his last press conference was painful. As Zeynep Tufekci noted on her blog this weekend it felt like
watching the waves recede on a crowded beach, with the tsunami warning blaring but with the resort management that wants to keep selling kitschy souvenirs playing loud music over it. It’s the worst I felt for the U.S. in months of following this.
The biggest negative is that the U.S. is late, and on its own. That is an unnecessary tragedy. I will delve into the politics in more detail in Part 3 but I was simply disgusted by the theatre. Fox News and Trump’s other enablers kept the same mantra: for two days it was “It’s a Democratic hoax, don’t fall for the hysteria” and then they switched to “The Democrats aren’t doing enough to avoid disaster!” Pure propaganda, and always dangerous.
Epidemiology is trying to outrun the wave. Every late day means you have to run even faster – much faster. Don’t self-soothe yourself. Yes, you can mitigate the situation (more on how below) and that effort must be massive. But you need to realistically look at the scenarios because bending this curve requires so, so much more.
And, no. The U.S. cannot do what the Chinese did, continue to do. As I noted in Part 1 of this series, the Chinese government essentially used a social nuclear weapon in its efforts. China has a collectivist culture and an authoritarian government, so its success fighting COVID-19, while deeply impressive, will not be easy to reproduce elsewhere. Culturally the U.S. cannot do that. But to see how coronavirus testing works in a country that takes the problem seriously watch this. It’s 6 minutes long but worth your time:
A good place for a shoutout: I am indebted to my videographer, Marco Vallini, who has done yeoman work in assembling/editing/uploading the video clips and other graphics I am using for this series. And to the team at Valossa. They have an incredible full stack AI solution designed for anybody working with video. It has allowed me to search 1000s of hours of video across Twitter, Youtube, and numerous media channels/sites to find the specific clips I want/need. I have been working with them for 2+ years. For my piece on the magic Valossa performs, click here. The most common visual representation of the COVID-19 threat, in video or still image, is a masked figure in a banal setting. Masked on a bus, in a restaurant, at the shops, in the park. Yes, bland normality. The first instinct of the amateur semiotician is to focus on the meaning of the mask. But I wanted to get beyond that and find a mix of (sometimes) wacky and informative clips to press my points. That requires a 5 star search capability.
Now, onto some science.
Epidemiology is the study and analysis of the distribution, patterns and determinants of health and disease conditions in defined populations. It is a cornerstone of public health, and shapes policy decisions and evidence-based practice by identifying risk factors for disease and targets for preventive healthcare. It is epidemiologists who determined the epidemiology of 2019 novel coronavirus disease (COVID-19) in a remote region of China.
You have all seen the phrase “novel coronavirus” which means a new coronavirus strain that has not been previously found in people. Coronaviruses are a large family of viruses that can infect humans or animals. Sometimes an animal coronavirus can change so that it can infect people and become a human coronavirus. There are seven known types of human coronaviruses. I will not delineate all seven but two of them you have certainly read about:
• Severe Acute Respiratory Syndrome (SARS) is a type of coronavirus infection discovered in China in 2002. The virus that causes SARS quickly spread to more than two dozen countries in North America, South America, Europe and Asia before it was controlled. During the 2002-2003 outbreak, nearly 8,100 people became infected. In the United States, eight people with laboratory-confirmed SARS infection were identified and they had traveled to areas where the virus was spreading. Since 2004, no cases of SARS have been reported in the world.
• Another type of coronavirus infection is Middle East respiratory syndrome (MERS). Since it was discovered in 2012, nearly 2,500 people with MERS have been identified. All these cases have been linked to travel to or residence in and near the Arabian Peninsula. Countries in or near the Arabian Peninsula include Bahrain, Iraq, Iran, Israel, the West Bank and Gaza, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, Syria, the United Arab Emirates, and Yemen. Two people in the United States have had MERS and both traveled to Saudi Arabia where they likely became infected.
I had the opportunity to speak with several epidemiologists at Johns Hopkins University in the U.S. Their Department of Epidemiology is the oldest, and among the largest, in the world. Their website has become the “go to” site for current, accurate information on COVID-19.
I wanted a simple explanation of what is going on to satisfy my multi-level reader base so here is a mash-up of several conversations:
-Any infectious disease has a basic reproduction number: the number of people that one infected person will go on to infect, absent any immunity (known as R0)
-Let’s say that number is 3. So if I have the disease, I’ll infect 3 other people on average.
-But if 2 out of 3 of those people become immune, I can’t infect them. So where once I’d have infected 3 people, now I only infect 1. The epidemic can’t grow. And if you can get R0 below 1, so each infected person infects *less* than one person, the disease eventually fizzles out. Take measles as an example. We started to get measles outbreaks because idiotic anti-vaxxers tipped the balance so that R0 rose above 1 and each infected person started infecting more people.
-Now you can get to the magic tipping point by creating herd immunity [note to my readers: I explain that below], either through vaccinating or letting people catch the disease.
-Or you can do it another way. You can take preventative measures (social distancing, hand washing) so I as the carrier simply don’t infect 2 of the 3 people I’d otherwise have infected. This is maybe better thought of as herd protection, rather than immunity.
-The trouble with the herd protection, rather than immunity, is that it only lasts as long as the controls are in place. Once they’re lifted, you’re vulnerable to another epidemic. Longer term, therefore, achieving immunity is arguably a better solution.
-That, at least, is a purely mathematical argument. It doesn’t make value judgements about whether we should accept that some people will die. But there it is.
“Herd immunity” and “flattening the curve”
Most of you have been following developments in the UK. Boris Johnson and his senior science advisors gave a presentation on Britain’s approach to tackling coronavirus. It was controversial. Rather than a sharp whack to stop the spread of the disease, the idea is to let it roll through the population. Anthony Costello, the former director of WHO, raises a series of important challenges on that approach. And can we flatten the curve enough or it is a deadly delusion?
Note: I am indebted to Azeem Azhar, writer of the weekly newsletter “Exponential View”, for the two links in the paragraph above. Azeem is really on top of this. He posted those two links in today’s newsletter. I cannot recommend this enough: subscribe to his newsletter. He even offers you the option to just give it a free look to decide. Just click here.
To those of you who receive my science newsletter, you’ll realise that the use of “herd immunity” is different – fluid rather than static – from what is more usually understood/used. Please read the two links above for a through understanding. But if you want a quick scan of what the UK is doing here is a clip from the blog of Professor Ian Donald. He runs social and environmental research at the University of Liverpool, specializing in behavioural factors in anti-microbial resistance:
-The UK government strategy on the Coronavirus is more refined than those used in other countries and potentially very effective. But it is also riskier and based on a number of assumptions. They need to be correct, and the measures they introduce need to work when they are supposed to.
-This all assumes I’m correct in what I think the government is doing and why. I could be wrong – and wouldn’t be surprised. But it looks to me like a UK starting assumption is that a high number of the population will inevitably get infected whatever is done – up to 80%. As you can’t stop it, so it is best to manage it. There are limited health resources so the aim is to manage the flow of the seriously ill to these.
-The Italian model the aims to stop infection. The UKs wants infection BUT of particular categories of people. The aim of the UK is to have as many lower risk people infected as possible. Immune people cannot infect others; the more there are the lower the risk of infection
-That’s herd immunity. Based on this idea, at the moment the govt wants people to get infected, up until hospitals begin to reach capacity. At that they want to reduce, but not stop infection rate. Ideally they balance it so the numbers entering hospital = the number leaving.
-That balance is the big risk. All the time people are being treated, other mildly ill people are recovering and the population grows a higher percent of immune people who can’t infect. They can also return to work and keep things going normally – and go to the pubs.
-The risk is being able to accurately manage infection flow relative to health case resources. Data on infection rates needs to be accurate, the measures they introduce need to work and at the time they want them to and to the degree they want, or the system is overwhelmed.
-Schools: Kids generally won’t get very ill, so the govt can use them as a tool to infect others when you want to increase infection. When you need to slow infection, that tap can be turned off – at that point they close the schools. Politically risky for them to say this.
-The same for large scale events – stop them when you want to slow infection rates; turn another tap off. This means schools etc are closed for a shorter period and disruption generally is therefore for a shorter period, AND with a growing immune population. This is sustainable.
-After a while most of the population is immune, the seriously ill have all received treatment and the country is resistant. The more vulnerable are then less at risk. This is the end state the govt is aiming for and could achieve.
-BUT a key issue during this process is protection of those for whom the virus is fatal. It’s not clear the full measures there are to protect those people. It assumes they can measure infection, that their behavioural expectations are met – people do what they think they will.
-The Italian (and others) strategy is to stop as much infection as possible – or all infection. This is appealing, but then what? The restrictions are not sustainable for months. So the will need to be relaxed. But that will lead to reemergence of infections.
-Then rates will then start to climb again. So they will have to reintroduce the restrictions each time infection rates rise. That is not a sustainable model and takes much longer to achieve the goal of a largely immune population with low risk of infection of the vulnerable.
-As the government tries to achieve equilibrium between hospitalisations and infections, more interventions will appear. It’s perhaps why there are at the moment few public information films on staying at home. They are treading a tight path, but possibly a sensible one.
-This is probably the best strategy, but they should explain it more clearly. It relies on a lot of assumptions, so it would be good to know what they are – especially behavioural. Most encouraging, it’s way too clever for Boris Johnson to have had any role in developing.
Francois Balloux is a computational/system biologist at the UCL Genetics Institute in London, working on infectious diseases and having spent five years in world class “pandemic response modelling”. He summarized what he believes he knows … and doesn’t know. A summary of a recent post:
-After having spent considerable time thinking how to mitigate and manage this pandemic, and analysing the available data. I failed to identify the best course of action. Even worse, I’m not sure there is such a thing as an acceptable solution to the problem we are facing.
-I believe that the covid-19 pandemic is the most serious global public health threat humanity faced since the 1918/19 influenza pandemic. There are major differences between the two events but I suspect there will also be similarities that may emerge once we look back.
-The most plausible scenario to me is for the covid-19 pandemic to wane in the late spring (in the Northern hemisphere), and come back as a second wave in the winter, which I expect could be even worse than what we’re facing now. The graphic below is what happened in 1918/19:
-Predictions from any model are only as good as the data that parametrised it. There are two major unknowns at this stage. (1) We don’t know to what extent covid-19 transmission will be seasonal. (2) We don’t know if covid-19 infection induces long-lasting immunity.
-Seasonality is difficult to predict without time-series. Comparison between regions for the covid-19 pandemic suggests some seasonality, but likely less than for influenza. This would be roughly in line with other Coronaviridae (common cold and MERS).
-How long immunity lasts for following covid-19 infection is the biggest unknown. Comparison with other Coronaviridae suggests it may be relatively short-lived (i.e. months). If this were to be confirmed, it would add to the challenge of managing the pandemic.
-Short-lived immunisation would defeat both ‘flattening the curve’ and ‘herd immunity’ approaches. Devising an effective strategy would be even more challenging under low seasonal forcing. It would also considerably complicate effective vaccination campaigns.
-The covid-19 pandemic is an extremely challenging problem and there are still many unknowns. There is no simple fix, and poorly thought-out interventions could make the situation even worse, massively so.
-The covid-19 pandemic is not just an epidemiological problem. It is a ‘Global Health’ problem, that can only be tackled with an integrated and global approach. For example, there is no such thing as a choice between managing the pandemic vs. protecting the economy.
-Health and the economy are closely linked. The correlation between per-capita GDP and health (life expectancy) is essentially perfect. If the covid-19 pandemic leads to a global economy collapse, many more lives will be lost than covid-19 would ever be able to claim:
There are, of course, arguments being made that governments should be testing a number of randomly selected people each day – say 0.01% of the population – and then you could extrapolate how many people are infected, proportion showing symptoms, groups most at risk, rate of spread, etc. Without this we’re in the dark. But we are then talking opportunity costs. This was suggested in Australia but it came out to 30,000 tests a day. The cost and logistics made it impossible.
Using AI to predict and track and “solve” coronavirus … beware the hype
It was an AI system that first saw COVID-19 coming, or so the story goes. On 30 December, an artificial-intelligence company called BlueDot, which uses machine learning to monitor outbreaks of infectious diseases around the world, alerted clients — including various governments, hospitals, and businesses — to an unusual bump in pneumonia cases in Wuhan, China. It would be another nine days before the World Health Organization officially flagged what we’ve all come to know as Covid-19.
BlueDot wasn’t alone. An automated service called HealthMap at Boston Children’s Hospital also caught those first signs. As did a model run by Metabiota, based in San Francisco. That AI could spot an outbreak on the other side of the world is pretty amazing, and early warnings save lives.
Companies like BlueDot and Metabiota use a range of natural-language processing (NLP) algorithms to monitor news outlets and official health-care reports in different languages around the world, flagging whether they mention high-priority diseases, such as coronavirus, or more endemic ones, such as HIV or tuberculosis. Their predictive tools can also draw on air-travel data to assess the risk that transit hubs might see infected people either arriving or departing.
The results are reasonably accurate. For example, Metabiota’s latest public report, on February 25, predicted that on March 3 there would be 127,000 cumulative cases worldwide. It overshot by around 30,000, but Mark Gallivan, the firm’s director of data science, says this is still well within the margin of error. It also listed the countries most likely to report new cases, including China, Italy, Iran, and the US. Again: not bad.
Yes, while governments across the globe are working in collaboration with local authorities and health-care providers to track, respond to and prevent the spread of disease caused by the coronavirus, health experts are turning to advanced analytics and artificial intelligence to augment current efforts to prevent further infection. There has been a flood of articles on these various efforts.
In the U.S. data and analytics have proved to be useful in combating the spread of disease, and the federal government has access to ample data on the U.S. population’s health and travel as well as the migration of both domestic and wild animals — all of which can be useful in tracking and predicting disease trajectory. Machine learning’s ability to consider large amounts of data and offer insights can lead to deeper knowledge about diseases and enable U.S. health and government officials to make better decisions throughout the entire evolution of an outbreak.
It is in detection that AI has seen its best use. When previously unknown viruses make the jump to humans, time becomes a precious resource. The quicker a disease outbreak is detected, the sooner action can be taken to stop the spread and effectively treat the infected population. In the U.S., the National Biosurveillance Integration Center in the Department of Homeland Security uses ML to mine social media data for indications of unusual flu symptoms. Thew team also examined near-real-time emergency medical services and ambulance data, using ML to look for anomalies in the medical notes as patients were admitted to hospitals. In these instances, AI provides not only better detection of an abnormal disease event, but was able to do it faster — weeks before traditional disease reporting would indicate a spike in disease.
How effective it will be with COVID-19 is an open question. As I noted at the start of this post, so much data at this point is sketchy (nonexistent) or unreliable.
But a few bright spots:
-According to my contacts at Johns Hopkins (noted above) data from chest X-rays of coronavirus patients can serve as input for AI models so physicians can make faster diagnoses. Regarding new treatments, creating vaccines for newly discovered viruses is a difficult and time-consuming process that is laden with trial and error. But AI is helping by doing massive number crunching, examining data from similar viral diseases and then using that data to predict which types of vaccines and medicines are most likely to be effective.
-Artificial intelligence company Infervision has launched a coronavirus AI solution that helps front-line healthcare workers detect and monitor the disease efficiently. Imaging departments in healthcare facilities are being taxed with the increased workload created by the virus. This solution improves CT diagnosis speed. Chinese e-commerce giant Alibaba also built an AI-powered diagnosis system they claim is 96% accurate at diagnosing the virus in seconds.
But we are in very early stages. These diagnostic tools will take time – possibly months – to get into the hands of the health-care workers who need them. The hype outstrips the reality. In fact, the narrative that has appeared in many news reports and breathless press releases — that AI is a powerful new weapon against diseases—is only partly true and risks becoming counterproductive. For example, too much confidence in AI’s capabilities could lead to ill-informed decisions that funnel public money to unproven AI companies at the expense of proven interventions such as drug programs. It’s also bad for the field itself: overblown but disappointed expectations have led to a crash of interest in AI, and consequent loss of funding, more than once in the past.
Yes, the work by the companies I noted above is certainly impressive. And it goes to show how far machine learning has advanced in recent years. A few years ago Google tried to predict outbreaks with its ill-fated Flu Tracker, which was shelved in 2013 when it failed to predict that year’s flu spike. What changed? It mostly comes down to the ability of the latest software to listen in on a much wider range of sources.
Unsupervised machine learning is also key. Letting an AI identify its own patterns in the noise, rather than training it on preselected examples, highlights things you might not have thought to look for. When you do prediction, you’re looking for new behavior.
But problems developed and us data freaks have seen this so many times before. The prediction by BlueDot correctly pinpointed a handful of cities in the virus’s path. This could have let authorities prepare, alerting hospitals and putting containment measures in place. But as the scale of the epidemic grows, predictions become less specific. Metabiota’s warning that certain countries would be affected in the following week might have been correct, but it is hard to know what to do with that information.
What’s more, all these approaches will become less accurate as the epidemic progresses, largely because reliable data of the sort that AI needs to feed on has been hard to get about COVID-19 (bad data; see my earlier notes):
-News sources and official reports offer inconsistent accounts.
-There has been confusion over symptoms and how the virus passes between people. The media played things up; authorities played things down.
-And predicting where a disease may spread from hundreds of sites in dozens of countries is a far more daunting task than making a call on where a single outbreak might spread in its first few days. My contact at Johns Hopkins: “Noise is always the enemy of machine-learning algorithms”. Metabiota acknowledged: daily predictions were easier to make in the first two weeks or so.
And the killer issue, heard all across the UK and U.S., especially: the lack of diagnostic testing. Ideally, we would have a test to detect the novel coronavirus immediately and be testing everyone at least once a day. But you cannot. So nobody really knows what behaviors people are adopting – who is working from home, who is self-quarantining, who is or isn’t washing hands – or what effect it might be having. If you want to predict what’s going to happen next, you need an accurate picture of what’s happening right now.
It’s not clear what’s going on inside hospitals, either. Pactera Edge, a data and AI consultancy that has been focusing on hospitals, has noted on it website and in multiple media interviews that AI prediction tools would be a lot better if public health data wasn’t locked away within government agencies as it is in many countries, including the US. This means an AI must lean more heavily on readily available data like online news. By the time the media picks up on a potentially new medical condition, it is already too late.
But if AI needs much more data from reliable sources to be useful in this area, strategies for getting it can be controversial. Ah, the brutal trade-off: to get better predictions from machine learning, we need to share more of our personal data with companies and governments.
So here’s my reality check: AI will not save us from the coronavirus – certainly not this time. But there’s every chance it will play a bigger role in future epidemics. But we’ll need to make some very big socioeconomic changes. And we might not like them all. I will fully explore this later in the series.