Roy Rivenburg, UC Irvine
Can Twitter be used to detect pandemics before they take off?
To find out, researchers at UCI and UCLA are sifting through millions of tweets (and other data) from the months leading up to COVID-19’s big splash, looking for anomalies and patterns that would have provided an early warning of the virus.
“It’s a little like searching for a needle in a haystack,” concedes Andrew Noymer, an associate professor of population health and disease prevention at UC Irvine. “But the stakes are high, so it’s worth trying some different approaches.”
The National Science Foundation agreed and bestowed nearly $1 million on the 10-member UC Irvine-UCLA team under its new Predictive Intelligence for Pandemic Prevention grant program, which funds “high-risk, high-payoff” research that “aims to identify, model, predict, track and mitigate the effects of future pandemics.”
Chen Li, a professor of computer science who’s leading the effort at UC Irvine, likens the project to “weather forecasting, where advances in big data technologies and information analysis have resulted in better forecasts that are further out.” A pandemic early detection system could enable “faster responses in public health, medicine and government,” he says.
Li, Noymer and principal investigator Wei Wang, a UCLA professor of computer science and computational medicine, developed the grant proposal last year. Noting that infectious diseases “are sociobiological phenomena and leave both social and microbiological footprints,” they suggested using artificial intelligence and a panoply of public data to “monitor human society for signs of unusual activities that reflect the emergence of novel pathogens with pandemic potential.”
At the heart of the study is a searchable database of 2.3 billion U.S. Twitter posts, which Li’s lab has been collecting since 2015. Type in the word “cough,” for example, and the results can be narrowed down by location, time period and other variables to help pinpoint trends.
The hard part is figuring out which tweets are meaningful and then training the project’s computer to recognize them.
Some keywords, such as “fever,” appear in too many non-health-related contexts to be relevant, Li says.
So far, the method has discovered “interesting pandemic signals from March 2020,” Noymer says. “Unfortunately, that’s too late to be useful,” because health officials had already begun issuing warnings more than a month earlier, he notes. The goal of the grant is to uncover tipoffs from late 2019, before the coronavirus was on anyone’s radar.
“We’re hoping to find the horse while it’s loose in the stable but before it bolts out of the barn,” Noymer says.
Toward that end, the researchers are also analyzing news media stories, anonymous student health and absence statistics, biological data and a range of public information resources — and not just for COVID-19 precursors.
One of the study’s limitations is that the coronavirus originated in China, where Twitter is officially blocked. So the team will also search for early monkeypox clues as a test case.
“If we can’t find harbingers for outbreaks of COVID-19 or monkeypox, our concept is sledding uphill,” Noymer says. “And even if we do find them, that doesn’t guarantee they’ll foreshadow the next pandemic. But the potential payoff makes the idea worth investigating.”
By the end of December, Li says, the group hopes to have a pilot AI system ready, a broader collection of data to analyze and additional results.
Other UC Irvine researchers on the team are Carter Butts, Chancellor’s Professor of sociology; Kristin Turney, a Dean’s Professor of sociology; and Dominik Wodarz, professor of population health and disease prevention.