With America fixated on the ebola panic as a result of insistent media coverage, much of America has forgotten about getting their flu shot. But this isn’t strange – according to the Center for Disease Control and Prevention (CDC), only 46 percent of the US received flu vaccinations in 2013, even if influenza can kill up to 50,000 individuals in a bad year.
Well get your flu shots, people, because this year has been projected to be much worse.
But that prediction got me thinking. How does the CDC determine how bad a flu season is? FluView by the CDC had some answers. The Influenza Division of the CDC produces a weekly report starting in week 40, and collects data on 1). Viral Surveillance 2). Mortality 3). Hospitalizations 4). Outpatient Illness Surveillance and 5). Geographic Spread of Influenza. They analyze these data year-round to produce predictions about the upcoming flu season with a lag time of 1-2 weeks.
That seems like hard, yet important, work. But what if we wanted to know about impending flu behavior sooner? While searching for other prediction methods, I found an interesting Nature paper by Ginsberg et al (2008) that used google search queries to predict influenza epidemic trends (see reference below).
Ginsberg and crew set out to develop a simple model to see if the percentage of influenza-related physician visits could be explained by the probability that a random search query submitted from that same region was related to influenza (across 9 regions). They did this by taking 50 million of the most common search queries in the US between the years 2003 and 2008, and checked each individually to see which would most accurately model the CDC-reported influenza visit percentage in that region. They narrowed it down to 46 search terms, and found a mean correlation of 0.90! Furthermore, across the 2006-2007 season, they shared their weekly results with the Epidemiology and Prevention Branch of Influenza Division at the CDC to better understand their prediction timing and accuracy, and determined that their model could correctly estimate the percentage of influenza-like symptoms 1-2 weeks before that of the CDC surveillance program (and in as little as 1 day).
Some of the terms with the most influence on the model include “influenza complication” and “Cold/Flu Remedy.”
Because these data are accurate and readily available, models like these could help public health officials prepare and respond better to flu epidemics. The authors acknowledge, though, that this should certainly not be a replacement for surveillance programs, especially since we cannot predict how internet users will behave in particular scenarios (I wonder how the search terms look now with ebola – you can check it out yourself at http://www.google.com/trends/).
I think the take home message here (besides the fact that google trends is super cool) is to understand that the modern technological world is information rich, filled with available big data; however, in leveraging these data to our advantage, we need to understand and acknowledge that there is always error involved in our predictions.
And as an ending note, by writing this blog post and searching for influenza-related topics, I probably just contributed to this week’s predictions!
If you’re interested, check out http://www.google.org/flutrends/ for results from a 2008-2009 tracking study.
Reference: Ginsberg, Jeremy, et al. “Detecting influenza epidemics using search engine query data.” Nature 457.7232 (2008): 1012-1014.