Get Your Flu Shot

With America fixated on the ebola panic as a result of insistent media coverage, much of America has forgotten about getting their flu shot. But this isn’t strange – according to the Center for Disease Control and Prevention (CDC), only 46 percent of the US received flu vaccinations in 2013, even if influenza can kill up to 50,000 individuals in a bad year.

Well get your flu shots, people, because this year has been projected to be much worse.

But that prediction got me thinking. How does the CDC determine how bad a flu season is? FluView by the CDC had some answers. The Influenza Division of the CDC produces a weekly report starting in week 40, and collects data on 1). Viral Surveillance 2). Mortality 3). Hospitalizations 4). Outpatient Illness Surveillance and 5). Geographic Spread of Influenza. They analyze these data year-round to produce predictions about the upcoming flu season with a lag time of 1-2 weeks.

That seems like hard, yet important, work. But what if we wanted to know about impending flu behavior sooner? While searching for other prediction methods, I found an interesting Nature paper by Ginsberg et al (2008) that used google search queries to predict influenza epidemic trends (see reference below).

Ginsberg and crew set out to develop a simple model to see if the percentage of influenza-related physician visits  could be explained by the probability that a random search  query submitted from that same region was related to influenza (across 9 regions). They did this by taking 50 million of the most common search queries in the US between the years 2003 and 2008, and checked each individually to see which would most accurately model the CDC-reported influenza visit percentage in that region. They narrowed it down to 46 search terms, and found a mean correlation of 0.90! Furthermore, across the 2006-2007 season, they shared their weekly results with the Epidemiology and Prevention Branch of Influenza Division at the CDC to better understand their prediction timing and accuracy, and determined that their model could correctly estimate the percentage of influenza-like symptoms 1-2 weeks before that of the CDC surveillance program (and in as little as 1 day).

Some of the terms with the most influence on the model include “influenza complication” and “Cold/Flu Remedy.”

Because these data are accurate and readily available, models like these could help public health officials prepare and respond better to flu epidemics. The authors acknowledge, though, that this should certainly not be a replacement for surveillance programs, especially since we cannot predict how internet users will behave in particular scenarios (I wonder how the search terms look now with ebola – you can check it out yourself at

I think the take home message here (besides the fact that google trends is super cool) is to understand that the modern technological world is information rich, filled with available big data; however, in leveraging these data to our advantage, we need to understand and acknowledge that there is always error involved in our predictions.

And as an ending note, by writing this blog post and searching for influenza-related topics, I probably just contributed to this week’s predictions!

If you’re interested, check out for results from a 2008-2009 tracking study.


Reference: Ginsberg, Jeremy, et al. “Detecting influenza epidemics using search engine query data.” Nature 457.7232 (2008): 1012-1014.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Get Your Flu Shot

  1. tbelford24 says:

    Would searching for influenza data also contribute to the observer effect? If a search for “flu data” has any influence on the model, then attempting to observe the data would change the behavior of the data by increasing the number of searches/observations used by the model. (I’m having a little trouble making my reasoning coherent, so apologies if it doesn’t make any sense.) Also, it’s definitely an interesting model, but has it been just as accurate in flu seasons following 2006-2007? I suppose I’ll look it up, although I hope it doesn’t throw off the data.

  2. justin4301 says:

    This is a really cool way to predict trends like infectious outbreaks. I know that when scientists prepare the vaccine for flu season in North America, they take a look at the flu season that occurring in the southern hemisphere, which happens opposite ours. They try to predict the 3 or 4 strains to include in the North American vaccine that will be most prevalent in the population based on what flu strains are prominent in South America.

    I wonder if there are ways that this query method can be applied to the process of trying to hit a moving target of predicting what flu strains will become a problem in the future based on what is happening in another part of the world. This would be more longer term forecasting. Perhaps you can merge data about patient isolates, hospitalizations, and mortality, with regional search queries to get a sense of the epidemiology of a single influenza strain over another. We might be able to use this to compare and determine pathogenic variance between different mutants of influenza i.e. H1N1 vs H7N3, etc., and use it to make conclusions about what strains my emerge as the most dangerous ones.

  3. coastalsci says:

    This is a much-belated response on the Google Flu Trends – just wanted to alert everyone to recent developments:

    Google Flu Trends gets it wrong three years running – 13 March 2014 by Hal Hodson
    Google may be a master at data wrangling, but one of its products has been making bogus data-driven predictions. A study of Google’s much-hyped flu tracker has consistently overestimated flu cases in the US for years. It’s a failure that highlights the danger of relying on big data technologies.

    Google Flu Trends is no longer good at predicting flu, scientists find
    Researchers warn of ‘big data hubris’ and the importance of updating analytical models, claiming Google has made inaccurate forecasts for 100 of 108 weeks

    Disruptions: Data Without Context Tells a Misleading Story – By Nick Bilton
    February 24, 2013

    Google Flu Trends’ Failure Shows Good Data > Big Data – by Kaiser Fung
    Harvard Business Review
    March 25, 2014
    In their best-selling 2013 book Big Data: A Revolution That Will Transform How We Live, Work and Think, authors Viktor Mayer-Schönberger and Kenneth Cukier selected Google Flu Trends (GFT) as the lede of chapter one. They explained how Google’s algorithm mined five years of web logs, containing hundreds of billions of searches, and created a predictive model utilizing 45 search terms that “proved to be a more useful and timely indicator [of flu] than government statistics with their natural reporting lags.”
    Unfortunately, no.

    Google’s Flu Project Shows the Failings of Big Data by Bryan Walsh @bryanrwalsh
    Time Magazine – March 13, 2014
    A new study shows that using big data to predict the future isn’t as easy as it looks—and that raises questions about how Internet companies gather and use information

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s