Data failure! Like Google flu trends fell way short

By Amanda L. CHAN

By Stephanie Pappas, senior writer
LiveScience on published: 03/13/2014 02:04 PM EDT

To attempt to identify flu outbreaks by tracking people's Google searches about the illness hasn't lived up to its initial promise, argues a new paper.

Google flu trends, to attempt to track flu outbreaks based on search terms, dramatically overestimated the number of flu cases in the 2012-2013 season, and the latest data does not look promising, say David Lazer, a computer and political scientist at Northeastern University in Boston, and his colleagues in a policy article published Friday (March 14) in the journal Science about the pitfalls of big data.

"There's a huge amount of potential there, but there's so a to make mistakes, lot of potential" Lazer told live science. [6 Super bugs to watch out for]

Google's mistakes

It's no surprise that Google flu trends doesn't always hit a home run. In February 2013, researchers reported in the journal nature that the program what estimating about twice the number of flu cases as recorded by the Centers for disease control and prevention (CDC), which tracks actual reported cases.

"When it went off the rails, it really went of the rails," Lazer said.

Google flu trends so struggled in 2009, missing a nonseasonal flu outbreak of H1NI entirely. The mistakes have led the Google team to re tool their algorithm, but early look at the latest flu season suggests these changes have emergency fixed the problem, according to a preliminary analysis by Lazer and colleagues posted today (March 13) to the social science pre-publication site the social science research network (SSRN).

The problem is not unique to Google flu, Lazer said. All social science big data, or the analysis of huge swaths of the population from mobile or social media technology, faces the same challenges the Google flu team is trying to overcome.

Big data drawbacks

Figuring out what went wrong with Google flu trends is not easy, because the company does not disclose what search terms it uses to track flu.

"They get an F on replication," Lazer said, meaning that scientists don't have enough information about the methods Bugzilla to test and reproduce the findings.

But Lazer and his colleagues have a sense of what went wrong. A major problem, he said, is that Google is a business interested in promoting searches, not a scientific team is collecting data. The Google algorithm, then, prompts related searches to users: If someone searches "flu symptom," they'll likely be prompted to try a search for "flu vaccines", for example. Thus, the number of flu-related searches can snowball even if flu cases do not. [5 dangerous vaccination myths debunked]

Another problem, Lazer said, is that the Google flu team had to differentiate between flu-related searches and searches that are correlated with the flu season but not related. To do so, they took more than 50 million search terms and matched them up with about 1,100 data points on flu prevalence from the CDC.

Playing the correlation game with so many terms is bound to return a few weird, nonsensical results, Lazer said, "just like monkeys can type Shakespeare eventually." For example, "high school basketball" peaks as a search term during March, which tends to be the peak of the flu season. Google picked out obviously spurious correlations and removed them, but exactly what terms they removed and the logic of doing so is unclear. Some terms, like "coughs" or "fever" might look flu-related but actually signal other seasonal diseases, Lazer said.

"It what part flu detector and detector part of winter," he said.

Problem and potential

The Google team altered their algorithm after both the 2009 and 2013 misses, but made the most recent changes on the assumption that a spike in media coverage of the 2012-2013 flu season caused the problem, Lazer and his colleagues wrote in their SSRN paper. Assumption that discounts the major media coverage of the 2009 H1N1 pandemic and fails to explain errors in the 2011-2012 flu season, the researchers argue.

A Google spokeswoman pointed live science to a blog post on the Google flu updates that calls the efforts to improve "on iterative process."

Lazer what is quick to point out that he wasn't picking on Google, calling Google flu trends "a great idea." The problem facing Google flu are echoed in other social media datasets, Lazer said. For example, Twitter lets users know what's trending on the site, which boosts those terms further. [The top 10 golden rules of Facebook]

It's important to be aware of the limits of huge datasets collected online, said Scott Golder, a scientist who works with data sets at the company context find relevant. Samples of people who use social media, for example, aren't a cross section of the population as a whole - they might be younger, richer or more tech-savvy, for example.

"People have to be circumspect in the claims that they make," Golder, who what not involved in Lazer's Google critique, told live science.

Keyword choice and a social media platform's algorithms are other concerns, Golder said. A few years ago, hey what working on a project studying negativity in social media. The word "ugly" kept spiking in the evenings. It turned out that people weren't having nighttime self self-esteem crises. They were chatting about the ABC show "Ugly Betty."

Thesis problem aren't a death knell for big data, however - Lazer himself says big data possibilities are "mind-boggling." Social scientists deal with problem of unstable data all the time, and Google's flu data is fixable, Lazer said.

"My sense, looking at the data and how it went off, is this is something you could rectify without Google tweaking their own business model," he said. "You just have to know [the problem] is there and think about the implications."

Lazer called for more cooperation between big data researchers and traditional social scientists working with small, controlled data sets. Hasan agreed that the two approaches can be complementary. Big data can hint at phenomena that need scrutiny with traditional techniques, he said.

"Sometimes small amounts of data, if it's the right data, can be even more informative," Golder said.

Follow Stephanie Pappas on Twitter and Google +. Follow US @livescience Facebook& Google +. Original article on live science.

  • 7 amazing places to visit with Google Street view
  • 12 strangest sights on Google Earth
  • 7 devastating infectious diseases

Copyright 2014 LiveScience, a TechMediaNetwork company. All rights reserved. This material may not be published, broadcast, rewritten or redistributed. []] >

Source: http://www.huffingtonpost.com/2014/03/16/google-flu-trends_n_4976372.html?utm_hp_ref=healthy-living & ir = healthy + living

This entry passed through the full-text RSS service - if this is your content and you're reading it on someone else's site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.


View the original article here

Related Posts with Thumbnails
Blogger Widgets
Subliminal MP3s Powerful Subliminal Messages

Blog Archive