Letter | Published:

Detecting influenza epidemics using search engine query data

Nature volume 457, pages 10121014 (19 February 2009) | Download Citation

Subjects

This article has been updated

Abstract

Seasonal influenza epidemics are a major public health concern, causing tens of millions of respiratory illnesses and 250,000 to 500,000 deaths worldwide each year1. In addition to seasonal influenza, a new strain of influenza virus against which no previous immunity exists and that demonstrates human-to-human transmission could result in a pandemic with millions of fatalities2. Early detection of disease activity, when followed by a rapid response, can reduce the impact of both seasonal and pandemic influenza3,4. One way to improve early detection is to monitor health-seeking behaviour in the form of queries to online search engines, which are submitted by millions of users around the world each day. Here we present a method of analysing large numbers of Google search queries to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day. This approach may make it possible to use search queries to detect influenza epidemics in areas with a large population of web search users.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Change history

  • 19 February 2009

    The AOP version of this paper published on 19 November 2008 contained an inaccuracy in the reference list.

References

  1. 1.

    World Health Organization. Influenza fact sheet. 〈〉 (2003)

  2. 2.

    World Health Organization. WHO consultation on priority public health interventions before and during an influenza pandemic. 〈〉 (2004)

  3. 3.

    et al. Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature 437, 209–214 (2005)

  4. 4.

    et al. Containing pandemic influenza at the source. Science 309, 1083–1087 (2005)

  5. 5.

    , & Telephone triage: A timely data source for surveillance of influenza-like diseases. AMIA Annu. Symp. Proc. 215–219 (2003)

  6. 6.

    Evaluation of over-the-counter pharmaceutical sales as a possible early warning indicator of human disease. Johns Hopkins APL Tech. Digest 24, 349–353 (2003)

  7. 7.

    Online Health Search 2006. Pew Internet & American Life Project〉 (2006)

  8. 8.

    et al. Analysis of Web access logs for surveillance of influenza. Stud. Health Technol. Inform. 107, 1202–1206 (2004)

  9. 9.

    Infodemiology: tracking flu-related searches on the web for syndromic surveillance. AMIA Annu. Symp. Proc. 244–248 (2006)

  10. 10.

    , , & Using internet searches for influenza surveillance. Clin. Infect. Dis. 47, 1443–1448 (2008)

  11. 11.

    The moments of the z and F distributions. Biometrika 36, 394–403 (1949)

  12. 12.

    & Mapreduce: Simplified data processing on large clusters. Sixth Symp. Oper. Syst. Des. Implement. (2004)

Download references

Acknowledgements

We thank L. Finelli for providing background knowledge, helping us validate results and comments on this manuscript. We are grateful to R. Rolfs, L. Wyman and M. Patton for providing ILI data. We thank V. Sahai for his contributions to data collection and processing, and C. Nevill-Manning, A. Roetter and K. Sarvian for their comments on this manuscript.

Author Contributions J.G. and M.H.M. conceived, designed and implemented the system. J.G., M.H.M. and R.S.P. analysed the results and wrote the paper. L.B. contributed data. All authors edited and commented on the paper.

Author information

Affiliations

  1. Google Inc., 1600 Amphitheatre Parkway, Mountain View, California 94043, USA

    • Jeremy Ginsberg
    • , Matthew H. Mohebbi
    • , Rajan S. Patel
    • , Mark S. Smolinski
    •  & Larry Brilliant
  2. Centers for Disease Control and Prevention, 1600 Clifton Road, NE, Atlanta, Georgia 30333, USA

    • Lynnette Brammer

Authors

  1. Search for Jeremy Ginsberg in:

  2. Search for Matthew H. Mohebbi in:

  3. Search for Rajan S. Patel in:

  4. Search for Lynnette Brammer in:

  5. Search for Mark S. Smolinski in:

  6. Search for Larry Brilliant in:

Corresponding author

Correspondence to Matthew H. Mohebbi.

Supplementary information

PDF files

  1. 1.

    Supplementary Information 1

    This file contains Supplementary Figures 1-3 and Legends, Supplementary Methods and Supplementary Notes.

Excel files

  1. 1.

    Supplementary Information 2

    Query fractions for the top 100 search queries, sorted by mean Z-transformed correlation with CDC-provided ILI percentages across the nine regions of the United States.

About this article

Publication history

Received

Revised

Accepted

Published

DOI

https://doi.org/10.1038/nature07634

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.