Influenza researchers are complaining that the poor sharing of data by the US disease-control agency is hindering their work.

Still reeling from accusations that his administration was unprepared for the hurricane that hit New Orleans last month, President George W. Bush called last week for an international partnership on influenza that would require countries facing an outbreak to share immediately information and samples with the World Health Organization (WHO).

But investigations by Nature have revealed widespread concern that too few of the flu data collected by the US Centers for Disease Control and Prevention (CDC) in Atlanta are made generally available. Experts say research would speed up if the CDC's influenza branch threw open its databases of virus sequences and immunological and epidemiological data.

“Many in the influenza field are displeased with the CDC's practice of refusing to deposit sequences of most of the strains that they sequence,” says Michael Deem, a physicist at Rice University in Houston, who works on predicting flu vaccine efficiency.

Shot in the dark: a lack of data sharing could hold back the design and assessment of flu vaccines. Credit: M. RIETSCHEL/AP

Policy decisions, such as which vaccine to produce ahead of each flu season, are being made without the full data being available to the scientific community, he says. “The quality of their decisions, which can affect millions of people, cannot be checked.”

Deem's criticisms are echoed widely, although most scientists are reluctant to speak on the record. “This is a very delicate issue. It is important to keep a positive working relationship with the CDC, and they do lots of things well,” says one evolutionary ecologist. “But getting data from them has been somewhere between extremely difficult and impossible.”

Getting data from the CDC has been somewhere between extremely difficult and impossible.

Researchers say they have no idea what or even how many flu sequences the CDC processes, but it is thought to be up to thousands each year. Apart from occasional large deposits accompanying published papers, required by journals, data are “coming through an eye dropper”, says one bioinformatician at the US National Institutes of Health (NIH) in Bethesda, Maryland.

Nature's analyses show that, of about 15,000 influenza A sequences in the gene database Genbank and the influenza sequence database at the Los Alamos National Laboratory in New Mexico, fewer than a tenth were deposited by the CDC. A consortium led by the US National Institute of Allergy and Infectious Diseases (NIAID) in Bethesda has deposited more than 2,800 sequences this year alone.

“The advancement of public health and science is generally best served when data are shared in an open, timely and appropriate manner, and the CDC Influenza Branch is committed to accomplishing this objective,” says James LeDuc, director of the CDC's division of viral and rickettsial diseases. But he adds: “This must be balanced against the needs for maintaining high standards for data quality and for protecting sensitive information when the situation warrants.”

LeDuc says that as well as depositing sequences alongside papers, the agency posts summaries of epidemiological data on its website each week, and shares information with the WHO. But “we do not have the capacity to comply with all requests while also meeting our other public-health responsibilities”.

Many flu scientists say that the CDC should try harder. “No other US laboratory receives thousands of influenza samples and sequences from around the globe,” points out one. “They say it's in [their weekly report],” says another. “Give me a break. I want the database.”

The dearth of CDC data was one reason why the NIAID last year created its consortium to sequence thousands of flu strains from humans and birds, according to one scientist close to the project. In one of the team's first papers, published in July (E. C. Holmes et al. PLoS Biology 3, e300; 2005), researchers found that viruses swap genes with each other much more frequently than was thought.

One such swap made the virulent Fujian strain, which hit in 2003–04, and to which the annual vaccine was poorly adapted. “The minute we got our hands on some open data, it jumped out that here was something people were not aware of,” says one NIH scientist. “The CDC didn't know what was going on with the Fujian thing, and by the time they realized, it was too late to use it for a vaccine.”

The threat of a flu pandemic makes it “imperative that our most experienced and brilliant scientists across the globe come together as one team”, says Jill Taylor, a clinical virologist at the Wadsworth Center of the New York state department of health, and a member of the NIAID consortium.

“Open data are better,” agrees William Glezen, a virologist at Baylor College of Medicine in Houston. “There is a lot that we have to learn about influenza.” A key issue, he says, is to match changes in the flu genome with the epidemiology of infections.

He acknowledges that CDC staff are busy with programmes such as the annual vaccine selection, and lack time and resources to share data better. “That's why other investigators need to look at the other parts,” says Glezen.