Long haul: The Lung Cancer Tissue Bank has only 20 of the 500 samples required. Credit: Mainframephotographics.com

When the ambitious Cancer Genome Atlas was announced in December 2005, the project's leaders said they would examine scores of tumors for mutations that promote cancer, which could then help develop targeted treatments.

But more than a year into the venture, they have found only a fraction of the tumor samples they need.

The three-year pilot phase of the project, pegged at more than $100 million, aims to analyze 2,000 genes from 1,500 lung, brain and ovarian tumors. The full project is supposed to catalog the genetic changes in 50,000 samples representing more than 100 types of cancer.

That's a lot of samples—and scientists warned that this might be more than what the project would be able to find (Nat. Med. 12, 719; 2006). At the same time, many researchers hailed this aspect of the study, saying that mutations associated with cancer often vary between individuals, making it difficult to identify the important, recurring changes.

It's not worth doing unless they have large numbers. Daniel Haber, Massachusetts General Hospital Cancer Center

“The samples in this project are key,” says Daniel Haber, director of the Massachusetts General Hospital Cancer Center, who is not associated with the project.

“Each tumor seems to have mutations in different genes—in other words, not too many recurrent ones,” Haber says. “To find the genetic commonalities you must have a large set of tumor samples.”

So far, at least, things seem more complicated than researchers had hoped.

Based on 210 tumor samples, for example, scientists at the UK's Wellcome Trust Sanger Institute have found nearly 1,000 different mutations, but all at low frequencies (Nature, 446, 153–158; 2007).

“The end result is you need a larger number of samples,” says Michael Stratton, one of the lead investigators of that study, which is unrelated to the atlas.

Even finding 210 samples that met the strict criteria was difficult, says Stratton. For example, 80% of each sample had to be comprised of tumor cells.

After scouring the US for two years the US National Institutes of Health (NIH), which oversees the atlas, in September 2006 chose three tissue banks to each supply 500 tumor samples. The banks have gathered about a third of that request.

Of 500 requested squamous cell lung tumors, for example, the Lung Cancer Tissue Bank of the Cancer and Leukemia Group B has 20 that meet the criteria, says Richard Schilsky, the bank's chairman.

The bank has more tumors, Schilsky says, but they were collected before the genome project began, and patients were not asked for the proper consent. To make the samples eligible, the bank would have to go back to each donor and explain that their genetic information would be entered into a public database. The bank collects about six or seven new samples each month but at that rate, it would take at least six years to meet the goal.

The Gynecologic Oncology Group, one of the three chosen banks, was asked to supply a subset of ovarian tumors and estimates that it has about 500 in its Columbus, Ohio, bank. Most of those are likely to qualify but about a third don't have the proper consent from donors, according to Michael Birrer, the group's vice-chair.

The bank will have to revise consent forms and get them approved by the hospitals or other sources that supplied the tumors. But those sources may not approve the new forms because of the ethical implications of sharing the donors' genetic data, Birrer says.

Fewer samples won't make the genome project impossible, but it will diminish the value of the results. If existing samples aren't enough, tissue banks may begin collecting large numbers of tumors for the project's next phase.

“I personally think it's not worth doing unless they have large numbers,” says Haber. “The Sanger study looked at hundreds of tumors, so the next logical step is to look at a broader set of genes in thousands of tumors.”