Power tools: microarrays (left, inset) show quickly and easily which genes in a sample are active, but despite automated steps in their manufacture, the chips are still open to significant errors. Credit: SPL/ANTHONY PIDGEON

Some call them DNA chips, others microarrays, but whatever name you prefer, they are one of the hottest tools in biology. A search of the Medline database for papers published in 1999 with 'microarray' in their title yields just 27 results. Try the same search for 2000 and the number jumps to 97 — a crude measure, perhaps, but it is a testament to a revolution that is transforming studies of gene expression. As the genomics revolution begins to make its mark, biologists are turning in growing numbers to a technology that lets them analyse cells or tissues and determine, at a stroke, which genes are active.

DNA microarrays consist of a library of genes immobilized in a grid, usually on a glass slide. Each individual 'spot' in the grid contains DNA from a single gene that will bind to the messenger RNA (mRNA) produced by the gene concerned. So by liquidizing a sample from a given tissue type, tagging its mRNAs with fluorescent dyes and then exposing the sample to the slide, it is possible to obtain an instant visual read-out revealing which genes were active.

Researchers who previously studied the activity of one gene at a time can now analyse the expression of thousands of genes simultaneously. But as aficionados explore the technology's limits, they are turning up errors in DNA chips that could lead unwary biologists towards erroneous conclusions. And experts worry that too few of the researchers rushing to embrace DNA microarrays are aware of the potential pitfalls. “It's going to revolutionize science. But the technology is in its infancy, so there are going to be some growing pains,” says Timothy Zacharewski, a toxicologist at Michigan State University in East Lansing, who makes and uses microarrays. “It's amazing how many people are going forward without a full appreciation of what they are getting into.”

The enormous number of genes that can be studied at one go is the technology's curse, as well as the source of its power. Although microarray production is heavily automated, there are many opportunities for human error. “For any experiment you can mislabel a tube and mess yourself up,” says Joseph DeRisi, a microarray pioneer at the University of California, San Francisco. “But here, the potential for the error to magnify itself is much more drastic. Instead of one tube at a time, you are doing 6,000.”

Send in the clones

One popular type of array was devised by a team led by Patrick Brown at Stanford University in California1, and is based on libraries of gene sequences made using mRNA. To store and reproduce these sequences, researchers make 'complementary' DNA (cDNA) copies of the RNA messages and splice them into loops of DNA called plasmids. The plasmids are then inserted into bacteria, which grow in culture and churn out more plasmids from which the cDNAs can be derived for spotting onto microarray slides.

Errors creep in as these bacterial cultures, or the cDNA clones extracted from them, are manipulated. The cultures are often stored in small plastic plates, each typically containing 96 wells, and they are transferred from plate to plate using pipetting robots. But bacteria can easily contaminate other wells, and technicians can make errors such as loading plates into the robots the wrong way round or taking samples from the wrong well for sequencing. As a result, between 1% and 5% of the clones in even the best-maintained sets do not contain the sequence that they are supposed to.

Until recently, few researchers were aware of the extent to which the errors can multiply as clone sets are copied and transferred from lab to lab. But last year, after hearing anecdotal reports of high error rates in a set of mouse cDNA clones assembled by a group of labs called the IMAGE (Integrated Molecular Analysis of Genomes and their Expression) consortium, Zacharewski decided to investigate further.

The IMAGE consortium has compiled a variety of cDNA clone sets, which are now produced by commercial suppliers. Scientists wanting to use IMAGE clone sets for microarray studies can either buy bacterial cultures or purified cDNAs and make up their own slides, or order pre-manufactured chips.

Teething troubles: Timothy Zacharewski (left) was shocked by the high error rates he found in a set of cDNA clones used to make microarrays. Joseph DeRisi (right) is worried that researchers cannot check for mistakes in commercial chips.

To check the accuracy of commercially available IMAGE mouse cDNA clone sets, Zacharewski and his colleagues purchased a set from one supplier, Research Genetics of Huntsville, Alabama, and sequenced 1,189 cDNAs. Only 62% of the stocks definitely represented a pure sample of the correct clone2. Of the remainder, more than half seemed to contain the wrong cDNA, and the rest contained either a mix of different cDNAs or did not yield a readable sequence.

In some cases, the apparent errors may mean that the sequence for the clone deposited in the public databases is wrong, rather than there being a problem with the clone. But stocks containing more than one cDNA were probably the result of cross-contamination, Zacharewski says. Other problems may reflect handling errors accumulated as different labs managed and distributed the stocks over the years.

Before Zacharewski's study, reagent suppliers had acknowledged the potential for errors and started producing cleaned-up, 'sequence-verified' cDNA clone sets. But even these can be problematical. Indeed, researchers at three major microarray centres told Nature that they have found disturbingly high error rates — up to 30% — in copies of the sequence-verified version of the Research Genetics mouse cDNA clone set studied by Zacharewski's team.

The centres involved — at Vanderbilt University in Nashville, Yale University in New Haven, Connecticut, and Brigham and Women's Hospital in Boston — belong to a biotechnology consortium funded by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) in Bethesda, Maryland. The source of the errors has yet to be pinpointed, and some may have arisen at the centres concerned. Troy Moore of Research Genetics maintains that the company's error rate should not exceed 2%, but adds: “If we identify a problem, we will work to correct it.”

Shawn Levy, who works at the Vanderbilt centre, does not believe the problems he has found within the Research Genetics clone set are the result of local mishandling, as his team has analysed other clone sets and found error rates of less than 5%. But some of the errors might reflect the fact that cDNA clones are usually not sequenced in their entirety — so if the fragments sequenced by the NIDDK consortium do not overlap with the partial sequences deposited in public databases, correct clones may appear to be in error.

While the consortium members compare their sequencing data in an effort to pin down the source of the apparent errors, the Yale centre has posted a notice on its website warning users of the potential for problems. But regardless of the explanation, the lesson is clear: even when care is taken to remove erroneous sequences, it cannot be assumed that microarrays based on cDNA clones are reliable. “I think errors may be inherent to the system,” says Steve Gullans, who heads the centre at the Brigham and Women's Hospital.

As a result, the NIDDK consortium plans to increase its output of microarrays based on a rival technology. In these chips, the grid consists of oligonucleotides, or oligos — short, single-stranded DNA segments built to order by chemical synthesis3. This construction process avoids problems with bacterial contamination, and should mean that each sequence is what the researcher orders. On the minus side, oligo-based microarrays are expensive. And ultimately, they are only as good as the information used to direct the oligos' synthesis — as the DNA chip company Affymetrix of Santa Clara, California, recently discovered.

Mistaken identity

Affymetrix can pack up to 400,000 different oligos on a single array — usually representing around 10,000 genes, with 40 oligos for each gene. But in February, Affymetrix announced that up to a third of the sequences on one set of mouse arrays were wrong. The company had used sequences from the public sequence databases that were known to be ambiguous, and which actually corresponded to the wrong strand from the DNA double helix. As a result, the oligos could not detect their target mRNAs.

Affymetrix has promised to replace the arrays. “It's going to be an inconvenience, at most,” says Carrolee Barlow of the Salk Institute for Biological Studies in La Jolla, California, who is using the chips to investigate the genetics of brain disorders. But to DeRisi, the incident points out the risks inherent in commercial DNA chips. “You are at the mercy of the company,” he says. “That is a tough situation when you are not allowed to proofread what they have done.”

But even perfect arrays do not guarantee good science. Microarray experts say that some new users seem to be so mesmerized by the technology's power that they are forgetting basic principles of experimental design. Ash Alizadeh, a graduate student in Brown's Stanford lab, says he knows of several microarray studies lacking the proper controls and replications needed to ensure that differences in gene expression really are associated with the variable under investigation.

Although such shortcomings should be spotted by journal editors and reviewers, erroneous results caused by faulty chips are harder to detect — and experts are sure that some have entered the literature. They are urging users not to draw firm conclusions about the activity of individual genes without checking the sequence of the spot concerned and verifying the result using alternative methods of monitoring gene expression.

Within a few months, predicts Gullans, journal reviewers will routinely be asking these questions. And then perhaps the focus will be back on the immense power of microarrays, rather than their limitations.

Nature Genetics' Chipping Forecast