To the editor:

The Gene Expression Omnibus (GEO) database1 hosted by the National Center for Biotechnology Information (NCBI) is the largest public archive for microarray data. Many factors contribute to the value and reusability of archived microarray data, including accurate probe annotation. As outlined by others2,3,4, this issue presents particular challenges, primarily because microarrays are labeled using many different annotation conventions and because probe-to-gene assignments continually evolve.

Chen and colleagues5 recently stated that GEO is experiencing only linear growth in citations despite exponential data growth, and claimed the reason for this imbalance is out-of-date probe annotation. It is not clear that exponential growth in any database should necessarily lead to an exponential increase in citation rates. Should we really expect citation rates to GenBank to keep pace with the exponential growth rates for DNA sequence data? Regardless, our numbers do not support the assertions in Chen et al.5 and instead show very similar rates of growth for GEO data and third-party citations (Supplementary Fig. 1 online).

Chen and colleagues also raised a fair point that for GEO data to be accurately evaluated, probe annotations should be repeatedly synchronized with latest gene mappings. They suggested that this task could be facilitated by implementing standardized column headers in microarray tables. We agree with both points. In fact, a standardized header system has been in place at GEO for several years (Supplementary Table 1 and Supplementary Discussion online) and is the basis of our internal reannotation pipeline. These standard column headers enable us to provide up-to-date annotation for genes within the Entrez GEO Profiles database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=geo). Annotation tables are freely available for download (ftp://ftp.ncbi.nih.gov/pub/geo/DATA/annotation/) and, when possible, include several auxiliary annotation categories, including chromosomal position and gene ontology terms.

GEO recognizes that accurate probe annotation is fundamental to data reuse, and we thank Chen and colleagues for raising this point. We will continue to make considerable efforts to acquire sufficient probe sequence tracking information from submitters. Our annotation procedures continue to be refined: we are working to implement a probe sequence mapping procedure, to increase the fraction of curated arrays and to reannotate more frequently.

Note: Supplementary information is available on the Nature Methods website.