To the Editor:

Following a discussion by the workgroup for Data Standards in Synthetic Biology, which met in June 2010 during the Second Workshop on Biodesign Automation in Anaheim, California, we wish to highlight a problem relating to the reproducibility of the synthetic biology literature. In particular, we have noted the very small number of articles reporting synthetic gene networks that disclose the complete sequence of all the constructs they describe.

To our knowledge, there are only a few examples where full sequences have been released. In 2005, a patent application1 disclosed the sequences of the toggle switches published four years earlier in a paper by Gardner et al.2. The same year, Basu et al.3 deposited their construct sequences for programmed pattern formation into GenBank3. Examples of synthetic DNA sequences derived from standardized parts that have been made available in GenBank include the refactored genome of the bacteriophage phage T7 (ref. 4) and a BioBrick-based plasmid5. More recently, the full genome sequence of synthetic Mycoplasma mycoides JCVI-syn1.0 clone sMmYCp235-1 also has been made available in GenBank (accession no. CP002027)6.

In contrast, most publications provide a variety of methods, information and/or partial sequences to explain the constructs used in a paper; for the research community, piecing together the full sequences of constructs is thus laborious, error-prone and sometimes impossible. A paper from your journal provides a recent example; although Kemmer et al.7 provided admirably detailed Supplementary Information on the construction methods for their plasmids, they failed to provide access to the final sequences. Indeed, the gaps between key components are almost never reported, presumably because they are not considered crucial to the report. Yet, synthetic biology relies on the premise that synthetic DNA can be engineered with base-level precision.

Missing sequence information in papers hurts reproducibility, limits reuse of past work and incorrectly assumes that we know fully which sequence segments are important. For example, many synthetic biologists are currently realizing that translation initiation rates are dependent on more than the Shine-Dalgarno sequence8. Sequences upstream of the start codon are crucial for translation rates, yet are underreported. Similarly, it has been demonstrated that intron length can affect the dynamics of genetic oscillators9. Many more such examples are likely to emerge.

Because full sequence disclosure is critical, we wonder why the common requirement by many journals to provide GenBank entries for genomes and natural sequences has not been enforced for synthetic DNA and engineered genetic constructs. In an environment where word count is a constant battle, replacing plasmid construction method sections with references to annotated GenBank entries would be a welcome change. We therefore feel that including a completely annotated sequence of the construct would greatly contribute to the development of our discipline. We hope that in the future you will encourage the authors you publish to submit this information to GenBank or other appropriate databases. In the long term, we hope to establish a minimal information guideline around the Minimal Information about a Biomedical or Biological Investigation (MIBBI; project and welcome contributions from the greater community.