The minimum information about a genome sequence (MIGS) specification

Field, Dawn; Garrity, George; Gray, Tanya; Morrison, Norman; Selengut, Jeremy; Sterk, Peter; Tatusova, Tatiana; Thomson, Nicholas; Allen, Michael J; Angiuoli, Samuel V; Ashburner, Michael; Axelrod, Nelson; Baldauf, Sandra; Ballard, Stuart; Boore, Jeffrey; Cochrane, Guy; Cole, James; Dawyndt, Peter; De Vos, Paul; dePamphilis, Claude; Edwards, Robert; Faruque, Nadeem; Feldman, Robert; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Goldstein, Philip; Guralnick, Robert; Haft, Dan; Hancock, David; Hermjakob, Henning; Hertz-Fowler, Christiane; Hugenholtz, Phil; Joint, Ian; Kagan, Leonid; Kane, Matthew; Kennedy, Jessie; Kowalchuk, George; Kottmann, Renzo; Kolker, Eugene; Kravitz, Saul; Kyrpides, Nikos; Leebens-Mack, Jim; Lewis, Suzanna E; Li, Kelvin; Lister, Allyson L; Lord, Phillip; Maltsev, Natalia; Markowitz, Victor; Martiny, Jennifer; Methe, Barbara; Mizrachi, Ilene; Moxon, Richard; Nelson, Karen; Parkhill, Julian; Proctor, Lita; White, Owen; Sansone, Susanna-Assunta; Spiers, Andrew; Stevens, Robert; Swift, Paul; Taylor, Chris; Tateno, Yoshio; Tett, Adrian; Turner, Sarah; Ussery, David; Vaughan, Bob; Ward, Naomi; Whetzel, Trish; San Gil, Ingio; Wilson, Gareth; Wipat, Anil

doi:10.1038/nbt1360

Perspective
Published: 08 May 2008

The minimum information about a genome sequence (MIGS) specification

Dawn Field¹,
George Garrity²,
Tanya Gray¹,
Norman Morrison^3,4,
Jeremy Selengut⁵,
Peter Sterk⁶,
Tatiana Tatusova⁷,
Nicholas Thomson⁸,
Michael J Allen⁹,
Samuel V Angiuoli^5,10,
Michael Ashburner^5,10,
Nelson Axelrod⁵,
Sandra Baldauf¹¹,
Stuart Ballard¹²,
Jeffrey Boore¹³,
Guy Cochrane⁶,
James Cole²,
Peter Dawyndt¹⁴,
Paul De Vos^15,16,
Claude dePamphilis¹⁷,
Robert Edwards^18,19,
Nadeem Faruque⁶,
Robert Feldman²⁰,
Jack Gilbert⁹,
Paul Gilna²¹,
Frank Oliver Glöckner²²,
Philip Goldstein²³,
Robert Guralnick²³,
Dan Haft⁵,
David Hancock^3,4,
Henning Hermjakob⁶,
Christiane Hertz-Fowler⁸,
Phil Hugenholtz²⁴,
Ian Joint⁹,
Leonid Kagan⁵,
Matthew Kane²⁵,
Jessie Kennedy²⁶,
George Kowalchuk²⁷,
Renzo Kottmann²²,
Eugene Kolker^28,29,30,
Saul Kravitz⁵,
Nikos Kyrpides³¹,
Jim Leebens-Mack³²,
Suzanna E Lewis³³,
Kelvin Li⁵,
Allyson L Lister^34,35,
Phillip Lord³⁴,
Natalia Maltsev¹⁹,
Victor Markowitz³⁶,
Jennifer Martiny³⁷,
Barbara Methe⁵,
Ilene Mizrachi⁷,
Richard Moxon³⁸,
Karen Nelson^5,39,
Julian Parkhill⁸,
Lita Proctor²⁵,
Owen White¹⁰,
Susanna-Assunta Sansone⁶,
Andrew Spiers⁴¹,
Robert Stevens³,
Paul Swift¹,
Chris Taylor⁶,
Yoshio Tateno⁴²,
Adrian Tett¹,
Sarah Turner¹,
David Ussery⁴³,
Bob Vaughan⁶,
Naomi Ward⁴⁴,
Trish Whetzel⁴⁵,
Ingio San Gil⁴⁰,
Gareth Wilson¹ &
…
Anil Wipat^34,35

Nature Biotechnology volume 26, pages 541–547 (2008)Cite this article

14k Accesses
933 Citations
47 Altmetric
Metrics details

Abstract

With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.

You have full access to this article via your institution.

Download PDF

Go Get Data (GGD) is a framework that facilitates reproducible access to genomic data

Article Open access 12 April 2021

Organizing genome engineering for the gigabase scale

Article Open access 04 February 2020

Genome sequencing—the dawn of a game-changing era

Article 12 June 2019

Main

A wealth of genomic and metagenomic sequences

By the end of next year, there will be complete genome sequences of at least draft quality for more than 1,000 bacteria and archaea and 100 eukaryotes^1,2 and for even larger numbers of viruses, organelles and plasmids. With the rapid pace at which new genome sequences are appearing, the need to consider how best to ensure stewardship of these data for the long term has never been more pressing.

Our genome collection: more than the sum of its parts. The analysis of genomic information is having an impact on every area of the life sciences and beyond. A genome sequence is a prerequisite to understanding the molecular basis of phenotype, how it evolves over time and how we can manipulate it to provide new solutions to critical problems. Such solutions include therapies and cures for disease, industrial products, approaches for biodegradation of xenobiotic compounds and renewable energy sources. With improvements in sequencing technologies, the growing interest in metagenomic approaches and the proven power of comparative analysis of groups of related genomes, we can envision the day when it will be commonplace to sequence tens to hundreds of genomes or more as part of a single study. At current rates of genome sequencing, it has been estimated that >4,000 bacterial genomes will be available soon after 2010 (ref. 1).

Given the importance of the growing genome collection, the capital investment in its creation and the benefits of leveraging its value through diverse comparative analyses, every effort should be made to describe it as accurately and comprehensively as possible. There is an increasing interest from the community in doing so, for three main reasons. The first is the interest in testing hypotheses about the features observed in genomes using comparative evo- and eco-genomic approaches³. The second is the need to supplement the content of a variety of databases with high-level descriptions of genomes that allow useful grouping, sorting and searching of the underlying data. The third is the growth in genome sequence data from environmental isolates and metagenomes—vast data sets of DNA fragments from environmental samples^4,5,6. The data generated by such studies will dwarf current stores of genomic information, making improved descriptions of genomes even more important.

At present, both top-level descriptors and genome descriptions are incomplete for many reasons. First and foremost, in hindsight we now know the minimum quality and quantity of information that is required to make each description precise, accurate and useful. For example, even for bacterial and archaeal species with validly published names, strain names were not routinely captured in genome annotation documents before the sequencing of large numbers of genomes from the same species⁷, but such information is now considered essential. Through empirical observations, we are expanding our view of the types of information that are important for testing particular hypotheses, exploring new patterns and quantifying inherent sampling biases^3,8.

As the number of habitats and communities sampled using metagenomic approaches increases, we are also being forced to rethink our understanding of the minimum information required to adequately describe a genome sequence. Without adequate description of the environmental context and the experimental methods used, such data sets will be of less value for researchers wishing to conduct comparative genomic studies or link genetic potential with the diversity and abundance of organisms. In fact, given the vast number of uncultivated microbes, it may be that a DNA-centric approach, in which genes are linked to habitats (locations), is more useful than the species-centric view^9,10. Finally, sequencing technology is advancing rapidly, and the adoption of new methods^11,13 will force the adoption of additional descriptors (e.g., the depth of sequence coverage, quality and whether any 'finishing' was used) to be able to distinguish among these methods.

Most often, metadata about genome sequences are found only in the primary literature or in reference works, such as Bergey's Manual¹⁴ for bacteria and archaea, rather than in sequence databases. The distributed and patchy nature of this information and the difficulties of curating even a few pieces of information for what are now very large collections of genomes make the vision of a single definitive source of rich genomic descriptions highly desirable.

The need for coordinated efforts

Facilitating and accelerating the process of collecting relevant metadata would clearly reduce ongoing replication of efforts and maximize the ability to share and integrate data within the genomics community. The obvious solution is to develop a consensus-based approach.

The Genomic Standards Consortium. The GSC is an open-membership, international working body formed in September 2005 (ref. 15). Its goal is to promote mechanisms that standardize the description of genomes and the exchange and integration of genomic data. The GSC community brings together (i) evolutionists, ecologists, molecular biologists and other researchers analyzing collections of genomes, (ii) bioinformaticians producing genomic databases, (iii) those who sequence genomes and (iv) computer scientists, ontology experts and members of other standardization initiatives, such as the International Nucleotide Sequence Database Collaboration (INSDC), which is responsible for the DNA Data Bank of Japan (DDBJ), European Molecular Biology Laboratory (EMBL) and GenBank databases (http://www.insdc.org/). The guidance of DDBJ, EMBL and GenBank will be critical to the success of the GSC initiative, both because they are the official stewards of the public collection of genomes and because of their interest in fulfilling community needs.

Minimum information about genomes and metagenomes

The GSC is working to define a set of core descriptors for genomes and metagenomes in the form of a MIGS specification (Fig. 1). MIGS extends the minimum information already captured by the INSDC. The MIGS checklist is given in Box 1, and the most up-to-date version is available from the consortium's website (http://gensc.sf.net). Examples of MIGS-compliant reports are given in Supplementary Table 1 online. The information required to comply with MIGS is routinely included in primary genome publications (or is referenced therein). However, this information needs to be formalized and made available in electronic form to improve its accessibility¹⁶ (Box 2).

Since it was originally proposed¹⁶, the MIGS specification has been simplified and changed by the GSC through an iterative revision process to contain (i) only curated information that cannot be calculated from raw genomic sequence and (ii) core descriptors specific to the major taxonomic groups (eukaryotes, bacteria and archaea¹⁷, plasmids, viruses, organelles) and metagenomes. MIGS is structured as an 'Investigation' composed of a 'Study' and an 'Assay', according to the Reporting Structures for Biological Investigations (RSBI) working group's recommendation for the modularization of checklists^18,19. Under 'Study' are the top-level concepts 'Environment' and 'Nucleic Acid Sequence' and under 'Assay' is a description of the sequencing technology.

MIGS aims to support unencumbered access to genomic reagents (such as strains)²⁰, place the complete (meta)genome collection into geospatial and temporal context (latitude, longitude, altitude or depth, date and time of sampling) and provide essential details of the experimental method used (e.g., sequencing method). MIGS also provides a framework for the capture of extra information deemed 'minimum' to specific communities. Most importantly, the description of metagenomes in MIGS is being extended in the minimum information about a metagenome sequence (MIMS) specification²¹. MIMS enables the capture of further measurements that define habitat (such as temperature, salinity, pH, dissolved organic carbon) and extends the original structure of MIGS for describing a single (meta)genomic experiment to allow the capture of information from pooled samples and more than one independent sampling event (e.g., sampling along a transect⁴).

How genomes and metagenomes are described in public databases has evolved from how short, simple DNA sequences are described, without special attention to information such as the geographical origin of the sequence. Significant efforts are underway by the INSDC to adapt and extend the infrastructure for describing genomes through the Genome Project Metadata initiative²². The INSDC efforts are open to evolution, albeit at a conservative pace²², and it is the GSC's hope that much, if not all, of the MIGS specification will be included in the Genome Project Metadata initiative. A mapping between INSDC features and MIGS has been developed for the purpose of placing MIGS information into INSDC documents and is available on our website. Any fields that are not already formally defined by the INSDC Feature Table Document (http://www.insdc.org/files/documents/feature_table.html) can be represented within a structured comment block in INSDC records²².

A genome catalog

The development of any checklist must be an open and iterative process that involves a balanced group of participants. Moreover, mechanisms for achieving compliance are needed to facilitate widespread adoption of a checklist. Such mechanisms involve an appropriate reporting structure for capturing and exchanging data (file formats), software, databases and appropriate controlled vocabularies and/or ontologies for defining the terms used in the annotations. The GSC is working toward these combined goals and has created an online system for capturing MIGS-compliant reports (http://gensc.sf.net).

In brief, we have implemented the checklist as an XML schema and built a freely available Genome Catalogue system (GCat) (http://gensc.sf.net). GCat is designed to generate forms automatically and 'on the fly' from this schema for the sake of data input. It also allows users to view and search genome descriptions as they accumulate during the process of refining the MIGS checklist. The GCat system is generic and could be applied to the capture of more expressive metadata for subsets of genomes. Indeed, it is flexible enough to support the implementation of any checklist that can be structured as an appropriate XML schema (MIGS.xsd, being developed into the Genomic Contextual Data Markup Language (GCDML)). The GSC is also working in the area of controlled vocabulary and ontology development through the collation of controlled vocabularies already in use in the community and through contributions to the Ontology for Biomedical Investigations (OBI, previously known as the Functional Genomics Investigation Ontology (FuGO)²³) and the Environment Ontology (EnvO) project (http://environmentontology.org). As a part of this process, GCat makes use of existing controlled vocabulary terms and accepts new terms.

Improving genomic databases

By design, MIGS contains only primary, curated information. This is because secondary, or derived, information that can be calculated from a genome sequence is subject to frequent change, can be generated using more than one method and should be acquired directly from those producing the calculations. Still, access to computed information (e.g., in the simplest cases, G+C content or total number of predicted proteins) should be made as easy as possible.

Genomic sequences and their initial annotations must be submitted to the INSDC (http://www.insdc.org/) (and subsequent high-quality, curated annotations derived from empirical observations to the Third Party Annotation data set²⁴), but there are an ever increasing number of genomic databases containing a wide range of additional computations. Although GSC does not endorse any particular method of analysis or database, it supports increased transparency of such resources for the sake of accurate data interpretation and integration.

The first issue is that of exchanging calculated information. This could be facilitated in part by widespread adoption of a common exchange format, such as the Generic Feature Format Version 3 (GFF3) file format (http://song.sourceforge.net/gff3.shtml). There are many tools that support the reformatting of a variety of file types into GFF3, so database providers would find it straightforward to generate appropriate files. The availability of a wide suite of tools for downstream analyses of files in GFF3 format also means that users could combine the weight of evidence from many sources when examining a particular genome. This could reveal instances of systemic bias and therefore lead to better genomic annotations, as more composite features would be available and conflicting annotations could be highlighted for resolution.

Exchanging data also relies on common standards for computational analyses, and supporting data downloads is not enough, regardless of format. Data resources should also be expected, within reason, to provide clear specifications for how the data are generated (for example, standard operating procedures (SOPs) that describe computations such as gene prediction and operon and ortholog identification). One example of this type of documentation is provided in AboutIMG, a web-based description of the Integrated Microbial Genomes (IMG) system²⁵.

In the future it should be far simpler to combine various genomic features, exact details of how they were generated and enough information about the provenance (origin) of the analyses to be able to transparently share data from different sources. Such interoperability, especially when provided by participating databases in a way that would enable automatic harvesting of the data (e.g., through web service technology), would multiply the individual value of these databases many times over and open up new opportunities to examine genome sequences in unprecedented detail.

Future directions

The effort required to achieve the degree of transparency advocated here is considerable but offers substantial and immediate benefits. We argue that the cost of achieving such standardization is trivial compared with the sums spent generating the data. The capture of MIGS-compliant information will not only facilitate comparative genomic and metagenomic analyses but also enhance the available descriptions of downstream '-omic' experiments based on genomic data. It will also enhance the much larger 'halos' of 16S ribosomal RNA sequences that are now available for many sequenced genomes and metagenomes. For example, the genome sequence of the marine bacterium Silicibacter pomeroyi²⁶ is 'embedded' in a large number of environmental 16S rRNA sequences affiliated with the Roseobacter lineage, which is accompanied by a fairly extensive literature describing the distribution, ecology and other properties of this group²⁷.

Through its ongoing efforts, the GSC hopes to stimulate discussion of the MIGS specification and solicit further feedback from the community. It therefore has an open call for participation and is eager to solicit MIGS-compliant genome reports (including batch uploads) and collect relevant controlled vocabulary terms useful in the description of genomes and metagenomes. GCat identifiers have been implemented and are available for past or future projects, and MIGS-compliant genome reports are starting to become available online (e.g., refs. 28, 29, 30, 31). We expect a production version of MIGS (2.0) to be released by early 2008 with an appropriate set of terms formalized within OBI¹⁹ and other relevant Open Biomedical Ontology (http://obofoundry.org/) ontologies. We would hope that this milestone (release of MIGS 2.0) will be accompanied by recognition by journals and implementation by a variety of databases. Beyond this, the MIGS specification should still remain flexible enough to allow it to be revised in accordance with advances in technology and our biological knowledge. It should also be considered for use in combination with other checklists in the context of the Minimum Information about a Biomedical or Biological Investigation (MIBBI) Foundry (http://mibbi.sf.net), of which the GSC is a founding community¹⁹. The most up-to-date information about GSC activities is available at our website (http://gensc.sf.net).

Note: Supplementary information is available on the Nature Biotechnology website.

Disclaimer

Opinions, findings and conclusions or recommendations expressed in this paper are those of the authors, and do not necessarily reflect the views of the US National Science Foundation.

Box 2: Frequently asked questions about MIGS

Below we answer general questions about MIGS, its development and how to use it.

What is MIGS?

MIGS specifies a formal way to describe genomes and metagenomes in more detail than is captured at present in DDBJ, EMBL and GenBank documents.
The information in MIGS is intended to be used in comparative genomic analysis, provide a better understanding of the source of each genome and enable us to situate genomes and metagenomes in their geospatial and temporal contexts (when relevant) through the specification of geographic location and sampling date.

Do all genomes and metagenomes fall under the scope of MIGS?

Yes. MIGS has elements describing eukaryotic, bacterial and archaeal, plasmid, viral and organellar genomes as well as metagenomes. Some of the core elements overlap between types of records, and some are unique to one or more groups.

Who has driven the development of MIGS?

MIGS has been developed through a series of GSC workshops involving participants from DDBJ, EMBL, the US National Center for Biotechnology Information (NCBI), European Bioinformatics Institute (EBI), Joint Genome Institute (JGI), Sanger Institute, J. Craig Venter Institute (JCVI, formerly TIGR), Max Planck Institute, the Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) project and a variety of other research institutions.

Who should complete a MIGS report?

Authors of genome and metagenome publications should submit a report after submitting project information to DDBJ, EMBL or Genbank.

Is MIGS very time-consuming to complete?

MIGS is a short specification compared with most other '-omic' checklists (see http://mibbi.sf.net) for three reasons:
MIGS is an extension of the data already captured by DBBJ, EMBL and Genbank to describe genomes and metagenomes and is designed to be complementary to these authoritative sources of metadata. The INSDC genome project database will contain essential administrative information, taxonomy identifiers (taxids) and a genome project identifier (PID).
MIGS was intentionally designed to be 'minimal' to encourage its adoption.
Genomic sequences, unlike transcriptomes, proteomes or metabolomes, are 'state independent' (a genome sequence is stable with respect to cellular state and environmental factors). In contrast, metagenomic experiments depend on the sampling strategy and the specific habitat of a given microbial community, requiring a further specification (MIMS) to define habitat parameters such as salinity, pH and temperature.

How can I get a unique identifier for my submission for use in my publication?

The Genomes Online Database (http://www.genomesonline.org) is the recognized authority for issuing GCat identifiers for eukaryotes, bacteria and archaea and metagenomes. The Genome Catalogue (GCat) will issue identifiers for other genomes.

Can I submit MIGS-compliant information online?

Yes. The GSC has developed a portal called the 'Genome Catalogue' that has been useful in prototyping the MIGS specification. MIGS-compliant information can be submitted through user-friendly web forms with drop-down menus for the selection of appropriate terms; batch uploading functions are being developed (http://gensc.sf.net).

Are sample reports available?

Yes, the Genome Catalogue contains a collection of MIGS-compliant reports. Examples are given in Supplementary Table 1.

How would I report the existence of MIGS-compliant data in my publication?

MIGS-compliant information could be reported as a supplementary table in a publication. Far more beneficial to the wider community would be to submit this information to the Genome Catalogue and report the GCat identifier and the URL of this database.

How can I get involved in the GSC and provide feedback for the development of MIGS?

The GSC has an open call for participation. Further information can be found at http://gensc.sf.net.

References

Overbeek, R. et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 33, 5691–5702 (2005).
Article CAS Google Scholar
Liolios, K., Mavromatis, K., Tavernarakis, N. & Kyrpides, N.C. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 36 (database issue), D475–D479 (2008).
Article CAS Google Scholar
Martiny, J. & Field, D. Ecological perspectives on our complete genome collection. Ecology Letters 8, 1334–1345 (2005).
Article Google Scholar
Rusch, D.B. et al. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. [online] 5, e77 (2007).
Article Google Scholar
Edwards, R.A. et al. Using pyrosequencing to shed light on deep mine microbial ecology under extreme hydrogeologic conditions. BMC Genomics 7, 57 (2006).
Article Google Scholar
Committee on Metagenomics: Challenges and Functional Applications, National Research Council. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet (National Academies Press, Washington, DC, 2007).
Coenye, T. & Vandamme, P. Bacterial whole-genome sequences: minimal information and strain availability. Microbiology 150, 2017–2018 (2004).
Article CAS Google Scholar
Haft, D.H., Selengut, J.D., Brinkac, L.M., Zafar, N. & White, O. Genome properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics. Bioinformatics 21, 293–306 (2005).
Article CAS Google Scholar
Lombardot, T. et al. Megx.net—database resources for marine ecological genomics. Nucleic Acids Res. 34 (database issue), D390–D393 (2006).
Article CAS Google Scholar
Tautz, D., Arctander, P., Minelli, A., Thomas, E. & Vogler, A.P. A plea for DNA taxonomy. Trends Ecol. Evol. 18, 70–74 (2003).
Article Google Scholar
Zhang, K. et al. Sequencing genomes from single cells by polymerase cloning. Nat. Biotechnol. 24, 680–686 (2006).
Article CAS Google Scholar
Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).
Article CAS Google Scholar
Shendure, J., Mitra, R.D., Varma, C. & Church, G.M. Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5, 335–344 (2004).
Article CAS Google Scholar
Garrity, G.M. (ed.) Bergey's Manual of Systematic Bacteriology, 2nd edn., Vol. 1, (Springer, New York, 2001).
Google Scholar
Field, D. et al. Meeting report: eGenomics: cataloguing our complete genome collection I. Comp. Funct. Genomics 6, 357–362 (2006).
Article Google Scholar
Field, D. & Hughes, J. Cataloguing our current genome collection. Microbiology 151, 1016–1019 (2005).
Article Google Scholar
Pace, N.R. Time for a change. Nature 441, 289 (2006).
Article CAS Google Scholar
Sansone, S.A. et al. A strategy capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. OMICS 10, 164–171 (2006).
Article CAS Google Scholar
Taylor, C. et al. Promoting coherent minimum reporting requirements for biological and biomedical investigations: the MIBBI project. Nat. Biotechnol. (in the press).
Ward, N., Eisen, J., Fraser, C. & Stackebrandt, E. Sequenced strains must be saved from extinction. Nature 414, 148 (2001).
Article CAS Google Scholar
Field, D. et al. Meeting report: eGenomics: cataloguing our complete genome collection III. Comp. Funct. Genomics 2007, 47304 (2007).
Article Google Scholar
Morrison, N. et al. Concept of sample in OMICS technology. OMICS 10, 127–137 (2006).
Article CAS Google Scholar
Whetzel, P.L. et al. Development of FuGO: an ontology for functional genomics investigations. OMICS 10, 199–204 (2006).
Article CAS Google Scholar
Cochrane, G. et al. Evidence standards in experimental and inferential INSDC Third Party Annotation data. OMICS 10, 105–113 (2006).
Article CAS Google Scholar
Markowitz, V.M. et al. IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res. 36 (database issue), D534–D538 (2008).
Article CAS Google Scholar
Moran, M.A. et al. Genome sequence of Silicibacter pomeroyi reveals adaptations to the marine environment. Nature 432, 910–913 (2004).
Article CAS Google Scholar
Buchan, A., Gonzalez, J.M. & Moran, M.A. Overview of the marine roseobacter lineage. Appl. Environ. Microbiol. 71, 5665–5677 (2005).
Article CAS Google Scholar
Angly, F.E. et al. The marine viromes of four oceanic regions. PLoS Biol. 4, e368 (2006).
Article Google Scholar
Bauer, M. et al. Whole genome analysis of the marine Bacteroidetes 'Gramella forsetii' reveals adaptations to degradation of polymeric organic matter. Environ. Microbiol. 8, 2201–2213 (2006).
Article CAS Google Scholar
Glockner, F.O. et al. Complete genome sequence of the marine planctomycete Pirellula sp. strain 1. Proc. Natl. Acad. Sci. USA 100, 8298–8303 (2003).
Article CAS Google Scholar
Rabus, R. et al. The genome of Desulfotalea psychrophila, a sulfate-reducing bacterium from permanently cold Arctic sediments. Environ. Microbiol. 6, 887–902 (2004).
Article CAS Google Scholar
Raes J., Foerstner, K.U. & Bork, P. Get the most out of your metagenome: computational analysis of environmental sequence data. Curr. Opin. Microbiol. 10, 490–498 (2007).
Article CAS Google Scholar

Download references

Acknowledgements

We would like to thank the UK National Institute of Environmental eScience (NIEeS) and the European Bioinformatics Institute (EBI) for hosting GSC workshops and the UK Natural Environmental Research Council for providing funds for coordination (NE/D01252X/1) and infrastructure building activities (NE/E007325/1).

Author information

Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK.

Authors and Affiliations

Natural Environmental Research Council Centre for Ecology and Hydrology, Oxford, OX1 3SR, UK
Dawn Field, Tanya Gray, Paul Swift, Adrian Tett, Sarah Turner & Gareth Wilson
Michigan State University, East Lansing, 48824, Michigan, USA
George Garrity & James Cole
School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
Norman Morrison, David Hancock & Robert Stevens
NERC Environmental Bioinformatics Centre, Oxford Centre for Ecology and Hydrology, Oxford, OX1 3SR, UK
Norman Morrison & David Hancock
J. Craig Venter Institute (JCVI), 9704 Medical Center Drive, Rockville, 20850, Maryland, USA
Jeremy Selengut, Samuel V Angiuoli, Michael Ashburner, Nelson Axelrod, Dan Haft, Leonid Kagan, Saul Kravitz, Kelvin Li, Barbara Methe & Karen Nelson
European Molecular Biology Laboratory (EMBL) Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
Peter Sterk, Guy Cochrane, Nadeem Faruque, Henning Hermjakob, Susanna-Assunta Sansone, Chris Taylor & Bob Vaughan
National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, 20894, Maryland, USA
Tatiana Tatusova & Ilene Mizrachi
Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, Cambridge, UK
Nicholas Thomson, Christiane Hertz-Fowler & Julian Parkhill
Plymouth Marine Laboratory, Prospect Place, Plymouth, PL1 3DH, UK
Michael J Allen, Jack Gilbert & Ian Joint
Institute for Genome Sciences and Department of Epidemiology and Preventive Medicine, University of Maryland School of Medicine, 20 Penn Street, Baltimore, 21201, Maryland, USA
Samuel V Angiuoli, Michael Ashburner & Owen White
Department of Biology, University of York Box 373, York, YO10 5YW, UK
Sandra Baldauf
Department of Earth Sciences, National Institute of Environmental eScience, University of Cambridge, Downing Street, Cambridge, CB2 3EQ, UK
Stuart Ballard
US Department of Energy (DOE) Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, 94598, California, USA
Jeffrey Boore
Department of Applied Mathematics and Computer Science, Ghent University, Krijgslaan 281 S9, Ghent, B-9000, Belgium
Peter Dawyndt
Laboratory of Microbiology, Ghent University, K.L. Ledeganckstraat 35, Ghent, B-9000, Belgium
Paul De Vos
BCCM/LMG Bacteria Collection, Ghent University, K.L. Ledeganckstraat 35, Ghent, B-9000, Belgium
Paul De Vos
Penn State University, 208 Mueller Laboratory, University Park, Pennsylvania, 16802, USA
Claude dePamphilis
Department of Computer Science, 5500 Campanile Drive, San Diego State University, San Diego, 92182, California, USA
Robert Edwards
Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, 60439, Illinois, USA
Robert Edwards & Natalia Maltsev
SymBio Corporation, 1455 Adams Drive, Menlo Park, California, 94025, USA
Robert Feldman
California Institute for Telecommunications and Information Technology (Calit2), a University of California San Diego (UCSD)/University of California Irvine partnership, 9500 Gilman Drive, La Jolla, 92093, California, USA
Paul Gilna
Microbial Genomics Group, Max Planck Institute for Marine Microbiology and Jacobs University Bremen, Bremen, 28359, Germany
Frank Oliver Glöckner & Renzo Kottmann
Department of Ecology and Evolutionary Biology and University of Colorado Natural History Museum, 218 UCB, University of Colorado, Boulder, 80309, Colorado, USA
Philip Goldstein & Robert Guralnick
Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Building 400-404, Walnut Creek, 94598, California, USA
Phil Hugenholtz
The National Science Foundation, 4201 Wilson Boulevard, Arlington, 22230, Virginia, USA
Matthew Kane & Lita Proctor
School of Computing, Napier University, Merchiston Campus, 10 Colington Road Edinburgh, Scotland, EH10 5DT, UK
Jessie Kennedy
Department of Terrestrial Microbial Ecology, Netherlands Institute of Ecology, Centre for Terrestrial Ecology, PO Box 40, Heteren, 6666, ZG, Netherlands
George Kowalchuk
BIATECH Institute, 19310 North Creek Parkway South, Suite 115, Bothell, 98011, Washington, USA
Eugene Kolker
Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Information, University of Washington, Seattle, 91895, Washington, USA
Eugene Kolker
Seattle Children's Hospital Research Institute, 1900 9th Avenue, Seattle, 98101, Washington, USA
Eugene Kolker
Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Building 400-404, Walnut Creek, 94598, California, USA
Nikos Kyrpides
Department of Plant Biology, University of Georgia, Athens, 30602-7271, Georgia, USA
Jim Leebens-Mack
Department of Molecular and Cell Biology, University of California, 539 Life Sciences Addition, Berkeley, 94720-3200, California, USA
Suzanna E Lewis
School of Computing Science, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
Allyson L Lister, Phillip Lord & Anil Wipat
Centre for Integrative Systems Biology of Ageing and Nutrition (CISBAN), Henry Wellcome Laboratory for Biogerontology Research, Newcastle University, Newcastle General Hospital, Newcastle upon Tyne, NE4 6BE, UK
Allyson L Lister & Anil Wipat
Computational Research Division, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, 94720, California, USA
Victor Markowitz
Department of Ecology and Evolutionary Biology, University of California, 455 Steinhaus Hall, Irvine, 92697, California, USA
Jennifer Martiny
Weatherall Institute of Molecular Medicine and University of Oxford Department of Paediatrics, Molecular Infectious Diseases Group, John Radcliffe Hospital, Headington, OX3 9DU, Oxford, UK
Richard Moxon
Department of Biology, Howard University, 415 College Street, NW, DC, 20059, Washington, USA
Karen Nelson
Department of Biology, LTER Network Office, University of New Mexico, Albuquerque, 87171, New Mexico, USA
Ingio San Gil
SIMBIOS Centre, University of Abertay Dundee, Dundee, DD1 1HG, UK
Andrew Spiers
Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Mishima, 411-8540, Shizuoka, Japan
Yoshio Tateno
Center for Biological Sequence Analysis, The Technical University of Denmark, Lyngby, DK-2800 Kgs., Lyngby, Denmark
David Ussery
Department of Molecular Biology, University of Wyoming, Laramie, 82071, Wyoming, USA
Naomi Ward
Center for Bioinformatics and Department of Genetics, University of Pennsylvania School of Medicine, 14th Floor Blockley Hall, 423 Guardian Drive, Philadelphia, 19104, Pennsylvania, USA
Trish Whetzel

Authors

Dawn Field
View author publications
You can also search for this author in PubMed Google Scholar
George Garrity
View author publications
You can also search for this author in PubMed Google Scholar
Tanya Gray
View author publications
You can also search for this author in PubMed Google Scholar
Norman Morrison
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Selengut
View author publications
You can also search for this author in PubMed Google Scholar
Peter Sterk
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Tatusova
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Thomson
View author publications
You can also search for this author in PubMed Google Scholar
Michael J Allen
View author publications
You can also search for this author in PubMed Google Scholar
Samuel V Angiuoli
View author publications
You can also search for this author in PubMed Google Scholar
Michael Ashburner
View author publications
You can also search for this author in PubMed Google Scholar
Nelson Axelrod
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Baldauf
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Ballard
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Boore
View author publications
You can also search for this author in PubMed Google Scholar
Guy Cochrane
View author publications
You can also search for this author in PubMed Google Scholar
James Cole
View author publications
You can also search for this author in PubMed Google Scholar
Peter Dawyndt
View author publications
You can also search for this author in PubMed Google Scholar
Paul De Vos
View author publications
You can also search for this author in PubMed Google Scholar
Claude dePamphilis
View author publications
You can also search for this author in PubMed Google Scholar
Robert Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Nadeem Faruque
View author publications
You can also search for this author in PubMed Google Scholar
Robert Feldman
View author publications
You can also search for this author in PubMed Google Scholar
Jack Gilbert
View author publications
You can also search for this author in PubMed Google Scholar
Paul Gilna
View author publications
You can also search for this author in PubMed Google Scholar
Frank Oliver Glöckner
View author publications
You can also search for this author in PubMed Google Scholar
Philip Goldstein
View author publications
You can also search for this author in PubMed Google Scholar
Robert Guralnick
View author publications
You can also search for this author in PubMed Google Scholar
Dan Haft
View author publications
You can also search for this author in PubMed Google Scholar
David Hancock
View author publications
You can also search for this author in PubMed Google Scholar
Henning Hermjakob
View author publications
You can also search for this author in PubMed Google Scholar
Christiane Hertz-Fowler
View author publications
You can also search for this author in PubMed Google Scholar
Phil Hugenholtz
View author publications
You can also search for this author in PubMed Google Scholar
Ian Joint
View author publications
You can also search for this author in PubMed Google Scholar
Leonid Kagan
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Kane
View author publications
You can also search for this author in PubMed Google Scholar
Jessie Kennedy
View author publications
You can also search for this author in PubMed Google Scholar
George Kowalchuk
View author publications
You can also search for this author in PubMed Google Scholar
Renzo Kottmann
View author publications
You can also search for this author in PubMed Google Scholar
Eugene Kolker
View author publications
You can also search for this author in PubMed Google Scholar
Saul Kravitz
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Kyrpides
View author publications
You can also search for this author in PubMed Google Scholar
Jim Leebens-Mack
View author publications
You can also search for this author in PubMed Google Scholar
Suzanna E Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Kelvin Li
View author publications
You can also search for this author in PubMed Google Scholar
Allyson L Lister
View author publications
You can also search for this author in PubMed Google Scholar
Phillip Lord
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Maltsev
View author publications
You can also search for this author in PubMed Google Scholar
Victor Markowitz
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Martiny
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Methe
View author publications
You can also search for this author in PubMed Google Scholar
Ilene Mizrachi
View author publications
You can also search for this author in PubMed Google Scholar
Richard Moxon
View author publications
You can also search for this author in PubMed Google Scholar
Karen Nelson
View author publications
You can also search for this author in PubMed Google Scholar
Julian Parkhill
View author publications
You can also search for this author in PubMed Google Scholar
Lita Proctor
View author publications
You can also search for this author in PubMed Google Scholar
Owen White
View author publications
You can also search for this author in PubMed Google Scholar
Susanna-Assunta Sansone
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Spiers
View author publications
You can also search for this author in PubMed Google Scholar
Robert Stevens
View author publications
You can also search for this author in PubMed Google Scholar
Paul Swift
View author publications
You can also search for this author in PubMed Google Scholar
Chris Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Yoshio Tateno
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Tett
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Turner
View author publications
You can also search for this author in PubMed Google Scholar
David Ussery
View author publications
You can also search for this author in PubMed Google Scholar
Bob Vaughan
View author publications
You can also search for this author in PubMed Google Scholar
Naomi Ward
View author publications
You can also search for this author in PubMed Google Scholar
Trish Whetzel
View author publications
You can also search for this author in PubMed Google Scholar
Ingio San Gil
View author publications
You can also search for this author in PubMed Google Scholar
Gareth Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Anil Wipat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dawn Field.

Supplementary information

Supplementary Text and Figures

Supplementary Table 1 (DOC 191 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Field, D., Garrity, G., Gray, T. et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 26, 541–547 (2008). https://doi.org/10.1038/nbt1360

Download citation

Published: 08 May 2008
Issue Date: May 2008
DOI: https://doi.org/10.1038/nbt1360

This article is cited by

The AnimalAssociatedMetagenomeDB reveals a bias towards livestock and developed countries and blind spots in functional-potential studies of animal-associated microbiomes
- Anderson Paulo Avila Santos
- Muhammad Kabiru Nata’ala
- Ulisses Rocha
Animal Microbiome (2023)
Sensitivity of endogenous autofluorescence in HeLa cells to the application of external magnetic fields
- Mariia Uzhytchak
- Barbora Smolková
- Oleg Lunov
Scientific Reports (2023)
Advancing reuse of genetic parts: progress and remaining challenges
- Jeanet Mante
- Chris J. Myers
Nature Communications (2023)
MarineMetagenomeDB: a public repository for curated and standardized metadata for marine metagenomes
- Muhammad Kabiru Nata’ala
- Anderson P. Avila Santos
- Ulisses Nunes da Rocha
Environmental Microbiome (2022)
On the importance of metadata when sharing and opening data
- Francois Sabot
BMC Genomic Data (2022)

The minimum information about a genome sequence (MIGS) specification

Abstract

Similar content being viewed by others

Go Get Data (GGD) is a framework that facilitates reproducible access to genomic data

Organizing genome engineering for the gigabase scale

Genome sequencing—the dawn of a game-changing era

Main

Box 1: Minimum Information about a Genome Sequence (MIGS) checklist version 2.0

Box 2: Frequently asked questions about MIGS

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Supplementary Text and Figures

Rights and permissions

About this article

Cite this article

This article is cited by

The AnimalAssociatedMetagenomeDB reveals a bias towards livestock and developed countries and blind spots in functional-potential studies of animal-associated microbiomes

Sensitivity of endogenous autofluorescence in HeLa cells to the application of external magnetic fields

Advancing reuse of genetic parts: progress and remaining challenges

MarineMetagenomeDB: a public repository for curated and standardized metadata for marine metagenomes

On the importance of metadata when sharing and opening data

Search

Quick links

Abstract

Similar content being viewed by others

Main

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links