A large-scale study has been assessing microbial diversity by analysing DNA sequences from samples submitted by scientists around the globe. The initial results are now being used to create an open-access resource. See Article p.457
A simple, though dauntingly ambitious idea of sampling the microbial genetic diversity across Earth is the driving force behind the Earth Microbiome Project. On page 457, Thompson et al.1 report the results of this experimental tour de force.
The project began life at a meeting in Snowbird, Utah, in 2010 at which a group of scientists from a wide range of disciplines discussed the goals, challenges and practicalities of such an enterprise2. Seven years on, the authors now report the microbial compositional profiles of a whopping 27,751 samples from 97 independent studies, providing insights into the diversity of microorganisms — from the bacterial and archaeal domains — in a wide range of geographic and environmental ecosystems, both terrestrial and aquatic. From these samples, Thompson and colleagues generated 2.2 billion DNA sequence reads of a highly variable region of a universally evolutionarily conserved gene called 16S rRNA, which encodes a component of the ribosome (the cell's protein-synthesis machinery).
The remarkable nature of this study lies not only in its scale and in the breadth of the environmental samples analysed (Fig. 1), but also in its methodology. The project involved a massive, global crowdsourcing effort in which scientists raided their collection freezers for samples to share with the project.
The approach was straightforward. A call was made for scientists to contribute well-preserved environmental samples collected during specific research projects, and the Earth Microbiome Project offered to sequence the DNA of the 16S rRNA gene in the microbial samples and to make the data available as open access.
This project is a prime example of a trend in the adoption of scientific approaches involving widespread engagement, in which the ease of electronic communication and the power of social media are harnessed to generate useful resources. In the same spirit, in the Polymath Project, mathematicians collaborate to tackle challenging mathematical problems.
Such approaches to generating crowdsourced experimental data usually work by first getting the project under way and obtaining funding later as the project gathers momentum, perhaps by crowdfunding. Examples of crowdsourced projects include those analysing bacteria in the human gut, such as the Flemish Gut Flora Project3, or the Personalized Nutrition Project4. Such studies contrast with conventional research collaborations that begin once a grant is obtained from a funding agency.
Challenges inevitably arise in the type of work conducted by Thompson et al., particularly from having to handle samples from many collection sites. A common frustration in microbial research is that sampling procedures, storage, transport conditions, DNA extraction and amplification protocols often result in a 'lab-of-origin' effect that makes it difficult to compare data generated by different research groups. To address this, the Earth Microbiome Project developed a range of protocols5 and standards for sample collection, DNA extraction, transport and the formatting of the associated auxiliary data (such as temperature or location), as well as data-analysis procedures. These protocols were used for the project itself, but have been rapidly adopted by the wider research community, and more than 2,000 papers have already been published that use them6. By having a single protocol for all samples, and running all analyses in one laboratory, Thompson and colleagues have tried to remove as many potential technical confounding factors as possible.
The results seem to confirm that they have succeeded, revealing that sample microbial profiles cluster by environment — those from a specific type of environment are more similar to each other than to those from other types of environment, irrespective of the research group that collected the sample. This approach also has a drawback, because a single DNA-extraction protocol cannot be expected to perform equally well across the wide chemical and biological variability of the samples collected in this type of broad survey, and might be less effective than a targeted approach in which extraction protocols are optimized for the environment being sampled. Thompson and colleagues have favoured generalizability over sensitivity, a choice that can surely be defended in these circumstances.
Another limitation of the study is its lack of hypothesis-driven experimental design, because it deliberately positions itself as an exploratory data analysis across different environments and sample types. This produces certain constraints on the inferences that can be made, because environmental data collected for the samples were not always measured in the same way in different environments.
The debate about the relative merits of data-driven and hypothesis-driven experimental approaches is not new, and there are examples of each of these approaches providing scientific insights. This study is an excellent example of the former, even if concessions had to be made regarding the selection of variables that could be used for analyses across all the environments.
Thompson and colleagues made several findings. For example, they investigated whether existing theories about the relationship between species richness (as monitored by the diversity of 16S rRNA sequences) and temperature and pH across environments were consistent with their data. For example, there is a model that proposes a steady logarithmic rise of microbial richness with increasing temperature7,8. Surprisingly, in contrast to this theory, the authors found that microbial biodiversity peaks at a relatively narrow pH and temperature range and then drops again.
The authors also observed an unexpectedly high amount of 'nestedness' among samples from different environments: samples showing low biodiversity were always present as microbial subsets of other, high-biodiversity samples, irrespective of the sample origin. Notably, this pattern of nestedness was mostly observed for microbial analyses above the level of genus — when analysed at the level of species, or when different strains of the same species were analysed, a strong decrease in nestedness was observed.
The value of the Earth Microbiome Project will extend far beyond what is reported in the present paper. The project provides a resource that will keep microbial ecologists and evolutionary biologists busy for years. More than 60 publications have already been published using subsets of the data that had been released previously6. By implementing and fiercely pursuing this open-access model, Thompson and colleagues emphasize the value of collaboration and sharing over competition, which is unfortunately still too frequent in the scientific community.