A catalogue of 136 microbial draft genomes from Red Sea metagenomes

Haroon, Mohamed F.; Thompson, Luke R.; Parks, Donovan H.; Hugenholtz, Philip; Stingl, Ulrich

doi:10.1038/sdata.2016.50

Download PDF

Data Descriptor
Open access
Published: 05 July 2016

A catalogue of 136 microbial draft genomes from Red Sea metagenomes

Mohamed F. Haroon¹,
Luke R. Thompson^1,2,
Donovan H. Parks³,
Philip Hugenholtz^3,4 &
…
Ulrich Stingl¹

Scientific Data volume 3, Article number: 160050 (2016) Cite this article

6006 Accesses
38 Citations
35 Altmetric
Metrics details

Subjects

Abstract

Earth is expected to continue warming and the Red Sea is a model environment for understanding the effects of global warming on ocean microbiomes due to its unusually high temperature, salinity and solar irradiance. However, most microbial diversity analyses of the Red Sea have been limited to cultured representatives and single marker gene analyses, hence neglecting the substantial uncultured majority. Here, we report 136 microbial genomes (completion minus contamination is ≥50%) assembled from 45 metagenomes from eight stations spanning the Red Sea and taken from multiple depths between 10 to 500 m. Phylogenomic analysis showed that most of the retrieved genomes belong to seven different phyla of known marine microbes, but more than half representing currently uncultured species. The open-access data presented here is the largest number of Red Sea representative microbial genomes reported in a single study and will help facilitate future studies in understanding the physiology of these microorganisms and how they have adapted to the relatively harsh conditions of the Red Sea.

Design Type(s)	observation design • organism identification objective
Measurement Type(s)	DNA sequence data
Technology Type(s)	metagenomics analysis
Factor Type(s)
Sample Characteristic(s)	Red Sea • Gulf of Aden • sea water

Machine-accessible metadata file describing the reported data (ISA-Tab format)

Compendium of 530 metagenome-assembled bacterial and archaeal genomes from the polar Arctic Ocean

Article 15 November 2021

New globally distributed bacterial phyla within the FCB superphylum

Article Open access 06 December 2022

A compendium of bacterial and archaeal single-cell amplified genomes from oxygen deficient marine waters

Article Open access 27 May 2023

Background & Summary

The Red Sea is an ideal marine environment to study microbial adaptation to physical conditions atypical of global oceans: high temperature, high salinity, and high irradiance. In late summer 2011, we undertook the King Abdullah University of Science and Technology (KAUST) Red Sea Expedition (KRSE2011) in the eastern Red Sea in order to map its diversity along environmental gradients that occur with changes in latitude, longitude, and depth¹. This time of year is not only when temperatures and evaporation (and hence salinity) are highest, but also when a foreign water mass called the Gulf of Aden Intermediate Water (GAIW) intrudes into the Red Sea^1,2 (Fig. 1). The GAIW brings nutrient-rich water to the Red Sea, providing nitrogen, phosphorus, and other elements to this otherwise oligotrophic sea, and is likely to introduce important microbial diversity.

**Figure 1: Experimental workflow for this study.**

Insights into the taxonomic, evolutionary, and functional diversity of the Red Sea have largely been based on studies of pure cultures^3–5 and single marker genes such as the 16S rRNA^6,7, or internal transcribed spacer⁸. Recently, investigations of microbial ecology have steered towards whole genome-based culture-independent methods notably single-cell genomics and metagenomics^9,10. Single-cell genomics is an exciting field that recovers complete and partial single cell genomes from complex environments, albeit the need of specialised equipment, high cost and relatively low throughput^11–13. Metagenomics is paving the way forward by harnessing the recent wave of sequencing technology and bioinformatics advancements to recover genomes of individual populations or populations of closely related organisms^14–16. Application of these methods has resulted in the recovery of numerous genomes of uncultivated microorganisms that have provided surprising insights into the diversity and function of microbial communities^{10,14,17–19}.

During the KRSE2011, eight stations were sampled along a cruise track from south to north, capturing gradients in temperature, salinity, oxygen, and nutrients, including the unique GAIW water mass (Fig. 1 and Table 1 (available online only)). At each station, samples were collected from the surface to mesopelagic depths (10, 25, 50, 100, 200, and 500 m), except for stations 12 and 34, which had depths shallower than 500 m (Fig. 1 and Table 1 (available online only)), in order to capture a greater variation in environmental parameters and microbial diversity. Here, we successfully reconstructed 136 genomes from 45 individually assembled metagenomes (Figs 1 and 2, Tables 1 and 2 (available online only), Data Citation 1) by differential read coverage and tetranucleotide frequency methods. Of these, 43 were ‘near-complete’ with an estimated completion minus contamination of ≥90%, while the other 93 draft genomes had completion minus contamination of ≥50% (Table 2 (available online only)). To our knowledge, this is the largest number of microbial genomes from the Red Sea to be reported in a single study.

Table 1 Characteristics of the 45 Red Sea metagenomic samples

Full size table

**Figure 2: Phylogenetic trees for the archaeal (green lines; top left) and bacterial (blue lines; bottom right) domains based on 122 and 120 single-copy marker genes, respectively.**

Table 2 Characteristics of the 136 genomes reported in this study

Full size table

Phylogenomic analysis based on sets of single-copy marker genes universal to either the bacterial or archaeal domain showed that the 136 genomes encompassed seven phyla across these domains: Thaumarchaeota, Euryarchaeota, Actinobacteria, Cyanobacteria, Bdellovibrionaeota, Proteobacteria, and Marinimicrobia (Fig. 2 and Table 2 (available online only)). As expected, most of the recovered genomes were affiliated with known marine microorganisms such as phototrophic Prochlorococcus^20,21 and Synechococcus^22,23; representative of clades first discovered in the Sargasso Sea (SAR86, SAR116, SAR324 and SAR406)^24–26; common marine bacteria in tropical biomes such as Alteromonas macleodii²⁷; an ammonia oxidizing thaumarchaeon from the genus Nitrosopelagicus²⁸; euryarchaeotal Marine Group II organisms reported to be abundant in surface waters²⁹; members of the Alpha- and Gamma-proteobacteria such as Aeromicrobium, Erythrobacter, Maritimibacter, Idiomarina, Marinobacter, Candidatus Thioglobus (SUP05 cluster) and several unclassified Gammaproteobacteria, consistent with the high relative abundance of these two groups in the recent Tara Oceans survey³⁰. Additionally, actinobacterial Acidiimicrobia and Nocardioides genomes thought to be responsible for secondary metabolite production in marine ecosystems³¹ were recovered from the metagenomes. An important strength of this dataset is the recovery of multiple, closely-related genomes from different stations or depths in the Red Sea (Data Citation 2). When complemented with physicochemical data¹, genome plasticity between these organisms to confer fitness under varying conditions can be investigated in future studies.

To allow easy access to the genomes, all 136 genomes were functionally annotated and deposited into the National Centre for Biotechnology Information (NCBI) and Integrated Microbial Genomes (IMG) databases³². The wealth of metagenomic and genomic data described here greatly expands the repertoire of microbial genomic information from the Red Sea which might help to better understand the effects of global warming to ocean microbiomes. These datasets will also strengthen studies to better understand the drivers of marine nutrient cycling, help approaches for bioprospecting for novel thermo- and halo-philic enzymes, and allow for a better understanding of microbial adaptation strategies against high temperature, salinity and solar irradiance.

Methods

Metagenomic sequencing and assembly

Seawater samples were collected from eight stations and from different depths (10, 25, 50, 100, 200, and 500 m; locations are shown in Fig. 1) during summer as part of KRSE2011 (ref. 1). Genomic DNA was extracted from the 0.1–1.2 μm size fraction using an established phenol-chloroform extraction protocol^1,33. Paired-end libraries (2×100 bp) were prepared using Nextera DNA Library Prep Kit (Illumina) and sequenced on a HiSeq 2000 (Illumina). Reads were quality checked and trimmed using PRINSEQ v0.20.4 (ref. 34) generating read lengths of ~93 bp and a total of ~10 million reads per sample with median insert sizes ranging from 183–366 bp¹ (Data Citation 1). Trimmed metagenome reads were individually assembled (Table 1 (available online only)) using IDBA-UD v1.1.1 (ref. 35) using the ‘--pre-correction’ option. To obtain coverage profile of contigs from each metagenomic assembly, the trimmed reads were mapped back to contigs using BWA v0.7.12 (ref. 36) with the bwa-mem algorithm.

Genome binning, refinement, and annotation

For each metagenome, genome bins were recovered based on tetranucleotide frequencies and read coverage using MetaBAT v0.26.1 (ref. 37) with default parameters. The completeness and contamination of the bins were assessed using CheckM v1.0.3 (ref. 38) using the lineage-specific workflow (Table 2 (available online only)). Bins were further refined using the CheckM ‘merge’ and ‘outliers’ commands which merge bins with complementary sets of marker genes to improve completeness and remove contigs from bins which appear to be outliers relative to reference GC and tetranucleotide distributions in order to reduce contamination³⁸. The FinishM v0.0.7 (https://github.com/wwood/finishm) ‘roundup’ workflow which comprise of ‘wander’ and ‘gapfill’ modes was used to scaffold contigs together and fill gaps within individual bins. The ‘wander’ mode uses a de Bruijn graph (kmer length of 51 bp and coverage cutoff of 5) to determine contig ends which are connected while the ‘gapfill’ mode align the reads to regions of ambiguous nucleotides and replaces them with the appropriate nucleotides. Genome bins that passed the quality filter of completion minus contamination of ≥50% were submitted to IMG/ER³² for gene calling and functional annotation.

Genome tree construction

The archaeal and bacterial genome trees (Fig. 2) were inferred from the concatenation of 122 and 120 proteins, respectively, identified as being present in ≥90% of the genomes in their respective domains and, when present, single-copy in ≥95% of genomes (Supplementary Tables 1 and 2). These marker genes were aligned using HMMER v3.1b1 (ref. 39) and the tree inference from the concatenated alignment with FastTree v2.1.7 (ref. 40) under the WAG+GAMMA models (Data Citation 2). Support values were determined using 100 non-parametric bootstrap replicates⁴¹. The archaeal tree was rooted with the DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanohaloarchaeota, and Nanoarchaeota) superphylum in concordance with a recent large-scale phylogenomic study⁹ while the bacterial tree was ‘arbitrarily’ rooted with the phylum Chloroflexi⁴² but should be treated as unrooted. The trees were visualized in ARB⁴³, annotated by iTOL⁴⁴ and edited in Illustrator CC 2014 (Adobe).

Code availability

All versions of third-party software and scripts used in this study are described and referenced accordingly in the Methods sub-sections for ease of access and reproducibility.

Data Records

The raw Illumina sequencing paired-end reads (Table 1 (available online only)), 45 assembled metagenome sequences (Table 1 (available online only)) and 136 assembled genome sequences (Table 2 (available online only)), generated from the KAUST Red Sea Expedition 2011, are available from NCBI databases (Data Citation 1). The genome trees and associated fasta amino acid alignment files are available from Figshare (Data Citation 2).

Technical Validation

To validate the completeness and contamination of the genomes, we accessed the number of marker genes present in all bacterial and archaeal genomes using CheckM³⁸. The genomes were also manually cleaned from vector contamination by comparing against the UniVec core database (ftp://ftp.ncbi.nlm.nih.gov/pub/UniVec/).

Usage Notes

The annotated genome assemblies can be downloaded and accessed via the Integrated Microbial Genomes (IMG) system (https://img.jgi.doe.gov/cgi-bin/m/main.cgi). The IMG genome IDs are provided in Table 2 (available online only).

Additional Information

How to cite this article: Haroon, M. F. et al. A catalogue of 136 microbial draft genomes from Red Sea metagenomes. Sci. Data 3:160050 doi: 10.1038/sdata.2016.50 (2016).

References

Thompson, L. R. et al. Metagenomic covariation along densely sampled environmental gradients in the Red Sea. bioRxiv doi:10.1101/055012 (2016).
Churchill, J. H., Bower, A. S., McCorkle, D. C. & Abualnaja, Y. The transport of nutrient-rich Indian Ocean water through the Red Sea and into coastal reef systems. Journal of Marine Research 72, 165–181 (2014).
Article Google Scholar
Sagar, S. et al. Cytotoxic and apoptotic evaluations of marine bacteria isolated from brine-seawater interface of the Red Sea. BMC Complementary and Alternative Medicine 13, 1–8 (2013).
Article Google Scholar
Jimenez-Infante, F. et al. Genomic differentiation among two strains of the PS1 clade isolated from geographically separated marine habitats. FEMS microbiology ecology 89, 181–197 (2014).
Article CAS Google Scholar
Zhang, G., Haroon, M. F., Zhang, R., Hikmawan, T. & Stingl, U. Draft Genome Sequence of Pseudoalteromonas sp. Strain XI10 Isolated from the Brine-Seawater Interface of Erba Deep in the Red Sea. Genome Announcements 4, e00109–16 (2016).
PubMed PubMed Central Google Scholar
Fuller, N. J. et al. Clade-specific 16S ribosomal DNA oligonucleotides reveal the predominance of a single marine Synechococcus clade throughout a stratified water column in the Red Sea. Applied and Environmental Microbiology 69, 2430–2443 (2003).
Article CAS Google Scholar
Qian, P.-Y. et al. Vertical stratification of microbial communities in the Red Sea revealed by 16S rDNA pyrosequencing. The ISME journal 5, 507–518 (2011).
Article CAS Google Scholar
Ngugi, D. K. & Stingl, U. Combined analyses of the ITS loci and the corresponding 16S rRNA genes reveal high micro-and macrodiversity of SAR11 populations in the Red Sea. PLoS ONE 7, e50274 (2012).
Article ADS CAS Google Scholar
Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).
Article ADS CAS Google Scholar
Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).
Article ADS CAS Google Scholar
Grötzinger, S. W. et al. Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA). Frontiers in Microbiology 5, 134 (2014).
PubMed PubMed Central Google Scholar
Clingenpeel, S., Clum, A., Schwientek, P., Rinke, C. & Woyke, T. Reconstructing each cell’s genome within complex microbial communities - dream or reality? Frontiers in Microbiology 5 (2015).
Gawad, C., Koh, W. & Quake, S. R. Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 17, 175–188 (2016).
Article CAS Google Scholar
Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nature Biotechnology 31, 533–538 (2013).
Article CAS Google Scholar
Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotech. 32, 822–828 (2014).
Article CAS Google Scholar
Sangwan, N., Xia, F. & Gilbert, J. A. Recovering complete and draft population genomes from metagenome datasets. Microbiome 4, 1–11 (2016).
Article Google Scholar
Haroon, M. F. et al. Anaerobic oxidation of methane coupled to nitrate reduction in a novel archaeal lineage. Nature 500, 567–570 (2013).
Article ADS CAS Google Scholar
Soo, R. M. et al. An expanded genomic representation of the phylum Cyanobacteria. Genome biology and evolution 6, 1031–1045 (2014).
Article Google Scholar
Evans, P. N. et al. Methane metabolism in the archaeal phylum Bathyarchaeota revealed by genome-centric metagenomics. Science 350, 434–438 (2015).
Article ADS CAS Google Scholar
Moore, L. R., Rocap, G. & Chisholm, S. W. Physiology and molecular phylogeny of coexisting Prochlorococcus ecotypes. Nature 393, 464–467 (1998).
Article ADS CAS Google Scholar
Partensky, F., Hess, W. & Vaulot, D. Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiology and molecular biology reviews 63, 106–127 (1999).
CAS PubMed PubMed Central Google Scholar
Moore, L. R., Goericke, R. & Chisholm, S. W. Comparative physiology of Synechococcus and Prochlorococcus: influence of light and temperature on growth, pigments, fluorescence and absorptive properties. Marine ecology progress series. Oldendorf 116, 259–275 (1995).
Article ADS Google Scholar
Palenik, B. et al. The genome of a motile marine Synechococcus. Nature 424, 1037–1042 (2003).
Article ADS CAS Google Scholar
Giovannoni, S. J., Britschgi, T. B., Moyer, C. L. & Field, K. G. Genetic diversity in Sargasso Sea bacterioplankton. Nature 345, 60–63 (1990).
Article ADS CAS Google Scholar
Britschgi, T. B. & Giovannoni, S. J. Phylogenetic analysis of a natural marine bacterioplankton population by rRNA gene cloning and sequencing. Applied and Environmental Microbiology 57, 1707–1713 (1991).
CAS PubMed PubMed Central Google Scholar
Haroon, M. F., Thompson, L. R. & Stingl, U. Draft genome sequence of uncultured SAR324 bacterium lautmerah10, binned from a Red Sea metagenome. Genome Announcements 4, e01711–e01715 (2016).
Article Google Scholar
Ivars-Martinez, E. et al. Comparative genomics of two ecotypes of the marine planktonic copiotroph Alteromonas macleodii suggests alternative lifestyles associated with different kinds of particulate organic matter. The ISME journal 2, 1194–1212 (2008).
Article CAS Google Scholar
Santoro, A. E. et al. Genomic and proteomic characterization of ‘Candidatus Nitrosopelagicus brevis’: An ammonia-oxidizing archaeon from the open ocean. Proceedings of the National Academy of Sciences 112, 1173–1178 (2015).
Article ADS CAS Google Scholar
DeLong, E. F. Archaea in coastal marine environments. Proceedings of the National Academy of Sciences 89, 5685–5689 (1992).
Article ADS CAS Google Scholar
Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).
Article Google Scholar
Bull, A. T. & Stach, J. E. Marine actinobacteria: new opportunities for natural product search and discovery. Trends in microbiology 15, 491–499 (2007).
Article CAS Google Scholar
Markowitz, V. M. et al. IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Research 40, D115–D122 (2012).
Article CAS Google Scholar
Rusch, D. et al. The Sorcerer II Global Ocean Sampling expedition: Northwest Atlantic through eastern tropical Pacific. PLoS Biology 5, e77 (2007).
Article Google Scholar
Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).
Article CAS Google Scholar
Peng, Y., Leung, H. C., Yiu, S.-M. & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).
Article CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article Google Scholar
Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).
Article Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research 25, 1043–1055 (2015).
Article CAS Google Scholar
Eddy, S. R. Accelerated Profile HMM Searches. PLoS Computational Biology 7, e1002195 (2011).
Article ADS MathSciNet CAS Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—Approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Article ADS Google Scholar
Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).
Article Google Scholar
Dagan, T., Roettger, M., Bryant, D. & Martin, W. Genome Networks Root the Tree of Life between Prokaryotic Domains. Genome Biology and Evolution 2, 379–392 (2010).
Article Google Scholar
Ludwig, W. et al. ARB: a software environment for sequence data. Nucleic Acids Research 32, 1363–1371 (2004).
Article CAS Google Scholar
Letunic, I. & Bork, P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Research 39, W475–W478 (2011).
Article CAS Google Scholar

Data Citations

Haroon, M. F. National Center for Biotechnology Information (NCBI) BioProject database PRJNA289734 (2015)
Haroon, M. F. Figshare https://dx.doi.org/10.6084/m9.figshare.3362899.v1 (2016)

Download references

Acknowledgements

We acknowledge the people who were involved in the KAUST Red Sea Expedition 2011 and those that helped to generate the data, include, but are not limited to, those named here: Matt Cahill, Mamoon Rashid, Vinu Manikandan, David Ngugi and Ahmed Shibl. This work was supported by King Abdullah University of Science and Technology (KAUST), Saudi Basic Industries Corporation (SABIC) fellowship to L.R.T., and SABIC presidential chair to U.S.

Author information

Authors and Affiliations

Red Sea Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
Mohamed F. Haroon, Luke R. Thompson & Ulrich Stingl
Department of Pediatrics, University of California, San Diego, 92037, California, USA
Luke R. Thompson
Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, Queensland 4072, Australia
Donovan H. Parks & Philip Hugenholtz
Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, Australia
Philip Hugenholtz

Authors

Mohamed F. Haroon
View author publications
You can also search for this author in PubMed Google Scholar
Luke R. Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Donovan H. Parks
View author publications
You can also search for this author in PubMed Google Scholar
Philip Hugenholtz
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Stingl
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.F.H. trimmed and assembled the metagenomes, binned, refined and annotated the genomes, submitted all sequences to databases, made figures and tables, and wrote the manuscript. L.R.T. organized the cruise, collected seawater samples, extracted DNA and helped to write the manuscript. D.H.P. helped write the manuscript and constructed the genome trees. P.H. constructed the genome trees. U.S. planned the study, organized the cruise and helped write the manuscript.

Corresponding authors

Correspondence to Mohamed F. Haroon or Ulrich Stingl.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

ISA-Tab metadata

Supplementary information

Supplementary Tables (DOC 227 kb)

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0 Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.

Reprints and permissions

About this article

Cite this article

Haroon, M., Thompson, L., Parks, D. et al. A catalogue of 136 microbial draft genomes from Red Sea metagenomes. Sci Data 3, 160050 (2016). https://doi.org/10.1038/sdata.2016.50

Download citation

Received: 20 April 2016
Accepted: 25 May 2016
Published: 05 July 2016
DOI: https://doi.org/10.1038/sdata.2016.50