Abstract
Human skin functions as a physical barrier to foreign pathogen invasion and houses numerous commensals. Shifts in the human skin microbiome have been associated with conditions ranging from acne to atopic dermatitis. Previous metagenomic investigations into the role of the skin microbiome in health or disease have found that much of the sequenced data do not match reference genomes, making it difficult to interpret metagenomic datasets. We combined bacterial cultivation and metagenomic sequencing to assemble the Skin Microbial Genome Collection (SMGC), which comprises 622 prokaryotic species derived from 7,535 metagenome-assembled genomes and 251 isolate genomes. The metagenomic datasets that we generated were combined with publicly available skin metagenomic datasets to identify members and functions of the human skin microbiome. The SMGC collection includes 174 newly identified bacterial species and 12 newly identified bacterial genera, including the abundant genus ‘Candidatus Pellibacterium’, which has been newly associated with the skin. The SMGC increases the characterized set of known skin bacteria by 26%. We validated the SMGC metagenome-assembled genomes by comparing them with sequenced isolates obtained from the same samples. We also recovered 12 eukaryotic species and assembled thousands of viral sequences, including newly identified clades of jumbo phages. The SMGC enables classification of a median of 85% of skin metagenomic sequences and provides a comprehensive view of skin microbiome diversity, derived primarily from samples obtained in North America.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Highly host-linked viromes in the built environment possess habitat-dependent diversity and functions for potential virus-host coevolution
Nature Communications Open Access 09 May 2023
-
The skin microbiota of the axolotl Ambystoma altamirani is highly influenced by metamorphosis and seasonality but not by pathogen infection
Animal Microbiome Open Access 12 December 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




Data availability
Metagenome sequence data are publicly available in SRA (study accession SRP002480). The SBCC is maintained and stored at the National Human Genome Research Institute. Some isolates will be available through public repositories. Strains of novel species not otherwise available will be made available upon request by researchers. The sequenced genomes of the isolates have been submitted to NCBI under the accession numbers in Supplementary Table 1. Metagenome assemblies, isolate and metagenome-assembled genomes from the SMGC and genome annotations are available in: http://ftp.ebi.ac.uk/pub/databases/metagenomics/genome_sets/skin_microbiome. Source data are provided with this paper.
Code availability
The SMGCv1.0 pipeline is available at https://github.com/skinmicrobiome/skin_MAGs/releases/tag/v1.0. The VIRify v0.2.0 is available at https://github.com/EBI-Metagenomics/emg-viral-pipeline/releases/tag/v0.2.0.
References
Oh, J. et al. Biogeography and individuality shape function in the human skin metagenome. Nature 514, 59–64 (2014).
Byrd, A. L., Belkaid, Y. & Segre, J. A. The human skin microbiome. Nat. Rev. Microbiol. 16, 143–155 (2018).
Oh, J. et al. Temporal stability of the human skin microbiome. Cell 165, 854–866 (2016).
Myles, I. A. et al. A method for culturing Gram− skin microbiota. BMC Microbiol. 16, 60 (2016).
Timm, C. M. et al. Isolation and characterization of diverse microbial representatives from the human skin microbiome. Microbiome 8, 58 (2020).
Jagielski, T. et al. Distribution of Malassezia species on the skin of patients with atopic dermatitis, psoriasis, and healthy volunteers assessed by conventional and molecular identification methods. BMC Dermatol. 14, 3 (2014).
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
Varghese, N. J. et al. Microbial species delineation using whole genome sequences. Nucleic Acids Res. 43, 6761–6771 (2015).
Jégousse, C., Vannier, P., Groben, R., Glöckner, F. O. & Marteinsson, V. A total of 219 metagenome-assembled genomes of microorganisms from Icelandic marine waters. PeerJ 9, e11112 (2021).
Stewart, R. D. et al. Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat. Biotechnol. 37, 953–961 (2019).
Delmont, T. O. et al. Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813 (2018).
Eren, A. M. et al. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 3, e1319 (2015).
Quince, C. et al. DESMAN: a new tool for de novo extraction of strains from metagenomes. Genome Biol. 18, 181 (2017).
Orakov A. et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol. 22, 178.
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
Sangwan, N., Xia, F. & Gilbert, J. A. Recovering complete and draft population genomes from metagenome datasets. Microbiome 4, 8 (2016).
Saheb Kashaf, S., Almeida, A., Segre, J. A. & Finn, R. D. Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data. Nat. Protoc. 16, 2520–2541 (2021).
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662 (2019).
Pallen, M. J., Telatin, A. & Oren, A. The next million names for archaea and bacteria. Trends Microbiol. 29, 289–298 (2021).
Colquhoun R. M., Hall M. B., Lima L., Roberts L. W. Nucleotide-resolution bacterial pan-genomics with reference graphs. Preprint at bioRxiv https://doi.org/10.1186/s13059-021-02473-1 (2020).
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).
Tournu, H., Fiori, A., Van & Dijck, P. Relevance of trehalose in pathogenicity: some general rules, yet many exceptions. PLoS Pathog. 9, e1003447 (2013).
Jo, J.-H., Kennedy, E. A. & Kong, H. H. Topographical and physiological differences of the skin mycobiome in health and disease. Virulence 8, 324–333 (2017).
Nayfach, S. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-00774-7 (2020).
Paez-Espino, D. et al. IMG/VR: a database of cultured and uncultured DNA viruses and retroviruses. Nucleic Acids Res. 45, D457–D465 (2017).
Camarillo-Guerrero, L. F., Almeida, A. & Rangel-Pineros, G. Massive expansion of human gut bacteriophage diversity. Preprint at bioRxiv https://doi.org/10.1016/j.cell.2021.01.029 (2020).
Buttimer C. et al. Genome sequence of jumbo phage vB_AbaM_ME3 of Acinetobacter baumanni. Genome Announc. https://doi.org/10.1128/genomeA.00431-16 (2016).
Paddison, P. et al. The roles of the bacteriophage T4 r genes in lysis inhibition and fine-structure genetics: a new perspective. Genetics 148, 1539–1550 (1998).
Cole, J. R. et al. The Ribosomal Database Project (RDP-II): introducing myRDP space and quality controlled public data. Nucleic Acids Res. 35, D169–D172 (2007).
McIver, L. J. et al. bioBakery: a meta’omic analysis environment. Bioinformatics 34, 1235–1237 (2018).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Uritskiy G. V., DiRuggiero J. & Taylor J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome https://doi.org/10.1186/s40168-018-0541-1 (2018).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
Kalvari, I. et al. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 46, D335–D342. (2018).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Bushnell B. BBMap https://sourceforge.net/projects/bbmap (2014).
von Meijenfeldt, F. A. B., Arkhipova, K., Cambuy, D. D., Coutinho, F. H. & Dutilh, B. E. Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome Biol. 20, 217 (2019).
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
Gu Z. ComplexHeatmap: Make Complex Heatmaps. R package version 1 https://doi.org/10.1093/bioinformatics/btw313 (2015).
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics https://doi.org/10.1093/bioinformatics/btz848 (2019).
Parks D. H. et al. Selection of representative genomes for 24,706 bacterial and archaeal species clusters provide a complete genome-based taxonomy. Preprint at bioRxiv https://doi.org/10.1038/s41587-020-0501-8 (2019).
Letunic, I. & Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 44, W242–W245 (2016).
Tonkin-Hill, G. et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol. 21, 180 (2020).
Shaffer, M. et al. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res. 48, 8883–8900 (2020).
Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics 36, 2251–2252 (2020).
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Saary, P., Mitchell, A. L. & Finn, R. D. Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC. Genome Biol. 21, 244 (2020).
Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat. Microbiol. 6, 3–6 (2021).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Jang, H. B. et al. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat. Biotechnol. 37, 632–639 (2019).
Muller, J. et al. eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010).
Roux, S., Emerson, J. B., Eloe-Fadrosh, E. A. & Sullivan, M. B. Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. PeerJ 5, e3817 (2017).
Acknowledgements
We thank S. Nurk and S. Conlan for their invaluable feedback regarding this work. S.S.K. is a graduate student supported by the NIH-Oxford-Cambridge Scholars Program. A.A. and R.D.F. are funded by EMBL core funds. This study utilized the computational resources of the NIH HPC Biowulf Cluster (http://hpc.nih.gov), and was supported by the Intramural Research Programs of the National Institutes of Health (NIH) National Institute of Arthritis and Musculoskeletal and Skin Diseases and National Human Genome Research Institute.
Author information
Authors and Affiliations
Consortia
Contributions
S.S.K., H.H.K., J.A.S., A.A. and R.D.F. conceived the study. S.S.K. and A.A. performed the analyses. M.H. contributed to the evaluation and Nextflow implementation of the VIRify pipeline, and provided guidance on the viral analyses. P.S. developed the EukCC tool and provided intellectual input on the eukaryotic analyses. D.M.P. provided intellectual input and contributed to the interpretation of the results. C.D., H.H.K. and M.E.T. performed the sample collection and culturing. J.A.S., A.A. and R.D.F. supervised the work. H.H.K., J.A.S. and R.D.F. provided funding. S.S.K., J.A.S., A.A. and R.D.F. wrote the manuscript. All authors read, edited and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review information
Nature Microbiology thanks David Moyes and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Genome statistics of the prokaryotic skin MAGs.
a, The completeness and b, contamination estimates for genomes (Single Run, n = 2,389; Per Sample, n = 1,206; Pool Time, n = 973; Pool Site, n = 1,171; Pool HV, n = 1,054, Other datasets, n = 1,099) recovered from different metagenomic samples as determined by CheckM. ‘Other datasets’ refers to skin metagenomes excluding the healthy volunteer dataset SRP002480. c, N50 of these MAGs as determined through BBMap. Significance for a-c was determined using the two tailed t-test relative to Per Sample, with ns representing not significant. d, The mean proportion of these genomes classified as taxonomically mismatched by comparing the annotation of the bin to the annotation of each contig via the contig annotation tool (CAT). ‘No support’ indicates that no taxonomic annotation was available at the respective rank. In panels a, b and c, box lengths represent the IQR of the data, with whiskers depicting the lowest and highest values within 1.5 times the IQR of the first and third quartiles, respectively.
Extended Data Fig. 2 Comparison of MAG and SBCC isolate genomes.
a, Misassembled fraction as a proportion of the total genome length, estimated by QUAST. b, Single-nucleotide mismatches between MAGs and isolates per 100 kbp. c, percent MAG aligned, and d, percent isolate aligned for all pairwise MAG-isolate matches sharing > =99% average nucleotide identity across different pooling strategies (Single Run, n = 124; Per Sample, n = 91; Pool Time, n = 116; Pool Site, n = 134; Pool HV, n = 115). e, CheckM completeness relative to percent isolate aligned for these MAGs, coloured by pooling strategies. The majority of the points fall below the dashed identity line, indicating that CheckM frequently overestimates genome completeness f, Dot plot of a novel Corynebacterium MAG obtained through Pool HV and the matching isolate, cultured from the same healthy volunteer. In panels a, b, c and d, box lengths represent the IQR of the data, with whiskers depicting the lowest and highest values within 1.5 times the IQR of the first and third quartiles, respectively.
Extended Data Fig. 3 Comparison of the number of species recovered by each sampling strategy.
Venn diagram of the number of species recovered by single run/per sample and pooled approaches (Pool Time, Pool HV, Pool Site) as part of the study accession SRP002480 or by a per sample investigation of other publicly available metagenomic datasets (other studies).
Extended Data Fig. 4 The metabolisms of the prokaryotic SMGC MAGs and isolates.
Annotation of the prokaryotic SMGC using DRAM shows that clades largely represented by uncultured species (outlined in black) are depleted in pathways involved in aerobic respiration, suggesting that the standard skin culture conditions are not able to capture the full diversity of microbes found on human skin.
Extended Data Fig. 5 Gene frequency and metabolic pathway distribution of species from abundant skin genera.
a, Number of genes in relation to the number of near-complete (≥90% completeness) conspecific genomes recovered for Staphylococcus epidermidis. Other species showcased in b, showed similar distributions. b, Genome accumulation curves of the number of genes detected as a function of the number of non-redundant genomes analysed. c, Venn diagram of the number of KEGG pathways shared by the two genera Staphylococcus and Corynebacterium. Barplot comparing the predominant KEGG pathways unique to the Staphylococcus or the Corynebacterium skin genomes only showing pathways present in at least 5% of the genomes.
Extended Data Fig. 6 Quality and taxonomic classification of fungal and viral genomes.
a, Genome completeness and b contamination of the 499 eukaryotic MAGs estimated by EukCC. c N50 for these MAGs determined via BBMap. The number of bins were 81 for Single Run, 123 for Per Sample, 112 for Pool Time, 87 for Pool Site, 65 for Pool HV, and 31 for Other datasets. Significance was determined using the two tailed t-test relative to Per Sample, with ns representing not significant. ‘Other datasets’ refers to skin metagenomes excluding the healthy volunteer dataset, which is a part of the study SRP002480. d, Taxonomic classification of the viral genomes according to DemoVir. In panels a, b and c, box lengths represent the IQR of the data, with whiskers depicting the lowest and highest values within 1.5 times the IQR of the first and third quartiles, respectively.
Extended Data Fig. 7 The human skin harbours vast viral diversity, of which the sebaceous sites remain stable over time.
a, The number of viral genomes in the SMGC coloured by their assigned CheckV quality. Comparison of the putative viral genomes to IMG/VR and the Gut Phage Database reveals that only a small fraction of the virome has been previously identified. b, The number of viral sequences detected for each SMGC bacterial genus using CRISPR host analysis. c, The stability of the SMGC over time for different body sites as estimated by the theta dissimilarity metric, with a theta dissimilarity of zero indicating high similarity. When calculating the theta dissimilarity, comparisons were made between the same body site of the same healthy volunteer over time. Body sites (Ac, n = 39; Al, n = 36; Ba, n = 33; Ch, n = 35; Ea, n = 35; Fh, n = 34; Hp, n = 35; Ic, n = 34; Id, n = 32; Mb, n = 35; N, n = 42; Oc, n = 36; Pc, n = 35; Ph, n = 36; Ra, n = 41; Tn, n = 32; Tw, n = 35; Vf, n = 38) are defined in Fig. 1a. The Ax was excluded due to limited sampling. Box lengths represent the IQR of the data, with whiskers depicting the lowest and highest values within 1.5 times the IQR of the first and third quartiles, respectively.
Extended Data Fig. 8 Characterization of the cluster 5 jumbo phage genome.
Distribution of viral protein families (ViPhOGs) and the GC (%) content along the cluster 5 jumbo phage genome reveals that viral proteins are evenly distributed and GC (%) content is consistent.
Extended Data Fig. 9 The SMGC improves classification of the skin microbiome.
a, Percentage of sequencing reads from different body sites classified by the SMGC as compared to the standard Kraken 2 database and the Pasolli et al skin prokaryotic MAGs. Box lengths represent the IQR of the data, with whiskers depicting the lowest and highest values within 1.5 times the IQR of the first and third quartiles, respectively. b, The species in the SMGC present at different body sites. Novelty was determined by comparison to both the GTDB database and the Pasolli et al catalogue. Body sites (Ac, n = 39; Al, n = 36; Ba, n = 33; Ch, n = 35; Ea, n = 35; Fh, n = 34; Hp, n = 35; Ic, n = 34; Id, n = 32; Mb, n = 35; N, n = 42; Oc, n = 36; Pc, n = 35; Ph, n = 36; Ra, n = 41; Tn, n = 32; Tw, n = 35; Vf, n = 38) are defined in Fig. 1a. The Ax was excluded due to limited sampling.
Extended Data Fig. 10 A new multi-kingdom view of the healthy human skin microbiome.
a, Relative abundance of viruses and members from the top 6 most abundant skin genera across the healthy volunteers for the first time point. Body sites are defined in Fig. 1a. b, Mean relative abundance across time of the most abundant species found in the external auditory ear canal and the nares for each healthy volunteer.
Supplementary information
Source data
Source Data Fig. 1
Statistical source data.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 7
Statistical source data.
Source Data Extended Data Fig. 9
Statisticalsource data.
Source Data Extended Data Fig. 10
Statistical source data.
Rights and permissions
About this article
Cite this article
Saheb Kashaf, S., Proctor, D.M., Deming, C. et al. Integrating cultivation and metagenomics for a multi-kingdom view of skin microbiome diversity and functions. Nat Microbiol 7, 169–179 (2022). https://doi.org/10.1038/s41564-021-01011-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-021-01011-w
This article is cited by
-
Gut microbiota in colorectal cancer development and therapy
Nature Reviews Clinical Oncology (2023)
-
Highly host-linked viromes in the built environment possess habitat-dependent diversity and functions for potential virus-host coevolution
Nature Communications (2023)
-
Microbiome epidemiology and association studies in human health
Nature Reviews Genetics (2023)
-
Bacterial Crosstalk via Antimicrobial Peptides on the Human Skin: Therapeutics from a Sustainable Perspective
Journal of Microbiology (2023)
-
The skin microbiota of the axolotl Ambystoma altamirani is highly influenced by metamorphosis and seasonality but not by pathogen infection
Animal Microbiome (2022)