The varied topography of human skin offers a unique opportunity to study how the body’s microenvironments influence the functional and taxonomic composition of microbial communities. Phylogenetic marker gene-based studies have identified many bacteria and fungi that colonize distinct skin niches. Here metagenomic analyses of diverse body sites in healthy humans demonstrate that local biogeography and strong individuality define the skin microbiome. We developed a relational analysis of bacterial, fungal and viral communities, which showed not only site specificity but also individual signatures. We further identified strain-level variation of dominant species as heterogeneous and multiphyletic. Reference-free analyses captured the uncharacterized metagenome through the development of a multi-kingdom gene catalogue, which was used to uncover genetic signatures of species lacking reference genomes. This work is foundational for human disease studies investigating inter-kingdom interactions, metabolic changes and strain tracking, and defines the dual influence of biogeography and individuality on microbial composition and function.
At a glance
- The skin microbiome. Nature Rev. Microbiol. 9, 244–253 (2011) &
- Topographical and temporal diversity of the human skin microbiome. Science 324, 1190–1192 (2009) et al.
- Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009) et al.
- Topographic diversity of fungal and bacterial communities in human skin. Nature 498, 367–370 (2013) et al.
- Candida albicans-Staphylococcus aureus polymicrobial peritonitis modulates host innate immunity. Infect. Immun. 81, 2178–2189 (2013) &
- Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011) et al.
- Temporal response of the human virome to immunosuppression and antiviral therapy. Cell 155, 1178–1187 (2013) et al.
- Pathogenic simian immunodeficiency virus infection is associated with expansion of the enteric virome. Cell 151, 253–266 (2012) et al.
- A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012) et al.
- Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013) et al.
- A diversity profile of the human skin microbiota. Genome Res. (2008) et al.
- The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012)
- Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial ‘pan-genome’. Proc. Natl Acad. Sci. USA 102, 13950–13955 (2005) et al.
- Nasal carriage as a source of Staphylococcus aureus bacteremia. N. Engl. J. Med. 344, 11–16 (2001) , , , &
- The altered landscape of the human skin microbiome in patients with primary immunodeficiencies. Genome Res. (2013) et al.
- Functional characterization of the antibiotic resistance reservoir in the human microflora. Science 325, 1128–1131 (2009) , &
- The shared antibiotic resistome of soil bacteria and human pathogens. Science 337, 1107–1111 (2012) et al.
- Temporal shifts in the skin microbiome associated with disease flares and treatment in children with atopic dermatitis. Genome Res. 22, 850–859 (2012) et al.
- Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS ONE 7, e39315 (2012)
- Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Natl Acad. Sci. USA 111, 4904–4909 (2014) et al.
- Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009) et al.
- pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010) , &
- Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012) &
- Pathoscope: Species identification and strain attribution with unassembled sequencing data. Genome Res. 23, 1721–1729 (2013) et al.
- BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010) &
- Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods 9, 811–814 (2012) et al.
- KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000) &
- Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLOS Comput. Biol. 8, e1002358 (2012) et al.
- Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010)
- ARDB—Antibiotic Resistance Genes Database. Nucleic Acids Res. 37, D443–D447 (2009) &
- Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013) et al.
- New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010) et al.
- Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002) , , &
- MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012) , , &
- Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132 (2010) , &
- Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006) , , &
- 281–295 http://link.springer.com/protocol/10.1007/978-1-61779-361-5_15 (Springer, 2012) & in Bacterial Molecular Networks (eds , & ) Methods in Molecular Biology Vol. 804, pp.
- Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995) &
- A similarity measure based on species proportions. Comm. Stat. Theory Methods 34, 2123–2131 (2005) &
- Classification and regression by randomForest. R News 2, 18–22 (2002) &
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: The 18 selected skin sites and their location on the human body. (304 KB)
These sites represent three microenvironments: sebaceous (blue), dry (red), and moist (green). Toenail (black) is a site that does not fall under these major microenvironments and is treated separately. Pie charts represent consensus relative abundance of the kingdoms Bacteria, Eukaryota (Fungi), and virus from multi-kingdom mapping.
- Extended Data Figure 2: Per-sample read statistics. (538 KB)
Additional samples (bacterial and eukaryotic mock communities) are shown. a, Boxplots (line indicates median; boxes represent first and third quartiles) show, for each site, % reads mapping to human hg19 that are discarded before analysis. Sites are coloured by site characteristic. b, Samples are ordered by label. Lines indicate the median value for that statistic; value is in parenthesis. c, Estimate of sequencing coverage. Reads seen is the number of reads in a sample sampled. Reads are then split into 20-mers, compared to a k-mer coverage table and kept only if the median k-mer coverage is below 20×. Curves are grouped by site, coloured by individual as indicated.
- Extended Data Figure 3: Validation of taxonomic classifications. (519 KB)
a, Bacterial sample community diversity as a function of genome coverage for two diversity metrics, the Shannon index that measures the richness and evenness of the community (left), and number of species observed (right). Genome coverage is defined as for each genome hit, the % of genome covered by reads. Boxplots show the range of diversity values for all samples, segregated by microenvironment. Black lines indicate median; boxes represent first and third quartiles. As coverage cut-offs increase, diversity estimates drop sharply. b, Comparisons of bacterial community diversity for Metaphlan-derived classifications versus custom bacterial Pathoscope-derived classifications. Each point represents a different sample, coloured by microenvironment. With no coverage cut-offs (left), Pathoscope may overestimate diversity, which is reduced by setting a minimum 1× coverage requirement. Spearman correlation (ρ) and corresponding P values are shown. Pathoscope-derived relative abundances versus relative abundances derived from c, 16S amplicon sequencing, d, Metaphlan genus-level, e, Metaphlan-species level (ρ and P value are calculated for non-zero abundance taxa), f, Metaphlan, staphylococcal species, g, ITS1 amplicon sequencing, genus (ρ and P value are calculated for non-zero abundance taxa), and h, ITS1 amplicon sequencing, Malassezia species.
- Extended Data Figure 4: Full taxonomic classifications for all healthy volunteers (HV), all sites. (534 KB)
To aid visualization of site- and individual-specific similarities, samples are grouped by site/microenvironment for each individual. Relative abundances of the most abundant skin taxa for each super-kingdom are shown. b, Taxonomic re-classification of major sites sampled by the Human Microbiome Project. Samples are from the anterior nares and retroauricular crease (skin), tongue dorsum and supragingival plaque (oral), stool, and posterior fornix (vaginal). Relative abundances of the most abundant taxa for each kingdom in the skin, for comparison, are shown.
- Extended Data Figure 5: Strain-level classification based on reference genomes show sub-species heterogeneity for dominant skin taxa. (387 KB)
a, Simulations to assess sensitivity of Pathoscope-based mapping to SNPs, non-core regions, or whole genomes. Synthetic communities were created with 6, 12, or 18 genomes per community. Sizes of circles reflect the number of reads sampled from each genome, for example, 50,000, 100,000, or 500,000 reads per genome. 15 random synthetic communities for each genome group were created and mapped to SNPs, non-core regions, or the full genome set. Sensitivity is calculated from the expected versus the observed abundances. b, Full strain-level assignments for samples with relative abundances of closest related Propionibacterium acnes strains, by individual. c, Dendrograms of strain similarity. Trees were generated using core SNPs; genomes were aligned with nucmer to identify core regions, and then SNPs within these core regions were identified by calculating all pairwise differences between genomes. Bar of colours indicates delineations of subtypes where phylogenetically more similar genomes are in similar colours; for example, we defined 12 subtypes for P. acnes.
- Extended Data Figure 6: Strain-level classification for Staphylococcus epidermidis. (737 KB)
a, Full strain-level assignments for samples by microenvironment. b, Description is as in Extended Data Fig. 5c. We defined 14 subtypes for S. epidermidis.
- Extended Data Figure 7: Full version of coreness of different module categories across skin microenvironment. (552 KB)
A module is defined as core if occurring in >2/3 of samples for that class. Major KEGG module descriptors are shown in the different colours. Height of bars reflects the proportion of samples that a module occurs in.
- Extended Data Figure 8: Correlation analysis of module abundance with species abundance to infer a module’s taxonomic origin. (460 KB)
Spearman correlation (ρ) was calculated with corresponding P value for taxa with relative abundance >0.5% and modules with greater than 0.05% relative abundance. Coryn., Corynebacterium. a, Unsupervised clustering of correlation coefficients. Species from the same genera clustering together may suggest a shared contribution of a pathway. b, Most significantly correlated taxa; colours represent broad KEGG classes. Adjusted P < 2 × 10−16.
- Extended Data Figure 9: Antibiotic resistance profiles in the skin. (406 KB)
Reads were mapped to a short marker database consensus created from the ARDB database, which catalogues publicly available resistance genes. Genes are grouped into broad resistance classes; a resistance category is called present (black; absent = white) if at least one gene from its family is present.
- Extended Data Figure 10: Reference-free analysis of skin metagenome with adaptive iterative assembly, gene catalogue, and metagenomic clusters. (509 KB)
a, Tracking unclassified reads. Fraction unmapped reads refers to the fraction of total reads passing quality control that do not map to the major super kingdoms Archaea, Bacteria, Eukaryota, and viruses. Samples are ordered by label and are divided by site. b, Assembly, gene-calling, and clustering workflow. c, Assembly efficacy varies significantly by k-mer depending on the site’s unique features of community complexity and sequencing depth, which is most affected by that site’s human DNA admixture. Assembly statistics are shown for samples pooled by individual, which produced higher quality assemblies than pooling by site. Because of large pool size, khmer digital normalization was used before Velvet assembly. % overall alignment rate indicates the total % of reads that map back to that sample’s assembly for each k-mer. % paired concordant indicates the fraction paired reads (of overall, not of % paired) in which both pairs of a mate map back to an assembly; discordant is where one mate of a pair does not map, or maps to a different contig. Contigs are then assessed by the maximum assembly size, the number of bases that are used in the assembly, and the number of contigs above a threshold of 300 bp. d, Effect of khmer digital normalization on individual sample assembly. Digital normalization + Velvet assembly performs similarly to Velvet assembly alone.
- Supplementary Data (17.9 MB)
This file contains Supplementary Tables 1-18.