Biogeography and individuality shape function in the human skin metagenome

Oh, Julia; Byrd, Allyson L.; Deming, Clay; Conlan, Sean; Kong, Heidi H.; Segre, Julia A.

doi:10.1038/nature13786

Article
Published: 01 October 2014

Biogeography and individuality shape function in the human skin metagenome

Julia Oh¹,
Allyson L. Byrd¹,
Clay Deming¹,
Sean Conlan¹,
NISC Comparative Sequencing Program,
Heidi H. Kong²^na1 &
…
Julia A. Segre¹^na1

Nature volume 514, pages 59–64 (2014)Cite this article

33k Accesses
715 Citations
79 Altmetric
Metrics details

Subjects

Abstract

The varied topography of human skin offers a unique opportunity to study how the body’s microenvironments influence the functional and taxonomic composition of microbial communities. Phylogenetic marker gene-based studies have identified many bacteria and fungi that colonize distinct skin niches. Here metagenomic analyses of diverse body sites in healthy humans demonstrate that local biogeography and strong individuality define the skin microbiome. We developed a relational analysis of bacterial, fungal and viral communities, which showed not only site specificity but also individual signatures. We further identified strain-level variation of dominant species as heterogeneous and multiphyletic. Reference-free analyses captured the uncharacterized metagenome through the development of a multi-kingdom gene catalogue, which was used to uncover genetic signatures of species lacking reference genomes. This work is foundational for human disease studies investigating inter-kingdom interactions, metabolic changes and strain tracking, and defines the dual influence of biogeography and individuality on microbial composition and function.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Multi-kingdom relative abundances are strongly shaped by skin microenvironment.**

**Figure 2: Individual-specific signatures are typically low abundance but shared across most sites.**

**Figure 3: *Propionibacterium acnes* and *Staphylococcus epidermidis* are heterogeneous and multiphyletic at the strain level.**

**Figure 4: Functional capacity varies by microenvironment.**

**Figure 5: Reconstruction of metagenomic dark matter with reference-free methods.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Microbiota in health and diseases

Article Open access 23 April 2022

Best practices for single-cell analysis across modalities

Article 31 March 2023

Accession codes

Primary accessions

BioProject

46333

References

Grice, E. A. & Segre, J. A. The skin microbiome. Nature Rev. Microbiol. 9, 244–253 (2011)
Article CAS Google Scholar
Grice, E. A. et al. Topographical and temporal diversity of the human skin microbiome. Science 324, 1190–1192 (2009)
Article ADS CAS PubMed PubMed Central Google Scholar
Costello, E. K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009)
Article ADS CAS PubMed PubMed Central Google Scholar
Findley, K. et al. Topographic diversity of fungal and bacterial communities in human skin. Nature 498, 367–370 (2013)
Article ADS CAS PubMed PubMed Central Google Scholar
Peters, B. M. & Noverr, M. C. Candida albicans-Staphylococcus aureus polymicrobial peritonitis modulates host innate immunity. Infect. Immun. 81, 2178–2189 (2013)
Article CAS PubMed PubMed Central Google Scholar
Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011)
Article CAS PubMed PubMed Central Google Scholar
De Vlaminck, I. et al. Temporal response of the human virome to immunosuppression and antiviral therapy. Cell 155, 1178–1187 (2013)
Article CAS PubMed PubMed Central Google Scholar
Handley, S. A. et al. Pathogenic simian immunodeficiency virus infection is associated with expansion of the enteric virome. Cell 151, 253–266 (2012)
Article CAS PubMed PubMed Central Google Scholar
Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012)
Article ADS CAS PubMed Google Scholar
Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013)
Article ADS CAS PubMed Google Scholar
Grice, E. A. et al. A diversity profile of the human skin microbiota. Genome Res. (2008)
The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012)
Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial ‘pan-genome’. Proc. Natl Acad. Sci. USA 102, 13950–13955 (2005)
Article ADS CAS PubMed PubMed Central Google Scholar
von Eiff, C., Becker, K., Machka, K., Stammer, H. & Peters, G. Nasal carriage as a source of Staphylococcus aureus bacteremia. N. Engl. J. Med. 344, 11–16 (2001)
Article CAS PubMed Google Scholar
Oh, J. et al. The altered landscape of the human skin microbiome in patients with primary immunodeficiencies. Genome Res. (2013)
Sommer, M. O. A., Dantas, G. & Church, G. M. Functional characterization of the antibiotic resistance reservoir in the human microflora. Science 325, 1128–1131 (2009)
ADS CAS PubMed PubMed Central Google Scholar
Forsberg, K. J. et al. The shared antibiotic resistome of soil bacteria and human pathogens. Science 337, 1107–1111 (2012)
Article ADS CAS PubMed PubMed Central Google Scholar
Kong, H. H. et al. Temporal shifts in the skin microbiome associated with disease flares and treatment in children with atopic dermatitis. Genome Res. 22, 850–859 (2012)
Article CAS PubMed PubMed Central Google Scholar
Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS ONE 7, e39315 (2012)
Howe, A. C. et al. Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Natl Acad. Sci. USA 111, 4904–4909 (2014)
Article ADS CAS PubMed PubMed Central Google Scholar
Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009)
Article CAS PubMed PubMed Central Google Scholar
Matsen, F. A., Kodner, R. B. & Armbrust, E. V. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010)
Article PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012)
Article CAS PubMed PubMed Central Google Scholar
Francis, O. E. et al. Pathoscope: Species identification and strain attribution with unassembled sequencing data. Genome Res. 23, 1721–1729 (2013)
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)
Article CAS PubMed PubMed Central Google Scholar
Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods 9, 811–814 (2012)
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000)
Article CAS PubMed PubMed Central Google Scholar
Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLOS Comput. Biol. 8, e1002358 (2012)
Article CAS PubMed PubMed Central Google Scholar
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010)
CAS PubMed Google Scholar
Liu, B. & Pop, M. ARDB—Antibiotic Resistance Genes Database. Nucleic Acids Res. 37, D443–D447 (2009)
Article CAS PubMed Google Scholar
Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013)
Article ADS PubMed Google Scholar
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010)
Article CAS PubMed Google Scholar
Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002)
Article PubMed PubMed Central Google Scholar
Namiki, T., Hachiya, T., Tanaka, H. & Sakakibara, Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012)
Article CAS PubMed PubMed Central Google Scholar
Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132 (2010)
Article PubMed PubMed Central Google Scholar
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006)
Article PubMed PubMed Central Google Scholar
van Dongen, S. & Abreu-Goodger, C. in Bacterial Molecular Networks (eds Helden, J., Toussaint, A. & Thieffry, D. ) Methods in Molecular Biology Vol. 804, pp. 281–295 http://link.springer.com/protocol/10.1007/978-1-61779-361-5_15 (Springer, 2012)
Book Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995)
MathSciNet MATH Google Scholar
Yue, J. C. & Clayton, M. K. A similarity measure based on species proportions. Comm. Stat. Theory Methods 34, 2123–2131 (2005)
Article MathSciNet Google Scholar
Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2002)
Google Scholar

Download references

Acknowledgements

We thank D. Schoenfeld, A. Pradhan, M. Park and G. Bouffard for their efforts. We also thank members of the Segre laboratory and M. C. Udey for their discussions. This work was supported by National Institutes of Health (NIH) NHGRI and NCI Intramural Research Programs and in part by 1K99AR059222 (H.H.K.). This study used the high-performance computational capabilities of the NIH Biowulf Linux cluster. Sequencing was funded by grants from the National Institutes of Health (1UH2AR057504-01 and 4UH3AR057504-02).

Author information

Heidi H. Kong and Julia A. Segre: These authors contributed equally to this work.

Authors and Affiliations

Translational and Functional Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, Maryland 20892, USA,
Julia Oh, Allyson L. Byrd, Clay Deming, Sean Conlan & Julia A. Segre
Dermatology Branch, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, Maryland 20892, USA,
Heidi H. Kong
NIH Intramural Sequencing Center, National Human Genome Research Institute, Bethesda, Maryland 20852, USA.,
Betty Barnabas, Robert Blakesley, Gerry Bouffard, Shelise Brooks, Holly Coleman, Mila Dekhtyar, Michael Gregory, Xiaobin Guan, Jyoti Gupta, Joel Han, Shi-ling Ho, Richelle Legaspi, Quino Maduro, Cathy Masiello, Baishali Maskeri, Jenny McDowell, Casandra Montemayor, James Mullikin, Morgan Park, Nancy Riebow, Karen Schandler, Brian Schmidt, Christina Sison, Mal Stantripop, James Thomas, Pamela Thomas, Meg Vemulapalli & Alice Young

Authors

Julia Oh
View author publications
You can also search for this author in PubMed Google Scholar
Allyson L. Byrd
View author publications
You can also search for this author in PubMed Google Scholar
Clay Deming
View author publications
You can also search for this author in PubMed Google Scholar
Sean Conlan
View author publications
You can also search for this author in PubMed Google Scholar
Heidi H. Kong
View author publications
You can also search for this author in PubMed Google Scholar
Julia A. Segre
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

NISC Comparative Sequencing Program

Betty Barnabas
, Robert Blakesley
, Gerry Bouffard
, Shelise Brooks
, Holly Coleman
, Mila Dekhtyar
, Michael Gregory
, Xiaobin Guan
, Jyoti Gupta
, Joel Han
, Shi-ling Ho
, Richelle Legaspi
, Quino Maduro
, Cathy Masiello
, Baishali Maskeri
, Jenny McDowell
, Casandra Montemayor
, James Mullikin
, Morgan Park
, Nancy Riebow
, Karen Schandler
, Brian Schmidt
, Christina Sison
, Mal Stantripop
, James Thomas
, Pamela Thomas
, Meg Vemulapalli
& Alice Young

Contributions

J.O., H.H.K. and J.A.S. designed the study. H.H.K. collected patient samples. C.D. prepared the clinical samples for sequencing, which was carried out by the members of the NIH Intramural Sequencing Center Comparative Sequencing program. J.O., A.L.B. and S.C. analysed sequence data. J.O., H.H.K. and J.A.S. drafted the manuscript. All authors read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Heidi H. Kong or Julia A. Segre.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Data deposition is with the SRA and all sequences can be accessed under BioProject 46333. Human subject clinical data are deposited with dbGaP phs000266. Analysis workflow is available at https://github.com/julia0h/skinmetagenome.git.

Extended data figures and tables

Extended Data Figure 1 The 18 selected skin sites and their location on the human body.

These sites represent three microenvironments: sebaceous (blue), dry (red), and moist (green). Toenail (black) is a site that does not fall under these major microenvironments and is treated separately. Pie charts represent consensus relative abundance of the kingdoms Bacteria, Eukaryota (Fungi), and virus from multi-kingdom mapping.

Extended Data Figure 2 Per-sample read statistics.

Additional samples (bacterial and eukaryotic mock communities) are shown. a, Boxplots (line indicates median; boxes represent first and third quartiles) show, for each site, % reads mapping to human hg19 that are discarded before analysis. Sites are coloured by site characteristic. b, Samples are ordered by label. Lines indicate the median value for that statistic; value is in parenthesis. c, Estimate of sequencing coverage. Reads seen is the number of reads in a sample sampled. Reads are then split into 20-mers, compared to a k-mer coverage table and kept only if the median k-mer coverage is below 20×. Curves are grouped by site, coloured by individual as indicated.

Extended Data Figure 3 Validation of taxonomic classifications.

a, Bacterial sample community diversity as a function of genome coverage for two diversity metrics, the Shannon index that measures the richness and evenness of the community (left), and number of species observed (right). Genome coverage is defined as for each genome hit, the % of genome covered by reads. Boxplots show the range of diversity values for all samples, segregated by microenvironment. Black lines indicate median; boxes represent first and third quartiles. As coverage cut-offs increase, diversity estimates drop sharply. b, Comparisons of bacterial community diversity for Metaphlan-derived classifications versus custom bacterial Pathoscope-derived classifications. Each point represents a different sample, coloured by microenvironment. With no coverage cut-offs (left), Pathoscope may overestimate diversity, which is reduced by setting a minimum 1× coverage requirement. Spearman correlation (ρ) and corresponding P values are shown. Pathoscope-derived relative abundances versus relative abundances derived from c, 16S amplicon sequencing, d, Metaphlan genus-level, e, Metaphlan-species level (ρ and P value are calculated for non-zero abundance taxa), f, Metaphlan, staphylococcal species, g, ITS1 amplicon sequencing, genus (ρ and P value are calculated for non-zero abundance taxa), and h, ITS1 amplicon sequencing, Malassezia species.

Extended Data Figure 4 Full taxonomic classifications for all healthy volunteers (HV), all sites.

To aid visualization of site- and individual-specific similarities, samples are grouped by site/microenvironment for each individual. Relative abundances of the most abundant skin taxa for each super-kingdom are shown. b, Taxonomic re-classification of major sites sampled by the Human Microbiome Project. Samples are from the anterior nares and retroauricular crease (skin), tongue dorsum and supragingival plaque (oral), stool, and posterior fornix (vaginal). Relative abundances of the most abundant taxa for each kingdom in the skin, for comparison, are shown.

Extended Data Figure 5 Strain-level classification based on reference genomes show sub-species heterogeneity for dominant skin taxa.

a, Simulations to assess sensitivity of Pathoscope-based mapping to SNPs, non-core regions, or whole genomes. Synthetic communities were created with 6, 12, or 18 genomes per community. Sizes of circles reflect the number of reads sampled from each genome, for example, 50,000, 100,000, or 500,000 reads per genome. 15 random synthetic communities for each genome group were created and mapped to SNPs, non-core regions, or the full genome set. Sensitivity is calculated from the expected versus the observed abundances. b, Full strain-level assignments for samples with relative abundances of closest related Propionibacterium acnes strains, by individual. c, Dendrograms of strain similarity. Trees were generated using core SNPs; genomes were aligned with nucmer to identify core regions, and then SNPs within these core regions were identified by calculating all pairwise differences between genomes. Bar of colours indicates delineations of subtypes where phylogenetically more similar genomes are in similar colours; for example, we defined 12 subtypes for P. acnes.

Extended Data Figure 6 Strain-level classification for Staphylococcus epidermidis.

a, Full strain-level assignments for samples by microenvironment. b, Description is as in Extended Data Fig. 5c. We defined 14 subtypes for S. epidermidis.

Extended Data Figure 7 Full version of coreness of different module categories across skin microenvironment.

A module is defined as core if occurring in >2/3 of samples for that class. Major KEGG module descriptors are shown in the different colours. Height of bars reflects the proportion of samples that a module occurs in.

Extended Data Figure 8 Correlation analysis of module abundance with species abundance to infer a module’s taxonomic origin.

Spearman correlation (ρ) was calculated with corresponding P value for taxa with relative abundance >0.5% and modules with greater than 0.05% relative abundance. Coryn., Corynebacterium. a, Unsupervised clustering of correlation coefficients. Species from the same genera clustering together may suggest a shared contribution of a pathway. b, Most significantly correlated taxa; colours represent broad KEGG classes. Adjusted P < 2 × 10⁻¹⁶.

Extended Data Figure 9 Antibiotic resistance profiles in the skin.

Reads were mapped to a short marker database consensus created from the ARDB database, which catalogues publicly available resistance genes. Genes are grouped into broad resistance classes; a resistance category is called present (black; absent = white) if at least one gene from its family is present.

Extended Data Figure 10 Reference-free analysis of skin metagenome with adaptive iterative assembly, gene catalogue, and metagenomic clusters.

a, Tracking unclassified reads. Fraction unmapped reads refers to the fraction of total reads passing quality control that do not map to the major super kingdoms Archaea, Bacteria, Eukaryota, and viruses. Samples are ordered by label and are divided by site. b, Assembly, gene-calling, and clustering workflow. c, Assembly efficacy varies significantly by k-mer depending on the site’s unique features of community complexity and sequencing depth, which is most affected by that site’s human DNA admixture. Assembly statistics are shown for samples pooled by individual, which produced higher quality assemblies than pooling by site. Because of large pool size, khmer digital normalization was used before Velvet assembly. % overall alignment rate indicates the total % of reads that map back to that sample’s assembly for each k-mer. % paired concordant indicates the fraction paired reads (of overall, not of % paired) in which both pairs of a mate map back to an assembly; discordant is where one mate of a pair does not map, or maps to a different contig. Contigs are then assessed by the maximum assembly size, the number of bases that are used in the assembly, and the number of contigs above a threshold of 300 bp. d, Effect of khmer digital normalization on individual sample assembly. Digital normalization + Velvet assembly performs similarly to Velvet assembly alone.

Related audio

Julie Segre and Heidi Kong on sequencing the genomes of microbes inhabiting our skin.

Supplementary information

Supplementary Data

This file contains Supplementary Tables 1-18. (XLSX 18412 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

PowerPoint slide for Fig. 5

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oh, J., Byrd, A., Deming, C. et al. Biogeography and individuality shape function in the human skin metagenome. Nature 514, 59–64 (2014). https://doi.org/10.1038/nature13786

Download citation

Received: 03 June 2014
Accepted: 21 August 2014
Published: 01 October 2014
Issue Date: 02 October 2014
DOI: https://doi.org/10.1038/nature13786

This article is cited by

Exploration of genes encoding KEGG pathway enzymes in rhizospheric microbiome of the wild plant Abutilon fruticosum
- Aala A. Abulfaraj
- Ashwag Y. Shami
- Rewaa S. Jalal
AMB Express (2024)
Mora: abundance aware metagenomic read re-assignment for disentangling similar strains
- Andrew Zheng
- Jim Shaw
- Yun William Yu
BMC Bioinformatics (2024)
The Skin Microbiome and its Significance for Dermatologists
- Cleo Whiting
- Sara Abdel Azim
- Adam Friedman
American Journal of Clinical Dermatology (2024)
The menstrual cycle regularity and skin: irregular menstrual cycle affects skin physiological properties and skin bacterial microbiome in urban Chinese women
- Laiji Ma
- Hong Jiang
- Li Shao
BMC Women's Health (2023)
Unveiling the dynamics of the breast milk microbiome: impact of lactation stage and gestational age
- Parul Singh
- Noora Al Mohannadi
- Souhaila Al Khodor
Journal of Translational Medicine (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.