Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Biogeography and individuality shape function in the human skin metagenome


The varied topography of human skin offers a unique opportunity to study how the body’s microenvironments influence the functional and taxonomic composition of microbial communities. Phylogenetic marker gene-based studies have identified many bacteria and fungi that colonize distinct skin niches. Here metagenomic analyses of diverse body sites in healthy humans demonstrate that local biogeography and strong individuality define the skin microbiome. We developed a relational analysis of bacterial, fungal and viral communities, which showed not only site specificity but also individual signatures. We further identified strain-level variation of dominant species as heterogeneous and multiphyletic. Reference-free analyses captured the uncharacterized metagenome through the development of a multi-kingdom gene catalogue, which was used to uncover genetic signatures of species lacking reference genomes. This work is foundational for human disease studies investigating inter-kingdom interactions, metabolic changes and strain tracking, and defines the dual influence of biogeography and individuality on microbial composition and function.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Figure 1: Multi-kingdom relative abundances are strongly shaped by skin microenvironment.
Figure 2: Individual-specific signatures are typically low abundance but shared across most sites.
Figure 3: Propionibacterium acnes and Staphylococcus epidermidis are heterogeneous and multiphyletic at the strain level.
Figure 4: Functional capacity varies by microenvironment.
Figure 5: Reconstruction of metagenomic dark matter with reference-free methods.

Accession codes

Primary accessions



  1. Grice, E. A. & Segre, J. A. The skin microbiome. Nature Rev. Microbiol. 9, 244–253 (2011)

    Article  CAS  Google Scholar 

  2. Grice, E. A. et al. Topographical and temporal diversity of the human skin microbiome. Science 324, 1190–1192 (2009)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. Costello, E. K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. Findley, K. et al. Topographic diversity of fungal and bacterial communities in human skin. Nature 498, 367–370 (2013)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  5. Peters, B. M. & Noverr, M. C. Candida albicans-Staphylococcus aureus polymicrobial peritonitis modulates host innate immunity. Infect. Immun. 81, 2178–2189 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. De Vlaminck, I. et al. Temporal response of the human virome to immunosuppression and antiviral therapy. Cell 155, 1178–1187 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Handley, S. A. et al. Pathogenic simian immunodeficiency virus infection is associated with expansion of the enteric virome. Cell 151, 253–266 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012)

    Article  ADS  CAS  PubMed  Google Scholar 

  10. Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013)

    Article  ADS  CAS  PubMed  Google Scholar 

  11. Grice, E. A. et al. A diversity profile of the human skin microbiota. Genome Res. (2008)

  12. The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012)

  13. Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial ‘pan-genome’. Proc. Natl Acad. Sci. USA 102, 13950–13955 (2005)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  14. von Eiff, C., Becker, K., Machka, K., Stammer, H. & Peters, G. Nasal carriage as a source of Staphylococcus aureus bacteremia. N. Engl. J. Med. 344, 11–16 (2001)

    Article  CAS  PubMed  Google Scholar 

  15. Oh, J. et al. The altered landscape of the human skin microbiome in patients with primary immunodeficiencies. Genome Res. (2013)

  16. Sommer, M. O. A., Dantas, G. & Church, G. M. Functional characterization of the antibiotic resistance reservoir in the human microflora. Science 325, 1128–1131 (2009)

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  17. Forsberg, K. J. et al. The shared antibiotic resistome of soil bacteria and human pathogens. Science 337, 1107–1111 (2012)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  18. Kong, H. H. et al. Temporal shifts in the skin microbiome associated with disease flares and treatment in children with atopic dermatitis. Genome Res. 22, 850–859 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS ONE 7, e39315 (2012)

  20. Howe, A. C. et al. Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Natl Acad. Sci. USA 111, 4904–4909 (2014)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  21. Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Matsen, F. A., Kodner, R. B. & Armbrust, E. V. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  23. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Francis, O. E. et al. Pathoscope: Species identification and strain attribution with unassembled sequencing data. Genome Res. 23, 1721–1729 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nature Methods 9, 811–814 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLOS Comput. Biol. 8, e1002358 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010)

    CAS  PubMed  Google Scholar 

  30. Liu, B. & Pop, M. ARDB—Antibiotic Resistance Genes Database. Nucleic Acids Res. 37, D443–D447 (2009)

    Article  CAS  PubMed  Google Scholar 

  31. Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013)

    Article  ADS  PubMed  Google Scholar 

  32. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010)

    Article  CAS  PubMed  Google Scholar 

  33. Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002)

    Article  PubMed  PubMed Central  Google Scholar 

  34. Namiki, T., Hachiya, T., Tanaka, H. & Sakakibara, Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  36. Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006)

    Article  PubMed  PubMed Central  Google Scholar 

  37. van Dongen, S. & Abreu-Goodger, C. in Bacterial Molecular Networks (eds Helden, J., Toussaint, A. & Thieffry, D. ) Methods in Molecular Biology Vol. 804, pp. 281–295 (Springer, 2012)

    Book  Google Scholar 

  38. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  39. Yue, J. C. & Clayton, M. K. A similarity measure based on species proportions. Comm. Stat. Theory Methods 34, 2123–2131 (2005)

    Article  MathSciNet  Google Scholar 

  40. Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2002)

    Google Scholar 

Download references


We thank D. Schoenfeld, A. Pradhan, M. Park and G. Bouffard for their efforts. We also thank members of the Segre laboratory and M. C. Udey for their discussions. This work was supported by National Institutes of Health (NIH) NHGRI and NCI Intramural Research Programs and in part by 1K99AR059222 (H.H.K.). This study used the high-performance computational capabilities of the NIH Biowulf Linux cluster. Sequencing was funded by grants from the National Institutes of Health (1UH2AR057504-01 and 4UH3AR057504-02).

Author information

Authors and Affiliations




J.O., H.H.K. and J.A.S. designed the study. H.H.K. collected patient samples. C.D. prepared the clinical samples for sequencing, which was carried out by the members of the NIH Intramural Sequencing Center Comparative Sequencing program. J.O., A.L.B. and S.C. analysed sequence data. J.O., H.H.K. and J.A.S. drafted the manuscript. All authors read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Heidi H. Kong or Julia A. Segre.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Data deposition is with the SRA and all sequences can be accessed under BioProject 46333. Human subject clinical data are deposited with dbGaP phs000266. Analysis workflow is available at

Extended data figures and tables

Extended Data Figure 1 The 18 selected skin sites and their location on the human body.

These sites represent three microenvironments: sebaceous (blue), dry (red), and moist (green). Toenail (black) is a site that does not fall under these major microenvironments and is treated separately. Pie charts represent consensus relative abundance of the kingdoms Bacteria, Eukaryota (Fungi), and virus from multi-kingdom mapping.

Extended Data Figure 2 Per-sample read statistics.

Additional samples (bacterial and eukaryotic mock communities) are shown. a, Boxplots (line indicates median; boxes represent first and third quartiles) show, for each site, % reads mapping to human hg19 that are discarded before analysis. Sites are coloured by site characteristic. b, Samples are ordered by label. Lines indicate the median value for that statistic; value is in parenthesis. c, Estimate of sequencing coverage. Reads seen is the number of reads in a sample sampled. Reads are then split into 20-mers, compared to a k-mer coverage table and kept only if the median k-mer coverage is below 20×. Curves are grouped by site, coloured by individual as indicated.

Extended Data Figure 3 Validation of taxonomic classifications.

a, Bacterial sample community diversity as a function of genome coverage for two diversity metrics, the Shannon index that measures the richness and evenness of the community (left), and number of species observed (right). Genome coverage is defined as for each genome hit, the % of genome covered by reads. Boxplots show the range of diversity values for all samples, segregated by microenvironment. Black lines indicate median; boxes represent first and third quartiles. As coverage cut-offs increase, diversity estimates drop sharply. b, Comparisons of bacterial community diversity for Metaphlan-derived classifications versus custom bacterial Pathoscope-derived classifications. Each point represents a different sample, coloured by microenvironment. With no coverage cut-offs (left), Pathoscope may overestimate diversity, which is reduced by setting a minimum 1× coverage requirement. Spearman correlation (ρ) and corresponding P values are shown. Pathoscope-derived relative abundances versus relative abundances derived from c, 16S amplicon sequencing, d, Metaphlan genus-level, e, Metaphlan-species level (ρ and P value are calculated for non-zero abundance taxa), f, Metaphlan, staphylococcal species, g, ITS1 amplicon sequencing, genus (ρ and P value are calculated for non-zero abundance taxa), and h, ITS1 amplicon sequencing, Malassezia species.

Extended Data Figure 4 Full taxonomic classifications for all healthy volunteers (HV), all sites.

To aid visualization of site- and individual-specific similarities, samples are grouped by site/microenvironment for each individual. Relative abundances of the most abundant skin taxa for each super-kingdom are shown. b, Taxonomic re-classification of major sites sampled by the Human Microbiome Project. Samples are from the anterior nares and retroauricular crease (skin), tongue dorsum and supragingival plaque (oral), stool, and posterior fornix (vaginal). Relative abundances of the most abundant taxa for each kingdom in the skin, for comparison, are shown.

Extended Data Figure 5 Strain-level classification based on reference genomes show sub-species heterogeneity for dominant skin taxa.

a, Simulations to assess sensitivity of Pathoscope-based mapping to SNPs, non-core regions, or whole genomes. Synthetic communities were created with 6, 12, or 18 genomes per community. Sizes of circles reflect the number of reads sampled from each genome, for example, 50,000, 100,000, or 500,000 reads per genome. 15 random synthetic communities for each genome group were created and mapped to SNPs, non-core regions, or the full genome set. Sensitivity is calculated from the expected versus the observed abundances. b, Full strain-level assignments for samples with relative abundances of closest related Propionibacterium acnes strains, by individual. c, Dendrograms of strain similarity. Trees were generated using core SNPs; genomes were aligned with nucmer to identify core regions, and then SNPs within these core regions were identified by calculating all pairwise differences between genomes. Bar of colours indicates delineations of subtypes where phylogenetically more similar genomes are in similar colours; for example, we defined 12 subtypes for P. acnes.

Extended Data Figure 6 Strain-level classification for Staphylococcus epidermidis.

a, Full strain-level assignments for samples by microenvironment. b, Description is as in Extended Data Fig. 5c. We defined 14 subtypes for S. epidermidis.

Extended Data Figure 7 Full version of coreness of different module categories across skin microenvironment.

A module is defined as core if occurring in >2/3 of samples for that class. Major KEGG module descriptors are shown in the different colours. Height of bars reflects the proportion of samples that a module occurs in.

Extended Data Figure 8 Correlation analysis of module abundance with species abundance to infer a module’s taxonomic origin.

Spearman correlation (ρ) was calculated with corresponding P value for taxa with relative abundance >0.5% and modules with greater than 0.05% relative abundance. Coryn., Corynebacterium. a, Unsupervised clustering of correlation coefficients. Species from the same genera clustering together may suggest a shared contribution of a pathway. b, Most significantly correlated taxa; colours represent broad KEGG classes. Adjusted P < 2 × 10−16.

Extended Data Figure 9 Antibiotic resistance profiles in the skin.

Reads were mapped to a short marker database consensus created from the ARDB database, which catalogues publicly available resistance genes. Genes are grouped into broad resistance classes; a resistance category is called present (black; absent = white) if at least one gene from its family is present.

Extended Data Figure 10 Reference-free analysis of skin metagenome with adaptive iterative assembly, gene catalogue, and metagenomic clusters.

a, Tracking unclassified reads. Fraction unmapped reads refers to the fraction of total reads passing quality control that do not map to the major super kingdoms Archaea, Bacteria, Eukaryota, and viruses. Samples are ordered by label and are divided by site. b, Assembly, gene-calling, and clustering workflow. c, Assembly efficacy varies significantly by k-mer depending on the site’s unique features of community complexity and sequencing depth, which is most affected by that site’s human DNA admixture. Assembly statistics are shown for samples pooled by individual, which produced higher quality assemblies than pooling by site. Because of large pool size, khmer digital normalization was used before Velvet assembly. % overall alignment rate indicates the total % of reads that map back to that sample’s assembly for each k-mer. % paired concordant indicates the fraction paired reads (of overall, not of % paired) in which both pairs of a mate map back to an assembly; discordant is where one mate of a pair does not map, or maps to a different contig. Contigs are then assessed by the maximum assembly size, the number of bases that are used in the assembly, and the number of contigs above a threshold of 300 bp. d, Effect of khmer digital normalization on individual sample assembly. Digital normalization + Velvet assembly performs similarly to Velvet assembly alone.

Related audio

Supplementary information

Supplementary Data

This file contains Supplementary Tables 1-18. (XLSX 18412 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oh, J., Byrd, A., Deming, C. et al. Biogeography and individuality shape function in the human skin metagenome. Nature 514, 59–64 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing