Hill number as a bacterial diversity measure framework with high-throughput sequence data

Kang, Sanghoon; Rodrigues, Jorge L. M.; Ng, Justin P.; Gentry, Terry J.

doi:10.1038/srep38263

Download PDF

Article
Open access
Published: 30 November 2016

Hill number as a bacterial diversity measure framework with high-throughput sequence data

Sanghoon Kang¹,
Jorge L. M. Rodrigues²,
Justin P. Ng³ &
…
Terry J. Gentry³

Scientific Reports volume 6, Article number: 38263 (2016) Cite this article

6430 Accesses
25 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Bacterial diversity is an important parameter for measuring bacterial contributions to the global ecosystem. However, even the task of describing bacterial diversity is challenging due to biological and technological difficulties. One of the challenges in bacterial diversity estimation is the appropriate measure of rare taxa, but the uncertainty of the size of rare biosphere is yet to be experimentally determined. One approach is using the generalized diversity, Hill number (N_a), to control the variability associated with rare taxa by differentially weighing them. Here, we investigated Hill number as a framework for microbial diversity measure using a taxa-accmulation curve (TAC) with soil bacterial community data from two distinct studies by 454 pyrosequencing. The reliable biodiversity estimation was obtained when an increase in Hill number arose as the coverage became stable in TACs for a ≥ 1. In silico analysis also indicated that a certain level of sampling depth was desirable for reliable biodiversity estimation. Thus, in order to attain bacterial diversity from second generation sequencing, Hill number can be a good diversity framework with given sequencing depth, that is, until technology is further advanced and able to overcome the under- and random-sampling issues of the current sequencing approaches.

Bacterial genome size and gene functional diversity negatively correlate with taxonomic diversity along a pH gradient

Article Open access 17 November 2023

Capturing the microbial dark matter in desert soils using culturomics-based metagenomics and high-resolution analysis

Article Open access 22 September 2023

Accounting for 16S rRNA copy number prediction uncertainty and its implications in bacterial diversity analyses

Article Open access 10 June 2023

Introduction

Biodiversity has traditionally been considered to be a consequence of environmental processes, such as niche partitioning, resource distribution, and disturbances. In the last several decades, a new view of biodiversity as the predictor of environmental processes and functions gained interest^1,2,3 and developed into the research field now regarded as biodiversity-ecosystem function (BEF)^4,5,6. Bacteria have an intimately interactive relationship with its surrounding environment and ecosystem, and thus, bacetrial diversity has an important role in BEF research^7,8. However, even determining a reasonable description of bacterial diversity is challenging due to the intrinsic properties of bacteria (e.g., debatable species concept, hyperdiversity, variable 16S rRNA gene copy number) and technological difficulties^9,10,11,12. One of the challenges in bacterial diversity estimation is the capture of rare taxa (rare biosphere), which often occupy large portions of microbial diversity^13,14,15; the experimental determination of the uncertainty involved is not yet available. Since 2005, the second generation sequencing technologies drastically advanced the capacity and the depth of microbial community sampling by sequencing. However, there is still bias associated with the experimental procedures, and sampling by sequencing is also known to be a less-than-complete representation¹⁶. Thus, reproducible estimation of biodiversity is not yet available¹⁷. One way to overcome this problem is to use statistical and mathematical biodiversity estimations¹⁸. However, most mathematical and statistical approaches of biodiversity estimation were developed for investigating less diverse organisms (e.g., plants and animals), which imposes an inheritant challenge in applying these tools to the analysis of bacterial communities due to their hyperdiversity. Therefore, a framework accomodating those challenges is needed for a reasonable bacterial diversity estimation using current available experimental resources.

Hill number (N_a)¹⁹ was proposed as a unified diversity concept by defining biodiversity as a reciprocal mean proportional abundance and differently weighing taxa based on their abundances as follows:

Parameter a determines special cases of Hill number, for example, N₀ as number of taxa, N₁ as exponential Shannon index, and N₂ as reciprocal Simpson index¹⁹. Because of the generality and flexibility in controlling the effects of rare taxa in biodiversity measure, Hill number may be an excellent framework for bacterial diversity studies⁹. Recently, Haegeman et al.²⁰ showed that the uncertainty associated with Hill numbers quickly increased to an uncontrollable range when a < 1 from the series of sequence data sets.

The consensus in bacterial diversity studies is that a fully exhaustive census may require an extremely large amount of resources for most natural ecosystems^11,21. We argue that the “unsaturation” or asymptotic result in those rarefaction curves is due to the vast size of rare biosphere; thus, the saturated bacterial diversity may be obtainable with reasonable sequencing efforts using diversity measure framework of Hill number with differential weight on rare taxa. The goal of this study is to investigate the use of Hill number as a framework for reliable diversity estimation given sequencing depth.

Results and Discussion

The taxa-accumulation curves of the Amazon and Texas mine studies (Fig. 1, S1 and S2) show both similarities and differences in their patterns. The richness measures (N₀ and Chao1 index) are far from saturated in both studies, and as the parameter a increased, the degree of diversity coverage increased, as well. The degree of coverage, however, was much less in the Texas mine study; only N₂ was able to provide enough coverage (asymptote). Apparently, the difference is due to the depth of sampling (sequencing), which will be further discussed below with in silico analysis. The higher a represents increased insensitivity to the contributions by rare taxa to the overall biodiversity (γ diversity) and more robustness in doing so with reduced uncertainty²⁰.

This analysis revealed an interesting pattern between the soil bacterial communities measured in very different sequencing depths from two distinct ecosystems. The observed taxa richness (N₀) is fairly similar, but the difference becomes greater as a increases (Table S1) in that the Texas mine soil bacterial community is much more diverse than that of the Amazon soil samples. This is at least partly due to the abundant rare taxa, which should have caused rather low sampling completedness in Texas mine (~32%) compared to the Amazon samples (~65%)²². In the case of the Chao1 index, large numbers of singleton and doubleton in the Texas mine samples inflate the Chao1 index which is defined, in part, as the ratio between the square of the singleton frequency (F₁), and times two of the doubleton frequency (F₂) (Fig. 2B). It is impossible to determine how much of those singletons and doubletones are a part of real rare taxa and sequencing artifacts. However, because of the uncertainty, Hill number may be useful by enabling controlling of the contributions of rare taxa on determining diversity. Significant deviation (D = 0.17, P < 0.001) from a log-normal model also indicates incomplete sampling in the Texas mine microbial communities²³. The large difference in the proportion of rare taxa between the two data sets also resulted in distinctive taxa abundance patterns (Fig. 2 and S3). Since the Texas mine samples were from the chronosequence of reclamation, the Zipf model is conceptually fitting²⁴. However, under-sampling of the Texas data set may be contributing to the distinctive taxa abundance patterns, as well. To test the relationship between sampling degree and biodiversity coverage in TAC, we used randomly subsampled Amazon data between 25,000 and 400,000 reads in varing degrees (Fig. S4). Sufficient biodiversity coverage using TAC seems to be obtained with ~200,000 reads resulting in reliable biodiversity measures (N₁ and N₂).

The two data sets used here were suitable because they were prepared using almost identical procedures, but the sequencing depths were vastly different. A recent study using a mock community concluded that microbial composition results are influenced by the primers and sequencing platforms used²⁵; thus, the compatible experimental procedure increases the credibility of the results. The diverse sequencing coverage is also useful because it could show the scale-independency of the analyses and results.

In conclusion, the hyperdiverse nature of microbiota in most ecosystems often results in random- and under-sampling, thus hampering reliable diversity estimations even with the technological advancementes made by the second generation sequencing technologies. Until a series of significant technological advancements in sampling coverage is available, the Hill number and TAC approach may be a suitable framework for reliable estimation of diversity and further applications in research studies like BEF and dimensions of biodiversity.

Methods

We used a smoothed taxa-accumulation curve (TAC), which is often mis-labeled as a rarefaction curve, to investigate a reliable approach to estimate bacterial diversity from two 454 pyrosequence data sets. One data set is from soil samples in a chronosequence of reclaimed surface mine sites in East Texas (Texas study) and the other is from soil samples from an Amazonian rainforest that was converted to agricultural fields (Amazon study). Both data were prepared by very similar experimental and analytical procedures. Briefly, both studies used a PowerSoil DNA Isolation kit for DNA extraction (MoBio Laboratories) following manufacturer’s instruction and 454 GS FLX Sequencer (454 Life Sciences) for 16S rRNA gene sequencing at V4-V5 region (~350 bp). The quality processed sequences were analyzed using mothur software (v. 1.23.1)²⁶ with SILVA and ribisomal database project (RDP) database for alignment and classification.

The depth of sequencing was quite different between the two studies: ~31,000 reads in the Texas mine sample in comparing mine reclaiming techniques (crosspit spreader, CP and mixed overburden, MO) and ~400,000 reads in the Amazon sample between forest and converted pasture. First, unique taxa (OTU_0.97) richness (N₀), Chao1 index²⁷, exponential Shannon index (N₁), and reciprocal Simpson index (N₂) were calculated then used in TAC construction and by using EstimateS 9.1²⁸ and R 3.1.3²⁹. Rank abundance distribution (RAD) plots were prepared using vegan (2.2-1) and sads packages (0.2.4).

Additional Information

Accession codes: Sequence data used for this study is available from NCBI Sequence Read Archive (SRA) under accession number SRP026369 (Texas Mine data) and FigShare, http://dx. doi.org/10.6084/m9.figshare.1547935 (Amazon data). http://www.nature.com/srep

How to cite this article: Kang, S. et al. Hill number as a bacterial diversity measure framework with high-throughput sequence data. Sci. Rep. 6, 38263; doi: 10.1038/srep38263 (2016).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Naeem, S. et al. Biodiversity and ecosystem functioning: Maintaining natural life support processes (Ecological Society of America, Washington DC, 1999).
Naeem, S., Thomson, L. J., Lawlor, S. P., Lawton, J. H. & Woodfin, R. M. Declining biodiversity can alter the performance of ecosystems. Nature 368, 734–737 (1994).
Article ADS Google Scholar
Tilman, D. Biodiversity: population versus ecosystem estability. Ecology 77, 350–363 (1996).
Article Google Scholar
Hector, A. & Bagchi, R. Biodiversity and ecosystem multifunctionality. Nature 448, 188–190, doi: 10.1038/nature05947 (2007).
Article ADS CAS PubMed Google Scholar
Loreau, M. et al. Biodiversity and ecosystem functioning: current knowledge and future challenges. Science 294, 804–808, doi: 10.1126/science.1064088 (2001).
Article ADS CAS PubMed Google Scholar
Radchuk, V., De Laender, F., Van den Brink, P. J. & Grimm, V. Biodiversity and ecosystem functioning decoupled: invariant ecosystem functioning despite non-random reductions in consumer diversity. Oikos 125, 424–433, doi: 10.1111/oik.02220 (2016).
Article Google Scholar
Philippot, L. et al. Loss in microbial diversity affects nitrogen cycling in soil. ISME J. 7, 1609–1619, doi: 10.1038/ismej.2013.34 (2013).
Article CAS PubMed PubMed Central Google Scholar
van der Heijden, M. G. A., Bardgett, R. D. & van Straalen, N. M. The unseen majority: soil microbes as drivers of plant diversity and productivity in terrestrial ecosystems. Ecol. Lett. 11, 296–310, doi: 10.1111/j.1461-0248.2007.01139.x (2008).
Article PubMed Google Scholar
Bent, S. J. & Forney, L. J. The tragedy of the uncommon: understanding limitations in the analysis of microbial diversity. ISME J. 2 (2008).
Escalas, A. et al. A unifying quantitative framework for exploring the multiple facets of microbial biodiversity across diverse scales. Environ. Microbiol. 15, 2642–2657, doi: 10.1111/1462-2920.12156 (2013).
Article PubMed Google Scholar
Roesch, L. F. W. et al. Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J. 1, 283–290 (2007).
Article CAS PubMed Google Scholar
Větrovský, T. & Baldrian, P. The Variability of the 16S rRNA Gene in Bacterial Genomes and Its Consequences for Bacterial Community Analyses. PLOS One 8, e57923, doi: 10.1371/journal.pone.0057923 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Boeken, B. & Shachak, M. Linking community and ecosystem processes: The role of minor species. Ecosystems 9, 119–127 (2006).
Article Google Scholar
Sogin, M. L. et al. Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc. Natl. Acad. Sci. USA 103, 12115–12120 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Lynch, M. D. J. & Neufeld, J. D. Ecology and exploration of the rare biosphere. Nat Rev Micro 13, 217–229, doi: 10.1038/nrmicro3400 (2015).
Article CAS Google Scholar
Zhou, J. et al. Random Sampling Process Leads to Overestimation of β-Diversity of Microbial Communities. mBio 4, doi: 10.1128/mBio.00324-13 (2013).
Zhan, A. et al. Reproducibility of pyrosequencing data for biodiversity assessment in complex communities. Methods in Ecology and Evolution 5, 881–890, doi: 10.1111/2041-210X.12230 (2014).
Article Google Scholar
Hughes, J. B., Hellmann, J. J., Ricketts, T. H. & Bohannan, B. J. M. Counting the uncountable: statistical approaches to estimating microbial diversity. Appl. Environ. Microbiol. 67, 4399–4406 (2001).
Article CAS PubMed PubMed Central Google Scholar
Hill, M. O. Diversity and evenness: a unifying notation and its consequences. Ecology 54, 427–432 (1973).
Article Google Scholar
Haegeman, B. et al. Robust estimation of microbial diversity in theory and in practice. ISME J. 7, 1092–1101, doi: 10.1038/ismej.2013.10 (2013).
Article PubMed PubMed Central Google Scholar
Quince, C., Curtis, T. P. & Sloan, W. T. The rational exploration of microbial diversity. ISME J. 2, 997–1006 (2008).
Article CAS PubMed Google Scholar
Coddington, J. A., Agnarsson, I., Miller, J. A., Kuntner, M. & Hormiga, G. Undersampling bias: the null hypothesis for singleton species in tropical arthropod surveys. J. Anim. Ecol. 78, 573–584 (2009).
Article PubMed Google Scholar
Ulrich, W., Ollik, M. & Ugland, K. I. A meta-analysis of species-abundance distributions. Oikos 119, 1149–1155 (2010).
Article Google Scholar
Wilson, J. B. Methods for fitting dominance/diversity curves. J. Veg. Sci. 2, 35–46 (1991).
Article Google Scholar
Fouhy, F., Clooney, A. G., Stanton, C., Claesson, M. J. & Cotter, P. D. 16S rRNA gene sequencing of mock microbial populations- impact of DNA extraction method, primer choice and sequencing platform. BMC Microbiol. 16, 1–13, doi: 10.1186/s12866-016-0738-z (2016).
Article CAS Google Scholar
Schloss, P. D. et al. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).
Article CAS PubMed PubMed Central Google Scholar
Chao, A. Nonparametric estimation of the number of classes in a population. Scand J Statist 11, 265–270 (1984).
MathSciNet Google Scholar
EstimateS: Statistical estimation of species richness and shared species from samples. Version 9 (2013).
R: A language and environment for statistical computing. (R Foundation for Statistical Computing, Vienna, Austria, 2015).

Download references

Acknowledgements

The authors would like to recognize and thank Dr. Brendan Bohannan for the valuable comments.

Author information

Authors and Affiliations

Department of Biology, Baylor University, Waco, TX, USA
Sanghoon Kang
Department of Land, Air and Water Resources, University of California, Davis, Davis, CA, USA
Jorge L. M. Rodrigues
Department of Soil & Crop Sciences, Texas A&M University, College Station, TX, USA
Justin P. Ng & Terry J. Gentry

Authors

Sanghoon Kang
View author publications
You can also search for this author in PubMed Google Scholar
Jorge L. M. Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Justin P. Ng
View author publications
You can also search for this author in PubMed Google Scholar
Terry J. Gentry
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.K. designed the research; J.L.M.R., J.P.N. and T.J.G. conducted the research. S.K. analyzed the data, and S.K. and J.L.M.R. wrote the paper.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Kang, S., Rodrigues, J., Ng, J. et al. Hill number as a bacterial diversity measure framework with high-throughput sequence data. Sci Rep 6, 38263 (2016). https://doi.org/10.1038/srep38263

Download citation

Received: 14 July 2016
Accepted: 08 November 2016
Published: 30 November 2016
DOI: https://doi.org/10.1038/srep38263

This article is cited by

Dynamic of active microbial diversity in rhizosphere sediments of halophytes used for bioremediation of earthen shrimp ponds
- Marie Colette
- Linda Guentas
- Nolwenn Callac
Environmental Microbiome (2023)
Characterization of a Bacterial Culture Collection from Terrestrial Subsurface Habitats in Colombia
- Jorge Luis Fuentes Lorenzo
- Jhon Alexander Suescun-Sepulveda
- Kevin Mauricio Cárdenas León
Proceedings of the National Academy of Sciences, India Section B: Biological Sciences (2023)
Response of soil bacterial populations to application of biosolids under short-term flooding
- Nicholas H. Humphries
- Steven F. Thornton
- Douglas I. Stewart
Environmental Science and Pollution Research (2023)
Assessment of microbial α-diversity in one meter squared topsoil
- Shuzhen Li
- Xiongfeng Du
- Ye Deng
Soil Ecology Letters (2022)
Hill-based dissimilarity indices and null models for analysis of microbial community assembly
- Oskar Modin
- Raquel Liébana
- Frank Persson
Microbiome (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.