Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Assessment of two statistical approaches for variance genome-wide association studies in plants

Abstract

Genomic loci that control the variance of agronomically important traits are increasingly important due to the profusion of unpredictable environments arising from climate change. The ability to identify such variance-controlling loci in association studies will be critical for future breeding efforts. Two statistical approaches that have already been used in the variance genome-wide association study (vGWAS) paradigm are the Brown–Forsythe test (BFT) and the double generalized linear model (DGLM). To ensure that these approaches are deployed as effectively as possible, it is critical to study the factors that influence their ability to identify variance-controlling loci. We used genome-wide marker data in maize (Zea mays L.) and Arabidopsis thaliana to simulate traits controlled by epistasis, genotype by environment (GxE) interactions, and variance quantitative trait nucleotides (vQTNs). We then quantified true and false positive detection rates of the BFT and DGLM across all simulated traits. We also conducted a vGWAS using both the BFT and DGLM on plant height in a maize diversity panel. The observed true positive detection rates at the maximum sample size considered (N = 2815) suggest that both of these vGWAS approaches are capable of identifying epistasis and GxE for sufficiently large sample sizes. We also noted that the DGLM decisively outperformed the BFT for simulated traits controlled by vQTNs at sample sizes of N = 500. Although we conclude that there are still certain aspects of vGWAS approaches that need further refinement, this study suggests that the BFT and DGLM are capable of identifying variance-controlling loci in current state-of-the-art plant or agronomic data sets.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Get just this article for as long as you need it

$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: False positive detection rates for the null setting at a false discovery rate of 0.05.
Fig. 2: True positive detection rates under the “Epistasis” scenario.
Fig. 3: True positive detection rates under the “GxE” scenario.
Fig. 4: True positive detection rates under the “vQTN” scenario.
Fig. 5: Genome-wide association study (GWAS) of plant height in the Goodman maize diversity panel.

Data availability

The genotypic data, simulated trait data, ROC curves, and code to simulate traits are available at https://github.com/mdm10-code/vGWAS_arabidopsis_maize.

References

  • Agresti A (2003) Categorical data analysis, Vol. 482. John Wiley and Sons, New York, NY

    Google Scholar 

  • Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KMM et al. (2016) 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166(2):481–491

    Article  CAS  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat 57(1):289–300

    Google Scholar 

  • Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23(19):2633–2635

    Article  CAS  PubMed  Google Scholar 

  • Brown MB, Forsythe AB (1974) The Small sample behavior of some statistics which test the equality of several. Technometrics 16(1):129–132

    Article  Google Scholar 

  • Clopper CJ, Pearson ES (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4):404–413

  • Cook JP, McMullen MD, Holland JB, Tian F, Bradbury P, Ross-Ibarra J et al. (2012) Genetic architecture of maize kernel composition in the nested association mapping and inbred association panels. Plant Physiol 158(2):824–834

    Article  CAS  PubMed  Google Scholar 

  • Cordell HJ (2002) Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 11(20):2463–2468

    Article  CAS  PubMed  Google Scholar 

  • Córdova-Palomera A, van der Meer D, Kaufmann T, Bettella F, Wang Y, Alnæs D et al. (2021) Genetic control of variability in subcortical and intracranial volumes. Mol Psychiatry 26(8):3876–3883

    Article  PubMed  Google Scholar 

  • Corty RW, Valdar W (2018) QTL mapping on a background of variance heterogeneity. G3-Genes Genom Genet 8(12):3767–3782

    CAS  Google Scholar 

  • Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePisto et al. (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Debat V, David P (2001) Mapping phenotypes: canalization, plasticity and developmental stability. Trends Ecol Evol 16(10):555–561

    Article  Google Scholar 

  • Dumitrascu B, Darnell G, Ayroles J, Engelhardt BE (2019) Statistical tests for detecting variance effects in quantitative trait studies. Bioinformatics 35(2):200–210

    Article  CAS  PubMed  Google Scholar 

  • Dunn PK, Smyth GK, Dunn MPK (2020) Package ‘dglm’

  • Fernandes SB, Lipka AE (2020) simplePHENOTYPES: simulation of pleiotropic, linked and epistatic phenotypes. BMC Bioinform 21(1):1–10

    Article  Google Scholar 

  • Flint‐Garcia SA, Thuillet AC, Yu J, Pressoir G, Romero SM, Mitchell SE et al. (2005) Maize association population: a high‐resolution platform for quantitative trait locus dissection. Plant J 44(6):1054–1064

    Article  PubMed  CAS  Google Scholar 

  • Forsberg SKG, Andreatta ME, Huang XY, Danku J, Salt DE, Carlborg Ö (2015) The multi-allelic genetic architecture of a variance-heterogeneity locus for molybdenum concentration in leaves acts as a source of unexplained additive genetic variance. PLoS Genet 11(11):1–24

    Article  CAS  Google Scholar 

  • Forsberg SKG, Carlborg Ö (2017) On the relationship between epistasis and genetic variance heterogeneity. J Exp Bot 68(20):5431–5438

    Article  CAS  PubMed  Google Scholar 

  • Gage JL, de Leon N, Clayton MK (2018) Comparing genome-wide association study results from different measurements of an underlying phenotype. G3-Genes Genom Genet 8(11):3715–3722

    Google Scholar 

  • Hill WG, Zhang XS (2004) Effects on phenotypic variability of directional selection arising through genetic differences in residual variability. Genet Res 83(2):121–132

    Article  CAS  PubMed  Google Scholar 

  • Hill WG, Mulder HA (2010) Genetic analysis of environmental variation. Genet Res 92(5-6):381–395

    Article  Google Scholar 

  • Hong C, Ning Y, Wei P, Cao Y, Chen Y (2017) A semiparametric model for vQTL mapping. Biometrics 73(2):571–581

    Article  CAS  PubMed  Google Scholar 

  • Hussain W, Campbell MT, Jarquin D, Walia H, Morota G (2020) Variance heterogeneity genome-wide mapping for cadmium in bread wheat reveals novel genomic loci and epistatic interactions. Plant Genome 13(1):1–13

    Article  CAS  Google Scholar 

  • Izawa T (2007) Adaptation of flowering-time by natural and artificial selection in arabidopsis and rice. J Exp Bot 58(12):3091–3097

    Article  CAS  PubMed  Google Scholar 

  • Al Kawam A, Alshawaqfeh M, Cai JJ, Serpedin E, Datta A (2018) Simulating variance heterogeneity in quantitative genome-wide association studies. BMC Bioinform 19(Suppl 3):72

  • Kitano H (2004) Biological robustness. Nat Rev Genet 5(11):826–837

    Article  CAS  PubMed  Google Scholar 

  • Lee Y, Nelder JA (1996) Hierarchical generalized linear models. J R Stat Soc Ser B Stat Methodol: Ser B 58(4):619–656

    Google Scholar 

  • Lee Y, Nelder JA (2006) Double hierarchical generalized linear models. J R Stat Soc, C: Appl Stat 55(2):139–185

    Article  Google Scholar 

  • Li H, Wang M, Li W, He L, Zhou Y, Zhu J et al. (2020) Genetic variants and underlying mechanisms influencing variance heterogeneity in maize. Plant J 103(3):1089–1102

    Article  CAS  PubMed  Google Scholar 

  • Li M, Zhang YW, Zhang ZC, Xiang Y, Liu MH, Zhou YH et al. (2022) A compressed variance component mixed model for detecting QTNs and QTN-by-environment and QTN-by-QTN interactions in genome-wide association studies. Mol Plant 15:630–650

    Article  CAS  PubMed  Google Scholar 

  • Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ et al. (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28(18):2397–2399

    Article  CAS  PubMed  Google Scholar 

  • Metz CE (1978) Basic principles of ROC analysis. Semin Nucl Med 8(4):283–298

    Article  CAS  PubMed  Google Scholar 

  • Mulder HA, Bijma P, Hill WG (2007) Prediction of breeding values and selection responses with genetic heterogeneity of environmental variance. Genet 175(4):1895–1910

    Article  CAS  Google Scholar 

  • Park CJ, Seo YS (2015) Heat shock proteins: a review of the molecular chaperones for plant immunity. Plant Pathol J 31(4):323–333

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Peiffer JA, Romay MC, Gore MA, Flint-Garcia SA, Zhang Z, Millard MJ et al. (2014) The genetic architecture of maize height. Genet 196(4):1337–1356

    Article  CAS  Google Scholar 

  • Pettersson ME, Carlborg Ö (2015) Capacitating epistasis—detection and role in the genetic architecture of complex traits. In: Moore J., Williams S. (eds.) Epistasis. Human Press, New York, NY, p 185–196

  • Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909

    Article  CAS  PubMed  Google Scholar 

  • Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria (2022) https://www.R-project.org/

  • Romay MC, Millard MJ, Glaubitz JC, Peiffer JA, Swarts KL, Casstevens TM et al. (2013) Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol 14(6):R55

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Rönnegård L, Felleki M, Fikse F, Mulder HA, Strandberg E (2010) Genetic heterogeneity of residual variance-estimation of variance components using double hierarchical generalized linear models. Genet Sel Evol 42(1):1–10

    Article  Google Scholar 

  • Rönnegård L, Valdar W (2011) Detecting major genetic loci controlling phenotypic variability in experimental crosses. Genet 188(2):435–447

    Article  CAS  Google Scholar 

  • Rönnegård L, Valdar W (2012) Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability. BMC Genet 13:63

  • Scherer R, Scherer MR (2018) Package ‘PropCIs’

  • Schillaci M, Gupta S, Walker R, Roessner U (2019) The role of plant growth-promoting bacteria in the growth of cereals under abiotic stresses. Root Biol-Growth, Physiol, Funct 28:1–21

    Google Scholar 

  • Struchalin MV, Amin N, Eilers PHC, Dujin CM, Aulchenko YS (2012) An R package “VariABEL” for genome-wide searching of potentially interacting loci by testing genotypic variance heterogeneity. BMC Genet 13:4

  • Shen X, Pettersson M, Rönnegård L, Carlborg Ö (2012) Inheritance beyond plain heritability: variance-controlling genes in arabidopsis thaliana. PLoS Genet 8(8):e1002839

  • Van Raden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423

    Article  CAS  Google Scholar 

  • Waddington CH (1942) Canalization of development and the inheritance of acquired characters. Nature 150(3811):563–565

    Article  Google Scholar 

  • Woodward AW, Bartel B (2018) Biology in bloom: a primer on the Arabidopsis thaliana model system. Genet 208(4):1337–1349

    Article  CAS  Google Scholar 

  • Yin L (2018) CMplot: Circle Manhattan Plot

  • Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38(2):203–208

    Article  CAS  PubMed  Google Scholar 

  • Ziervogel G, Ericksen PJ (2010) Adapting to climate change to sustain food security. Wiley Interdiscip Rev Clim Change 1(4):525–540

    Article  Google Scholar 

  • Zhang X, Qi Y (2021) Genetic architecture affecting maize agronomic traits identified by variance heterogeneity association mapping. Genomics 113:1681–1688

Download references

Acknowledgements

We would like to thank the Associate Editor and four anonymous referees for their helpful suggestions. The research conducted in this manuscript is supported by the National Science Foundation project accession numbers 1355406 and 1733606, the University of Illinois Urbana-Champaign Department of Crop Science’s J.C. Hackleman and Lawrence E. Schrader and Elfriede Massier Plant Physiology Fellowship Programs.

Author information

Authors and Affiliations

Authors

Contributions

MDM conducted all simulations and analyses, wrote the computer program that will simulate vQTNs, created all figures and tables, and wrote and edited the manuscript. SBF contributed to the design of the simulation settings, made edits to the computer program to make it more computationally efficient, and edited the manuscript. GM contributed to various aspects of the statistical analysis, including how to use the DGLM in a meaningful manner, as well as how to interpret the results of the simulation study. These contributions significantly guided the direction of our study. AEL oversaw the entire analysis, designed the simulation study, designed the procedure for simulating vQTNs, and wrote and edited the manuscript.

Corresponding author

Correspondence to Alexander E. Lipka.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Associate editor: Yuan-Ming Zhang.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Murphy, M.D., Fernandes, S.B., Morota, G. et al. Assessment of two statistical approaches for variance genome-wide association studies in plants. Heredity 129, 93–102 (2022). https://doi.org/10.1038/s41437-022-00541-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41437-022-00541-1

Search

Quick links