Hybrid maize displays superior heterosis and contributes over 30% of total worldwide cereal production. However, the molecular mechanisms of heterosis remain obscure. Here we show that structural variants (SVs) between the parental lines have a predominant role underpinning maize heterosis. De novo assembly and analyses of 12 maize founder inbred lines (FILs) reveal abundant genetic variations among these FILs and, through expression quantitative trait loci and association analyses, we identify several SVs contributing to genomic and phenotypic differentiations of various heterotic groups. Using a set of 91 diallel-cross F1 hybrids, we found strong positive correlations between better-parent heterosis of the F1 hybrids and the numbers of SVs between the parental lines, providing concrete genomic support for a prevalent role of genetic complementation underlying heterosis. Further, we document evidence that SVs in both ZAR1 and ZmACO2 contribute to yield heterosis in an overdominance fashion. Our results should promote genomics-based breeding of hybrid maize.
This is a preview of subscription content, access via your institution
Subscribe to Nature+
Get immediate online access to Nature and 55 other Nature journal
Subscribe to Journal
Get full journal access for 1 year
only $6.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
All data sets reported in this study have been deposited in NCBI. The raw reads of 150-bp pair-end Illumina sequencing, PacBio SMRT sequencing, and RNA-seq (for gene annotation), sequences of genome assemblies have been deposited in the NCBI database under the BioProject accession PRJNA755430. The RNA-seq data of 131 inbred lines have been deposited in the NCBI database under the BioProject accession PRJNA783356. Source data are provided with this paper.
The code for construction pseudomolecules is available at GitHub (https://github.com/JunpengShi/Maize_pseudomolecule_construction) and Zenodo (https://doi.org/10.5281/zenodo.7407607) (ref. 87).
Larnkey, K. R. & Staub, J. E. (eds) Concepts and Breeding of Heterosis in Crop Plants (CSSA Special Publications, 1998).
Duvick, D. N. The contribution of breeding to yield advances in maize (Zea mays L.). Adv. Agron. 86, 83–145 (2005).
Zhang, R. et al. Patterns of genomic variation in Chinese maize inbred lines and implications for genetic improvement. Theor. Appl. Genet. 131, 1207–1221 (2018).
Mikel, M. A. Genetic composition of contemporary U.S. commercial dent corn germplasm. Crop Sci. 51, 592–599 (2011).
Romay, M. C. et al. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol. 14, R55 (2013).
Van Heerwaarden, J., Hufford, M. B. & Rossibarra, J. Historical genomics of North American maize. Proc. Natl Acad. Sci. USA 109, 12420–12425 (2012).
Tian, H. et al. New resources for genetic studies in maize (Zea mays L.): a genome-wide Maize6H-60K single nucleotide polymorphism array and its application. Plant J. 105, 1113–1122 (2021).
Li, Y. & Wang, T. Germplasm base of maize breeding in China and formation of foundation parents. J. Maize Sci. 18, 1–8 (2010).
Coffman, S. M., Hufford, M. B., Andorf, C. M. & Lubberstedt, T. Haplotype structure in commercial maize breeding programs in relation to key founder lines. Theor. Appl. Genet. 133, 547–561 (2020).
Hollick, J. B. & Chandler, V. L. Epigenetic allelic states of a maize transcriptional regulatory locus exhibit overdominant gene action. Genetics 150, 891–897 (1998).
Schnell, F. W. & Cockerham, C. C. Multiplicative vs. arbitrary gene action in heterosis. Genetics 131, 461 (1992).
Fu, J. et al. Partial least squares regression, support vector machine regression, and transcriptome-based distances for prediction of maize hybrid performance with gene expression data. Theor. Appl. Genet. 124, 825–833 (2012).
Birchler, J. A., Auger, D. L. & Riddle, N. C. In search of the molecular basis of heterosis. Plant Cell 15, 2236–2239 (2003).
Stupar, R. M. & Springer, N. M. Cis-transcriptional variation in maize inbred lines B73 and Mo17 leads to additive expression patterns in the F1 hybrid. Genetics 173, 2199–2210 (2006).
Springer, N. M. & Stupar, R. M. Allelic variation and heterosis in maize: how do two halves make more than a whole? Genome Res. 17, 264–275 (2007).
Stupar, R. M. et al. Gene expression analyses in maize inbreds and hybrids with varying levels of heterosis. BMC Plant Biol. 8, 33 (2008).
Garcia, A. A., Wang, S., Melchinger, A. E. & Zeng, Z. B. Quantitative trait loci mapping and the genetic basis of heterosis in maize and rice. Genetics 180, 1707–1724 (2008).
Yang, J. et al. Incomplete dominance of deleterious alleles contributes substantially to trait variation and heterosis in maize. PLoS Genet. 13, e1007019 (2017).
Wang, Y. et al. Heterotic loci identified for maize kernel traits in two chromosome segment substitution line test populations. Sci. Rep. 8, 11101 (2018).
Flint-Garcia, S. A., Buckler, E. S., Tiffin, P., Ersoz, E. & Springer, N. M. Heterosis is prevalent for multiple traits in diverse maize germplasm. PLoS ONE 4, e7433 (2009).
Birchler, J. A. & Veitia, R. A. The gene balance hypothesis: implications for gene regulation, quantitative traits and evolution. New Phytol. 186, 54–62 (2010).
Goff, S. A. A unifying theory for general multigenic heterosis: energy efficiency, protein metabolism, and implications for molecular breeding. New Phytol. 189, 923–937 (2011).
Schnable, P. S. & Springer, N. M. Progress toward understanding heterosis in crop plants. Annu. Rev. Plant Biol. 64, 71–88 (2013).
Jiao, Y. et al. Improved maize reference genome with single-molecule technologies. Nature 546, 524–527 (2017).
Sun, S. et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat. Genet. 50, 1289–1295 (2018).
Troyer, A. F. Background of US hybrid corn. Crop Sci. 39, 601–626 (1999).
Wang, F. et al. Development and characterization of a core set of SSR markers for fingerprinting analysis of Chinese maize varieties. Maydica 56, 7–18 (2011).
Lu, F. et al. High-resolution genetic mapping of maize pan-genome sequence anchors. Nat. Commun. 6, 6914 (2015).
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275 (2019).
Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021).
Haberer, G. et al. European maize genomes highlight intraspecies variation in repeat and gene content. Nat. Genet. 52, 950–957 (2020).
Liu, Q. et al. An atypical thioredoxin imparts early resistance to sugarcane mosaic virus in maize. Mol. Plant 10, 483–497 (2017).
Guo, L. et al. Stepwise cis-regulatory changes in ZCN8 contribute to maize flowering-time adaptation. Curr. Biol. 28, 3005–3015 (2018).
Hung, H. Y. et al. ZmCCT and the genetic basis of day-length adaptation underlying the postdomestication spread of maize. Proc. Natl Acad. Sci. USA 109, E1913–E1921 (2012).
Yang, Q. et al. CACTA-like transposable element in ZmCCT attenuated photoperiod sensitivity and accelerated the postdomestication spread of maize. Proc. Natl Acad. Sci. USA 110, 16969–16974 (2013).
Huang, C. et al. ZmCCT9 enhances maize adaptation to higher latitudes. Proc. Natl Acad. Sci. USA 115, E334–E341 (2018).
Salvi, S. et al. Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize. Proc. Natl Acad. Sci. USA 104, 11376–11381 (2007).
Liu, L. et al. KRN4 controls quantitative variation in maize kernel row number. PLoS Genet. 11, e1005670 (2015).
Jia, H. et al. A serine/threonine protein kinase encoding gene KERNEL NUMBER PER ROW6 regulates maize grain yield. Nat. Commun. 11, 988 (2020).
Tao, Y. et al. Identification and fine-mapping of a QTL, qMrdd1, that confers recessive resistance to maize rough dwarf disease. BMC Plant Biol. 13, 145 (2013).
Liu, Q. et al. A helitron-induced RabGDIα variant causes quantitative recessive resistance to maize rough dwarf disease. Nat. Commun. 11, 495 (2020).
Tao, Y. et al. Combined linkage and association mapping reveals candidates for scmv1, a major locus involved in resistance to sugarcane mosaic virus (SCMV) in maize. BMC Plant Biol. 13, 162 (2013).
Wang, B. et al. Genome-wide selection and genetic improvement during modern maize breeding. Nat. Genet. 52, 565–571 (2020).
Battilani, P. et al. Oxylipins from both pathogen and host antagonize jasmonic acid-mediated defence via the 9-lipoxygenase pathway in Fusarium verticillioides infection of maize. Mol. Plant Pathol. 19, 2162–2176 (2018).
Sun, Y. et al. 3D genome architecture coordinates trans and cis regulation of differentially expressed ear and tassel genes in maize. Genome Biol. 21, 143 (2020).
Duan, C. X. et al. Advances in research on maize resistance to ear rot. Sci. Agricultura Sin. 48, 2152–2164 (2015).
Cheng, Y., Kato, N., Wang, W., Li, J. & Chen, X. Two RNA binding proteins, HEN4 and HUA1, act in the processing of AGAMOUS pre-mRNA in Arabidopsis thaliana. Dev. Cell 4, 53–66 (2003).
Gao, X. Q. et al. OsLIS-L1 encoding a lissencephaly type-1-like protein with WD40 repeats is required for plant height and male gametophyte formation in rice. Planta 235, 713–727 (2012).
Huang, Y. et al. Megabase-scale presence-absence variation with Tripsacum origin was under selection during maize domestication and adaptation. Genome Biol. 22, 237 (2021).
Ning, Q. et al. An ethylene biosynthesis enzyme controls quantitative variation in maize ear length and kernel yield. Nat. Commun. 12, 5832 (2021).
Guo, M. et al. Maize ARGOS1 (ZAR1) transgenic alleles increase hybrid maize yield. J. Exp. Bot. 65, 249–260 (2014).
Liu, H. et al. Genome-wide identification and analysis of heterotic loci in three maize hybrids. Plant Biotechnol. J. 18, 185–194 (2020).
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007 (2014).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv https://doi.org/10.48550/arXiv.1303.3997 (2013).
Marcais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Ghosh, S. & Chan, C. K. Analysis of RNA-seq data using tophat and cufflinks. Methods Mol. Biol. 1374, 339–361 (2016).
Shao, M. & Kingsford, C. Accurate assembly of transcripts through phase-preserving graph decomposition. Nat. Biotechnol. 35, 1167–1169 (2017).
Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein coding regions in RNA transcripts. Nucleic Acids Res. 43, e78 (2015).
Campbell, M. A., Haas, B. J., Hamilton, J. P., Mount, S. M. & Buell, C. R. Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BMC Genomics 7, 327 (2006).
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89 (2016).
Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S. O. & Grau, J. Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. BMC Bioinf. 19, 189 (2018).
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, ii215–ii225 (2003).
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Blanco, E., Parra, G. & Guigo, R. Using geneid to identify genes. Curr. Protoc. Bioinformatics https://doi.org/10.1002/0471250953.bi0403s18 (2007).
Korf, I. Gene finding in novel genomes. BMC Bioinf. 5, 59 (2004).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Ou, S. & Jiang, N. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mobile DNA 10, 48 (2019).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinf. 9, 18 (2008).
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Toronen, P., Medlar, A. & Holm, L. PANNZER2: a rapid functional annotation web server. Nucleic Acids Res. 46, W84–W88 (2018).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Nattestad, M. & Schatz, M. C. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics 32, 3021–3023 (2016).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Tian, T. et al. agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update. Nucleic Acids Res. 45, W122–W129 (2017).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Lu, X. et al. Gene-indexed mutations in Maize. Mol. Plant 11, 496–504 (2018).
Piepho, H.-P. Letters in mean comparisons: what they do and don’t mean. Agron. J. 110, 431–434 (2018).
Shi, J. Maize_pseudomolecule_construction: Maize_pseudomolecule_construction_v1.0 (v1.0). Zenodo https://doi.org/10.5281/zenodo.7407607 (2022).
This work is supported by National Key R&D Program of China (grant 2021YFF1000301 to Haiyang Wang), National Natural Science Foundation of China (32022065 to B.W., 32130077 to Haiyang Wang, 31871639 to Y.C.), the Major Program of Guangdong Basic and Applied Research (2019B030302006 to Haiyang Wang), the Beijing Scholars Program (BSP041 to J.Z.) and the Agricultural Science and Technology Innovation Program of CAAS to B.W.
The authors declare no competing interests.
Peer review information
Nature Genetics thanks Georg Haberer and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Distribution profile of genes and three types of most abundant transposable elements (Gypsy, Copia and Helitron superfamilies) in the 14 FIL genomes.
The purple rectangles below each chromosome indicate positions of the centromeres.
Extended Data Fig. 2 Pan-gene distribution and global comparison profile of the syntenic blocks between B73 genome with the genomes of other 13 FILs.
a, Distribution profile of the Core- and Pan-genes in the B73_v4 genome. b, Profile of whole-genome alignment (WGA) of 13 FIL genomes against the B73_v4 genome. The purple rectangles below each plot of B73 chromosomes indicate positions of the centromeres. c, d, Length (c) and ratio (d) of syntenic sequences occupied the B73 genome for pan-WGA map.
a-l, Profile of synteny relationship for 12 high-confident mega-base SVs between B73 and the other 13 FILs. m-o, Synteny relationship of the 8 cross-validated mega-base SVs among the 14 FILs. The yellow segments represent sequence of Ns among the Contigs (gaps estimated based on the BioNano physical map), while the green segments (very rare here) represent sequence of Ns among the Scaffolds. The red segments represent the putative SVs between the paired FILs.
Extended Data Fig. 4 Structural variations of ZmCCT9, VGT1, ZCN8, KRN4, KNR6 and ZmGDIα in the 14 maize FIL genomes.
a, The 57 kb upstream region of ZmCCT9. b, The 2 kb conserved noncoding region of Vegetative to generative transition 1 (Vgt1). c, The promoter region of ZCN8. d, The ~60 kb downstream region of UNBRANCHED3 (UB3). e, The genomic region of KNR6. The trait value of kernel row number (KRN) or ear length (EL) are shown on the right. f, The resistance performance of maize rough dwarf disease (MRDD) for Xu178 and HuangC in normal or MRDD epidemic environments. Scale bar, 15 cm. g, QTL mapping for MRDD resistance using a recombinant inbred lines (RILs) population derived from Xu178 and HuangC. h, The genomic region of ZmGDIα. Each horizontal long bar represents genomic sequence of the FILs. The blank regions represent no difference among the FILs. The green, red, black, blue and orange vertical segments represent the A, T, G, C and missing type, respectively. The red box indicates position of the reported causal variant.
Extended Data Fig. 5 Haplotype frequency profile of the known functional variations for ZmCCT9, VGT1, ZCN8, KRN4, KNR6, ZmGDIα and ZmTrxh in different heterotic groups.
These analyses were based on PCR-genotyping of 350 temperate inbred lines. Sample numbers are shown in brackets neighboring names of heterotic groups.
For each box, the upper and lower boundaries represent the 25th and 75th percentile, respectively. The middle horizontal lines represent the median. The whiskers represent 1.5× the interquartile range. The dots beyond the whiskers represent outliers. The “compact letter display (CLD)” method is used to determine the multiple comparison results (conducted by the least significant difference method). Different letters above the boxes indicate significant differences (P < 0.05, Bonferroni correction) in a pairwise comparison. Sample numbers are shown in brackets on x axis. DTA: days to anthesis; KRN: kernel row number; EL: ear length; TBN: tassel branch number; EH: ear height; EP: relative ear height (ear height/plant height).
Extended Data Fig. 7 Two SVs in the promoter region and second intron regulate expression variation in ZmLOX3.
a, Association mapping identifies SV151 and SV363 as the SVs affecting ZmLOX3 expression level. The upper panel is the Manhattan plot; the lower panel is the LD heatmap. The y axis of Manhatten plot represents the −log10 p-value for candidate association analysis using the MLM method. b, Four haplotypes formed by SV151 and SV363 in 131 inbred lines. Hap-151/363 was excluded for further analysis due to its small sample size (n < 10). c, The relative expression level (qRT-PCR) of ZmLOX3 in ovules (prior to pollination) or kernels (5 days after pollination) of 12 FILs. Values shown are mean ± SD. d, Violin plot for ZmLOX3 expression levels (RNA-Seq) in inbred lines of Hap-0/363, Hap-0/0 and Hap-151/0 (n = number of inbred lines, which is shown in the parenthesis). For the violin plot, the white dot represents the median, the black box limits indicate the 25th and 75th percentiles, and the whiskers represent 1.5× the interquartile range. p-values for two-sided t-tests are shown. e, Frequency changes of Hap-0/363, Hap-0/0 and Hap-151/0 in 350 inbred lines of different breeding eras in the US and China. f, Frequency profile of Hap-0/363, Hap-0/0 and Hap-151/0 in different heterotic groups.
Extended Data Fig. 8 Correlation analysis between BPH performance and different kinds of SVs between the parents of 91 diallel-cross F1s.
(a-f) Correlation analysis between over parent heterosis of grain yield per plant (GYPP) and different kinds of SVs between the parents of 91 diallel-cross F1s. The p-values (p) and Spearman rank-based correlation coefficients (r) are shown on the top of each plot. The 7 registered varieties in China or the US are shown as colored points with corresponding names shown at the bottom.
Extended Data Fig. 9 Relationship between genetic variations and GYPP_BPH performance in 7 historically released varieties.
a, Comparison of the genetic variations between 7 historically released varieties and other F1s. The p-values (p) for two-sided t-test results are shown on the top of each plot. For each box plot, the upper and lower boundaries represent the 25th and 75th percentile, respectively. The middle horizontal lines represent the median. The whiskers represent 1.5× the interquartile range. The dots beyond the whiskers represent outliers. b, Correlation analysis between GYPP_BPH performance and different types of genetic variants between the parents of the seven registered varieties. The p-values (p) and Spearman rank-based correlation coefficients (r) are shown on the top of each plot.
Distribution profile of genes and three types of most abundant transposable elements (Gypsy, Copia and Helitron superfamilies) in the 14 FIL genomes.
PCR-genotyping of 229 of the 350 temperate inbred lines.
Phenotyping profile of the 350 inbred lines.
GYPP_BPH performance and different kinds of SVs between the parents of the 91 diallel-cross F1s.
About this article
Cite this article
Wang, B., Hou, M., Shi, J. et al. De novo genome assembly and analyses of 12 founder inbred lines provide insights into maize heterosis. Nat Genet (2023). https://doi.org/10.1038/s41588-022-01283-w