A heading date QTL, qHD7.2, from wild rice (Oryza rufipogon) delays flowering and shortens panicle length under long-day conditions

Heading date (HD) and panicle length (PL) are important traits that affect rice breeding and are controlled by pleiotropic genes. Some alleles associated with HD and PL from wild relatives might differ from those in cultivated rice. In this study, a main effect HD quantitative trait locus from wild rice, qHD7.2, was identified using a chromosomal segment substitution line (CSSL) population. First, qHD7.2 was determined to be located near RM172 on chromosome 7 based on association analysis of phenotype data from six environments and 181 polymorphic molecular markers. CSSL39, which has the latest flowering of all CSSLs and carries qHD7.2, was selected for further study, and qHD7.2 was narrowed to a 101.1-kb interval using a CSSL39/9311 F2 population. An OsPRR37-homologous gene was found within this region. The wild type allele delayed flowering and shortened PL under long-day conditions. The HD7.2, which was identified as a candidate gene for qHD7.2, transcript level was substantially higher than that in 9311. Our data showed that HD7.2 is likely a novel OsPRR37 allele. Sequence analysis revealed that OsPRR37 in cultivated rice had multiple origins, and natural variation in the coding domain sequence and promoter region contribute to flowering time diversity in cultivated rice.

As an increasing number of HD genes have been identified, the association of HD and PL has been identified in pleiotropic genes. Both HD and PL are key factors that influence the business value of cultivated rice. It has been reported that PL is co-segregated and finely mapped with the HD locus, which demonstrates the pleiotropic effects of the underlying genes 29, 30 . Hd1 29 , Ghd7 15 , and Ghd8 30 are pleiotropic, and prolong HD and enhance PL. However, genetic factors that affect HD and their effects on PL are not well understood. Moreover, a large of proportion of these QTLs have still not been finely mapped or cloned, and more HD genes are needed to elucidate the associated genetic interactions 31,32 .
Common wild rice (Oryza rufipogon Griff.) evolved many exotic genes in response to a variety of disasters and natural selection of harsh environments, and exhibits a late HD and open panicle architecture compared with cultivated rice 33 . Chromosome segments substitution lines (CSSLs) greatly improve the accuracy of gene or QTL mapping by eliminating the influence of genetic backgrounds. Thus, CSSLs have been used to obtain many cloned and fine-mapped QTLs, because they represent ideal material for genetic analysis and gene fine mapping 14,[34][35][36][37][38][39] . Numbers of QTLs from wild rice species were reported and some alleles increase PL 40,41 .
OsPRR37, which is responsible for the Hd2/qDTH7-2 QTL, was reported as a major effect QTL that control photoperiod sensitivity in rice 42 . The genes homologous to OsPRR37 in barley, wheat, and sorghum have been well studied with regard to how they impact photoperiod sensitivity [43][44][45] . Natural variants of OsPRR37 have also been reported that contributed to the expansion of rice cultivation to temperate and cooler regions 21 . However, the natural variants of promoter of OsPRR37 have remained elusive.
To explore new allele from wild rice, we constructed a set of wild rice CSSLs in the previous study. We herein report the detection and fine mapping of a QTL for HD and PL. In this study, we performed QTL analysis for HD and PL using CSSLs and advanced backcross populations. One QTL from wild rice, qHD7.2, was identified and a novel allele of OsPRR37 was fine mapped within this QTL. Expression pattern of this gene was analyzed between wild and cultivated rice, and sequence analysis among 3000 germplasm was performed to identify the origin and selection domain during rice domestication. Our study provides a new genetic resource for cultivated rice breeding and new evidence regarding the evolution of flowering genes.

Results
qHD7.2 detection using a CSSL population. In our previous study, a set of 198 CSSLs population were developed from common wild rice as the donor parent and an elite O.sativa indica cultivar, 9311 as recurrent parent 46 . The HD and PL were remarkably different between the two parents; the wild rice has an open panicle phenotype and does not undergo heading under long-day conditions. The HD phenotype substantially differed within the CSSL population. The days to heading we obtained from six environments (two years, three sites) were associated with genotype based on 119 simple sequence repeat (SSR) and 62 insert/delete (InDel) polymorphic markers evenly distributed across the 12 rice chromosomes in our laboratory, several HD QTLs were identified based on a cut-off LOD score ⩾ 2.5 (Table 1). Two QTLs located, one located near RM172 on chromosome (Chr.) 7 and one located near RM19 on Chr. 12, were detected in all environments. These loci showed an increasing effect on HD; this indicates that these two QTLs were stably detected, and they could improve accuracy and efficiency of future work. In particular, the QTL near RM172 identified under the Beijing conditions had the highest LOD value (14.6) and explained 27.2% of the variance, which indicated that this QTL (named qHD7.2) is likely a main effect QTL. One CSSL, CSSL39, which flowers the latest of all CSSLs under natural long-day (NLD) conditions in Beijing (116.4°E/39.9°N, day length > 15 h) and carries the qHD7.2, was selected for advanced study. The CSSL39 genotypes were shown in Fig. 1. Seven substituted segments from wild rice were detected in the whole CSSL39 genome (Fig. 1) , and the HD of F 1 progeny fell in between those of the two parents ( Fig. 2A,B). Additionally, we investigated CSSL39 basic agronomic traits, including thousand grain weight, grain length, grain width, plant height, flag leaf length, flag leaf width, PL, and tiller number (S- Table 1). We found that only PL was significantly shorter compared with 9311; CSSL39 PL was 5.88 cm shorter under NLD conditions and 3.58 cm shorter under NSD conditions compared with that of 9311, the F 1 generation also exhibited a moderate PL (Fig. 2C,D). Moreover, we counted of the number of mature CSSL39 and 9311 seeds based on seed coat colour; the results showed that approximately 88.9% of 9311 grains but only 25% of CSSL39 grains reached maturity under NLD conditions. However, there were no significant  Sixteen new InDel and SSR markers located at the substituted interval near RM172 were used to screen the parents, CSSL39 and 9311, of which 10 markers exhibited polymorphism (S- Table 2). First, the substituted segment near RM172 was narrowed to the interval between InDel7-12 to InDel7-13 ( Fig. 4A). Combined with the eight SSR markers which were detected in CSSL39, we used these 18 polymorphic markers for CSSL39/9311 F 2 population screening. HD and PL QTLs were detected in 1024 F 2 individuals from Sanya and 846 F 2 individuals from Beijing. In total, two HD and three PL QTLs were detected (Table 2). Two HD QTLs (qHD7.1 and qHD7.2), which can be stably inherited, were detected under both NLD and NSD conditions. qHD7.1 was mapped in the 522.9-kb interval of RM7601 to RM172 on Chr. 7, whereas qHD7.2 was mapped in the 101.1-kb interval of RM172 to RM22188 (Fig. 4B). They maintained the same genetic effects that prolong HD. Three PL QTLs (qPL7, qPL10.1, and qPL10.2) were detected. qPL7 was mapped in the 101.1-kb interval of RM172 to RM22188 at the same position of qHD7.2. qPL10.1 was mapped in the 568-kb interval of RM171 to RM1146 on Chr. 10, and qPL10.2 was mapped in the 69.7-kb interval of RM25723 to RM7300 on Chr. 10. Among those three QTLs, qPL10.1 was only detected under NSD conditions; qPL7 and qPL10.2 can be stably inherited under both NLD and NSD conditions. Only qPL10.2 under NSD conditions had a positive effect on elongating the panicle; the others had negative effects on shortening the panicle. Therefore, we predicted that qHD7.2 is likely a major QTL for delaying HD and shortening PL.
Gene prediction and expression analysis. According to GRAMENE website (www.gramene.org/) and the Rice Annotation Project database (http://rapdb.dna.affrc.go.jp/) 47 , 10 open reading frames (ORFs) in the target region for qHD7.2 between RM172 and RM22188 were predicted (Fig. 4C, S- Table 3). Four of the ORF genes contain reported functional domains, and the others are hypothetical proteins. We sequenced each of the 10 annotated genes, and only ORF1 and ORF2 had no polymorphisms in the coding domain between CSSL39 and 9311 (data not shown). Among the 10 genes, ORF7 (LOC_Os07g49460) with functional information is related to HD control. LOC_Os07g49460 encodes a protein that contains a response regulator receiver domain and corresponds to the cloned HD gene OsPRR37; PRR37 was reported to show photoperiodic sensitivity and affects HD under long-day conditions 21 . Thus, we predicted that LOC_Os07g49460 could be a potential candidate for qHD7.2, and is hereafter named HD7.2. Furthermore, we aligned the HD7.2 coding domain sequence (CDS) with OsPRR37 of 9311 and Nipponbare (japonica). There were 10 mutations and an 8-bp deletion in the 9311 CDS compared with CSSL39; the 8-bp deletion produced premature translational termination and then led to a non-functional allele (Fig. 4D, S-Fig. 2). There were also five differences in the coding sequence between CSSL39 and Nipponbare, which led to five amino acid changes.
The HD7.2 spatial expression patterns were monitored in CSSL39. The HD7.2 expression levels of the flag leaf, second leaf, leaf cushion, leaf sheaths, stems, column, roots, and panicle of CSSL39 were detected under NLD conditions in the heading period. The data show that transcript levels of this gene differed among tissues; the highest expression level was found in the flag leaf, and the lowest was in the roots (Fig. 5A). We compared the HD7.2 expression levels between CSSL39 and 9311 in three different stages (before heading, heading period, and after heading), the transcript level was approximately two-fold higher in CSSL39 than in 9311 plants (Fig. 5B). Then, we compared the promoter region sequences that were located 2.0-kb upstream of the initiation codon between CSSL39 and 9311. The data show that there were many mutations or InDels in 9311 compared with CSSL39, and these polymorphic sites in the promoter region changed some cis-acting elements (S- Fig. 3     The network of the LOC_Os07g49460 promoter sequence were also constructed using the same database; 21 haplotypes which contain more than 10 cultivated individuals or five wild rice individuals were selected (Fig. 6B). The promoter sequence of LOC_Os07g49460 allele had high diversity in both cultivated rice and wild rice. The cultivated rice haplotypes H_1 and H_12 were most closely related to the wild rice haplotypes H_13-15, H_17-18, and H_20. More than 75% of H_1 individuals were japonica, and all H_12 individuals were indica, which indicated that both indica and japonica promoter regions originated from wild rice and be paralleled domesticated. Other cultivated rice haplotypes, including H_2-3, H_6, and H_7-11, of which most were indica, were closely related to the wild rice haplotypes H_16 and H_19. This finding indicates that the promoter region of different cultivated rice originated from different wild rice species, and the promoter region was also selected during domestication.

Discussion
Previous studies showed that O. rufipogon from southern China is the ancestor of O. sativa, and many alleles in the wild species were lost during rice domestication 48,49 . The exploitation of novel alleles from wild rice that were lost in cultivated rice could be very important for rice breeding and evolution studies. Genetic populations played a major role in QTL detection and gene mapping. CSSLs have the potential to facilitate identification of QTLs not identified in F 2 populations because of genetic background noise. In our previous study, we developed a CSSL population with a 9311 genetic background which cover the whole wild rice genome; many QTLs associated with various agronomic traits were identified using this CSSL population 46 . In this study, we identified one HD QTL, qHD7.2, using this CSSL population, which had the highest LOD value and has stable inheritance in all environments (Table 1). One CSSL, CSSL39, which consistently produced the latest heading date and carried the qHD7.2 but none of other HD QTLs from the donor parent, was selected for advanced study. CSSL39 exhibited a substantially shorter panicle than 9311 under NLD and NSD conditions. Because dense SSR markers were used during the development of this CSSL population, CSSL39 was not a near isogenic line, and eight substituted segments from wild rice were detected in its genome. Furthermore, an F 2 population of CSSL39 and recurrent parent 9311 was constructed for fine mapping of qHD7.2. Ten polymorphic markers located at the substituted interval around RM172 on Chr. 7 were used for additional analysis of almost 3000 F 2 plants. Finally, a pleiotropic QTL response for HD and PL was detected between two markers: RM172 and RM22188. Theoretically, an isogenic qHD7.2 line and F 3 recombinants should be constructed to confirm our results. However, a previously cloned HD gene, Os07g49460 (OsPRR37), was found in this interval, and an 8-bp deletion led to a nonfunctional allele. Although another PL QTL was also detected on Chr. 10, we designed InDel primers for HD7.2 detection in F 2 and F 3 populations, and all individuals with the wild rice allele showed later HDs and shorter PLs (data not shown). Therefore, we can exclude other intervals, and it can be deduced that HD7.2 is the target gene involved in shaping HD and PL phenotypes. These findings also illustrate that these CSSLs are suitable and efficient for fine mapping.
Pseudo-response regulators (PRRs) have been reported to be important circadian-clock components in Arabidopsis and rice 42,50 . The regulatory roles of the PRR37 orthologues in growth and development diverged among species; for example, the Arabidopsis prr7 loss-of-function mutants flower slightly later under inductive long-day conditions, but rice prr37-knockout mutants flower early in non-inductive long-day conditions 21,42 . One member of rice PRR gene family, OsPRR37 is responsible for EH7-2/Hd2, which is the major effect QTL that controls photoperiod sensitivity in rice. OsPRR37 down-regulates Hd3a expression to suppress flowering under long-day conditions, and the natural variation in OsPRR37 regulates HD and contributes to rice cultivation at a wide range of latitudes 21 . Some homologous genes of OsPRR37 in barley, wheat, and sorghum have also been well documented to regulate flowering time [43][44][45] . In this study, the PRR37 allele from wild rice also substantially suppresses flowering under NLD conditions. Previous studies reported that PRR37 also affects other important agronomic traits such as plant height and spikelets per panicle [42][43][44][45] . However, no reports previously found that PRR37 affects PL, so we predicted that HD7.2 is likely a novel allele of PRR37 that delays flowering and shortens PL under long-day conditions.
Mapping of QTLs for HD and yield component traits in rice has resulted in remarkable progress in elucidating the genetic basis that underlies the natural variation of these traits. Major QTLs for HD and yield component traits have shown a common association between delayed heading and increased yield, such as Ghd7 15,51 , DTH8/Ghd8/qHY-8/LH8 [52][53][54][55] , Hd1 29 , and Ghd7.1 56 . PL is a major grain yield component trait; in this study, HD7.2 delayed HD but shortened PL, potentially because HD7.2 from wild rice controls the physiological process of panicle development. Our data indicated that HD7.2 is likely a novel allele of PRR37 that has a different function. Further studies, such as a transgenic experiment of HD7.2, should be performed to understand the exact function of this gene.
There were many nucleotide changes in both coding and promoter sequences of HD7.2 between the two parents, and the expression level in CSSL39 was higher than that in 9311 during each period. In the 9311 coding sequences, an 8-bp deletion produced premature translational termination. In the promoter sequences, the changes led to cis-factor element differences between CSSL39 and 9311; these data indicated that the CSSL39 phenotype changes were caused by the changes of HD7.2 transcript level and protein function. Nucleotide diversity and network analysis of PRR37 were implemented using more than 3000 cultivated and wild rice accessions. The data showed that this gene originated from multiple wild rice accessions, which is consistent with previous reports that japonica and indica evolved from multiple ancestral populations 48 . Furthermore, our result showed that the promoter region also originated from wild rice, and substantial natural variation was found in rice landraces. Koo et al. 21 reported that natural variation of the PRR37 protein contributed to the wide range of latitudes in which cultivated rice can be grown; usually, japonica, which is distributed in high-latitude regions, has the non-functional allele 21 . In this study, some landraces had functional alleles, but the promoter was low-expression type, so they still can flowering normally under long-day conditions. It can be concluded that both the HD7.2 coding and promoter regions were selected during domestication, and natural variation in the PRR37 promoter region also contributed to the widespread distribution of cultivated rice. Our study could provide a novel understanding of the rice OsPRR37 gene and rice flowering regulation networks, and provides additional evidence regarding the evolution of this gene in rice domestication.

Materials and Methods
Plant materials and growth conditions. A set of 198 CSSLs produced from common wild rice (O. rufipogon) as the donor and an elite indica variety, 9311, as the recurrent parent was developed in our laboratory as previously reported 42 . Each CSSL was genotyped using 313 polymorphic SSR markers evenly distributed across the 12 rice chromosomes. In this study, the CSSL population was employed for QTL mapping of HD. The genotypes of each individual were surveyed by SSR analysis; among them, one line, CSSL39, was selected as the starting material for the present study. CSSL39 was backcrossed with 9311, and the resultant F 1 was self-crossed to produce F 2 seeds.
The 198 CSSLs and recurrent parent 9311 were grown in six environments (two years × three locations) (S- Table 4 Phenotype investigation. The mean value of 10 representative individual plants in the middle of the entry plot were selected for the CSSLs and 9311. HD was measured on a single-plant basis. Days to heading for each individual were scored when the first panicle (2-cm-long) emerged. PL is the length from the panicle neck to tip of the main panicle, but does not include awn length. Seed maturity percentages were measured from two panicles sampled 60 d after 9311 heading under NLD and NSD conditions, which was determined by yellow pigmentation of the seed coat. QTL analysis and predicted candidate gene. Initially, a total of 191 polymorphic SSR markers selected from a public database (Rice Genome Research Program 2007) were employed to construct the linkage map. To construct a high-density linkage map for fine mapping of the QTL, new InDel markers that cover the target QTL region were developed. The Nipponbare and 9311 target sequences were obtained from publicly available rice genome sequence data to develop InDel markers. Primers were designed based on InDel sequences using Primer Premier 5.0. All primer pairs flanking SSRs or InDels were designed using the following parameters: 18-25 nucleotides in length, absence of secondary structure, a GC content of approximately 50%, and a melting temperature around 55 °C. SSR and InDel marker primers were synthesised by Shanghai Invitrogen Biotechnology Company (Shanghai, China). Polymorphisms of the SSR and InDel markers between the two parents were tested by PCR. DNA of the samples was extracted from fresh leaves at the seedling stage by employing the CTAB method. PCR amplification consisted of a denaturing step of 5 min at 95 °C; followed by 33 cycles of 30 s at 94 °C, 30 s at 56 °C, and 30 s at 72 °C; finally, 10 min at 72 °C. Amplifications were separated by 6% denatured polyacrylamide gel electrophoresis and visualised by silver staining. QTL analysis was conducted by combined the genotype with phenotype of CSSLs and secondary separation population using QTL IciMapping 57 . Mapping standard was identified as LOD⩾2.5, because a QTL exists when LOD ⩾2.5. Putative genes in the qHD7.2 region were predicted by referring to the Rice Genome Annotation Project (http://rice.plantbiology.msu.edu/). Total RNA was extracted from leaves of the two parents using Trizol reagent (Invitrogen, CA, USA) and reversely transcribed into cDNA using a Reverse Transcription Kit (TaKaRa, Scientific RepoRTs | (2018) 8:2928 | DOI:10.1038/s41598-018-21330-z Otsu, Japan). The coding regions of the putative genes were amplified from cDNA using PFU polymerase (TaKaRa, Otsu, Japan) and sequenced by Shanghai Sangon Biotechnology Company (Shanghai, China). DNA sequence comparison between the parents was performed using the BLAST program.
Gene expression analysis. The CSSL39 plants were grown under NLD conditions for 140 d, which was before heading. The flag leaf, second leaf, leaf cushion, stems, column, roots, panicle, and leaf sheaths were harvested. All samples were harvested from the main culm of each plant. Samples from two or three different individuals were collected as biological replicates. For expression comparison, two parents were planted in the Chinese Academy of Agricultural Sciences greenhouse. Each pot in half, planting 3 pots to make each pot have CSSL39 and 9311, to ensure consistent planting conditions. Fresh leaves were collected before heading, at heading, and after heading. RNA was extracted using TRIzol Reagent (Invitrogen, CA, USA) and treated with DNase I (Invitrogen, CA, USA). cDNA was synthesised using SuperScript III Reverse Transcriptase (Invitrogen, CA, USA). Quantitative analysis of gene expression was performed with SYBR Premix Ex Taq (TaKaRa, Otsu, Japan) on an Applied Biosystems 7500 Real-time PCR System. The data were analysed using the relative quantification method.
Network and genetic diversity analyses. We collected SNP and InDel genomic variation data for the 2859 rice genomes, and established a comprehensive SNP and InDel sub-database for the Rice Functional Genomics and Breeding Database (http://www.rmbreeding.cn/snp3k) 58 . This sub-database is a global resource that contains tools such as a polymorphism information retrieval function, genome browser visualization system, and data export system for specific genomic regions. All the SNPs located in the promoter and CDS regions were extracted based on the genome gff3 annotation. Haplotype analysis was performed using Perl scripts, and only non-synonymous SNPs were considered. Number of haplotypes and haplotypes diversity were counted by DnaSPv5 software (http://www.ub.edu/dnasp) 59 and introduced to NETWORK 5.0.0.0 programme (Fluxus technology Ltd. 2015) for haplotype networks construction.