New alleles for chlorophyll content and stay-green traits revealed by a genome wide association study in rice (Oryza sativa)

Higher chlorophyll content (CC) and strong stay-green (SG) traits are conducive for improvement of photosynthetic efficiency in plants. Exploration of natural elite alleles for CC and SG, and highly resolved gene haplotypes are beneficial to rational design of breeding for high-photosynthetic efficiency. Phenotypic analysis of 368 rice accessions showed no significant correlation between CC and SG, and higher CC and stronger SG in japonica than in indica. Genome-wide association studies of six indices for CC and SG identified a large number of association signals, among which 14 were identified as pleiotropic regions for CC and SG. Twenty-five known genes and pleiotropic candidate gene OsSG1 accounted for natural variation in CC and SG. Further analysis indicated that 20 large-effect, non-synonymous SNPs within six known genes around GWAS signals and three SNPs in the promoter of OsSG1 could be functional causing significant phenotypic differences between alleles. Superior haplotypes were identified based on these potentially functional SNPs. Population analyses of 368 cultivated accessions and 446 wild accessions based on SNPs within genes for CC and SG suggested that these genes had been subjected to strong positive selection in japonica in the process of spreading from its subtropical origin to the North China temperate zone. Our studies point to important genes that account for natural variation and provide superior haplotypes of possible functional SNPs that will be beneficial in breeding for high-photosynthetic efficiency in rice.


Materials and Methods
Materials and sequencing data. Three hundred and sixty-eight rice accessions from 32 countries were used as materials for identification of CC and SG genes. The sequence data of all accessions were obtained from the 3000 Rice Genome Project (3KRGP) 24,27,28 . For phylogenetic analysis, we added 446 wild rice accessions, having publicly available sequencing data from a previous report 20 . Phenotyping. All 368 rice accessions were used in phenotyping CC and SG. Field experiments were performed at the China Agricultural University Shangzhuang Experimental Station in Beijing in the summer of 2014. Two replicates were grown in each of two fields and each accession was transplanted 30 days after sowing in three row plots with 20 cm between plants and 26 cm between rows. Three central plants from the middle row of each plot were used to assess CC and SG. We measured the CC in the flag leaf, and second and third upper leaves of two tillers of each plant by a SPAD (soil-plant analysis development) meter (SPAD-502 Plus, Konica-Minolta, Japan) at heading and 30 days after heading. Average SPAD values across the two replicates were used for analysis. We adopted six indices to evaluate the CC and SG of all materials. These included SPAD of the flag leaf at heading (SFH), total SPAD for the three upper leaves at heading (TSH), absolute difference value of SPADs of the flag leaf at heading and 30 days post heading (ADSF), relative difference value of SPAD of the flag leaf at heading and 30 days post heading (RDSF), cumulative SPAD of the flag leaf at heading and 30 days post heading (CSF), and total cumulative SPAD for the three upper leaves at heading and 30 days post heading (TCS). The formulae of ADSF, RDSF, CSF and TCS were: ADSF = SPAD of the flag leaf at heading − SPAD of the flag leaf at 30 days post heading, RDSF = ADSF/SPAD of the flag leaf at heading, CSF = SPAD of the flag leaf at heading + SPAD of the flag leaf at 30 days post heading, and TCS = total SPAD for the three upper leaves at heading + total SPAD for the three upper leaves at 30 days post heading. Among these indices, SFH and TSH were used as CC indices, and ADSF and RDSF indicated the difference and degradation rate of CC at two growth stages. We applied the two indices to assess the SG of each accession. We also considered CSF and TCS as indices to evaluate ability including CC and SG, which to a certain extent, reflect the accumulation of chlorophyll (ACC) during the heading and 30 days post heading stages.
Population genetic analysis and GWAS. More than 3.3 million SNPs with minor allele frequencies (MAF) >0.05 and missing rates <0.5 were used in population genetic analysis and GWAS. Principal component (PC) and kinship analyses were performed using GAPIT 29 to evaluate population structure and relative kinship of the 368 rice accessions. The first three PCs were used to construct a PC matrix. To control spurious associations, we performed GWAS on 6 indices for CC, SG and ACC using the compressed mixed linear model (CMLM) with PC and kinship matrices, that account for population structure and identify the optimal group kinship matrix 30 . A significance threshold was calculated using the formula "-log 10 (1/the effective number of independent SNPs)" as described previously 31 , and effective numbers of independent SNPs were determined by PLINK to be 144605, 172233 and 95342 in the full population, and indica and japonica subpopulations, respectively 32 . The suggestive P values were 6.9 × 10 −6 , 5.8 × 10 −6 and 1.0 × 10 −5 , respectively. Finally, the threshold was set at −log(P) = 5 to identify significant association signals. Due to different genome-wide linkage disequilibrium (LD) decay rates in indica and japonica at 123 kb and 167 kb 33 , adjacent significant SNP with distances less than 170 kb were merged into single association signals. The SNP with the minimum P value in a signal region was considered to be the lead SNP. In order to identify candidate genes in the signal region, LD heatmaps surrounding peaks in the GWAS were constructed using the R package "LD heatmap" 34 . GO and KEGG pathway enrichment analysis. A cytoscape plug-in ClueGO v2.3.5 was used to analyse GO and pathway enrichment 35 . According to the default parameters, a two-sided hypergeometric test and Bonferroni stay-down correction were used to identify enriched GO terms and pathways. Significant enrichment was detected with a corrected P value of <0.05.

Non-synonymous SNPs and haplotype analysis. Based on information on coding sequence (CDS)
coordinates and the transcript from MSU RGAP 7, we separated non-synonymous SNPs from all SNPs across the 368 accessions using an in-house Perl script. Differences in phenotypic values between alleles of each www.nature.com/scientificreports www.nature.com/scientificreports/ non-synonymous SNP were examined by Student's t-tests. Sequence alignment of each gene was performed using non-synonymous SNPs associated with CC or SG, and differences in phenotypic values among haplotypes of each gene were calculated by one-way ANOVA or Student's t-tests. Duncan's multiple range tests were conducted to make comparisons if the results of the one-way ANOVA were significant (P < 0.05). A phylogenetic tree for all 368   cultivated and 446 wild accessions was constructed using the neighbor-joining method in TASSEL 5 and MEGA  5 36,37 . Nucleotide diversity (π) 38 and Tajima's D 39 were calculated using an in-house Perl script.

Results
Population structure and phenotypic characterization of CC, SG and ACC of cultivated rice. PC and kinship analysis showed that the sampled material could be divided into two subpopulations comprising 199 indica and 169 japonica accessions (Fig. S1). Large variations were observed in the whole population among CC indices SFH and TSH, SG indices ADSF and RDSF, and ACC indices CSF and TCS (Fig. S2). High correlations were detected between paired CC, SG and ACC indices with correlation coefficients of 0.943, 0.968 and 0.912, respectively (Table S1). High correlation coefficients (>0.7) were also detected between the CC and ACC indices, whereas low negative correlations were detected between the SG and ACC indices. A low correlation coefficient (<0.4) between the CC and SG indices (Table S1) suggested that there were distinct genetic architectural differences between CC and SG, and that a higher CC index did not imply enhancement of SG.
Taking into account the large genetic differences between the subspecies 20 , we compared CC and SG between indica and japonica. Two CC indices for indica were significantly lower than those for japonica (Table S2). Clear differences were detected between indica and japonica for two SG indices (Table S2). Phenotypic variation in ADSF and RDSF for indica ranged from 0 to 35 and from 0 to 1, whereas phenotypic variation in ADSF and RDSF for japonica ranged from 0 to 20 and from 1 to 0.4, respectively (Fig. S2). Moreover, higher ACC was detected in japonica than in indica (Fig. S2). These results suggested that japonica rice has higher CC, stronger SG and higher ACC than indica.
Fourteen loci for CC and SG were detected by GWAS. A GWAS was performed to identify associations of SNPs for CC, SG and ACC in the full population, and in the indica and japonica subpopulations under CMLM (Materials and methods). Thirty-five, 15, 13, 12, 28 and 10 significant signals were obtained for SFH, TSH, ADSF, RDSF, CSF and TCS, respectively, in the full population (Figs 1 and S3;  Table S3). The differences in the number of significant signals between the subspecies were due to larger phenotypic variation in indica than that in japonica.
There were 28 common lead SNPs for the separate CC indices in GWAS among the three subpopulations, and most significant signals showed overlapping with multiple significant SNPs clustered in regions of less than 170 kb ( Fig. 1; Table S3). Six common lead SNPs for SFH were identified using the full population and japonica subpopulation ( Fig. 1; Table S3). However, no common lead SNP for CC was detected between the indica and japonica subpopulations ( Fig. 1; Table S3). These results indicated an obvious genetic heterogeneity between indica and japonica.
More genetic heterogeneity in SG was detected between indica and japonica by comparing GWAS results for the two subpopulations. Not only was there no common lead SNP in indica and japonica, but only two signals for SG were found in japonica ( Fig. 1; Table S3). Considering less significant association signals and narrower phenotypic variation of SG in japonica, we suggest that strong SG and low genetic diversity of related genes may be important characteristics of japonica.
ACC is a complex trait that includes CC and SG. By comparison with GWAS results for CC, fifteen common lead SNPs were associated with CSF and SFH, and three common lead SNPs were associated with TCS and TSH ( Fig. 1; Table S3). Thus several genes were responsible for ACC and CC.
To further examine associations for CC and SG, we compared significant lead SNPs detected in the three populations for CC and SG. Fourteen pleiotropic association regions for CC and SG were identified ( Fig. 1; Table 1), and among them, eight were also identified for ACC. These results suggested that there were several pleiotropic genes for CC and SG.

Natural variation in genes responsible for CC and SG. Comprehensive analysis of known genes
is conducive to exploration and utilization of loci responsible for natural variation in CC and SG. One hundred and fifty two known genes associated with CC (leaf color) or SG in rice were selected from the China Rice Data Center (http://www.ricedata.cn/) database and more recent reports 25,[40][41][42] . The gene ontology (GO) categories significantly enriched in this protein group were located in chloroplasts (Fig. S6), and mainly involved 'porphyrin-containing compound metabolism' (Fig. S7) by adjusting the activity of various reductases (Fig. S8). We analyzed the metabolic processes associated with the 152 genes. 'Porphyrin and chlorophyll metabolism' was the only significantly enriched metabolic pathway and included 16 known genes (Fig. S9). The combined analysis of GO and pathway of these genes showed that CC and SG were controlled by a complex network, with a large number of proteins for CC and SG being located in chloroplasts and involved in metabolism of porphyrin-containing compounds.
To identify large-effect genes affecting CC and SG in natural rice populations, we performed further comparisons between the 152 known genes and GWAS data obtained in this study. Genes SSG4 (LOC_Os01g08420), www.nature.com/scientificreports www.nature.com/scientificreports/ CK2β3 (LOC_Os07g31280) and CHR729 (LOC_Os07g31450) were located in the association regions for two CC indices, indicating that these genes could contain important loci involved in natural variation of CC; were located in association regions for two SG indices, suggesting that they could be related to natural variation in SG; , OsNUS1 (LOC_Os03g45400) and RNRS1 (LOC_Os06g14620) were in association regions for two ACC indices, implying that these genes could be involved in natural variation of ACC ( Fig. 1; Table S3). Genes NYC1 and OsMTP8.1 encoding chloroplast-localized proteins were in association regions for CC, SG and ACC (Table S3). Additionally, seven genes, RDD1, PAPST1, YGL8, OsWAK25, NOL, OsNUS1 and RNRS1 were in association regions for CC and ACC (Table S3). Thus 25 known genes around GWAS signals probably have roles in natural variation of CC or SG, especially the first nine genes mentioned above.
Elite alleles in six cloned genes for CC and SG. Further study was made to identify alleles of known genes for CC and SG. Eight hundred and eleven non-synonymous SNPs were detected within the above 152 genes, including 306 SNPs in the full association panel with MAF > 0.05. For the subpopulations, there were 605 www.nature.com/scientificreports www.nature.com/scientificreports/ non-synonymous SNPs, including 185 with MAF > 0.05 in japonica, and 529 non-synonymous SNPs in indica including 232 with MAF > 0.05. Considering the complexity of population structure and genetic background, we performed a statistical analysis of each subpopulation by Student's t-tests. Eight, 7, 2, 1, 6 and 5 non-synonymous SNPs in japonica were significantly (P < 0.05) associated with SFH, TSH, ADSF, RDSF, CSF and TCS, respectively. Eight, 8, 1, 1, 1 and 1 non-synonymous SNPs showed significant associations with SFH, TSH, ADSF, RDSF, CSF and TCS in indica, respectively (Fig. 2).
There were four non-synonymous SNPs (Chr3_ 25519021, Chr3_ 25523316, Chr3_ 25525039 and Chr3_ 25525141) within the NOL gene, which encodes a chloroplast-localized short-chain dehydrogenase/reductase (SDR) with three transmembrane domains, and mutation in which produced an SG phenotype 18,43 . These SNPs showed significant associations with four indices for CC and ACC (SFH_Jap., TSH_Jap., CSF_Jap., TCS_Full and TCS_Jap.) (Fig. 2). Allele C at Chr3_ 25519021, allele A at Chr3_ 25523316, allele T at Chr3_ 25525039, and allele A at Chr3_25525141 represented higher CC and more ACC in japonica (Figs 2 and S10). Six haplotypes, named NOL-1 to NOL-6, were identified based on the four non-synonymous SNPs in wild and cultivated rice (Fig. 3a). NOL-1 and NOL-2 were present in japonica and indica accessions, respectively, and both showed large genetic distances from other haplotypes (Fig. 3b). NOL-4 was predominant and shared across japonica, indica and wild rice. There were highly significant differences between NOL-1 and NOL-4 among SFH, TSH, CSF and TCS in japonica with -log(P) values of 2.35, 3.15, 3.36 and 5.34, respectively (Fig. 3c). In indica, there were clear differences in SFH, TSH, CSF and TCS between NOL-2 and NOL-4 (Fig. 3c). Accessions with the NOL-1 genotype had higher CC and ACC than accessions having other haplotypes.
Three non-synonymous SNPs (Chr1_4133772, Chr1_4134499 and Chr1_4138234) in known genes around GWAS signals for CC were detected within SSG4, mutation of which affected the size of chloroplasts and amyloplasts and produced a variegated phenotype 44 . These SNPs showed significant associations with two CC indices in japonica (SFH_Jap., TSH_Full and TSH_Jap.) (Figs 2 and S11). Three haplotypes were present in cultivated and wild rice (Fig. 3d,e). Haplotype SSG4-1 was prevalent only in the japonica population, SSG4-2 was mostly present in japonica and wild rice, and SSG4-3 was detected in japonica, indica and wild rice. Highly significant differences were observed in SFH and TSH between SSG4-1 and SSG4-3 in japonica by one-way ANOVA (P < 0.01) (Fig. 3f).
For ACC, two non-synonymous SNPs (Chr3_ 6134135 and Chr3_6136209) were identified within OsFRDL1. Knockout of this gene resulted in leaf chlorosis 46 . Both SNPs were significantly associated with two ACC indices in indica (CSF_Full, CSF_ind., TCS_Full and TCS_Ind.) (Figs 2 and S13). Three haplotypes were present in cultivated and wild rice, and there was a large genetic difference between japonica and indica (Fig. 3j,k). Haplotype OsFRDL1-1 was present in japonica, and OsFRDL1-2 and OsFRDL1-3 predominated in indica. Significant differences in CSF and TCS were detected between OsFRDL1-2 and OsFRDL1-3 with −log(P) values of 7.08 and 9.12 in indica by Student's t-tests (Fig. 3l).
Two (Chr1_7024543 and Chr1_7027178) and one (Chr11_2514115) non-synonymous SNPs within NYC1 and YGL138(t) showed significant associations with CC or SG, respectively (Fig. 2). The NYC1 mutant is a stay-green mutant in which chlorophyll degradation during senescence is impaired 18 , and the YGL138(t) mutant exhibits a distinct yellow-green leaf phenotype throughout development 47 . There were two haplotypes within NYC and YGL138(t) in each subpopulation due to rare non-synonymous SNPs and obvious differentiation of indica and www.nature.com/scientificreports www.nature.com/scientificreports/ japonica. In conclusion, the 20 non-synonymous SNPs could be possible functional SNPs within six known genes responsible for natural variation in CC and SG; natural elite alleles/haplotypes were identified for the six known genes with larger effects on CC and SG.

Variation in OsSG1, a new locus for CC, SG and ACC.
We found an association region at 15-17 Mb on chromosome 7, in which lead SNPs at Chr7_15932240, Chr7_15821905, Chr7_16135435, Chr7_16135435, Chr7_16023159 and Chr7_15821905 in indica were associated with SFH, TSH, ADSF, RDSF, CSF and TCS with −log(P) values of 8.66, 7.60, 6.57, 5.14, 5.82 and 5.89, respectively (Figs 4a and S14). Lead SNPs at Chr7_16073851 and Chr7_16135435 in the full population were associated with SFH and ADSF with −log(P) values of 5.99 and  www.nature.com/scientificreports www.nature.com/scientificreports/ 5.76, respectively (Fig. S14). By using pairwise LD correlations (r 2 > 0.6) 48,49 , we estimated a candidate region from 15.8 Mb to 16.3 Mb (Fig. 4a). High r 2 values were detected among the four lead SNPs in indica (Fig. S15). The results suggested that there could be a single pleiotropic gene regulating CC, SG and ACC within the LD block.
Stable expression of 20 of 70 annotated genes within the candidate region was detected in rice leaves (Table S4). By GO analysis of the 20 genes, we found candidate gene LOC_Os07g27790 encoding a protein with glutamate-cysteine ligase activity that participated in glutathione biosynthesis. This gene was predicted to be located in plastids. Metabolic pathway analysis using the KEGG system suggested that LOC_Os07g27790 could be involved in glutathione metabolism, together with known genes RNRS2, RNRS1, OsAPX2 and RNRL1 for chlorophyll content. According to these analyses, we suggest that LOC_Os07g27790, named as OsSG1, is an important candidate gene controlling multiple chlorophyll-related traits, including CC, SG and ACC.
To explore possible functional sequences within OsSG1 based on re-sequencing data, we investigated associations between six indices and non-synonymous SNPs as well as SNPs located in the 5′ flanking sequence (≤2 Kb upstream of the open reading frame) of OsSG1. Three non-synonymous SNPs were identified but their MAFs were lower than 0.05 in indica (Fig. S16). Considering that these associated signals were detected in the indica subpopulation, we postulated that the three non-synonymous SNPs could not be the cause of the variation affecting CC and SG. Eighteen SNPs with MAF > 0.05 were detected in the promoter of OsSG1 in indica; 14 of them showed significant associations with at least one of 6 indices for CC, SG and ACC (Fig. 4b); three adjacent SNPs in set of 14 (Chr7_16219076, Chr7_16219244, Chr7_16219280) were associated with almost all indices for CC, SG and ACC (Fig. 4b). The haplotypes of OsSG1 were assembled using re-sequencing data of the three SNPs; three haplotypes were detected in indica, and all japonica accessions carried OsSG1-2 (Fig. 4c). Clear differences were observed in all six indices in indica between OsSG1-1 and OsSG1-2 and between OsSG1-1 and OsSG1-3 by one-way ANVOA (Fig. 4d). Varieties carrying OsSG1-1 showed higher CC (SFH = 40.6 and TSH = 122) and ACC (CSF = 69.7 and TCS = 191), but weaker stay-green capacity (ADSF = 11.6 and RDSF = 0.27) than indica varieties carrying OsSG1-2 or OsSG1-3. The results suggested the sequences in OsSG1 for maintaining protein function were highly conserved, and that phenotypic differences between the three haplotypes could be caused by the differences in expression level in indica. strong positive selection on genes related to CC and sG in japonica. In order to investigate the domestication history of genes related to CC and SG in indica and japonica, we made a phylogenetic analysis and signature identification of selection using 368 cultivated and 446 wild rice accessions. According to the www.nature.com/scientificreports www.nature.com/scientificreports/ phylogenetic tree calculated from SNPs in 152 known genes and OsSG1, there was a distinct differentiation between japonica and indica (Fig. 5a). Japonica accessions were close to the Or-III (japonica-like wild rice) group from southern China, and indica accessions were close to Or-I (indica-like wild rice) (Fig. 5a). Thus the SNPs in japonica could be inherited from Or-III, whereas those in indica were from Or-I. Selective signal scans were performed within the CC and SG genes using the ratio of genetic diversity in wild rice to that in japonica and indica (π W /π J and π W /π I ), respectively (Table S5). Twenty-eight and 55 known genes showed high selective signals in indica (π W /π I > 3) and japonica (π W /π J > 3), respectively. After considering the values of Tajima's D (Tajima's D < −2) of these genes in their respective subpopulations, we found that nine genes had been strongly positively selected in indica, whereas 43 genes were strongly selected in japonica (Table 2). By comparing the geographical areas of distribution of cultivated and wild rice we found that indica rice and wild rice were mainly distributed in low latitudes with short days and high light intensity, whereas japonica was far from its ancestral progenitor (Or-III), and distributed in areas with long days and low light intensity (Fig. 5b). We therefore suggest that genes controlling CC and SG in japonica rice were positively selected in the process of spreading from a subtropical origin to the temperate zone of North China.

Discussion
Natural variation in 25 candidate genes has important roles in CC and SG. With development of functional genomics, high throughput genotyping and phenotyping technologies, more than 2,200 genes have been cloned and functionally identified in rice by forward or reverse genetic strategies. Based on those studies, molecular knowledge has been increasingly applied to the breeding of high yielding, superior-quality rice. This is considered to be a powerful strategy to meet the challenges of future crop breeding, particularly in pyramiding multiple complex traits 26 . Despite these research results the practice of breeding by molecular design is still difficult and requires more precise genetic dissection of agronomic traits and precisely identified chromosome haplotypes.
High throughput genotyping and GWAS provide strong support for determining the effect of known functional genes in natural populations and exploration of superior natural variation 25,50 . In this study, we conducted GWAS using a diverse worldwide population of 368 rice accessions, following a comparison of GWAS results and 152 known genes for CC or SG. Twenty-five known genes were around GWAS signals in GWAS, implying that these genes could be involved in genetic variation of CC or SG, and could be used in molecular breeding for high photosynthetic efficiency.
Gene function can be manipulated by alterations in expression level and protein sequence, and polymorphisms causing protein-coding differences are most likely to be important functional SNPs associated with target traits 48 . Based on high-density SNPs from the 3KRGP, we extracted 811 non-synonymous SNPs within known genes for CC or SG. After removing SNPs with MAF < 0.05, 20 non-synonymous SNPs within 6 of 25 genes (SSG4, NYC1, OsFRDL1, NOL, CHR729 and YGL138(t)) were associated with at least one of six indices, implying that the 20 SNPs could be real functional SNPs accounting for natural variation in CC or SG. The results of haplotype analysis using the 20 non-synonymous SNPs can provide guidance for pyramiding desirable alleles associated with CC and SG in molecular design of genotypes with high photosynthetic efficiency.

OsSG1 is a natural variant of CC and SG in indica.
One important finding in our study was that OsSG1 might be a major gene accounting for variation in CC, and also control of SG. In GWAS of indica, strong signals of six indices around OsSG1 suggested that there could be a pleiotropic gene regulating CC, SG and ACC in a single LD block. KEGG pathway analysis showed that OsSG1 was involved in glutathione metabolism, together with www.nature.com/scientificreports www.nature.com/scientificreports/ four known genes RNRS1, RNRS2, OsAPX2 and RNRL1 for CC or SG. In a previous study 16 , mutants of RNRS1 and RNRL1 produced chlorotic leaves in a growth stage-dependent manner under field conditions, and yeast two-hybrid analysis showed that the interacting activities were RNRL1:RNRS1 > RNRL1:rnrs1 > rnrl1:RNRS1 > rnrl1:rnrs1, which correlated with the degree of chlorosis for each genotype 16 . The activity of RNRL1 homolog RNRS2 could supplement RNRS1 activity in chloroplast biogenesis in developing leaves 16 . OsAPX2 mutants had significantly lower CC than wild-type plants and over-expression increased CC to a level higher than in wild-type Chr. Gene π W /π I π W /π J  Table 2. Summary of 43 and 9 genes that had undergone positive selection in japonica and indica, respectively.
www.nature.com/scientificreports www.nature.com/scientificreports/ plants 51 . Since these genes involved the regulatory mechanism of CC or SG, we suggest that further investigation of the glutathione metabolic network could help in genetic dissection of CC and SG.
Genes for CC and sG have been subjected to positive selection in japonica. Asian cultivated rice is well known for its rich within-species diversity with two major subspecies, indica and japonica, and further subpopulation differentiation. Previous studies and this study show that CC and SG in japonica are significantly higher than in indica 25 . However, the pathway of physiological change during domestication of distinct subpopulations remains unclear. Genetic analysis using well-characterized domestication loci indicated that japonica and indica were close to wild rice subpopulations Or-III and Or-I, respectively. Japonica was first domesticated from Or-III in southern China (Fig. 5a). Our phylogenetic tree using SNPs within genes for CC and SG is similar to those of well-characterized domestication loci, implying that higher CC and SG were important domestication traits. Selective signal scans showed that several genes were strongly positively selected in cultivated rice, especially in japonica (Table 2). Given the geographical distributions of japonica, indica and wild rice, higher CC and SG could have enabled japonica to adapt to higher latitudes with longer days and lower light intensities (Fig. 5b). However, the phylogenetic tree for each gene for CC and SG showed a distinct domestication pattern (Fig. 3). Among the NOL and SSG4 genes for chlorophyll content, the NOL-1 and SSG-1 haplotypes for higher CC levels were detected only in japonica, implying that they were new mutations acquired during domestication of japonica. All haplotypes of CHR729 and OsFRDL1 were detected in wild rice, and CHR729-2 and OsFRDL1-1 were prevalent haplotypes in japonica whereas CHR729-1, OsFRDL1-2 and OsFRDL1-3 predominated in indica. Our results suggest that during domestication of japonica, the planting areas gradually extended from low altitudes to high altitudes along with the changes in light intensity and daylength. During this adaptation new natural mutations for higher CC and SG were preserved, and gradually accumulated along with natural elite variation from wild rice.