New evidence for grain specific C4 photosynthesis in wheat

The C4 photosynthetic pathway evolved to allow efficient CO2 capture by plants where effective carbon supply may be limiting as in hot or dry environments, explaining the high growth rates of C4 plants such as maize. Important crops such as wheat and rice are C3 plants resulting in efforts to engineer them to use the C4 pathway. Here we show the presence of a C4 photosynthetic pathway in the developing wheat grain that is absent in the leaves. Genes specific for C4 photosynthesis were identified in the wheat genome and found to be preferentially expressed in the photosynthetic pericarp tissue (cross- and tube-cell layers) of the wheat caryopsis. The chloroplasts exhibit dimorphism that corresponds to chloroplasts of mesophyll- and bundle sheath-cells in leaves of classical C4 plants. Breeding to optimize the relative contributions of C3 and C4 photosynthesis may adapt wheat to climate change, contributing to wheat food security.

The C 4 photosynthetic pathway evolved to allow efficient CO 2 capture by plants where effective carbon supply may be limiting as in hot or dry environments, explaining the high growth rates of C 4 plants such as maize. Important crops such as wheat and rice are C 3 plants resulting in efforts to engineer them to use the C 4 pathway. Here we show the presence of a C 4 photosynthetic pathway in the developing wheat grain that is absent in the leaves. Genes specific for C 4 photosynthesis were identified in the wheat genome and found to be preferentially expressed in the photosynthetic pericarp tissue (crossand tube-cell layers) of the wheat caryopsis. The chloroplasts exhibit dimorphism that corresponds to chloroplasts of mesophyll-and bundle sheath-cells in leaves of classical C 4 plants. Breeding to optimize the relative contributions of C 3 and C 4 photosynthesis may adapt wheat to climate change, contributing to wheat food security.
One of the key biological innovations was development of the ability of an organism to use light as the source of energy to generate chemical energy (ATP and NAD(P)H) for metabolic activities 1 in the process commonly known as photosynthesis 2 . Evolutionarily, six phyla of prokaryotic bacteria have the ability to photosynthesize 3 , five of them using anoxygenic photosynthesis with bacteriochlorophyll and only one, the cyanobacteria, having oxygenic photosynthesis with chlorophyll 4 . Endosymbiotic associations of cyanobacteria in eukaryotes resulted in their ability to photosynthesize through chloroplasts in the process designated as "photosyntax" or "photosynthesis" in 1893 by Charles Reid Barnes 5 . Chemical energy generated from light energy is captured and used to synthesize organic compounds in higher plants in 'dark reactions' 6 . There are many different photosynthetic pathways reported in higher plants 7 ; four types viz., C 3 , C 4 , CAM (Crassulacean acid metabolism), and C 3 -C 4 intermediates are widely known, while, C 4 -like (less advanced C 4 ), C 3 -CAM, and C 4 -CAM intermediates have also been reported. These photosynthetic pathways, able to use CO 2 as a carbon source, evolved in cyanobacteria around 3.5 billion years ago 8 . The key enzyme in C 3 photosynthesis, ribulose diphosphate carboxylase (RuBisCO), was reported to have evolved around the same time as cyanobacteria 9 . The C 4 pathway originated approximately 30 Mya (million years ago) 10 and was first described 50 years ago 11 . The pathway provides enhanced radiationwater-and nitrogen-use efficiency 12 especially in sub-optimal environments 10,13 .
Three classical C 4 photosynthesis subtypes, NADP-ME (NADP-dependent malic enzyme), NAD-ME (NADdependent malic enzyme) and PEPCK (phosphoenolpyruvate carboxykinase) have been defined based upon the decarboxylation reactions involved 14 . These photosynthetic pathways explain the high growth rates of C 4 plants such as maize. Anatomical, biochemical, and molecular evidence has been commonly used to distinguish C 4 -(sub)types from C 3 -types 15 . Kranz anatomy with reactions compartmentalized in different cell types has been considered essential for C 4 photosynthesis 16 but spatial compartmentalization in a single-cell has been demonstrated more recently 17 . The stem and petiole of C 3 plants (tobacco and celery) was reported to accomplish NAD-ME type C 4 photosynthesis in cells surrounding vascular bundles 18 . Photosynthesis in cereal grains is less well defined. Ear photosynthesis in wheat contributes from 10% to 44% of grain yield 19 . Grain photosynthesis accounts for 33-42% of this photosynthesis depending on the genotype and environment 20 .
Wheat is a major food crop critical to global food security. The current increase in wheat production of around 1% per year is not keeping pace with the rate of yield growth required to achieve the target of doubling crop production by 2050 21 . The likely impact of climate change makes progress in advancing wheat productivity more urgent. Increasing total plant biomass through efficient carbon capture by photosynthesis is now more crucial in improving wheat productivity since advances in grain yield by improving harvest index have plateaued 22 . Plants with the C 4 pathway are known to contribute 25% of total photosynthesis although they represent just 3% of species 10 . Converting C 3 crops to C 4 provides the possibility of improving yield by 30% through improved water-and nitrogen-use efficiency 23 . Engineering C 3 food crops like wheat and rice to use the C 4 pathway has long been explored to enhance global food security 24 . We now report an analysis of the transcriptome of genes associated with C 4 photosynthesis in the developing wheat grain. Genes identified as transcripts were located in the genome and their sequences analysed to determine likely specificity. This allowed an evaluation of substantial new evidence for C 4 photosynthesis in wheat grains.

Results
Remarkably, transcriptome analysis and functional annotation of genes expressed in developing wheat grains revealed the presence and expression of all genes specific to NAD-ME type C 4 -photosynthesis. When added to earlier evidence dispersed in the literature, the present discoveries suggest the functioning of a form of C 4photosynthesis specifically in the developing wheat grain. The transcriptome of the developing caryopsis from 35 diverse wheat genotypes (31 and 32 genotypes respectively from 14 and 30 days-post-anthesis stage with 28 genotypes in common) was analyzed by RNA-Seq. Annotation of the differentially expressed genes in the wheat grain transcriptome between 14 and 30 dpa (days-post-anthesis) indicated the presence of NAD-ME type C 4 photosynthesis during wheat grain development. This was an unexpected finding with wheat being a well-known C 3 crop. Wheat genes involved in C 4 photosynthesis, the number of copies expressed in developing wheat grains and their C 4 specificity (based on cytological and evolutionary evidence) are listed in Table 1.
Molecular evidence. Phosphoenolpyruvate carboxylase (ppc) genes were localized in wheat on the long arms of chromosomes 3 and 5. The mean expression value (in RPKM) for ppc across 31 genotypes at 14 dpa (chromosome 3) was 36.2 ( Fig. 1A, sum of three sub-genomes A, B, and D) while only 0.29 (mean of three growth stages -Z10, Z23, and Z71 with the expression values on the Y-axis representing the sum of the three sub-genomes) for leaves 25 (Fig. 2A), indicating a 125 fold up-regulation in the developing wheat caryopsis. Conversely, ppc from chromosome 5 was upregulated in leaves ( Fig. 2A). It is well-known that C 4 plants have less RubisCO protein (reflecting transcript abundance) than C 3 plants 26 . The mean rbcS gene expression value was 512.3 and 39166 for the wheat caryopsis at 14 dpa and leaves respectively indicating a 76 fold down-regulation in the developing wheat caryopsis. This shows an enormous, 9500 fold, difference between developing wheat caryopsis and leaves for the relative expression of ppc and rbcS genes.
Aspartate aminotransferase (aat; also known as got) is the most up-regulated among six C 4 pathway genes in the developing wheat caryopsis. This is also the most up-regulated gene in the leaf tissues between C 3 and C 4 plants 26 . Of six copies (in each sub-genome) of the aat gene in wheat, only two copies were the C 4 type (cytoplasmic 3L -aat1 and mitochondrial 7L -aat2). RNA-Seq analysis indicated that these genes were differentially up-regulated at 14 dpa in the developing caryopsis (Fig. 1B) when compared with leaves ( Fig. 2B) 25 .
Two copies of malate dehydrogenase (mdh) gene were localized on the long and short arm of chromosome 1 (cytoplasmic -mdh1) and chromosome 5 (mitochondrial -mdh2) respectively across the three sub-genomes. The gene copy from chromosome 1 was differently expressed (Figs 1C and 2C) compared to the one from chromosome 5 in both grain and leaf tissues 25 . The mitochondrial targeted mdh2 gene from chromosome 5 is likely to be involved in C 4 photosynthesis.
Two copies of the NAD-dependent malic enzyme coding gene (me2) with one each targeted to chloroplast and mitochondria were localized on chromosomes 1 and 2 respectively. The mitochondrial targeted gene (chromosome 2) copy supports C 4 photosynthesis, converting malate into pyruvate with release of CO 2 for further fixation through the C 3 cycle 15 . The mitochondrial isoform was up-regulated in the developing wheat caryopsis (Fig. 1D) while, the plastidic isoform was up-regulated in leaves (Fig. 2D) 25 .
Two copies of alanine transaminase (gpt) genes were localized to the short arm of chromosomes 2 and 5 of hexaploid wheat. This cytoplasmic enzyme converts pyruvate to alanine and vice-versa in bundle sheath and mesophyll cells respectively in a classical NAD-ME type C 4 pathway 14 . Both genes were expressed in similar proportions in the developing wheat caryopsis at 14 dpa (Figs 1E and 2E); while the gene on chromosome 2 was more highly expressed in leaves 25 . Pyruvate, orthophosphate dikinase (ppdk) gene was localized to the long arm of chromosome 1 in hexaploid wheat. All four gene copies (although a full length sequence was not available) were used to assess the RPKM expression levels in the developing wheat caryopsis at 14 dpa (Fig. 1F) and in leaf (Fig. 2F) tissues 25 . Earlier reports indicate the role of a dual promoter in regulating a single gene copy during light and dark in the chloroplast and cytoplasm respectively with the second promoter region in the first intron for cytoplasmic expression 27 . Aoyagi and co-workers showed the presence of PPDK and RubisCO in the green pericarp, but failed to envision the possibility of C 4 photosynthesis due to the lack of Kranz anatomy in developing wheat grains 28 .
Six genes (excluding carbonic anhydrase) were involved in the NAD-ME type C 4 pathway, phosphoenolpyruvate (PEP) carboxylase (ppc), aspartate aminotransferase (aat; also known as got), malate dehydrogenase (mdh), NAD-dependent malic enzyme (me2), alanine aminotransferase (gpt), and pyruvate, orthophosphate dikinase (ppdk) 15 . Grain specific expression of genes involving NAD-ME type C 4 photosynthesis viz., ppc, aat, mdh, me2, gpt, and ppdk; in all three (A, B, and D) sub-genomes ( Fig. 1) indicates a possible evolutionary diversification point well before the speciation of the diploid progenitors in the Triticeae tribe. Endosperm and aleurone transcripts 29 do not express all of these genes demonstrating that the C 4 pathway is restricted to the wheat pericarp.
Varied expression pattern between wheat genotypes. The presence of all C 4 specific genes in the genome confirms that natural selection may have already explored the options being considered by plant breeders 30 . The levels of expression for all six genes at 14 dpa in NAD-ME type C 4 pathway varied across 31 genotypes (Fig. 3) suggesting potential for genetic selection for this trait in wheat breeding. C 4 specificity of gene sequences. Four of the six genes involved in NAD-ME type C 4 photosynthesis, (aat, mdh, me2, and ppdk) had sub-cellular targeting that suggests C 4 -type specificity 15 . The other two genes (ppc and gpt) require sequence information to distinguish between the copies specific for C 3 -or C 4 -pathways. Analysis of gpt genes in wheat suggested both C 3 and C 4 forms were expressed at similar levels (Figs 1E and 2E) across photosynthetic and non-photosynthetic tissues. While the ppc gene copies clearly show different expression patterns between developing grains and leaves (Figs 1A and 2A); sequence differences are the only way to distinguish the C 3 -and C 4 -isoforms. Specific amino acid substitutions have been associated with C 4 functionality 13 . Increased tolerance to feedback inhibition by malate involves G 884 (Glycine) in C 4 -isoforms rather than R 884 (Arginine) as found in C 3 -isoforms. The translated sequence of the ppc gene from chromosome 3 (S 885 ) and 5 (R 891 ) of wheat cDNA (IWGSC -international wheat genome sequencing consortium, release-23 version) indicates the gene copy from chromosome 5 is C 3 -type; while the chromosome 3 copy is non-C 3 type. The gene sequences from wheat and related species 31 were analyzed using the translated amino acid sequence of the ppc gene (IWGSC cDNA database release-23) from chromosome 3. Results indicated that most of the Triticeae tribe members have five copies of ppc gene (Table 2) although in the hexaploid wheat cDNA database we found only two copies (3L and 5L). IWGSC cDNA database (release-23) 31 was used to perform tblastn analysis with the translated ppc gene sequence confirming that gene sequence copies from chromosomes 3S, 6 and 7 are not in frame suggesting the presence of insertions or deletions in these genes. However, one ppc gene copy from all Triticeae members had S 885 indicating a non C 3 -type; while the other four copies revealed a C 3 -type -R 891 (the corresponding amino acid position) across all the Triticeae members studied ( Table 2). Since the amino acid position is neither R nor G, we studied different species acting as diversification points in the evolution of these species in order to compare them with respect to known C 4 types (Table 3). This gave an indication that from Bryophytes to Angiosperms, the C 3 type amino acid position was invariably conserved with 'R' (Table 3). Whereas the C 4 type amino acid position was either S (Panicum and Triticeae tribe) or Q (Alloteropsis, Setaria) or G (Alloteropsis, Panicum, Zea, and Sorghum) or I (Amaranthus) depending on the species or taxonomic group (Table 3).

Discussion
Wheat is widely known as a classical C 3 plant. Close examination of the literature shows many reports of components of the case for C 4 photosynthesis in the grain especially in early studies. However, this evidence has been overlooked because of the knowledge of C 3 photosynthesis in the leaves and a lack of understanding of the possibility of different pathways in different parts of the plant. Indeed many studies have attempted to explain away the evidence that did not fit with the knowledge that wheat was a C 3 plant. This study has identified a complete set of C 4 specific genes in wheat genome for the first time. This finding addresses the apparent anomaly of this subfamily (Pooideae) of the Poaceae being uniquely seen to lack C 4 photosynthesis. We have also shown for the first time that all the required genes are expressed in the required compartmentalization, specifically in the pericarp, a tissue with an anatomy that is suitable for supporting a C 4 pathway. The possibility of photosynthesis in the pericarp of wheat grains was predicted in the early 1960s 32 . Phosphoenolpyruvate carboxylase (PPC) from the wheat or barley pericarp tissues of developing grain was reported to be 50-100 times as active in carbon fixation as ribulose diphosphate carboxylase (RuBisCO) 33 . Based on the enzyme activity for malate dehydrogenase, malic enzyme, and pyruvate-orthophosphate dikinase in pericarp tissues of developing grain, Duffus and Rosie 33 indicated the possibility of C 4 photosynthesis. A little later, Wirth, et al. 34 studied different reproductive parts from wheat and oat -glume, lemma, palea, and pericarp -along with leaves and reported that the pericarp tissues of developing grains seemed to "possess carbon metabolism different to that of the other tissues". They also analyzed and reported the possibility of refixation of the CO 2 released through respiration or photorespiration. Assimilation of 14 CO 2 to malate and 3-phosphoglyceric acid in wheat ears and flag leaf respectively; along with higher enzyme activities for enzymes of C 4 and C 3 metabolic pathways in ears and flag leaf respectively suggested the possibility of C 4 photosynthesis in ears 35 . Carbon isotope discrimination (Δ ) values were used to distinguish plants between C 3 -and C 4 -type 36 . Although wheat was considered a C 3 plant, Δ values were used to study the plants' water-useor transpiration efficiency 37,38 . Their results indicate a clear difference between flag leaf and grain Δ values in different wheat genotypes. Although the difference is not as distinct as it is with classical C 4 photosynthesis. This might be due to either inefficient less advanced C 4 type photosynthesis or the fact that grain photosynthesis accounts for only 33-42% of ear photosynthesis 20 with the remainder translocated from leaf or stem tissues with C 3 -type photosynthesis thereby diminishing the difference in Δ values between flag leaf and grain to a marginal level. Similarly but in reverse, in a maize plant with C 4 -type, maize husk leaves were reported to be C 3 -type and their Δ values were marginally higher than leaves 39 .
In spite of this evidence (enzyme activity, 14 CO 2 in malate, Δ values), earlier researchers failed to explore C 4 photosynthesis in wheat grains due to the view that Kranz anatomy was required for C 4 photosynthesis 16,28 . In  17,18 .
In the late 1990s, there were reports of the C 4 pathway being found selectively at different developmental stages of some plants (Salsola spp. and Haloxylon spp.) of cotyledons and leaves exhibiting C 3 and C 4 type photosynthesis respectively 40,41 . Similarly there have been reports on the selective use of the C 4 pathway in different environments like terrestrial or submerged situations 42,43 or high or low CO 2 44 . Selective expression of the C 3 pathway was reported in the husk leaves (hypsophylls) of the maize plant 45 which is otherwise a C 4 plant. Evidence of 4-carbon compounds specifically in wheat leaf bases 46 agreed with a much later report of C 4 pathways in C 3 plants 18 . Altered C 4 and C 3 enzymatic activity has recently been reported in wheat ears under water stress 47 but the significance of this was not clear given the C 3 status of wheat. This evidence suggests the presence of a diversified range of regulatory patterns of C 4 pathway in plants with ontogeny and varied environmental cues. Operation of different pathways (C 3 or C 4 ) at different growth stages allows wheat to have a lifecycle that extends across seasons with varying environments (cool, wet during vegetative growth; hot, dry during grain filling). Molecular and cytological evidence. Functional annotation and differential expression of C 4 -specific gene copies (Figs 1 and 2) for genes of NAD-ME type photosynthesis specifically in developing wheat grains adds evidence for the C 4 pathway in wheat grains as suggested in early reports 34,35 . With multiple copies in a genome, species preferentially co-opt the same neo-functionalized gene lineage for C 4 photosynthesis 48 although these genes were present well before the evolution of the C 4 pathway but with different anaplerotic functions 49 .
Reports indicate that cross-and tube-cells in pericarp of developing wheat grain are photosynthetic in nature and contribute to the grain weight 50 . Thorough re-examination of this report indicates the presence of numerous mitochondria, and dimorphic chloroplasts -stacked grana in cross-cells and reduced stacking in tube-cells being structurally similar to classical C 4 types 51 . The presence of numerous mitochondria specifically in bundle sheath cells of NAD-ME type C 4 pathway has also been reported 52 . These pieces of cytological evidence in addition to our molecular evidence suggests NAD-ME type C 4 photosynthesis operates in developing wheat grains (Fig. 4) with cross-and tube-cells paralleling mesophyll and bundle sheath cells in a classical C 4 pathway. In contrast to the classical C 4 photosynthesis that is associated with little or no starch granules in mesophyll cells 53 , the presence of starch granules in cross-cells (mesophyll like) was reported by Morrison 50 . This led us to question the possibility of NAD-ME type C 4 photosynthesis in wheat grains. However, there is evidence for the presence of RuBisCO in both mesophyll and bundle sheath cells of young amaranth leaves 54 suggesting a C 3 cycle in both mesophyll and bundle sheath cells. In some Flaveria spp., reports of the presence of RuBisCO in both mesophyll and bundle sheath cells supporting both the C 3 and C 4 cycle simultaneously led to their classification as having C 4 -like type (less advanced)    angusta c,f,g,i , A. semialata subsp semialata b,f-i , A. s. subsp eckloniana g,i,j ; Panicum spp.: P. bisulcatum n , P. capillare l , P. coloratum l , P. fluviicola l , P. laetum k,m , P. miliaceum k,l , P. millegrana o , P. phragmitoides l , P. turgidum l . p Soreng, et al. 56 . q Chalupska, et al. 57 .
type (less advanced) photosynthesis in developing wheat grains through the cross-and tube-cell layers of pericarp paralleling the mesophyll and bundle sheath cells of classical C 4 photosynthesis (Fig. 4).
Taxonomical and evolutionary evidence. Around 41% of grasses are known to fix carbon through the C 4 pathway 56 . Hence, an overview at evolutionary scale linking speciation events with C 4 photosynthesis might shed light on the evolution of the C 4 -like type photosynthetic pathway in developing wheat grains. The Poaceae family is monophyletic and consists of 12 subfamilies with three at the basal level, followed by the BOP and PACMAD clades consisting of three and six subfamilies each with the Triticeae tribe included in the subfamily Pooideae (cool season grasses) of the BOP clade 56 . To date, no species from the Pooideae have been reported to be C 4 . The Aristidoideae (PACMAD) subfamily has been reported to have at least two independent evolutions of the C 4 pathway 56 . In this study, knowledge of specific amino acids (G in C 4 and R in C 3 ) in the ppc gene product required for efficient carbon fixation by the C 4 pathway 13 was used to show that wheat and all related species (including Hordeum and Brachypodium) had five ppc copies in their genome with four of them having the amino acid R indicating their C 3 nature while one copy (3L, as in hexaploid wheat) has an S -a non-C 3 type in place of R except Brachypodium ( Table 2). The C 3 specific amino acid position (R) was apparently conserved (Table 3) from Bryophytes (around 450Mya) to Angiosperms. The amino acid position with C 4 specificity appears to have evolved at least four times in the last 30Mya (origin of C 4 ) with either S (Panicum and Triticeae tribe) or Q (Alloteropsis, Setaria) or G (Alloteropsis, Panicum, Zea, and Sorghum) or I (Amaranthus). This suggest that various amino acid substitutions at that site might result in differing efficiency of carbon fixation through the C 4 pathway by altering tolerance to feedback inhibition by malate 13 . Analysis of enzyme kinetics with each of the four C 4 -specific amino acids individually might help to rank their photosynthetic efficiency. However, the weakest form among the four will probably be much more efficient in carbon fixation than the C 3 type (R). The presence in wheat of an amino acid specific to a known C 4 -type (S in Panicum laetum and P. miliaceum) is strong evidence when taken together with the grain specific pattern of expression of the C 4 specific ppc gene (Fig. 2A). The tribe Brachypodieae (Brachypodium distachyon) has amino acids corresponding to the C 3 -type for all the five copies; while members from tribe Triticeae (Aegilops, Hordeum, Triticum) have one copy of the C 4 -type and four copies of the C 3 -type (Table 2). This fits with the evolutionary time line for C 4 photosynthesis around 30Mya 10 ; with Brachypodieae evolving around 35Mya 57 having only C 3 -type genes (Brachypodium). Unfortunately, there are no diversification points between Brachypodium (35 Mya) and Hordeum (11.6 Mya) 57 to establish the exact timing of C 4 evolution in the Pooideae tribe. Derived traits like those associated with C 4 photosynthesis appear at later evolutionary stages and are expressed later in plant development 58 . This is consistent with the observations of C 4 photosynthesis in wheat and its relatives specifically in the grain.
Based on this molecular, cytological, taxonomical and evolutionary evidence, we propose the occurrence of C 4 -like type photosynthesis specifically in developing wheat grains. Recognition of both C 3 and C 4 -like type photosynthetic pathways in wheat provides a basis for interpretation of wheat performance as a crop adapted to maturation in hot dry environments, suggesting that the plant may rely more on C 4 photosynthesis under conditions of water stress during the grain filling stage. Photosynthates from pericarp, glumes and awns are critical 59 when other parts of the plant lose photosynthetic capacity due to terminal drought often experienced in the environments in which wheat evolved. This may be especially important in the development of wheat varieties to adapt to climate change 60 and associated temperature extremes. The operation of C 4 photosynthesis specifically in these tissues provides an adaptive advantage to the wheat plant while C 3 photosynthesis is adequate during early vegetative growth under more temperate conditions. The potential for genetic manipulation to extend C 4 photosynthesis throughout the wheat plant seems much more realistic given the existing expression of the entire pathway in the grain. This supports the view that plant species have evolved specific photosynthetic pathways in different organs, at specific developmental stages and in different environments suggesting that the classification of plants as C 3 or C 4 or CAM in a broad fashion cannot simply be based upon leaf anatomy. Research to establish the variation in flux through this pathway in wheat and its progenitors will shed much light on the share of carbon fixation through the C 3 and C 4 pathway under varying environmental conditions. This has the potential to suggest new options for the development of higher yielding wheat genotypes.  58 . Developing grains were collected from wheat spikes at 14 days-and at 30 days-postanthesis (dpa) as described elsewhere 58 .
RNA isolation, library preparation and NGS sequencing. RNA isolation, cDNA synthesis, library preparation and next generation sequencing was carried out and described by Furtado, et al. 58 . Libraries for 31 samples from 14 dpa and 32 samples from 30 dpa with 28 genotypes in common were prepared and sequenced as described in Furtado, et al. 58 . Libraries were not prepared for four cultivars viz., NW-93A, NW-108A, Pelada, and Vega at 14 dpa, and three cultivars viz., Greece-25, NW-25A, and NW-51A at the 30 dpa stage due to lack of sufficient starting material.
Sequencing data processing and analysis. Sequencing data obtained was imported into CLC genomics workbench ver. 7.0.4 (CLC Bio-Qiagen, Denmark) and further processing and analysis were done within this environment unless otherwise stated. Quality checking, trimming, and RNA-Seq analysis were performed as described in Furtado, et al. 58  Differential transcript and statistical analyses. Transcripts that were differentially expressed between 14 and 30 dpa were analyzed using the RNA-Seq experimentation tool with default parameters. Statistically significantly differentially expressed transcripts were identified using both Gaussian (mean based) and Empirical analysis of Differential Gene Expression (EDGE, count based) statistics facilitated through CLC workbench (CLC Bio-Qiagen, Denmark) with p-value using false discovery rate (FDR) corrected least the significant difference set at 0.01 level.
Functional annotation and data mining. In total, 26,477 transcripts that are common for both Gaussian and EDGE statistics were significant at FDR corrected value 0.01. Among them, 319 and 181 transcripts were unique to 14 dpa and 30 dpa respectively; while 16237 and 9740 transcripts were differentially up-regulated at 14 dpa and 30 dpa respectively. Transcript sequences for these four groups (unique 14 dpa, unique 30 dpa, differential 14 dpa and differential 30 dpa) were extracted from the reference database (TaGI) and subjected to blastx analysis against the non-redundant protein database. Blast results obtained were converted to a BLAST2GO project file and exported in ".dat" format files using the plug-in version within CLC workbench (CLC Bio-Qiagen, Denmark). Functional annotation for these four groups was performed independently using BLAST2GO Pro ver 3.0.10 with default parameters 62 . Annotations were augmented using InterProScan and followed by Run-annex options. Annotations pertaining to the plant database were retained using the GO (gene ontology)-slim option. Finally, KEGG (Kyoto encyclopedia of genes and genomes) pathway maps for these four annotated sequence groups were retrieved using GO-enzyme code mapping option. The differential 14 dpa group highlighted the presence of a complete C 4 photosynthetic pathway existing in developing wheat caryopsis.
Chromosomal localization and IWGSC transcript retrieval. Based on enzyme code mapping, TaGI transcript IDs pertaining to those enzyme code (EC numbers) for the genes involved in the C 4 photosynthetic pathway from the differential 14 dpa group were retrieved using CLC workbench (CLC Bio-Qiagen, Denmark). A total of 62 transcripts for the six genes (phosphoenolpyruvate carboxylase -ppc; aspartate aminotransferase -aat; malate dehydrogenase -mdh; decarboxylating dehydrogenase -me2; alanine aminotransferase -gpt; and pyruvate, Scientific RepoRts | 6:31721 | DOI: 10.1038/srep31721 orthophosphate dikinase -ppdk) were retrieved. Blast searches for the 62 transcripts from the TaGI database were performed against the IWGSC cDNA database containing 100,717 sequences (release-23) 31 for retrieval of IWGSC transcripts (since the TaGI transcripts are lesser in length and mostly incomplete).
Modified reference and RNA-Seq analysis. Based on blast analyses using the 62 transcripts of TaGI as reference 55 transcripts from the IWGSC cDNA database (release-23) 31 were obtained. Using sub-genome sequence information and sequence alignment, 10 of 55 transcripts were found to be actually five genes each being two parts of the same transcript with or without overlap. Based on homology and sequence alignment between the sub-genome copies, those 10 transcripts were stitched into five transcripts resulting into a total of 50 transcripts for six genes that accomplish NAD-ME type C 4 photosynthesis. In order to construct a modified reference, 10 transcripts (that are used to stitch) were replaced with the five stitched transcripts in the 100,717 sequences of the IWGSC cDNA (release-23) 31 . The resulting database containing 100,712 sequences was named "modified IWGSC cDNA (release-23)"and used for performing RNA-Seq analysis as described above 58 to obtain RPKM values for the 31 genotypes at the 14 dpa stage. Although researchers use FPKM instead of RPKM for paired-end reads, we used RPKM with an option of counting mapped paired-end reads as "two" and singleton reads that are mapped as "one" to avoid confusion between FPKM and RPKM terminologies.
RNA-Seq analysis for tissue specific transcriptome data. Raw reads (100 bp paired-end sequencing on Illumina HiSeq2000) of different tissues (leaf, and grain) at three different growth stages for hexaploid wheat ('Chinese Spring') were available online 25 . These raw sequence reads were downloaded, and processed through the CLC workbench (CLC Bio-Qiagen, Denmark). Quality checking, trimming and RNA-Seq analysis using the modified IWGSC cDNA (release-23) containing 100,712 sequences as reference were performed to obtain RPKM values and represented in pictorial form.
Taxonomical and evolutionary relation for C 4 -specificity. Specific amino-acid positions for PPC (PEPCase) that are functionally related to C 3 and C 4 -specificity were reported recently 13 . In order to identify these in wheat and related species (Table 2), whole genome sequence details 63 were downloaded and translational blast analysis was performed using CLC workbench ver. 8.5.1 (CLC Bio-Qiagen, Denmark).
Similar analyses were performed for species (for which genome sequence was available) including taxa from bryophytes to angiosperms 31,64,65 corresponding to various diversification points in an evolutionary timeline (Table 3). Although whole genome data for some well-known C 4 species was not available, ppc gene sequences in public databases was used to study the evolutionary pattern at specific amino acid positions ( Table 3) that are functionally related to C 3 or C 4 specificity.