Deposition of lignin in four species of Saccharum

We used primers designed on conserved gene regions of several species to isolate the most expressed genes of the lignin pathway in four Saccharum species. S. officinarum and S. barberi have more sucrose in the culms than S. spontaneum and S. robustum, but less polysaccharides and lignin in the cell wall. S. spontaneum, and S. robustum had the lowest S/G ratio and a lower rate of saccharification in mature internodes. Surprisingly, except for CAD, 4CL, and CCoAOMT for which we found three, two, and two genes, respectively, only one gene was found for the other enzymes and their sequences were highly similar among the species. S. spontaneum had the highest expression for most genes. CCR and CCoAOMT B presented the highest expression; 4CL and F5H showed increased expression in mature tissues; C3H and CCR had higher expression in S. spontaneum, and one of the CADs isolated (CAD B) had higher expression in S. officinarum. The similarity among the most expressed genes isolated from these species was unexpected and indicated that lignin biosynthesis is conserved in Saccharum including commercial varieties Thus the lignin biosynthesis control in sugarcane may be only fully understood with the knowledge of the promotor region of each gene.

soluble and insoluble lignin. In the four species the highest soluble lignin contents were found in new internodes (Fig. 4A), and S. officinarum had the highest content, which was equal among the others. S. officinarum also showed the highest content in mature culms. On the other hand, insoluble lignin content was higher in mature internodes in the four species (Fig. 4B), and in the two internode stages analyzed the highest contents were found in S. spontaneum and S. robustum.

Saccharification yield.
A similar pattern could be observed in saccharification yield of S. barberi and S. officinarum, and of S. robustum and S. spontaneum, constituting two distinct groups (Fig. 5A). While in the first two species the saccharification yields between the two stages of internodes were similar, they were quite different in the second group. Saccharification in mature internodes of S. spontaneum and S. robustum was nearly halfed that of young internodes. In general, the percentage of saccharification in young culms was close in the four species, around 65%. s/G ratio of lignin. S/G ratio was higher in new internodes than in mature internodes of S. robustum and S. barberi but did not differ in the other two species (Fig. 5B). When comparing only new internodes, the first two Profile of soluble lignin oligomers. Soluble lignin monomers and oligomers were identified through comparison with data from a library 15 , using retention times, m/z ratio, and MS/MS fragmentation pattern. We  www.nature.com/scientificreports www.nature.com/scientificreports/ found linkage structures belonging to the groups β-aryl ether (8-O-4), phenylcoumarin , and resinol , and it was possible to identify the S aromatic unit involved in each of these linkage structures (Table 1). In the species investigated we identified 11 structures: one aldehyde sinapyl, one monolignol (S), four dimers, and five trimers. Two of the dimers and two of the trimers presented 8-5 linkages, with G monomers, which are more recalcitrant linkages. The other linkages present in the trimers (8-O-4) Figure 6 shows that, irrespective of internode age, S. spontaneum and S. robustum have the greatest diversity and frequency of oligomers compared with S. officinarum and S. barberi. Aldehyde sinapyl (m/z = 207) was found both in young internodes and in mature internodes of all species, whereas S monolignol (m/z = 209) was found only in young internodes of S. officinarum and S. barberi, more frequently in the latter species. In the species S. robustum and S. spontaneum, lignin dimers show a tendency to be more frequent in mature internodes, contrary to what was found for S. officinarum and S. barberi, where dimers were more frequent in young internodes. Trimers were found preferably in mature internodes of the four species, and with remarkable frequency in S. spontaneum and S. robustum. Comparing all the oligomers identified, the dimer m/z = 387 [S    www.nature.com/scientificreports www.nature.com/scientificreports/ Composition of monosaccharides, of lignin, and acetyl groups substituent of cell wall xylan. Figure 7A,B show the expansion of the 2D-HSQC NMR spectrum ( 1 H (x-axis)/ 13 C (y-axis)) of the lignin aromatic region and anomeric region, respectively, of a stem wall sample, taking as example one of the Saccharum species. Prominent peaks corresponding to known polysaccharide linkages connections are tagged 56,57 . The compositions of p-hydroxycinnamates, O-acetyl substituent groups in xylan, and monosaccharides are shown in Fig. 7C-E. There was no significant difference as to p-coumarate and ferulate (Fig. 7C). In relation to the relative abundance of O-acetyl substituent groups of xylans (Fig. 7D), S. officinarum had a significantly higher percentage of 3-O-Ac substituent groups in xylan in relation to the other species under study. On the other hand, as for the 2,3-O-Ac group, there were significant differences between the species under study, and the highest percentage was found in S. officinarum and S. spontaneum. There were no significant differences between the species under study in relation to the total relative abundance of the acetylated groups and of the 2-O-Ac substituent group. S. spontaneum and S. robustum showed significantly highest glucose content, in relation to the species S. officinarum and S. barberi. In opposition to what was found for glucose, xylose percentage was significantly higher in the species S. officinarum and S. barberi. S. officinarum presents significantly greatest abundance of mannose when compared with the other species under study. As for the case of arabinose, S. barberi was the species that presented the highest percentage of this monosaccharide. S. spontaneum, S. robustum, and S. officinarum showed no significant differences with respect to the monosaccharide arabinose (Fig. 7E). β aryl ether and dibenzodioxocin were the main linkages detected in the four species while resinol and phenylcoumaran were found in lower amounts (Fig. 7F). S. officinarum showed the highest percentage for β aryl ether and the lowest for resinol and phenycoumaran.    www.nature.com/scientificreports www.nature.com/scientificreports/ with yellow coloration (Fig. 8). In the different stages of development analyzed, there was an increase in tissue lignification in the fifth and seventh internodes, with the second internode, still immature, showing little lignification. In the seventh internode of S. officinarum (Fig. 8E) and S. barberi  www.nature.com/scientificreports www.nature.com/scientificreports/ Starch grains, stained black, were observed in the chlorophyll parenchyma cells in the peripheral region of the culm of all species analyzed (Fig. 10). However, in the fundamental parenchyma cells the starch grains were only observed in abundance in S. spontaneum (Fig. 10C).
The marked differences found between the species were the thickness of the cell wall of the fibers of the vascular bundles in the peripheral region and the lignification of the parenchyma cells in the central region. In S. officinarum (Fig. 9E,F) and S. barberi (Fig. 9K,L) the vascular bundles near the epidermis presented fibers with thinner cell wall compared with those present in S. spontaneum (Fig. 9Q,R) and S. robustum (Fig. 9W,X). In the peripheral region, parenchyma cells of all species are lignified on the seventh internode. However, in the central region of the S. officinarum culm the parenchyma cells remain non-lignified. www.nature.com/scientificreports www.nature.com/scientificreports/ Identification and expression of monolignol biosynthesis genes. Bands taken from the gels and sequenced enabled the identification of 13 unigenes in the four Saccharum species: 1 C4H, 2 4CL, 1 HCT, 1 F5H, 1 C3H, 2 CCoAOMT, 1 CCR, 1 COMT, and 3 CAD. As two genes were isolated for CCoAOMT and 4CL, they were identified as A and B; and for CAD they were called A, B, and C. The SAS (Sugarcane Assembled Sequences) of the respective orthologs in sugarcane identified by Bottcher et al. 33 and the abundances of reads observed for each one of the genes identified in this study are shown in Supplementary Table S3. The phylogenetic analyses of the sequences of the genes isolated from the Saccharum species of this study and other angiosperms are in Supplementary Figs S1-S9 and the translated sequences for proteins are in Supplementary Figs S10-S18.

Gene expression profile in S. spontaneum and S. officinarum. Expression of the identified genes
were analyzed by qPCR (Fig. 11). In general, most genes were higher expressed in S. spontaneum, namely: C4H, 4CL A, C3H, CCoAOMT A and B, CCR, and F5H. S. officinarum had higher expression of HCT, COMT, and CAD B genes. The CAD A gene had varied expression between the tissues, but its highest expression was in young and mature leaf (Fig. 11J). Internodes 3 and 5 showed a difference in rind and pith. C4H was equally expressed in pith and rind of S. officinarum and decreased from rind to pith in S. spontaneum (Fig. 11A). 4CL A showed no difference between rind and pith in internode 3 in both species, but decreased from rind to pith in internode 5 for S. officinarum and increased for S. spontaneum (Fig. 11B). HCT had higher expression in rind of internodes 3 and 5, but compared with pith the expression in this tissue was lower (Fig. 11C). C3H had higher expression in all tissues of S. spontaneum compared with S. officinarum. The expression of C3H was higher in rind than in pith of internode 3 (Fig. 11D). CCoAOMT A had higher expression in rind and pith in internode 5 than in internode 3 ( Fig. 11E). However, this gene was more expressed in pith (internodes 3 and 5) than in rind in S. officinarum and the opposite was observed in S. spontaneum. CCoAOMT B maintained the expression in rind of internodes 3 and 5 and increased slightly between pith 3 to 5 in S. officinarum, in S. spontaneum this gene was more expressed in tissues of internode 5 than 3 (Fig. 11F). CCR had relatively higher expression in rind and pith in internode 5 in S. spontaneum (Fig. 11G). In S. officinarum the expression was lower in all tissues, and there was higher expression in rind of internode 5 than in rind of internode 3. Among the genes analyzed CCR was one of the most expressed of the lignin biosynthetic pathway, followed by CCoAOMT B (Fig. 11F-G). S. officinarum showed no difference in expression between rind and pith for internodes 3 and 5 for F5H, but higher expression in S. spontaneum in rind and pith in internode 5 (Fig. 11H). In COMT a higher expression in pith of internode 5 for both species should be noted (Fig. 11I). CAD A presented a more specific pattern in young and mature leaves, low expression in roots for both species, and higher expression in pith for internodes 3 and 5 compared with rind, respectively of each www.nature.com/scientificreports www.nature.com/scientificreports/ internode in S. officinarum (Fig. 11J). No differences were observed in the expression of CAD A between rind and pith for S. spontaneum. CAD B in S. officinarum showed higher expression in rind of internode 5. In S. spontaneum there was higher expression in tissues of internode 5 compared with internode 3 (Fig. 11K). Interestingly, www.nature.com/scientificreports www.nature.com/scientificreports/ the genes display distinct pattern of expression in the two species, which shows a complex and distinct pattern in the control of the lignin biosynthetic pathway. The CCR and CCoAOMT B genes were expressed the highest, 4CL and F5H displayed higher expression in more developed tissues, i.e., internode 5; C3H and CCR in S. spontaneum; CAD B in S. officinarum.

Discussion
Sugarcane has the capacity of storing soluble, readily fermentable sugars (mostly sucrose) up to 18% of the fresh mass in the stalk 2,58 . The large accumulation of sucrose occurs in the maturation of the culms. Energy cane accumulates half or less sucrose than sugarcane and much of the fixed carbon is shuttled to structural polysaccharides such as cellulose and hemicelluloses 59 . By comparing the mature internodes between the Saccharum species studied, the lowest values for cellulose, hemicellulose, and pectin were found in the species S. officinarum, and the highest values were found in S. spontaneum (Fig. 1). The opposite was observed for sucrose, the primary soluble sugar in mature culms (Fig. 2). With some variation, S. barberi had closer levels to those of S. officinarum, while S. robustum was closer to S. spontaneum. This inverse relationship appears to be reflected in the wall monosaccharide composition evaluated by 2D-HSQC NMR spectroscopy. S. officinarum and S. barberi biomass harbor a higher xylose content, while S. spontaneum and S. robustum a higher glucose content (Fig. 7D) reflecting the competing sinks for these carbohydrates, hemicellulose and cellulose, respectively 60,61 .
Interestingly, while the cellulose content remained the same in new and mature culms of S. barberi and S. officinarum, it increased in the other two species. This behavior is opposite to the sucrose levels, that is, the disaccharide increases with maturation in the culms of S. barberi and S. officinarum, but remains practically the same in S. robustum and S. spontaneum. On the other hand, the comparison of reducing sugar contents in new and mature culms shows a much greater variation for S. barberi and S. officinarum, suggesting that reducing sugars in these species are directed towards sucrose synthesis, whereas in the other two species towards structural polysaccharides, in particular cellulose 62 . Similar to Panicum virgatum 63,64 , Brachypodium distachyon 60,65 , and Zea mays 66,67 , during the development of the internodes in S. spontaneum and S. robustum there was higher accumulation of carbon as unsoluble polysaccharides (cellulose, hemicellulose and pectin) in the cell wall, than the soluble sucrose in the parenchymal cell.
While the starch content was reduced during the maturation of the culms in S. officinarum, S. robustum, and S. barberi, it increased notably in S. spontaneum as also visually observed in the histochemical analyses. Starch granules were detected in the fundamental parenchyma of mature internodes of S. spontaneum.The presence of starch in S. spontaneum had been reported previously 59 , where 215 clones related to the genera Saccharum, Erianthus, and Miscanthus were analyzed. While S. robustum was the species with only traces of starch, S. spontaneum harbors the highest content. It has been suggested that the accumulation of starch in mature internodes of this species could be due to its capacity for tillering and high metabolic activity and as a strategy to cope with biotic and abiotic stresses 68 .
Lignin is the second largest biopolymer present in the cell walls of grasses 69 . Although it is essential for plant growth and development, lignin is the main factor responsible for the recalcitrance to processing of plant biomass in 2GE, including sugarcane 33 . Lignin content in the Saccharum species was determined using the Klason method, which distinguishes the soluble and insoluble fractions together providing a total estimate of lignin 70 . Regarding internode age a negative correlation was observed between these two types of Klason lignin, indicating greater amount of soluble Klason lignin (monomers and oligomers precursors of insoluble lignin polymers) in young internodes, and insoluble lignin in mature internodes. This is not surprising as lignification of the wall is still underway in young internodes. However, most of the lignin biosynthetic genes analyzed had a lower expression in young culms suggesting that the larger amount of soluble lignin in these tissues would be correlated to the polymerization process and not with monolignol production.
In the culm, the rind contains a high percentage of densely packed vascular bundles and is a metabolically active region with high peroxidase activity, therefore polymerizing and thus accumulating lignin 31,33 . When comparing the insoluble lignin content in mature internodes of the four species, S. spontaneum (20%) and S. robustum (18%) contain higher values than S. barberi (16%) and S. officinarum (14.5%). This difference was also observed in the histochemical analyses with phloroglucinol-HCl. Compared with S. officinarum and S. barberi, the rinds of mature internodes of S. spontaneum and S. robustum have higher density of vascular bundles and the walls of cellular elements such as hypodermis, epidermis, sclerenchyma and vascular fibers seem thicker and more lignified, contributing significantly to the higher content of this polymer. A general analysis of the expression of lignin biosynthesis pathway genes in the tissues of the culms displays a higher expression in S. spontaneum compared to S. officinarum, and a higher expression in tissues (rind and pith) of internode 5 compared with internode 3, supporting the higher insoluble lignin content in S. spontaneum and in mature tissues of the stalk. These gene expression differences, however, varied slightly depending on the species and tissue, for example, C4H in S. spontaneum, C3H in pith of the two internodes, CAD A and CAD B in rind and pith of S. officinarum, CCoAOMT A in rind of S. officinarum, and HCT in pith of S. spontaneum.
The nature of inter-monomeric linkages between lignin oligomers and their modifications can be exploited for the production of more degradable lignins 15,71,72 enabling greater efficiency in fermentation process using cell wall sugars for 2GE production. The linkages 8-O-4 (β aryl ether) are the most common and are characterized as those of easiest cleavage. Lignins rich in G units have more recalcitrant linkages, such as 8-5 (phenylcoumarins), 5-5 (resinols), and 5-O-4, while S lignins are less interlinked and less recalcitrant to hydrolysis 15,73 . Overall, the analyses of the profiles of oligomers obtained by UPLC/MS from the four species studied identified 11 structures, between aldehydes, monomers, dimers, and trimers ( Table 1). The distribution of these structures allowed a clear distinction between the internodes of the Saccharum species, and there was higher frequency of lignin oligomers in mature internodes than in young internodes. On the other hand, the highest amount of soluble phenols in all species were found in young culms, with markedly higher quantities in S. robustum and S. spontaneum compared (2019) 9:5877 | https://doi.org/10.1038/s41598-019-42350-3 www.nature.com/scientificreports www.nature.com/scientificreports/ with the other two species. Large quantities of free phenols, such as hydroxynnamic acids and chlorogenic acids, are found in tissues in lignification 10,16,25 . Also mature internodes of S. robustum and S. spontaneum the highest frequency and diversity of lignin oligomers (dimers and trimers) were found. Morreel et al. 74 commented that the various lignin oligomers in tissues that undergo extensive lignification are derived from the availability of monolignols that are coupled under oxidative conditions for cell wall lignification, justifying the correlation between lignin content and frequency of oligomers.
The 8-O-4 linkage was the most common type of lignin linkage (Fig. 7F). According to Santos et al. 75 , this type of linkage is dominant in lignins of grasses, corresponding to 60% of the total. Other works such as those presented by Bottcher et al. 33 and Kiyota et al. 15 also corroborate these results. It was also observed that G units were found more frequently than S units in oligomers of the four Saccharum species. We could not find H units, although the lignin of grasses is characterized by having more of these units than the lignin of dicotyledons 76 . The non-detection of H units in the Saccharum species could be explained by the fact that these units occur essentially as free terminal, inert phenolic groups, and their incorporation prevent the growth of the lignin polymer. Due to their high oxidative potential they are insoluble in ethyl acetate, which was the solvent used in the extraction of the oligomers 76  In sugarcane hybrids it has been observed that the S/G ratio increases with the development of the stalk 33 . The same was observed in other grasses such as Festuca arundinacea 78 , Zea mays 66 , and Panicum virgatum 79 . However, such a ratio increase was not observed here, with the S/G ratio being higher in young internodes of S. barberi and S. robustum and equal for the other two species. Local growth conditions may have affected the S/G ratio, but in the case of the S. barberi and S. robustum the low values might be due to the amount of one of the monomers being higher in the pith or rind. We did not separate rind and pith for S/G analysis but based on the histochemical analysis the G amount (stained yellow, see Fig. 8) was elevated in the pith compared to the rind.
Lignin composition (S/G ratio) affects the yield of saccharification 7 since tissues rich in S are more susceptible to hydrolysis than those rich in G 80 . We found no significant difference in saccharification in young internodes of the four species studied, which is not unexpected, since the lignification process has not been completed based on the content of soluble and insoluble lignin, oligomers, and phenols. However, it is interesting to note that in young tissues there seems to be no relationship between saccharification yield and S/G ratio, since S. barberi and S. robustum have higher S/G ratio, but saccharification yield is equal. However, mature internodes of S. spontaneum and S. robustum with lower S/G ratio resulted in a lower yield of saccharification. Therefore, higher yield of saccharification is related to S/G ratio, but only in tissues whose maturity has been reached and, thus, where the secondary cell wall formation process has been completed.
F5H and COMT are thought to be the determinant enzymes in defining S unit content in plants 38,81 . In P. radiata, the joint action of the two activities led to an increase of S units, with the increase being smaller when only F5H was overexpressed 81 . In sugarcane, the reduction in the expression of COMT and F5H using RNAi led to different situations 38 . While plants with partial silenced F5H did not show a reduction in lignin content, one of the lines had a reduced S/G ratio with a concomitant increased saccharification yield. One of the mutants of COMT displayed a reduction in lignin content and improvement in saccharification yield. One of the mutants of COMT exhibited a reduction in the S/G ratio. Our data do not indicate a direct relationship between the expression of COMT and F5H and the S/G ratio. Using S. spontaneum as an example, this species had a similar S/G ratio between young and mature internodes; however, the expression of COMT and F5H was a little higher in pith of mature internodes but equal to the rind of young and mature internodes. On the other hand, the expression of F5H was much higher in mature tissues. A similar situation was also observed in S. officinarum, but with lower expression values. It cannot be ruled out that other hitherto unidentified isoforms of COMT and F5H are involved in lignin biosynthesis in these two species, but it is noteworthy that Bottcher et al. 33 isolated only one COMT and one F5H in sugarcane, and its sequences have a high homology with the sequences isolated in the four species studied.
Another factor that has been recognized as negatively affecting plant biomass processing into 2GE is the degree of O-acetylation of cell wall polymers, since acetate, when released during pretreatment represents a powerful inhibitor of fermenting microorganisms 82 . O-acetylation of hemicelluloses also reduced enzymatic hydrolysis due to steric hinderance of the acetate 83 . Therefore, reducing the content of O-acetyl groups in biomasses with bioenergetic potential is desirable 84 . The main hemicelluloses in grasses are xylans 3 and their degree of O-acetylation may vary according to plant species, type of tissue and organ, and state of development 85 . Xylan acetylation occurs more frequently in position O-3 (up to 30%) and less frequently in O-2 (up to 25%), but acetylation in both positions has been reported 85 . In the Saccharum species studied here, it was found that the total percentage of acetylation (36.9-39.9%) was similar to values found in other grass biomasses 86  www.nature.com/scientificreports www.nature.com/scientificreports/ 22.9%, respectively) and total acetylation (36.9-37.4%, respectively). However, the hypothesis that biomass with a reduced percentage of acetylesters results in higher saccharification yields 87 could not be supported here. S. officinarum and S. barberi, with a higher degree of acetylation than S. spontaneum and S. robustum, exhibited a higher yield of saccharification. Since it is known that in secondary walls xylans are closely associated with cellulose 88 , a lower percentage of acetyl groups in S. spontaneum and S. robustum could lead to an even tighter association of xylan with cellulose adding to recalcitrance in these species, and limiting the yield of saccharification 83 .
The strategy used in this study to identify genes involved in the lignin biosynthetic pathway in the four Saccharum species involved the amplification of fragments produced in RT-PCR reactions using primers designed from conserved regions of gene sequences of sugarcane and of several other close species. Therefore, such primers are likely to amplify sequences of closely related genes encoding similar enzymatic activities. There is a possibility that not all genes of a gene family are amplified. However, the isolated genes represent the highest expressed genes in the tissues is high. Taking into account that the four species studied presented distinct genetic characteristics, it was surprising to observe that the isolated sequences are highly similar among the species and very close in sequence to the ones identified by Bottcher et al. 33 in sugarcane. Such similarities could be explained not only by the evolution of the lignin biosynthetic pathway in terrestrial plants but also by the origin of the genus Saccharum and of the commercial cultivars of sugarcane. The parental genomes of S. officinarum (80-90%) and S. spontaneum (10-20%) contributed to sugarcane hybrids including to some extent recombinant chromosomes 89 . Additionally, the lignin biosynthetic pathway is very conserved between plants and modifications in this pathway generate similar phenotypes between monocotyledons and dicotyledons. The approaches to manipulate lignin in alfalfa 7 can be transferred to other species such as switchgrass and sugarcane 29,37 . Genes related to sugar accumulation in sugarcane culms arose through differential expression of other regulators suggesting a specific epigenetic control. PAL is highly conserved between plants and seems to precede the divergence of dicotyledons and monocotyledons 90 . Genes related to transcriptional activation are highly conserved in grasses 91 . An example is the gene SND1 which activates several transcription factors: SND3, MYB46, MYB83, MYB85, and MYB105; apparently very conserved during evolution 91 .

Conclusions
The set of data obtained here enabled the association of patterns to better understand the process of lignin deposition in four Saccharum species. The differences between the species studied became evident, whether in relation to structural and non-structural carbohydrates or in the quantity and type of lignin. The data enabled the coherent separation of the two species that have been identified as energy canes, S. spontaneum and S. robustum, which accumulate more fiber, from the other two, which accumulate more sucrose. Moreover, the first two species contain more insoluble lignin, the lowest S/G ratios, greater abundance of intermonomeric linkages (lignin oligomers), and lower percentages of saccharification. Gene expression analysis of the lignin biosynthesis pathway genes in S. officinarum and S. spontaneum showed that in general the later species has higher expression in culm tissues especially in mature culms. Surprisingly the sequences of the identified genes showed high conservation in the four Saccharum species including the commercial hybrids. This feature is desirable for the genetic manipulation of energy cane, since knowledge has already been gained with low lignin commercial varieties of sugarcane 39,44,92,93 . It has been show in other grasses that lignin biosynthesis has a complex regulation by transcription factors, which can activate or repress the expression of the several genes of the route [93][94][95][96][97][98] . However, to our knowledge, this is the first report describing that lignin genes are highly conserved among species of the same genus and, consequently, the differences they have regarding the polymer content and composition can be only fully understood after gaining knowledge on the sequencing of the regulatory regions of each gene or at least of a set of genes.
Methods plant material and growing conditions. Culms of the species S. spontaneum, S. officinarum, S. robustum, and S. barberi were obtained from the Center of Sugarcane of the Agronomy Institute of Campinas, at Ribeirão Preto, São Paulo State, Brazil. The culms were planted in plastic trays containing vermiculite and kept in a greenhouse and the resulting seedlings were transplanted to 50 L pots containing commercial organic substrate and kept in the greenhouse for approximately one year. For each species 5 replicates were planted (5 pots). After this period, the substrate of the pots was partially replaced, taking care not to damage the root system, and the pots were transferred out of the greenhouse, to the experimental area of our department, under natural sunlight. The pots remained in these conditions for a period of 4 months, with daily irrigation.
Only healthy stems, without any sign of physical injury or disease were collected. For biochemical analyses internodes 2 + 3 (young stage) and internode 8 (mature stage) were separated from the apex. Histochemical analyses were performed on internodes 2, 5, and 7. Internodes 4 to 10 were used for cell wall characterization by 2D-HSQC NMR spectroscopy. To identify the genes of the lignin biosynthetic pathway we made a composite sample, containing a 1/1 (w/w) mixture of young internodes (2 + 3) and mature internodes (8), from five plants. For the expression analyses (quantitative RT-PCR, qPCR) 7 types of tissues were used: young and mature leaves, rinds of internodes 3 and 5, piths of internodes 3 and 5, and roots. A steel blade was used to separate the rind from the pith 31 . In the samplings, the stems were washed in tap water, chopped into small pieces of 1 cm 2 , frozen in liquid nitrogen and grinded and stored in freezer at −80 °C. For biochemical analyses the ground tissues were dried in a freeze-dryer.
Histochemical analysis. Internodes 2,5, and 7 of the stems of the four species were used in these analyses.
Cell wall polysaccharides. The protocol of Chen et al. 78 was followed and pectin, hemicellulose fraction and cellulose were determined. Total sugar content in each fraction was determined with phenol-sulfuric reagent, using glucose as standard 102 . Non-structural sugars and starch. Samples were extracted with 70% ethanol at 60 °C for three times and the supernatants were pooled after centrifugation. Total soluble sugars and sucrose were determined with the phenol-sulfuric assay 102,103 and glucose and sucrose were used as standards, respectively. Reducing sugar content was determined according to Nelson 104 using glucose as standard. Starch content was determined according to Amaral et al. 105 . The dried, 70% ethanol extracted samples were treated sequentially with α-amylase from Bacillus licheniformis (code E-ANAAM, MEGAZYME, Ireland) and amyloglucosidase from Aspergillus niger (code E-AMGPU, MEGAZYME, Ireland) and the resulting glucose was determined with the PAP Liquiform glucose kit (Labtest Diagnóstica S.A.), using an ELISA plate reader (model EL307C, Bio-Tek Instruments, Winooski, Vermont) at 490 nm. Glucose was used as standard.
Analysis of wall constituents by 2D spectroscopy HSQC NMR. Ball-milled de-starched, alcohol insoluble material (25 mg) was dissolved in 0.75 mL of DMSO-d6 and 10 μL of [Emim] OAc-d14 as previously described 56 . The dissolved lignocellulosics were subjected to a 2D HSQC NMR experiment acquired on a Bruker AVANCE 600 MHz NMR spectrometer equipped with a 5-mm TXI 1 H/ 13 C/ 15 N cryo-probe using the pulse sequence 'hsqcetgpsisp.2' . The experiments were carried out at 25 °C with the following parameters: spectral width 12 ppm in F2 ( 1 H) dimension with 4096 data points (TD1) and 160 ppm in F1 ( 13 C) dimension with 256 data points (TD2); scan number (SN) of 200; inter scan delay (D1) of 1 s. The chemical shifts were referenced to the DMSO solvent peak (δ C 39.5 ppm, δ H 2.5 ppm). The NMR data was quantified as described previously using Bruker's Topspin 3.1 software 56,57 . The acetylation on xylan was quantified as described below. In brief, the signals in the aromatic region (H1-C1 signals of 2-O-Ac-Xyl, 3-O-Ac-Xyl, 2,3-O-Ac-Xyl, Xyl (xylan) and reducing ends of Xylan (α/β-Xyl-R)) were summed up to 100%, and the signal in the aliphatic region were integrated separately to calculate the relative content of each form of O-acetyl-xylan unit. The relative content of 2-O-Acetyl and 2,3-O-Acetyl-Xylan units were calculated from H2-C2 signal and 3-O-Acetyl-Xylan unit were calculated from H3-C3 signal. The monosaccharide composition [glucose (Glu), xylose (Xyl) and mannose (Man)] was quantified from their anomeric integrals as a fraction of 100%. The compositions of lignin; S (syringyl), G (guaiacyl), H (p-Hydroxyphenyl), FA (ferulate) and pCA (p-coumarate) lignin units were quantified from their aromatic lignin integrals as a fraction of 100%. total soluble phenols. The samples were extracted twice with 80% ethanol and the phenols extracted were determined with the Folin-Ciocalteu reagent 106 . Chlorogenic acid was used as standard.
Lignin content, s/G ratio, and oligomers. Soluble and insoluble lignin was determined according to the TAPPI UM-250 Protocol 107 . Insoluble lignin content was expressed as percentage of dry wall residue, obtained after sample extraction and hydrolysis. For the determination of soluble lignin, the absorbance of the filtrate of the hydrolysis product was determined at 205 nm and the content calculated using an extinction coefficient of 110 l. g −1 .cm −1 . To determine the S/G ratio, the samples were treated with NaOH in a heating block at 95 °C/24 h, neutralized with HCl and extracted with ethyl acetate. The residue was dried and then dissolved in H 2 O MilliQ and the hydrolysis products were analyzed by LC-MS using a UHPLC coupled to a triple quadrupole mass spectrometer with ESI ionization source (model ACQUITY, Waters Corp., Manchester, UK), as described by Mokochinski et al. 108 . For the analysis of soluble lignin oligomers the samples were twice extracted in 80% ethanol under sonication and the extracts were dried in a concentrator (Concentrator plus-Eppendorf). The dried residue was solubilized in acetonitrile/water (1:2, v/v) just before the analyses. The samples were analyzed in an Acquity UPLC coupled to a TQD triple quadrupole mass spectrometer (Micromass-Waters, Manchester, UK), according to Kiyota et al. 15 .
Saccharification. Saccharification was determined according as described by Brown and Torget 109 using lyophilized biomass equivalent of 10 mg of cellulose. After addition of sodium citrate buffer (0.1 M, pH 4.8), Na 3 N, and H 2 O MilliQ, the mixture was heated to 50 °C and cellulase (Trichoderma reesei) and cellobiohydrolase (Aspergillus niger) was added at a 1:4 v/v ratio (Sigma-Aldrich). The samples were incubated in a 160 rpm shaker at 50 °C for 5 days, and then centrifuged at 12,000 rpm for 15 min. Glucose was quantified in the supernatant 102 . In silico analysis of databases and synthesis of primers for identification of expressed genes. We studied the genes of the following lignin biosynthesis enzymes: 4- 33 were used as bait for the search for homologues in the NCBI and Phytozome databases. We selected sequences of sorghum (Sorghum bicolor), rice (Oryza sativa), corn (Zea mays), wheat (Triticum aestivum), Lolium perenne, and Arabidopsis thaliana. We used only full-CDS sequences with a low e-value (<10 −6 ). These sequences were aligned in the BioEDIT program 110 and conserved regions were used for the design of primers (Supplementary Table S1) using the Primer 3 program, having as parameters Tm 57 °C-60 °C, a difference of only Total RNA extraction, cDNA synthesis, amplification and sequencing. Total RNA extraction was performed in a 1:1 (w/w) mixture of tissues from young internodes (2 + 3) and mature internodes (8). Total RNA was extracted with Trizol (Tri-Phasis Reagent -BioAgency) and treated with Turbo DNAse-free (Ambion). First-strand cDNA synthesis was performed with SuperScript III (Invitrogen) following the manufacturers' guidelines. RT-PCR reactions were carried out in a thermal cycler (Veriti 96-Well Thermal Cycler-AB Applied Biosystems) following the parameters of Llerena et al. 112 . The amplification products were separated by electrophoresis in a 1% agarose gel containing ethidium bromide and observed by a photo-documenter Gel Doc 2000 (Biorad). Bands with the expected number of bases were recovered from the gel with GeneJET Extraction (Thermo Scientific), inserted into the cloning vector pGEM-T easy (Promega), and cloned in thermocompetent Escherichia coli DH10β (Novagen). Some colonies were selected, and the presence of the insert (PureLink Quick Plasmid Miniprep Kit, Invitrogen) and its size were confirmed after digesting the plasmid with EcoRI. The inserts were sequenced using M13 primers. Sequencing reactions were performed using BigDye ® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) and 3730xl DNA analyzer sequencer (Applied Biosystems). Several colonies were sequenced until 25 good quality sequences (forward and reverse orientation for each sequence) were obtained. phylogenetic analyses. The obtained nucleotide sequences were translated to amino acid sequences in silico, and homologous proteins obtained from databases NCBI (http://www.ncbi.nlm.nih.gov/), SUCEST (http://sucest-fun.org), and Phytozome (http://www.phytozome.net/) were selected for phylogenetic analysis. Multiple alignment of amino acid sequences was performed with the ClustalW program 113 . Phylogenetic analyses were performed with the MEGA program version 4.02 and evolutive relations were inferred using the Neighbor-joining algorithm with Bootstrap for 1,000 repetitions. Gap regions were excluded manually.
Gene expression analysis of the isolated genes. For the gene expression analysis primers specific for the isolated gene sequences were designed (Supplementary Table S2). The efficiency curve of the primers was determined with the Step One Plus Software v2.3 (Life Technologies). Total RNA extraction and first-strand cDNA production were carried out as described above. cDNAs of 7 tissues (new leaf, old leaf, rinds of internodes 3 and 5, piths of internodes 3 and 5, and root) of the species S. officinarum and S. spontaneum were used in the analysis. The reactions were prepared with iTaq ™ universal SYBR ® Green supermix (Bio-Rad) and analyzed in a StepOnePlus ™ Real-Time PCR System, following the program of 95 °C for 3 min and 40 cycles of 95 °C for 10 s and 60 °C for 30 s. The specificity of the amplified products was evaluated by dissociation curve analysis generated by the equipment. GAPDH (glyceraldehyde 3-phosphate dehydrogenase) was used as housekeeping gene 33  where the first level are the species of Saccharum and the second level are the sugarcane maturation stages, i.e., young internodes (2 + 3) and mature internodes (8). Comparison between means was performed through the Tuckey test (α = 0.05). For gene expression analysis we used ANOVA and for comparison of means the Tuckey test (α = 0.05). For the biochemical analyses (soluble sugars, starch, cell wall polysaccharides, total phenols, Klason lignin, and saccharification) we analyzed 5 biological replicates with three technical replicates each. For the analysis of soluble lignin oligomers and S/G ratio, we analyzed 5 replicates and 1 technical replicate each. For the analyses of hydroxycinnamic acids, monosaccharides, and acetylated xylans we analyzed three biological replicates and one technical replicate each. Results of the biochemical analysis were expressed as mean ± standard error. For gene expression, the analyses were expressed as the mean for three biological replicates and three technical replicates each.

Data Availability
All data generated or analysed during this study are included in this article (and its Supplementary Information files).