Integrated physiological and genomic analysis reveals structural variations and expression patterns of candidate genes for colored- and green-leaf poplar

Colored-leaf plants are increasingly popular and have been attracting more and more attentions. However, the molecular mechanism of leaf coloration in plants has not been fully understood. In this study, a colored-leaf cultivar of Populus deltoides (Caihong poplar, CHP) and green-leaf cultivar of Populus deltoides L2025 were used to explore the mechanism of leaf coloration through physiological and the whole genome resequencing analysis. The content of anthocyanins, total Chl, and carotenoids in the leaves of CHP and L2025 were evaluated. The ratio of anthocyanins to total Chl in CHP was 25.0 times higher than that in L2025; this could be attributed to the red leaf color of CHP. Based on the whole genome resequencing analysis, 951,421 polymorphic SNPs and 221,907 indels were screened between CHP and L2025. Using qRT-PCR analysis, three structural genes (flavonol synthase 1 family protein, UDP-glucose flavonoid 3-O-glucosyltransferase 3′ and flavonoid 3-O-galactosyl transferase family protein) and six transcription factors (MYB-related protein Myb4, transcription factor GAMYB, PtrMYB179, transcription factor bHLH53, transcription factor bHLH3, VARICOSE family protein) may be involved in the anthocyanin synthesis pathway, which could be used as candidate genes to explore the molecular regulation mechanism of leaf coloration in Populus deltoids, and could be used in molecular breeding in the future.

. The height and basal diameter of CHP and L2025, which were cutting propagation in one year. Each value is the mean ± SE of five replicates. ** and * indicate p < 0.001 and p < 0.05 by Student's t-test, respectively.
total Chl, Chl a, Chl b and carotenoids in CHP leaves compared with concentrations in L2025 leaves (Table 2). However, the concentration of anthocyanin in CHP leaves was much higher than concentrations in L2025 leaves ( Table 2). The anthocyanin to total Chl content ratio in CHP leaves was 25.0 times of these in L2025 leaves; this may be the physiological reason why CHP had bright red leaves and weaker growth. The bright red leaves could be a consequence of the higher anthocyanin to total Chl ratio, and the weaker growth may be due to the lower total chlorophyll content.
Whole genome resequencing of the Populus deltoids genome. To explore the genome variation between CHP and L2025, the whole genome resequencing was conducted using the P. trichocarpa genome as a reference. A total of 106.4 million reads (about 15.96 Gb) for L2025 and 108.9 million reads (16.33 Gb) for CHP were generated compared with the reference genome, which has 37.2× and 38.4× sequencing depth, respectively (Table 3). There was 410.4 Mb consensus sequence in the L2025 genome from 100.7 million reads (94.7% of the total reads), and 410.8 Mb consensus sequence in the CHP genome from 103.5 million reads (95.0% of the total reads). The CHP and L2025 genome coverage were 95.8% and 95.7%, respectively, compared with the P. trichocarpa genome (Table 3). Among 36,071 homozygote polymorphic SNPs, transitions (C/T and A/G) accounted for 65.7%, and transversions (A/T, A/C, G/T, and C/G) accounted for 34.3%, indicating that transitions occurred more easily than transversions (Table S1). There were 951,421 polymorphic SNPs between L2025 and CHP compared with the P. trichocarpa genome. The highest number of SNPs in chromosome 01 was 94,154, and the lowest number of SNPs in chromosome 16 was 32,315 ( Fig. 2, Table S2). Among these polymorphic SNPs, 33.2% of total polymorphic SNPs occurred in genic regions, and the remaining ones occurred in intergenic regions. There were 91,732 polymorphic SNPs in coding sequence regions, and non-synonymous and synonymous SNPs accounted for 53.4% and 46.6%, respectively. The functions of 36,389 genes might be changed due to the polymorphic SNPs occurred in coding sequence regions (Table S2).
There was a similar trend for the frequency and length of deletions and insertions. The most frequent indels were mononucleotide indels, accounting for 49.9% of the total. The next most frequent were dinucleotide indels with 18.1%, followed by trinucleotide indels with 9.7%, and tetranucleotide indels with 7.2% (Fig. S1). Among the 221,907 polymorphic indels, the highest number of polymorphic indels was 21,607 on chromosome 01, and the smallest number of polymorphic indels was 7580 on chromosome 16 (Table S3). Most of the polymorphic indels (71.2%) occurred in intergenic regions. Of the 63,832 polymorphic indels in genic regions, there were 45,596 indels in the intron regions, 13,511 indels in the untranslated regions, and 4725 indels in the coding sequence regions (Table S3).
Gene ontology and KEGG analysis of candidate genes with polymorphic SNPs. Gene Ontology (GO) analysis of candidate genes with polymorphic SNPs was conducted between L2025 and CHP. For the 'molecular function' , the binding contained most genes (13,856), followed by catalytic activity (13,577) and transporter activity (1444). There were 968 genes for nucleic acid binding transcription factor activity, 551 genes for structural molecule activity, and 477 genes for molecular transducer activity. These candidate genes were also classified according to the 'cellular components' as cell part with 10,445 genes, membrane with 5797 genes, organelle with 7992 genes, and cell with 10,409 genes. For the 'biological processes' , the metabolic processes contained 16,657 genes, cellular processes contained 13,573 genes, and single-organism processes contained 11,918 genes ( Fig. 3).   Table 3. The screening of genome variation between CHP and L2025 compared with the reference genome based on the whole genome resequencing.  www.nature.com/scientificreports www.nature.com/scientificreports/ The KEGG pathway enrichment analysis of candidate genes were conducted, and 'Flavonoid biosynthesis' (ko00941) was the enriched pathway associated with anthocyanin biosynthesis. There were 27 genes carrying polymorphic SNPs in this pathway. To further screen the possibly useful candidate genes associated with anthocyanin biosynthesis, genes carrying more than five non-synonymous SNPs in coding regions were considered. If these candidate genes do not work, genes carrying less than five non-synonymous SNPs in coding regions were also considered. In the flavonoid biosynthesis pathway, flavonol synthase 1 family protein (Ptr FLS1, Potri.004G139500) and shikimate O-hydroxycinnamoyl transferase-like (Potri.005G028000), carried more than five non-synonymous SNPs in coding regions, and were considered as candidate genes ( Table 4). The relative expression level of these two candidate genes was evaluated, and there was a significant difference for Ptr FLS1 between CHP and L2025, and no significant difference for Shikimate O-hydroxycinnamoyl transferase-like (Fig. S2). Therefore, Ptr FLS1 was screened as a candidate gene associated with anthocyanin biosynthesis. In addition, UDP-glucosyl transferase family genes with polymorphic SNPs were also counted. Six genes with more than five non-synonymous SNPs in coding regions are shown in Table S4. The relative expression level of these six genes was also evaluated by qRT-PCR, and there were significant differences between UDP-glucose flavonoid 3-O-glucosyltransferase 3 and flavonoid 3-O-galactosyl transferase family protein; there were no significant differences among the other four genes (Fig. 4). Therefore, UDP-glucose flavonoid 3-O-glucosyltransferase 3 and flavonoid 3-O-galactosyl transferase family protein were also considered as candidate genes associated with anthocyanin biosynthesis.
Candidate transcription factors associated with anthocyanin biosynthesis between L2025 and cHp. The anthocyanin biosynthesis was mainly affected by three families of transcription factors: MYB, bHLH, and WD40. There were 157 MYB family genes, 121 bHLH family genes, and 57 WD40 family genes. Ten candidate transcription factors with more than five non-synonymous SNPs in coding regions were identified for MYB family genes, six for bHLH family genes, and four for WD40 family genes (Tables S5-S7). To further screen candidate transcription factors associated with anthocyanin biosynthesis, the relative expression levels of these ten, six, and four candidate transcription factors were conducted. There was much significant difference for 'MYB-related protein Myb4' , 'PtrMYB179' and 'transcription factor transcription factor GAMYB' of leaves between L2025 and CHP, and no significant difference for the left ones (Fig. 5). There was higher relative expression of 'transcription factor bHLH53' and 'transcription factor bHLH3' in CHP leaves compared with that in L2025 leaves, and no significant differences for the other transcription factors (Fig. 6). For the WD40 family genes, there were significant differences for expression of the 'VARICOSE family protein' in CHP leaves compared with L2025 leaves; there were no significant differences for the other transcription factors (Fig. 7). These three MYB family, two bHLH family, and one WD40 family genes might play important roles in determining coloration of poplar, and could provide some references to further explore the molecular mechanisms of anthocyanin biosynthesis in poplar and other woody plants.

Relative expression levels of genes associated with anthocyanin biosynthesis in cHp.
There were three structural genes associate with anthocyanin biosynthesis in our results, which had more than five non-synonymous SNPs in coding regions and significant differences in the relative expression levels between CHP leaves and L2025 leaves. To better discover the regulation mechanism of anthocyanin biosynthesis, the relative expression levels of other structural genes were also evaluated. Our findings showed higher relative expression levels in CHP leaves for the structural genes compared with L2025 leaves (Fig. 8). Compared with the relative expression difference of anthocyanin biosynthesis upstream genes such as 4CL, CHS, and F3H between CHP and L2025, there were much greater differences in the relative expression difference of anthocyanin biosynthesis downstream genes such as UFGT, DFR, and ANS, which could be attributed to the higher content of anthocyanins in CHP leaves.

Discussion
The different colors that appear in various plant tissues are determined by the ratio and distribution of three kinds of pigments (chlorophyll, carotenoids, and flavonoids/anthocyanins). In green-leaved plants, there is a high ratio of chlorophyll to the other two pigments; in yellow-leaved plants, there is a high ratio of carotenoids to the other two pigments; and in red-, purple-, and blue-leaved plants there is a high ratio of anthocyanins to the other two pigments 1 . In the present study, a higher ratio of anthocyanins to the other two pigments was observed in the red leaves of CHP, compared with in the green leaves of L2025 (Table 2). In a previous study, a high ratio of anthocyanins to the other two pigments was observed in the purple leaves of the Quanhong P. deltoides cultivar (QHP) 8 . The red or purple leaf color of many transgenic plants has been associated with the overexpression of genes associated with anthocyanin synthesis 11,22,23 . Therefore, the high ratio of anthocyanins content to the other two pigments in CHP may be the physiological reason for the bright red leaves of CHP.
In addition, the growth of CHP was inhibited compared with the growth of L2025 (Table 1); in previous studies, the growth of purple-leaved QHP was also less than that of L2025 8 . The transgenic tomato plants overexpressing AtPAP2 accumulated much more anthocyanins compared with the wildtype, and were also significantly smaller 31 . Poplars overexpressing PtrMYB119 accumulated more anthocyanins compared with the wildtype; however, growth of transgenic poplars was similar with the wildtype 23 .
the structural genes involved in anthocyanin biosynthesis of leaves in cHp. The genes involved in anthocyanins biosynthesis including early biosynthetic genes (CHS, CHI, F3H, F3′H, and F3′5′H) and late biosynthetic genes (DFR, ANS, and UFGT) have been well studied 13 . In the present study, many genes carrying polymorphic SNPs were found to be involved in the flavonoid biosynthesis pathway, such as naregenin-chalcone synthase family protein, chalcone synthase 1, chalcone isomerase family protein, leucoanthocyanidin reductase (2019) 9:11150 | https://doi.org/10.1038/s41598-019-47681-9 www.nature.com/scientificreports www.nature.com/scientificreports/ family protein, dihydroflavonol reductase, leucoanthocyanidin dioxygenase-like (Tables 4 and S4). There were eight genes with more than five non-synonymous SNPs in the coding regions. Three genes (flavonol synthase 1 family protein, UDP-glucose flavonoid 3-O-glucosyltransferase 3, and flavonoid 3-O-galactosyl transferase family protein) had much greater differences in the relative expression levels between CHP and L2025 leaves and could be considered as candidate genes associated with anthocyanin biosynthesis. There were nine, non-synonymous SNPs in coding regions of flavonol synthase 1 family protein, five in coding regions of UDP-glucose flavonoid 3-O-glucosyltransferase 3, and fifteen in coding regions of flavonoid 3-O-galactosyl transferase family protein (Tables 4 and S4). Such mutations always induced abnormal anthocyanin accumulation, and led to the colored plants. The mutation of ANS or DFR genes led to the formation of pink onions 32,33 and beans with a black seed coat was produced owing to the deletion of the CHS promoter 34,35 . The deletion of DFR can also lead to purple ornamental kale 36 . As flavonol synthase 1 family protein, UDP-glucose flavonoid 3-O-glucosyltransferase 3, and flavonoid 3-O-galactosyl transferase family protein have more than five non-synonymous SNPs in the coding regions and significant differences in relative expression levels, these three genes can be used as candidate genes to further explore the molecular mechanism of leaf coloration in poplar and other colored-leaf plants.

the transcription factors involved in anthocyanin biosynthesis of leaves in cHp.
Three kinds of transcription factors, R2R3-MYB, bHLH, and WD40 transcription factors, are crucial for anthocyanin accumulation; R2R3-MYB transcription factors seem to be especially important 37 . PAP1 and PAP2 in Arabidopsis, belonging to R2R3-MYB transcription factors, can enhance anthocyanin production by up-regulation of structural genes such as LDOX, DFR and CHS 19,37 . Arabidopsis overexpressing PAP1/AtMYB75 promoted anthocyanin biosynthesis by enhancing the expression levels of structural genes 20 . UDP-Glc: flavonoid 3-O-glucosyltransferase (UFGT) gene in grape was induced through the expression of VvMYBA1 and VvMYBA2, and the anthocyanin biosynthesis capability was also enhanced [38][39][40] . Tobacco overexpressing VvMYB5b accumulated much more anthocyanidin and proanthocyanidin in flowers by increasing the expression levels of structural genes, such as F3H, CHI, and ANS 41 . A dual-pigmented sweet potato with higher content of carotenoids and anthocyanins was generated owing to the over-expression of IbMYB1a in a single orange-fleshed sweet potato cultivar 42 . The almost capabilities variation of anthocyanin biosynthesis in Citrus species were the differences of MYB transcription factor activity 43 . MdbHLH3 and MdWD40 can bind MdMYB10 to increase its activity, which further enhance the anthocyanin biosynthesis 44,45 . MdbHLH3 can also interact with the promoter of MdMYB9 and MdMYB11 to promote its transcription activity 22 . bHLH and WD40 can combine with MYB to form MYB-bHLH-WD40  www.nature.com/scientificreports www.nature.com/scientificreports/ ternary complexes to better regulate anthocyanin biosynthesis 46,47 . The MYB involved in anthocyanin regulation in Arabidopsis has been well studied, and PAP1, PAP2, AtMYB113, and AtMYB114 play critical roles during the anthocyanin synthesis. The phylogenetic analysis indicated that PtrMYB179 was clustered into the same group with PAP1 and PAP2 (Fig. S3). In addition, PtrMYB179 in the present study is the same with PtrMYB117 in the study conducted by Cho, and over-expression of PtrMYB119 was reported to enhance anthocyanin accumulation and produce the colored-leaf poplar 23 . Therefore, 'MYB-related protein Myb4' , 'PtrMYB179' and 'transcription factor GAMYB' might play critical roles in the synthesis of anthocyanin in CHP.
Except for the effects of expression levels of MYB family genes on anthocyanin synthesis, the mutations of MYB family genes also lead to the abnormal anthocyanin synthesis. In wheat, there is a 19 bp deletion in the coding sequence of TaMYB10-B, which contributed to the white grain 48 . In wild strawberry, one single nucleotide mutation  www.nature.com/scientificreports www.nature.com/scientificreports/ in FvMYB10 was reported to cause yellow fruit 49 . MrMYB1 is involved in anthocyanin accumulation in the Chinese bayberry fruit, and a nonsense mutation in the MYB1 protein is responsible for no or low expression of MYB1 in the white and red fruit 50 . The deletion of two exons of the gene coding for DFR determined the color difference between yellow and red onions, and active red and inactive yellow DFR alleles have been designated as DFR-AR and DFR-ATRN, respectively 51 . There was a high level of the Riant protein in red flowers of peach, and barely detectable levels of the Riant protein in variegated flowers; this was attributed to small insertions and deletions (indels) in the last exon, leading to a frameshift mutation 2 . In the present study, the expression level of three MYB family genes, two bHLH family genes, and one WD40 family gene was much higher in CHP leaves than in L2025 leaves.
The present study investigated the physiological mechanism of leaf coloration in poplar. The candidate genes associated with anthocyanin biosynthesis were also identified. The genes can be used as target genes to acquire the desired color poplar through the Crispr/Cas 9 system, RNA interference or overexpressing system, and obtain more genes which can change the color of plant leaves. Taken together, our findings provide important information for the designing of molecular markers associated with color traits, and have considerable potential to assist in the breeding of colored-leaf poplar and other woody plants in the future.

Materials and Methods
plant materials. Two Populus deltoids cultivars, CHP with bright red leaf and wild-type L2025 with green leaf were cultivated at the experimental field of Nanjing botanical garden Mem. Sun Yat-Sen (32°3′N, 118°49′E) in April 2017. Both CHP and L2025 were cutting propagation. To explore the mechanism of leaf coloration in poplar, the leaves were collected from CHP and L2025 trees on June 5, 2017 to determine the physiological index,  www.nature.com/scientificreports www.nature.com/scientificreports/ such as the content of anthocyanins and chlorophyll, and the changes of molecular aspects, such as genomic polymorphic SNPs and the expression levels of candidate genes. Five seedlings of CHP and L2025 were selected, and five leaves per plant were collected. The leaves were stored at −80 °C after putting them into liquid nitrogen for fast frozen to conduct the subsequent analysis.
The difference of growth status between CHP and L2025. The heights and stem diameters (about 100 cm above the soil) were determined after their defoliation (15 December, 2017). Duncan's multiple range test were conducted for the statistical analyses in SPSS 17.0 (SPSS Inc., Chicago, IL). the determination of chlorophyll, carotenoid and anthocyanin in the leaves of cHp and L2025. To determine the chlorophyll content of leaves in CHP and L2025, about 0.1 g leaves was grinded, and the powers were put into 95% ethanol (5 mL) for 2 h at 50 °C. The mixture was centrifuged at 5,000 g for 5 min, after that the supernatant was obtained, which was used to measure the absorbance with a spectrophotometer at 470, 649, and 665 nm. The chlorophyll content was computed according to the method described by Lichtenthaler and Wellburn 52 . To determine the total anthocyanin content of leaves in CHP and L2025, about 1.0 g of the fresh leaves was immersed into 1% (v/v) HCl ethanol (10 mL) at 60 °C for 30 min. The mixture was centrifuged at 13,000 g for 5 min, after that the supernatant was obtained, which was used to measure the absorbance with a spectrophotometer at 530, 620 and 650 nm. The content of total anthocyanin was measured based on the method described by Yue 53 .
Construction of DNA library and massively parallel sequencing. Genomic DNA was extracted from the leaves of CHP and L2025 with the modified method from Kim 54 . After that, the whole genome resequencing was conducted by Genepioneer Biotechnologies (Nanjing, China) on an Illumina HiSeq4000 ™ . The standard Illumina protocol was used to construct DNA library and subsequent sequencing. Paired-end sequencing libraries was purified using the QIA quick PCR kit, which has an insertion of about 200 to 400 bp. The clusters were generated after end repair with poly-A on the 3´ ends and the ligated adaptors. The fragments were selected based on the agarose gel electrophoresis, and amplified by the PCR. The fragments with 2 × 150 bp paired-end were sequenced on an Illumina HiSeq4000 platform. After that the data was demultiplexed to obtain the raw reads, which were evaluated for quality using FastQC, and then filtered using in-house made Perl script before alignment to the poplar reference genomes (Phytozome, http://phytozome.jgi.doe.gov/pz/portal.html). As Populus trichocarpa genome has been well sequenced and annotated, it was used as reference genomes. To obtain better results in our study, firstly, the indexed adapter sequences were removed. Secondly, low quality reads with one end of a paired-end read >5% 'N' bases or Phred quality score Q ≤ 30 were filtered. the relative expression levels of genes involved in pigments formation. Total RNA was extracted from finely ground leaves by the method described by Li 57 , the concentration of which was measured by BioPhotometer (Eppendorf). To explore the relative expression levels of genes involved in pigments formation, qRT-PCR was performed on Applied Biosystems 7500. The synthesis of cDNA was conducted with reverse transcription system (Promega). Gene-specific primers were designed based on the sequence of the target genes (Table S8), and ACTIN2 gene was selected as housekeeping gene 23 . The relative expression level of genes was evaluated by 2 −ΔΔCt method 58 , and analyzed by SPSS 17.0 with three biological replicates.
conclusions Compared with leaves of L2025, leaves of CHP had a higher content of anthocyanins and a lower content of total Chl and carotenoids, and the ratio of anthocyanins to total Chl in CHP was more than 10 times of those in L2025, which might be the physiological reason why CHP had red leaf color. Three structural genes (flavonol synthase 1 family protein, UDP-glucose flavonoid 3-O-glucosyltransferase 3′ and flavonoid 3-O-galactosyl transferase family protein) and six transcription factors (MYB-related protein Myb4, ttranscription factor GAMYB, PtrMYB179, transcription factor bHLH53, transcription factor bHLH3, VARICOSE family protein) might be involved in the anthocyanin synthesis pathway. The above results could provide reference to explore the molecular regulation mechanism of leaf coloration in Populus deltoids, and also provided insight into coloration of other woody plants.