Bulked segregant analysis reveals candidate genes responsible for dwarf formation in woody oilseed crop castor bean

Plant dwarfism is a desirable agronomic trait in non-timber trees, but little is known about the physiological and molecular mechanism underlying dwarfism in woody plants. Castor bean (Ricinus communis) is a typical woody oilseed crop. We performed cytological observations within xylem, phloem and cambia tissues, revealing that divergent cell growth in all tissues might play a role in the dwarf phenotype in cultivated castor bean. Based on bulked segregant analyses for a F2 population generated from the crossing of a tall and a dwarf accession, we identified two QTLs associated with plant height, covering 325 candidate genes. One of these, Rc5NG4-1 encoding a putative IAA transport protein localized in the tonoplast was functionally characterized. A non-synonymous SNP (altering the amino acid sequence from Y to C at position 218) differentiated the tall and dwarf plants and we confirmed, through heterologous yeast transformation, that the IAA uptake capacities of Rc5NG4-1Y and Rc5NG4-1C were significantly different. This study provides insights into the physiological and molecular mechanisms of dwarfing in woody non-timber economically important plants, with potential to aid in the genetic breeding of castor bean and other related crops.


Results
Phenotypic variation of plant height in castor bean. The height of cultivated castor varieties usually varies from 1.5 to 2 m. During a broad field survey and castor bean germplasm collection, we collected a dwarf germplasm (CBd) with the average plant height (PH) of 45.67 ± 23.76 cm at maturity. To dissect the potential physiological and molecular mechanism of dwarfing in castor bean, we first compared CBd and a common castor variety (ZB159), called CBt here, with the average height of 236.5 ± 19.68 cm, at maturity (Fig. 1A). We measured several important traits related to height, including vertical height of the secondary branching (VHSB), height of primary raceme (HPR), number of node on main stem (NN), diameter of main stem (DMS), and average length of internodes (ALI). As shown in Fig. 1B and Table S1, the values of VHSB, HPR, DMS and ALI in CBt were greater than that in CBd, but CBd has more nodes on the main stem than CBt. ALI was substantially and significantly difference (p = 1.02e-6) between CBt and CBd, therefore the length of internodes is one of the main differences underlying height difference between two varieties.  www.nature.com/scientificreports/ We conducted cytological observations of cell size, cell number and cell layers within xylem, cambia and phloem tissues from developing stems of CBt and CBd within seven internodes from the bottom of the stem ( Figure S1). Since it is sometime variable and difficult to delimit Internode 1 and Internode 7, we excluded these and examined Internode 2 to Internode 6. Xylem tissue was not developed at the younger internodes (Internodes 5 and 6). The more visible xylem tissues including the width and cell layers were observed in Internodes 2 to 4, with visible in both CBt and CBd (Fig. 1C).
The width and cell layers of xylem tissues at Internodes 2 and 3 exhibited substantial and significant differences between CBt and CBd (Fig. 1D,E), whereas these figures were very similar at Internodes 4-6, suggesting that developmental difference between CBt and CBd might start in Internode 3. Comparing the cell size and cell wall thickness of fibers and vessels in xylem tissues at Internode 3 revealed that cell size, cell wall thickness and longitudinal cell length of fibers and vessels were significantly different between the two accessions ( Fig. 1F-K). The cell size of phloem fibers was greater and the number of cell layers of phloem more numerous in CBt than in CBd at Internode 3 ( Fig. 1L,M). Similarly, the cell size and number of cell layers of cambia tissue were significant greater in CBt compared to CBd at Internode 3 ( Fig. 1N-P). These observations clearly suggest that the difference in cell size across xylem, cambia and phloem as well as difference of meristematic capacity of cambia tissue might all be important factors that cause the variation in plant height.
BSA-seq identified QTL regions associated with plant height. To identify genes potentially responsible for the variation in height in castor bean, CBd (♀) and CBt (♂) were crossed to generate F 1 hybrids. The average height of F 1 hybrids was 123.2 cm. After selfing the F 1 , we obtained an F 2 population comprising 330 individuals. Plant height varied from 34 to 286 cm in F 2 population and exhibited a normal distribution ( Fig. 2A), suggesting polygenic control of plant height in this population. We measured five traits in addition to PH (VHSB, HPR, NN, DMS and ALI) in the F 2 population. ALI (r = 0.8443) and DMS (r = 0.7762) were tightly correlated with plant height, whereas NN was not correlated with plant height (r = 0.0186) (Fig. 2B).
These observations were consistent with our above findings that greater PH is caused in part by the larger cell size in the xylem and phloem tissues, and the stronger meristematic capacity of cambia tissue within internodes.
Based on the segregation of phenotype, we selected the 28 shortest individuals (average height 71.32 ± 20.14 cm) as the dwarf bulk and the 28 tallest individuals (average height 254.54 ± 21.44 cm) as the tall bulk ( Fig. 2A) for genome sequencing to identify loci potentially associated with plant height. The sequencing generated 83,983,462 and 79,741,716 high quality clean reads for the dwarf bulk and tall bulk, respectively, a depth exceeding 32-fold of the genome (Table S2). After aligning the clean reads to the castor reference genome (http:// casto rbean. jcvi. org) and variant calling using GATK 3.4, 1,027,623 genomic SNPs were identified from the parents and two bulks. The physical positions of SNP were aligned and annotated by ANNOVAR according to our previously constructed castor chromosomes 41 . In total, 300,225 SNPs homozygous within but different between the parents were identified, and their ΔSNP index and the average ΔSNP index within each 1 Mb window (with 10 kb step size) were calculated. Based on the 99% statistical confidence intervals (permutation tests under the null hypothesis of no QTLs), two regions, QTL1 (0.94 Mb in length; 27.07-28.01 Mb on chromosome Rc06) and QTL2 (3.79 Mb in length; 14.49-18.28 Mb on chromosome Rc07), were identified as genomic regions associated with height in castor bean ( Fig. 3A-C). We chose to focus on the region on Rc06 (27.93-27.97 Mb) where the difference was greatest between the ΔSNP index (0.463) and the 99% statistical threshold (0.350) based on the recommendation of reference 42 (Fig. 3D). www.nature.com/scientificreports/ Candidate genes associated with plant height were identified. In total, 487 genes were identified in the QTL1 and QTL2 regions (Table 1). After filtering out the 78 genes without ΔSNP index (≥ 0.5) and the 84 genes with SNPs in intergenic regions, we obtained 325 genes with SNPs located within the gene or within 2000 bp of the coding region ( Figure S2A-B). The 325 genes were considered candidate genes associated with castor bean height. Gene ontology (GO) analysis showed 146 of 325 genes were functionally involved in cell component ( Figure S2C). Based on the p value of GO enrichment analysis, we found that these candidate genes were mainly enriched in lignin metabolism, secondary metabolism and phenylalanine metabolism (Figure S2D). KEGG enrichment analysis found that the majority of candidate genes were enriched in hormone signal transduction and secondary metabolism processes ( Figure S2E). www.nature.com/scientificreports/ We identified where in the gene the SNPs were located and found non-synonymous polymorphisms in 118 genes that differentiated CBd and CBt ( Figure S2B). To reduce the potentially spurious associations between SNPs and height as far as possible we applied a strict criterion with a ΔSNP index ≥ 0.6, as suggested by Takagi et al. 42 , to sort out target candidate genes. Based on properties of the amino acids polymorphisms, 29 target candidate genes with non-synonymous SNPs (ΔSNP index ≥ 0.6) were selected for further investigation, including 6 and 23 genes located at QTL1 and QTL2, respectively (Table 2).

Rc5NG4-1 was demonstrated as a candidate gene controlling castor height. To investigate
whether the SNP variants were associated with plant height among different germplasms, we determined the genotypes of five wild castor bean accessions with average height of 539.2 ± 31.76 cm, and five cultivated castor bean accessions with average height of 153.4 ± 13.16 cm (Table S3). Among the 29 candidate genes, based on their putative functional annotation directly or indirectly related to cell wall formation (tightly associated with dwarf phenotype), five (29,822.t000050, 29,846.t000008, 29,846.t000007, 29,701.t000013 and 28,345.t000007) were selected to further investigate the genotypes of non-synonymous SNP with ΔSNP index ≥ 0.6 in this extended germplasm ( Table 2). As shown in Fig. 4, we found only the non-synonymous A/G SNP at position 1133 bp in gene 29822.t000050 to be fixed between the tall and dwarf germplasm. These results strongly suggest that gene 29822.t000050 might be an important candidate associated with height. However, we noted that the ΔSNP index of 29,822.t000050 was not the highest among identified candidate genes based on the statistic analysis (see Table 2).  www.nature.com/scientificreports/ According to the annotation of 29,822.t000050, it is predicted to be an auxin induced protein 5NG4, a member of the MtN21 family 47 . As the first 5NG4-like gene to be reported in castor bean, we named 29,822.t000050 as Rc5NG4-1. The non-synonymous SNP resulted in the substitution of the amino acid 218th from tyrosine (Y) in CBt to cysteine (C) in CBd (Fig. 5A). Previous studies have found that the protein of 5NG4 members have 10 transmembrane (TM) helices that often form several transmembrane secondary loop structures for facilitating IAA transportation. The Arabidopsis orthologue AtWAT1 (At1g75500) functions as an auxin efflux transporter localized on tonoplast, facilitating IAA transportation from vacuole to cytosol 25 . We predicted the protein secondary structure of Rc5NG4-1 using TMHMM Server v. 2.0, found it had 10 TM domains and the amino acid substitution described above was located in TM helix 7, though did not result in a change to the TM helix number or number of hydrophilic loops (Fig. 5A). Inspection of the subcellular localization of Rc5NG4-1 in tobacco epidermal cells and Arabidopsis protoplasts, revealed Rc5NG4-1 was not localized in the plasma membrane ( Fig. 5B) but was instead co-localized with the tonoplast marker (Fig. 5C). Thus, we reasonably infer that the function of Rc5NG4-1 is involved in IAA transportation from vacuole to cytosol in castor bean. Further, while inspecting the expressional profiles of Rc5NG4-1 among different tissues we found Rc5NG4-1 have relatively higher expressions in root and stem, compared to that in leaf and seed ( Figure S3).
Functional confirmation of Rc5NG4-1 involved in IAA uptake. To inspect whether the non-synonymous SNP that resulted in the substitution of amino acid 218 from tyrosine (Y) in CBt to cysteine (C) in CBd gave rise to functional difference in auxin transport, we performed a heterologous expression of Rc5NG4-1 in yeast (Saccharomyces cerevisiae). First, we checked the subcellular localization of Rc5NG4-1Y and Rc5NG4-1C in transformed yeast strains using pDR196-Rc5NG4-1Y-GFP or pDR196-Rc5NG4-1C-GFP vectors. As shown in Fig. 6A, Rc5NG4-1Y and Rc5NG4-1C were localized in the membrane. Second, Rc5NG4-1Y and Rc5NG4-1C were heterologously expressed in yeast cells and expression of Rc5NG4-1Y and Rc5NG4-1C was confirmed using RT-PCR (Fig. 6B). The uptake capacity of free IAA (as 2 H-IAA in the growth medium) in transformed yeast strains was assayed. The uptake rates of 2 H-IAA were higher in yeast transformed with Rc5NG4-1Y than yeast transformed with Rc5NG4-1C from 5 to 60 min of 2 H-IAA supply (Fig. 6C). The uptake rates of 2 H-IAA in transformed Rc5NG4-1C yeast strains was almost identical to the control, suggesting Rc5NG4-1C may have lost  Content of free IAA in CBd and CBt. Content of free IAA in plant tissues is usually determined by intracellular IAA transport between organelle and cytosol, cellular influx and efflux transport, and long distance transport among tissues 23,25,48 . To test whether the differences between the tall and dwarf phenotypes related to free IAA, we measured the content of free IAA in apical buds and Internode 3 tissues in CBt and CBd. As shown in Fig. 7, the content of free IAA did not differ between CBd and CBt in the apical bud, whereas there was significantly lower free IAA in Internode 3 of CBd than in CBt (p value = 0.0013, t-test). This result suggests that the biosynthesis of free IAA in apical bud might be similar between the two accessions, but the transport capability of free IAA from the apical bud to node is weaker in CBd.  www.nature.com/scientificreports/

Discussion
As a key component of tree architecture, plant height is a critical agronomic trait in non-timber trees such as fruit trees and woody oilseed crops. Dwarfism is beneficial for reducing canopy size, allowing increased planting density and greater yields, lowering costs and increasing the orchard's lifespan of non-timber trees. Dwarfism in plants is a complex trait and relatively little is known about the physiological and genetic factors that regulate dwarfing in non-timber trees. As mentioned above, castor bean is an ideal system to dissect the potentially physiological and molecular mechanisms underlying dwarfism in woody plants because of its domestication from a tall wild perennial woody tree to the shorter annual oilseed crop 38 . Based on our phenotypic observations, dwarfism is clearly caused by shorter internodes and thinner stems, implying that stem growth of CBd might be developmentally repressed. Cytological observations revealed that cell growth in xylem and phloem tissues and the cell division of cambia tissue were repressed in CBd, resulting in smaller cells within the xylem and phloem tissues and fewer cell layer in cambia tissue compared to CBt. Similarly, in poplar, larger phloem and vessel tissues lead to greater plant height 33,49 . Since cell division in cambia tissue is an important driver of radial and longitudinal growth in woody plants 50 , we infer that the capability of cell division within cambia tissue is probably a critical factor affecting the radial and longitudinal growth in castor bean internodes.
BSA is an efficient and rapid method to detect plausible candidate genes responsible for a given trait in plants from the F 2 generation. Based on the segregation of plant height in the F 2 generation, we detected two QTLs associated with castor bean height. Within the two identified QTLs, 325 candidate genes associated with the plant height were mainly enriched in cell component, functionally involved in hormone signal transduction and secondary metabolism processes, consistent with the requirement of plant growth and development in woody plants. Based on the non-synonymous polymorphisms and a strict criterion (ΔSNP index ≥ 0.6) we sorted out 29 candidate genes that were likely participated in regulating the formation of plant height in castor bean. Within these identified candidate genes, we sorted out a candidate gene Rc5NG4-1 that was functionally involved in the regulation of cellular auxin transport. However, according to current analyses it is difficult to determine or rule out which candidate genes are the causal genes controlling the formation of plant height in castor bean.
Many studies have revealed that the GA signaling network is a critical factor that regulates plant growth, in some cases underlying dwarf phenotypes 2, 51, 52 . Indeed, the physiological and molecular basis of plant growth regulation and dwarfing are complex and varied across plant species. Our current study demonstrated that Rc5NG4-1 is located in the tonoplast, and is an auxin transporter, likely capable of transporting IAA between vacuole and cytosol. Based on expressing Rc5NG4-1 heterologously in yeast, it is clear that the uptake capacity of free IAA between Rc5NG4-1Y (tall allele) and Rc5NG4-1C (dwarf allele) is significantly different validating the different uptake and transport capacities of Rc5NG4-1Y and Rc5NG4-1C. It is likely that Rc5NG4-1 functions in a similar way to its putative orthologue AtWAT1 encoding an IAA transport protein in Arabidopsis thaliana 25 by transferring IAA between the vacuole and cytosol to maintain the free IAA content in cytosol. It appears that the Rc5NG4-1C allele, whilst being expressed, was non-functional. Using Rc5NG4-1 as a query in a BLAST search against the castor bean genome reference we found several putative paralogues (data not shown), suggesting that other copies of Rc5NG4-like genes might compensate for this loss-of-function, thus partially maintaining the free IAA content in cytosol. Due to the free IAA content in the apical bud being similar between CBt and CBd, and the difference in the internode tissues, we infer that long-distance free IAA transport capability from the apical bud to the internode might vary between two varieties. Since auxin affects cell division in cambia tissue and cell growth in xylem and phloem tissues 53, 54 , we reasonably infer that our observed reduction in IAA in the internode tissues in CBd might be an important physiological difference resulting in the dwarfing of castor bean. Based on the sequence similarity we found the orthologue AtWAT1 of Rc5NG4 in Arabidopsis. The function of AtWAT1 has been verified to control plant height by affecting cellular auxin transport in Arabidopsis 25,27 . Our results indicated that Rc5NG4 is likely a causal gene that gives rise in phenotypic variation of plant height in woody castor bean. However, such study is very limited in other plants. Whether or how the mutant Rc5NG4-1C directly influences cellular influx and efflux of IAA and the long distance transportation of free IAA remain unknown.

Conclusions
In this study, we first found divergent cell growth in xylem and phloem tissues and divergent cell division in cambia tissues, suggesting the potential physiological basis of dwarfing in castor bean. Two QTLs associated with plant height which cover 325 candidate genes were identified using bulked segregant analysis sequencing from an F2 population derived from a tall castor variety CBt and a dwarf castor variety CBd. In particular, we detected a gene Rc5NG4-1 (encoding an IAA transporter protein localized in tonoplast) as a responsible candidate for controlling dwarfism formation in woody oilseed crop castor bean. This study provided novel insights into the physiological and molecular mechanisms of dwarfing in woody non-timber economic plants, facilitating the genetic improvement to create dwarf varieties in castor bean.

Materials and methods
Plant materials and phenotype survey. The castor bean varieties CBd (dwarf) and CBt (tall) were developed by self-pollination controlled for at least three generations, provided by Zibo Academy of Agriculture Sciences, Shandong Province. The varieties CBt and CBd were genetically independent, and were applied as parents for creating the F 2  www.nature.com/scientificreports/ branching (VHSB), height of primary raceme (HPR), number of node on main stem (NN), diameter of main stem (DMS), and average length of internodes on main stem (ALI), were measured after seed ripening for parent lines and each individual of the F 2 population. Only PH was measured for F 1 hybrids. The DMS was the diameter of the middle internode of main stem. ALI was defined as the ratio of HPR to NN in main stem.
The observations of cytology. The CBt and CBd seedlings were grown in a plastic basin (the ratio of nutrient soil, soil and perlite was 3:1:1, v/v) in the greenhouse (30 °C, light/dark was 12 h/12 h) and individuals at the same maturity stage (seven internodes) were used in the anatomical survey, transmission electron microscopy and maceration of fibers and vessels within stem. The stem from CBt and CBd seedlings were split into 1 cm sections manually using a razor blade, immersed in 30% glycerin, infiltrated in a vacuum environment (0.08 MPa) for 30 min, and sustained in 30% glycerin. The 1 cm samples were then cut into blocks (2 mm × 2 mm) and fixed into the chuck with refrigerant. Then, sections of 15-25 μm thickness were sliced continuously using freezing microtome (Leica, CM3050S, Germany), transferred onto microslides and rinsed until the refrigerant was removed completely. After this, the sections were stained with 1% saffron (1 g saffron dissolved in 100 ml 50% ethanol, m/v) for one minute, and the residual dye was removed using 50% ethanol. After rehydration with sterile water, the sections were inspected with a fluorescence microscope (Leica Microsystems, DM5508, Germany) under white light or ultraviolet light to scan the xylem tissue and the phloem fibers 55 . Images were captured using a Leica camera (Leica Microsystems, DFC450C, Germany), and cell area, cell layer and cell wall thickness were measured with Image J software (version 1.46). All cells from each image were measured, and the values from at least five images of at least three plants were measured.
Transmission electron microscopy was used for inspecting cell thickness of xylem in Internode 3. The xylem tissue was shaped to 1 mm × 1 mm blocks, fixed in 2.5% glutaric dialdehyde for at least 24 h and immersed in 1% osmium acid for 12 h under 4 °C. After dehydration in an acetone series (final concentration of 90% acetone) and embedment in resin spur, the 40 nm-thick sections were sliced by microtome (Leica-R, Germany). The sections were dried at room temperature and stained using uranylacetate and lead citrate. The stained sections were inspected under an electron microscope (JEOL Ltd, Tokyo, Japan), and images captured by an Olympus-SIS Megaview camera (Olympus Soft Imaging Solutions GmbH, Münster, Germany). Measurement of the cell wall followed the above method.
The xylem in the Internode 3 of seedlings was macerated to separate fibers and vessels for measuring length. The maceration method was similar to that of Biswal et al. 33 . The tissue was macerated for 10 h at 90 °C with buffer (50% acetate acid and 3% hydrogen peroxide) in 2 mL polypropylene tubes. After washing with sterile water, the fibers and vessels were observed by microscope (Leica Microsystems, DM5508, Germany). Image capturing and trait measurements were followed above protocols.
DNA extraction and BSA analysis. DNA were isolated from young leaves of single plants in parents, the 28 tallest plants and the 28 most dwarf plants using a plant genomic DNA extraction Kit (TIANGEN, Beijing, China). The tall bulk and dwarf bulk were constructed by mixing equivalent DNA amounts from each of 28 tall plants and each of 28 dwarf plants, respectively. Pair-end sequencing libraries were generated for the two parents, the tall bulk and the dwarf bulk and sequenced using Illumina HiSeq™ 2500 (Illumina Inc, San Diego, CA, United States). High quality clean 150 bp reads (HQ clean reads) were obtained by filtering out reads with adapters, with > 10% unidentified nucleotides (N) and with quality scores (Q) ≤ 20 bases accounting for more than 50%. The depth for two parents and two bulks were above 32-fold (Table S2).
HQ clean reads were aligned against the reference genome (http:// casto rbean. jcvi. org) using the Burrows-Wheeler Aligner (BWA) software (settings: mem 4 -k 32 -M). After duplicates marking by picard (1.129), variants (SNPs) were identified using UnifiedGenotyper of GATK (3.4-46) and filtered using VariantFiltration (settings: -Window 4, -filter "QD < 4.0 || FS > 60.0 || MQ < 40.0", -G filter "GQ < 20") of GATK. The physical positions of variants were aligned and annotated by ANNOVAR according to the 10 chromosomes constructed by Yu et al. 41 . The SNP index of the tall bulk and dwarf bulk were calculated using homozygous and biallelic SNPs that differentiate the two parental genomes and were found at depth > 2. ΔSNP index were calculated by subtracting SNP index of dwarf bulk from SNP index of tall bulk, after filtering out SNPs with read depth < 6 and SNP index < 3 in the two bulks. The average SNP index and ΔSNP index were computed using 1 Mb sliding window with 10 kb step size, and the 95% and 99% statistical confidence intervals of ΔSNP index were obtained from 1000 permutation tests of each window under the null hypothesis of no QTLs. Windows with < 10 SNPs were ignored. Nr description, GO term and KEGG pathway were performed to candidate genes.
The apical buds and Internode 3 were sliced from seedlings of CBt and CBd (with seven internodes) at the same maturity stage using a razor. Seedlings were grown in greenhouse as above. Three independent samples of each tissue of CBt and CBd were collected and immersed in liquid nitrogen. The sample was ground into a powder using pestle and mortar. 100 mg of powder from each sample were placed in cryogenic vials.
The 2 H-IAA uptake by yeast cell and free IAA concentrations in apical buds and Internode 3 were detected and quantified with ultra-high performance liquid chromatography (Shimadzu, LC-20AD, Japan) and QQQ-triple quadrupole mass spectrometer (Shimadzu, LCMS-8040, Japan) (LC-MS) using a positive electrospray ionization (ESI) detection system. 5 ng N 5 -IAA was used as the internal standard and 1 ml ethyl acetate was added to each sample. After vortex for 10 min, the sample was centrifuged (13,000 g, 10 min) and the supernatants transferred to new 2 mL polypropylene tubes. Residual pellets were extracted again but with 0.5 mL of ethyl acetate and the supernatants were combined. Next, the supernatant was evaporated on a vacuum concentrator (Eppendorf) until dry, 0.5 mL of 70% methanol was added to resuspended the residue, samples were centrifuged and the www.nature.com/scientificreports/ supernatants transferred to glass vials, for analysis by LC-MS using multiple reaction monitoring (MRM) scan with a windows of 50 s and a target scan time of 3 s. The 2 H-IAA and free IAA concentrations were quantified as 5 ng (the weight of internal standards) multiplied the ratio of their peak area to the peak area of N 5 -IAA.
Informed consent for publication. Informed consent for publication is obtained from the participant for the identifiable image in Fig. 1A. The authors confirm that the Scientific Reports and Springer Nature Press have unrestricted copyright to publish or re-publish the figure (with the photographic portrait) for scientific and educational purposes.