Unraveling cis and trans regulatory evolution during cotton domestication

Cis and trans regulatory divergence underlies phenotypic and evolutionary diversification. Relatively little is understood about the complexity of regulatory evolution accompanying crop domestication, particularly for polyploid plants. Here, we compare the fiber transcriptomes between wild and domesticated cotton (Gossypium hirsutum) and their reciprocal F1 hybrids, revealing genome-wide (~15%) and often compensatory cis and trans regulatory changes under divergence and domestication. The high level of trans evolution (54%–64%) observed is likely enabled by genomic redundancy following polyploidy. Our results reveal that regulatory variation is significantly associated with sequence evolution, inheritance of parental expression patterns, co-expression gene network properties, and genomic loci responsible for domestication traits. With respect to regulatory evolution, the two subgenomes of allotetraploid cotton are often uncoupled. Overall, our work underscores the complexity of regulatory evolution during fiber domestication and may facilitate new approaches for improving cotton and other polyploid plants.

. Corresponding to each combinatorial pattern, expression changes between Maxxa and TX2094 in At/Dt ratio and At+Dt total expression are tabulated at 10 and 20 dpa. The color of each cell shows the magnitude of gene enrichment (blue) and depletion (red) based on residuals of Pearson's Chi-square test of independence (blue indicates a positive residual, i.e., more genes observed than expected, and red indicates fewer genes than expected).

Supplementary Note 1. Sequence variation in association with regulatory evolution.
To understand the genetic basis of cis divergence, we utilized genomic sequences for the two parental lines, Maxxa and TX2094 (SRA accession number SRR617482 and SRR3560138-3560140), to characterize genetic variants within a 2-kb promoter region upstream of the transcription start site. We asked whether cis regulatory divergence is mirrored in sequence variation in the gene promoter regions. A total of 340,642 SNPs (in 53,252 genes) and 88,137 indels (in 40,218 genes) were identified. As shown in Supplementary Figure 4, RD genes with cis divergence contain more SNPs and indels than genes without cis regulatory divergence (cisonly or cis+trans > trans-only; Duncan's multiple range test P < 0.05), and trans-only RD genes showed no significant difference from non-RD genes.
We next examined whether regulatory divergence was also associated with evolutionary rates (dN and dS) in coding regions. Because of the small values of dN (peak at 0.0011) and dS (peak at 0.0026), the calculation of dN/dS is subject to stochastic variance of ratios of small numbers, and was omitted in the following analysis. Both the distributions of dN and dS were significantly different between RD and non-RD genes (Kolmogorov-Smirnov test P < 0.01). Cis-only and trans-only genes tend to display higher substitution rate (dS = 0.0063-0.0070 and dN = 0.0028-0.0037) than those exhibiting cis+trans divergence and non-RD genes (dS = 0.0029-0.0035 and dN = 0.0019-0.0021; Duncan's multiple range test P < 0.05; Supplementary Figure 4). It was expected that cis+trans regulated genes were as conserved as non-RD genes, considering the stabilizing, antagonistic effects of co-existing cis and trans variants to preserve expression levels. However, further study is required to understand how the observed higher substitution rates of cis-only and trans-only genes relate to selection. Overall, however, the data show that RD genes with cis-only variants generally evolve faster with a higher substitution rate in both promoter and coding regions.
When comparing sequence divergence between corresponding At and Dt homoeologs, modest Pearson correlations of 0.15-0.38 resulted, with the highest correlation in dN and the lowest in promoter indels. Comparable amounts of promoter and coding sequence change under domestication were also observed, except that Dt promoters accumulated more SNPs than did At promoters (6.4 vs. 5.7 SNPs on average within 2-kb upstream of the annotated transcription start sites; Student's t-test P < 0.05). This slightly higher number of promoter SNPs is not associated with any particular type of cis and trans regulatory divergence.

Supplementary Note 2. Functional implications of key TF RD genes in Maxxa versus TX2094 GRNs.
Of the 53 RD TFs in Figure 5, 33 have more predicted TGs in Maxxa, as represented by bigger node size, than their counterparts in TX2094. For example, 2,786 and 324 TGs were predicted for node 1 (Gohir.A01G033500, response regulator RR1) in Maxxa and TX2094, respectively, which is a trans-only gene with increased expression following domestication. This regulator gene functions in the cytokinin signaling pathway and auxin biosynthesis, and is critical for root and shoot development in Arabidopsis 1,2 . In vicinity of this hub gene in Maxxa, node 6 (Gohir.D10G215300, a POWERDRESS-like MYB), 7 (Gohir.A05G132500.1, GLABRA2), 12 (Gohir.D10G033800, MIKC_MADS family protein) and 27 (Gohir.D06G209500, homeobox-1) were also predicted to have more TGs in domesticated fibers; in contrast, fewer TGs were predicted for node 45 (Gohir.D05G08900 0, TANDEM ZINC FINGER PROTEIN 9). The regulatory link in the vicinity of node 7 are of particular interest, as GLABRA2 (GL2) is a key regulator downstream of the GL1/GL3/EGL3/TTG1 core complex in the Arabidopsis trichome development pathway 3 . Previously, a cotton GL2-like homolog GhHOX3 has been found to play a central role in controlling cotton fiber elongation, which represents a basal and highly conserved component of the fiber developmental program 4 ; it remains an open question whether and how human-mediated selection has acted on the core regulators of trichome development such as GL2.