Genome-wide identification of membrane-bound fatty acid desaturase genes in Gossypium hirsutum and their expressions during abiotic stress

Membrane-bound fatty acid desaturases (FADs) are of great importance and play multiple roles in plant growth and development. In the present study, 39 full-length FAD genes, based on database searches, were identified in tetraploid upland cotton (Gossypium hirsutum L.) and were phylogenetically clustered into four subfamilies. Genomic localization revealed that 34 genes were mapped on 22 chromosomes, and five genes were positioned on the scaffold sequences. The FAD genes of G. hirsutum in the same subfamily had similar gene structures. The structures of paralogous genes were considerably conserved in exons number and introns length. It was suggested that the FAD gene families in G. hirsutum might be duplicated mainly by segmental duplication. Moreover, the FAD genes were differentially expressed in different G. hirsutum tissues in response to different levels of salt and cold stresses, as determined by qRT-PCR analysis. The identification and functional analysis of FAD genes in G. hirsutum may provide more candidate genes for genetic modification.

To date, several membrane-bound FAD genes from different plant species have been characterized. It is evident by the previous studies that FAD genes are expressed in adverse environmental conditions. For example, in tomato, over-expression of LeFAD3 gene enhanced the tolerance of tomato seedlings against salt stress 17 , whereas low-expression of LeFAD7 increased tolerance of tomato leaves to high-temperature (45 °C) 18 . Similarly, over-expression of either FAD3 or FAD8 increased tolerance of tobacco plants to drought 19 , and over-expression of FAD7 alleviated the damage of cold stress in tobacco plants 20 . Moreover, over-expression of omega-3 desaturases improved tolerance of various transgenic plants to both drought and chilling stresses [20][21][22][23][24] . In addition, FAD2 and FAD6 were induced in seedlings of Arabidopsis under salinity stress 25,26 . In general, the FAD genes regulate the levels of fatty acid unsaturation in membrane lipids thus improving plant tolerance to adverse environments 8 .
Cotton is an important commercial crop grown worldwide to provide generous supply of natural fiber for textile industry and to provide cottonseeds for food, feed, and bio-fuel productions 4 . It is cultivated mainly in the tropical and subtropical regions. Temperatures below 15 °C can affect the plant growth and development causing yield losses mainly due to poor germination and high seedling mortality. Gossypium hirsutum L., an allotetraploid species, is one of the most commonly grown cotton species for commercial production 4 . Originally, it was developed from two ancient diploid cotton species, G. raimondii and G. arboreum, by hybridization and chromosome doubling about 1-2 MYA 3 . It seems to be an interesting model system not only for the research of genome evolution, but also for studying the roles of polyploid in crop development and domestication with high research value. The fatty acid families from G. raimondii and G. arboreum have already been identified and characterized 13 . In G. hirsutum, Δ 12 desaturase (FAD2) and Δ 15 desaturase (FAD3 and FAD7/8) have been studied, mainly focused on cold adaptation at seedling stages [1][2][3]7 . Furthermore, the FAD2-1 was found to be a seed-specific desaturase responsible for the synthesis of polyunsaturated fatty acids in the cottonseeds 27 , while the FAD2-2 had low expression levels throughout the seed development 2 .
In this study, all the FAD genes in G. hirsutum were identified and characterized, and the phylogenetic analysis and structural diversification, as well as the expression profiles of the detected GhFAD genes in response to salt and cold stress regimes across different plant tissues were conducted. This research work may contribute to widen our understanding about the structure and function of FAD gene family in G. hirsutum, which may provide some candidate genes for predictable modification of fatty acid profiles in order to improve the plant vigor and seed nutritional value for cotton breeders 1 .

Results
Identification of FAD genes in G. hirsutum. Based on the completed genome sequences of G. hirsutum, a genome-wide search for FAD genes was performed by BlastP and tBlastN program, using Arabidopsis and rice FAD genes as the query sequences. Subsequently, all pre-identified proteins were subjected to domain analysis using Pfam and SMART databases for further confirmation, in the presence of FAD protein domain (PF00487). Eventually, a total of 39 FAD genes in G. hirsutum, based on their orthologs with reported counterparts in Arabidopsis 28-30 , were predicted and annotated ( Table 1).
The protein sequences, encoded by these 39 FAD genes in G. hirsutum, were detected. They ranged in length from 205 amino acids of GhFAD2.5A to 475 amino acids of GhFAD7A and GhFAD2.3D, with an average of 388 amino acids approximately. Similarly, the predicted molecular weight (Mw) of these deduced proteins varied from 23.76 kDa to 54.27 kDa, and the theoretical isoelectric point (pI) varied from 6.80 to 9.61. Phylogenetic relationship among the FAD genes. To evaluate the evolutionary relationships among the FAD gene members, all the genes from G. hirsutum (39), Arabidopsis (17), and rice (10) were aligned separately by Neighboring-Joining method to generate an un-rooted phylogenetic tree (Fig. 1). For confirmation, a phylogenetic tree was also constructed by Maximum Evolution method and the results were found almost consistent with Neighbor-Joining method, showing a little differences at some branches ( Supplementary Fig. S1). Further, the FAD genes in these three species were divided into four well-defined monophyletic groups, suggesting the existence of a common ancestor before the divergence of monocots and dicots. The First desaturase group encoded by ADS genes was composed of 11 ADS genes, nine from Arabidopsis and two from G. hirsutum. No ADS genes were found in the rice, suggesting some evolutionary role in its genome. The ADS genes were either lost in the rice or were acquired by the dicotyledons after divergence from the last common ancestor. In the Omega desaturase group, there were five genes from Arabidopsis, eight genes from rice, and 22 genes from G. hirsutum. There were 14 SLD genes in the Front-end desaturase group which included 11 gens from G. hirsutum, two from Arabidopsis, and one from rice. The Sphingolipid desaturase subfamily contained only six DSD genes, four DSD genes from G. hirsutum, only one from Arabidopsis and rice each. Interestingly, the G. hirsutum possessed more gene members than Arabidopsis and rice in these three subfamilies except First desaturase in which most of the gene members belonged to Arabidopsis.
Though several pairs of paralogous genes in the terminal branches of the G. hirsutum phylogenetic tree were distributed in each subfamily, several other FAD genes were not clustered together with the FAD genes from Arabidopsis and rice. This suggested that these FAD genes might have merged in G. hirsutum genome after diverging from the last common ancestor or have been lost in Arabidopsis and rice. Gene numbers in First and Sphingolipid desaturase groups were lower than others, which harbored two and four GhFAD genes, respectively. Taken together, the presence and the absence of species-specific FAD genes were due to the functional divergence. The FAD genes in G. hirsutum might have undergone rapid expansion during the course of evolution and exhibited a tendency to cluster into the same subgroup due to the conserved functions. Genomic localization of FAD genes. Genomic localization of FAD genes in G. hirsutum notably revealed that 34 genes were mapped on the 22 chromosomes, except for chromosome A02, A03, D02, and D03 (Fig. 2). The rest five genes, GhFAD7D, GhFAD2.4A, GhFAD2.5A, GhADS5A, and GhSLD5 were positioned on the scaffold sequences. As shown in Fig. 2, chromosome A01, A04, A05, A06, A08, A09, A12, D04, D06, D08, D09, D10, and D12 harbored only a single FAD gene each, while two FAD genes were present on each chromosome of A07, A10, A11, D05, D07, and D11. Moreover, there were three FAD genes on chromosome A13, D01, and D13 each.
Duplication of FAD genes. To date, the mechanism of FAD genes expansion in G. hirsutum remains unclear. We deliberately investigated the relationship between genetic divergence and gene duplication within the FAD gene family of allotetraploid cotton. Total 13 duplicated gene pairs were found in this study, i.e. GhFAD8   Structural organization of FAD genes. In the current experiment, 39 FAD genes were clustered into four subfamilies (Fig. 3). Generally, the FAD genes in the same subfamily had strikingly similar exon/intron structure than other subfamilies. For instance, in the First desaturase subfamily, GhADS5D and GhADS5A harbored five exons with highly conserved length and four introns in closed size. In contrast, all the genes in the Sphingolipid desaturase subfamily possessed only two same-sized exons. Generally, the genes of the Front-end desaturase subfamily possessed a single exon generally, except GhSLD5 which had two exons and one shorter intron. However, there were three exons in GhSLD1A and GhSLD1D. The similar exon-intron structure agreed well with their close phylogenetic relationship. In contrast to these three relatively conserved subfamilies, the FAD genes in the Omega desaturase subfamily showed a complex distribution of exons and introns. For example, the genes of FAD3, FAD7, and FAD8 sections mostly had up to eight exons of relatively conserved length, except for GhFAD7A which possessed seven exons. While all the FAD genes in FAD2 section had a single exon, except for GhFAD2.1D, which harbored three exons. In addition, GhFAD6D and GhFAD6A were composed of ten exons. Among the paralog pairs, uniformity was observed in the gene structures. Since G.hirsutum is an allotetraploid plant, the essential integrated evolutionary relationship was done by the sequences of the diploid G. raimondii and G. arboreum progenitors ( Supplementary Fig. S2). The detailed comparison indicated that the exon-intron structures were identical to the result of phylogenetic relationships and were highly conserved in each subfamily, though parts of the orthologous genes demonstrated certain differences.

Discussion
G. hirsutum is a natural allotetraploid produced by the interspecific hybridization of A-and D-genome diploid progenitor species. Based on the genome scans of several plant genomes, the FAD gene family has been systematically investigated in Arabidopsis (17), soybean (10) 31 , G. raimondii (19), and G. arboreum (20) 13 . In current work, a total of 39 putative FAD genes were identified in the genomes of G. hirsutum, which contained more FAD genes than other plant species such as Arabidopsis and rice. It might be mainly due to the recent polyploidy and segmental duplication events in G. hirsutum evolutionary history. Coincidentally, the number of FAD genes in G. hirsutum (AD 1 allotetraploid) was exactly the sum of those in G. raimondii (19) and G. arboretum (20). Moreover, all these FADs were distinctly classified into four groups as Omega desaturase, First desaturase, Sphingolipid desaturase, and Front-end desaturase consistently. In three subfamilies, Front-end desaturase, Omega desaturase, and Sphingolipid desaturase, G. hirsutum had much more FAD genes than Arabidopsis and rice, which implied that these three subfamilies arose before the divergence of monocots-dicots and might have undergone species specific expansion process. Furthermore, the sizes of the GhFAD proteins varied markedly, as well as the predicted isoelectric points, suggesting that different GhFAD proteins might have functions in different microenvironments.
Gene duplication plays an irreplaceable role in the process of gene family expansion in the genomes. In G. hirsutum, 13 segmental duplicated gene pairs were found, such as five gene pairs in the Front-end desaturase subfamily, two gene pairs in the Sphingolipid desaturase subfamily, and six gene pairs in the Omega desaturase subfamily. These results suggested that the expansion of Sphingolipid desaturase, Front-end desaturase, and Omega desaturase subfamilies were due to the segmental duplication, which was consistent with the previous report 13 , namely the increasing size of FAD genes might be contributed by the segmental duplication events. It was noteworthy that paralogous FAD gene pairs demonstrated very similar exon/intron distribution patterns in terms of exon number and intron length. For instance, the FAD genes in the First desaturase subfamily and the Sphingolipid desaturase subfamily of three cotton species were highly conserved and clustered distinctly. However, their expression patterns were greatly divergent. It might have involved in the mechanism of adaptation to different environmental conditions. In the current study, different expression patterns of FAD genes were observed in each tissue. GhADS5A, GhFAD3.1A, and GhSLD3A were expressed markedly at high levels in the leaves but lower in the other tissues, while GhFAD6D was constitutively expressed at high levels in all the tested tissues. The tissue specific FAD gene expression patterns observed in the present study indicated their functional divergence during plant development and growth, which needs further study to identify their actual functions. In addition, most of the FAD2 subfamily such as GhFAD2.3A, GhFAD2.1D, GhFAD2.4A, GhFAD2.4D, GhFAD2.1A, and GhFAD2.2D showed dramatically high abundance in the process of lipid accumulation. It suggested that these genes were seed-specific responsible for the polyunsaturated fatty acids in the seed oil of cultivated cotton. In addition, the genes which were highly expressed on 20 DPA recruited all types of FAD genes for the biogenesis in different positions or lengths. However, the other development stages involved only three subfamilies, which suggested that 20 DPA might be an important stage for ovule development.
Salinity, resulting mainly due to NaCl, is one of the common environmental stresses that afflicts the growth and yield of crops in many regions all over the world 6,26,[32][33][34] . In previous studies, FAD2 and FAD6 were revealed to be required for salt tolerance in Arabidopsis 26,32 , while LeFAD3 up-regulation could alleviate salt stress in early seedling stage 23 . In addition, antisense AtFAD7 expression resulted in lower salt tolerance in transgenic tobacco plants 6 . In this study, low expression level of GhFAD2.4D was found in all tissues treated by salt stress. GhFAD8.1D and GhADS5D genes were significantly up-regulated in the leaves in response to salt stress, which suggested that these FAD genes might adjust to appropriate levels to protect the cell membrane of cotton plant from salt stress. Low temperature is another serious stress, which significantly affects plant growth and development. The FAD2-3 and FAD2-4 genes from G. hirsutum were found to participate in the membrane adaptation to cold stress and induced significantly higher in a previous study 7 . Similarly, the over-expression of FAD8 could also alleviate the damage of cold stress 35 . In this study, GhFAD8.1A was highly expressed implying its potential role in the plant tolerance to low temperature stress, which is consistent with the previous findings 3, 35,36 .

Conclusion
In the current work, 39 full-length FAD genes, classified into four well-documented groups in conserved exon/ intron structures, have been identified and functionally validated in G. hirsutum. Most GhFAD2 subfamily members showed high abundance in later developmental stages, it might have contributed to the accumulation of oil contents in seeds and can be used as target genes to improve cottonseed quality. qRT-PCR analysis also revealed the corresponding expressions of GhFAD genes under different levels of salt and cold stresses. GhFAD8.1D and GhADS5D exhibited specifically high transcript accumulation in the tested tissues exposed to different levels of cold stress, which suggested that they might be required for salt tolerance. Meanwhile, GhFAD8.1A was likely to enhance the plant tolerance to cold stress according to the significant up-regulation after long periods of treatment. The comprehensive investigation carried out in this study may improve our understanding about the involvement of FAD genes to tolerate adverse environments, and can also provide the candidate genes for functional studies.  Table S1) 31,37 were retrieved from The Arabidopsis Information Resource (TAIR release 10, http://www.arabidopsis.org) and the Rice Genome Annotation Project Database (RGAP release 10, http://riceplantbiology.msu.edu/index.shtml), respectively. BlastP and tBlastN programs were applied to search for the initial identification of the candidate FAD genes of G. hirsutum in default parameters, with the queries of Arabidopsis and rice FAD protein sequences. Then, the Pfam (http://pfam.sanger.ac.uk/search) 38 and SMART databases (http://smart.embl-heidelberg.de/) 39 were used to confirm every putative FADs of G. hirsutum with the domain of Pfam FAD gene family (PF00487). Furthermore, the theoretical molecular weight (Mw) and isoelectric point (pI) of the full-length proteins were predicted by using pI/Mw tool (http://web.expasy.org/protparam/) 40 .
Phylogenetic construction. Clustal X version 2.0 41 was used to carry out the multiple sequence alignment of full length protein sequences from three species, G. hirsutum, Arabidopsis, and rice. As soon as a MAGA file was generated, MEGA 5.2 42 was further applied to construct an unrooted Neighbor-Joining phylogenetic tree with pairwise deletion option and Possion correction model. Bootstrap test was carried out with 1000 replicates for the reliability of interior branches. To confirm the consistency with the Neighbor-Joining method, the phylogenetic tree was also performed with Maximum Evolution method.
Chromosomal locations, gene duplications, and structural analysis. The genes were mapped on G. hirsutum chromosomes using the Circos tool 43 . Gene duplication of FAD genes in G. hirsutum was indicated by (1) shared aligned sequence covering > 80% of the longer gene; (2) similarity of the aligned regions > 80%; and (3) only one duplicated event was counted for tightly linked genes 44,45 . Referring to the different chromosomal locations (Supplementary Table S2), these FAD genes were designated as either tandem duplications or segmental duplications.
Comparing the predicted coding sequences (CDSs) with their corresponding genomic sequences, the exon/ intron organization of the individual GhFAD genes were illustrated using the Gene Structure Display Server (GSDS, http://gsds1.cbi.pku.edu.cn/). Estimating Ka/Ks ratios for duplicated gene pairs. Initially, all the full-length gene sequences of the duplicated FAD gene pairs of G. hirsutum were aligned by Clustal X 2.0 program 41 and then the non-synonymous (Ka) and synonymous substitution rates (Ks) were calculated using DnaSp V5.0 46 software. Finally, the selection pressure of each gene pair was assessed based on the Ka/Ks ratio.
Plant materials and abiotic stress treatment. TM-1, a genetic standard line of G. hirsutum was grown in a temperature-controlled chamber adjusted to 28 °C temperature, 70% relative humidity, and 16/8 hours photoperiod. After five days, half of the germinated cottonseeds were transplanted to black plastic buckets containing MS medium (5 liters). After adaption for ten days in the hydroponics, the MS solutions were adjusted to different salt treatments as 0 (control), 0.3% (slight stress), 0.6% (moderate stress), and 0.9% (severe stress). Three biological replicates were carried out for each treatment. After two weeks of salt treatment, the roots, stems, cotyledons, and leaves were sampled, frozen immediately in the liquid nitrogen, and stored at − 80 °C for further analyses.
For cold stress, 30 days old seedlings were transferred to a temperature-controlled chamber adjusted to 10 °C. The plants were harvested at 0, 6, 12, 18, 24, 36, and 48 hours cold intervals to conduct expression analysis. Three biological replicates of each sample were also implemented. All collected samples were immediately frozen in the liquid nitrogen and then stored at − 80 °C.
In addition, the flowers of TM-1, planted in the fields in Agricultural station of Zhejiang University, Hangzhou, in 2015, were labeled at 0 DPA. The cotton embryos were harvested at 10, 20, 30, and 40 DPA for analysis of gene expression patterns during the seed development. All collected samples were immediately frozen in the liquid nitrogen and stored at − 80 °C.

RNA isolation and qRT-PCR. Total RNA of all collected samples was extracted using the EASYspin Plus
Total RNA Extraction Kit (Aidlab, Beijing, China) according to the manufacturer protocol. The NanoDrop 2000 Spectrophotometer was used to determine the quantity and quality of RNAs. Subsequently, using the PrimeScript TM 1st Strand cDNA Synthesis Kit (TakaRa, Dalian, China) to synthesize first-strand cDNAs. The specific primers were designed according to CDSs (Supplementary Table S3). All melting curves produced by qRT-PCR were shown in Supplementary Fig. S3.
The amplification reactions of qRT-PCR were performed with Lightcycler 96 system (Roche) using SYBR the premix Ex taq (TakaRa) in 20 μ L volume with following parameters: initializing denaturation at 95 °C for 30 seconds, following 45 cycles denaturation at 95 °C for 10 seconds; annealing at 53-54 °C for 10 seconds, and extension at 72 °C for 20 seconds. In addition, the default setting for the melting curve stag was chosen. Three biological replicates were maintained for each collected sample. The relative expression levels were calculated according to the 2 −ΔΔCt method 47 . The heatmap for expression profiles were generated with the Mev 4.0 software 48 with Pearson correction and complete linkage clustering.