Abstract
Although much research has been conducted to characterize microsatellites and develop markers, the distribution of microsatellites remains ambiguous and the use of microsatellite markers in genomic studies and marker-assisted selection is limited. To identify microsatellites for cotton research, we mined 100,290, 83,160 and 56,937 microsatellites with frequencies of 41.2, 49.1 and 74.8 microsatellites per Mb in the recently sequenced Gossypium species: G. hirsutum, G. arboreum, and G. raimondii, respectively. The distributions of microsatellites in their genomes were non-random and were positively and negatively correlated with genes and transposable elements, respectively. Of the 77,996 developed microsatellite markers, 65,498 were physically anchored to the 26 chromosomes of G. hirsutum with an average marker density of 34 markers per Mb. We confirmed 67,880 (87%) universal and 7,705 (9.9%) new genic microsatellite markers. The polymorphism was estimated in above three species by in silico PCR and validated with 505 markers in G. hirsutum. We further predicted 8,825 polymorphic microsatellite markers within G. hirsutum acc. TM-1 and G. barbadense cv. Hai7124. In our study, genome-wide mining and characterization of microsatellites and marker development were very useful for the saturation of the allotetraploid genetic linkage map, genome evolution studies and comparative genome mapping.
Similar content being viewed by others
Introduction
Microsatellites, which consist of a variable number of tandem repeats, are known as simple sequence repeats (SSRs, defined as 1-6 bp) and are characterized by their high frequency, distribution, co-dominance, reproducibility and high polymorphism1,2. Among genetic marker systems such as restriction fragment length polymorphism (RFLP), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), sequence-related amplified polymorphism (SRAP) and target region amplification polymorphism (TRAP), microsatellites have numerous uses, including linkage map, quantitative trait loci (QTL) mapping, marker-assisted selection, genetic diversity studies and evolutionary studies3,4,5,6.
Cotton (Gossypium SPP.) is the most important fiber crop in the word and is also an important edible oil crop. The genus Gossypium has nine genome types: eight diploid genomes (A, B, C, D, E, F, G and K; 2n = 2x = 26) and one tetraploid genome (AD; 2n = 4x = 52), based on observations of chromosome pairing7,8,9. Two of the four cultivated allotetraploid species, G. hirsutum L. (AD)1 and G. barbadense L. (AD)2, account for 90% and 8% of world cotton production, respectively5. Although G. arboreum (A2) and G. raimondii (D5) are regarded as the progenitors that led to the formation of the tetraploid cotton species 1-2 million years ago (MYA), the exact donor species that no longer exist10.
Much research has been conducted to characterize microsatellites and develop markers from enriched genomic DNA11,12, expressed sequence tags13,14,15, sequences derived from bacterial artificial chromosomes16,17, assembled contigs based on transcriptomic profile18 and sequenced genome19. Current publicly available cotton simple sequence repeat (SSR) markers are described in the CottonDB (http://cottondb.org/) and Cotton Marker Database (CMD) ( http://cottonmarker.org)20. So far, 19,010 SSR markers have been described in the CottonDB. CottonGen supersedes CottonDB and the CMD with enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data21. Thanks to global efforts, high-resolution mapping in cotton has been published with segregating populations through interspecific crosses19,22,23,24,25. However, the development of microsatellites in cotton remains limited, due to the paucity of DNA polymorphisms and the lack of complete genome sequences26,27,28. Although a very-high-density whole genome marker map (WGMM) has been constructed for cotton based on the D genome, which has a total of 48,959 loci, about six times the number characterized in the most richly populated of the integrated maps published previously20,29, the pattern of microsatellite distribution and the physical position or product of publicly available cotton SSRs in G. hirsutum remained ambiguous.
Recent developments in high throughput DNA sequencing technologies provide new opportunities to expedite molecular marker development30. In this study, we conducted whole-genome microsatellite characterization and marker development in the sequenced genome of G. hirsutum31, G. raimondii32 and G. arboreum33. We characterized and compared the frequency and distribution of motif length, type and repeat number of microsatellites in the assembled genomic sequences of these three species. In addition, we analyzed the genomic distribution of microsatellites, genes and transposable elements (TEs) in the assembled genome of G. hirsutum. Furthermore, we developed microsatellite markers from assembled genomic sequences and evaluated polymorphism in different species. These markers may facilitate the advancement of many basic and applied genomic studies in cotton, including the development of high-resolution linkage maps, positional gene-cloning and breeding in different cotton species.
Results and discussion
Frequency, distribution and characterization of microsatellite length, type and number
The available 2,433 Mb, 1,694 Mb and 761 Mb genome sequences of G. hirsutum, G. arboreum, and G. raimondii, respectively, were searched for microsatellites with different types of desirable repeat motifs from mono- to hexanucleotide. A total of 100,290, 83,160 and 56,937 microsatellites were identified with an overall frequency of 41.2, 49.1 and 74.8 per Mb, or one every 24.3, 20.4 and 13.4 Kb, in above three assembled genomic sequences, respectively (Table 1). Meanwhile, a total of 50,443 and 41,119 microsatellites with perfect repeats were detected in the At and Dt, respectively. These showed an overall density of 34.2 (At, “t” indicates tetraploid) and 49.5 (Dt) per Mb, or one every 29.3 and 20.2 Kb, respectively (Table 1). The determination of the genomic distribution of 100,290 microsatellites revealed 81,898 microsatellites mapped to the 26 chromosomes of G. hirsutum with an average density of 42.33 per Mb. In our study, physically mapped microsatellites showed a higher density of markers on the Dt (38,622, 49.87/Mb) than the At (43,276, 37.30/Mb), with maximums on A05 (4,248, 46.15/Mb) and D05 (3,553, 57.37/Mb) and minimums on A06 (3,204, 31.06/Mb) and D02 (3091, 45.94/Mb) (Table S1). Our results agree with a previous study which revealed a negative correlation between genome size and microsatellite density34. Cucumber (367 Mb), wheat (3B, 1,000Mb) and maize (1,115 Mb) have microsatellite densities of 551.9 per Mb, 163 per Mb and 120 per Mb, respectively4,35,36. Other microsatellite densities of 78.5 per Mb, 189.4 per Mb and 99.8 per Mb have been identified in Brachypodium, rice and sorghum37.
The distributions of microsatellite length in the assembled genomic sequences of G. hirsutum, G. arboreum and G. raimondii showed that tri-, tetra-, penta- and hexanucleotide repeats accounted for very similar proportions, whereas mono- and dinucleotide repeats were relatively different in proportions among these genomes (Fig.1A). Among the 100,290 microsatellites obtained in G. hirsutum, the hexanucleotide repeats were most abundant (39,506) with a proportion of 39.4%, followed by tri- (22,483, 22.4%), penta- (14970, 14.9%), di- (12,445, 12.4%), tetra- (9,031, 9.0%) and mono-nucleotide (1,855, 1.8%) (Fig.1A, Table S2). This distribution pattern of microsatellite length differed to earlier reports that di-nucleotide repeats (DNR) are abundant in rice and Arabidopsis38 and tri-nucleotide repeats (TNR) are abundant in Brachypodium37, bamboo39, switchgrass40 and Foxtail Millet41. In cotton, penta-nucleotide repeats (PNR) were found to be most abundant in the G. raimondii genome42. Some of these differences may be due to variations in characterization parameters, algorithms and bioinformatics software43,44. In principle, a microsatellite can extend to any length in the absence of selection force38. Generally, shorter repeat lengths were used to define microsatellite in Brassica45, Foxtail Millet41 and G. raimondii42. The relax criteria was also used to identify microsatellite with minimum repeat lengths of 12, 6, 4, 3, 3 and 3; showing that the proportion of mono- to hexanucleotide repeats is similar in G. hirsutum, G. arboreum and G. raimondii (Fig. S1). The different results of these two criteria (minimum microsatellite length 18 vs 12) mostly due to longer reads used in genome assembling of G. raimondii such as Roche 454 data32. Among these 100,290 microsatellites in G. hirsutum, 11,008 (11.0%) were detected in the genic regions. More intergenic microsatellites from mono- to hexanucleotide were identified than that in genic regions (Fig.1B).
The distributions of microsatellite type in the assembled genomic sequences of G. hirsutum, G. arboreum, and G. raimondii are presented in Fig. S2A and Table S2. Specifically, the major motifs were rich in A, AT, AAT/AAG, AAAT, AAAAT and AAAAAT and the minor motifs were mostly riched in C/G (Fig. S2A, Table S2 and S3), which is consistent with previous reports on microsatellites from the assembled genomic sequences of G. raimondii42, Cucumis sativus4 and Brassica napus45. It was noted that the nucleotide composition characteristics of these A/T and C/G motifs corresponded well with their higher A/T than C/G content, such as 34.1% GC content in G. hirsutum31. However, AAT was the most common motif in allotetraploid G. hirsutum, whereas AT was most common in diploid G. arboreum and G. raimondii. More interestingly, we found significantly higher levels of AATCAG in At (2,590) and G. arboreum (2,646) compared to Dt (42) and G. raimondii (43), but much lower number of AACCCT motif were found in At (60) and G. arboreum (73) compared to Dt (332) and G. raimondii (421). In addition, the ACAGG repeat was only detected in At (283) and G. arboreum (224) (Table S2).
The distributions of microsatellite repeat numbers in the assembled genomic sequences of G. hirsutum, G. arboreum and G. raimondii revealed that microsatellite frequency decreased as the number of repeat units increased (Fig. S2B). Interestingly, as the motif repeat number increased, the microsatellite abundances decreased and the rate of change was slowest for dinucleotides, followed by trinucleotide repeats (Fig. 2). These finding are similar to the research on Brassica45. Although we found that the number of mono- and dinucleotide was dramatically reduced in G. hirsutum, high correlations were identified between these genomic sequences based on motif repeat numbers (Table S4).
In conclusion, almost all of the analyzed characteristics of microsatellite distribution in assembled genome of G. hirsutum and its two progenitors were highly similar, which suggests that the pattern of microsatellite distribution is conservative in Gossypium or retained after formation of allotetraploid cotton. This is understandable because allotetraploid cotton is reunited by G. arboreum and G. raimondii approximately 1-2 MYA ago12.
Genomic distribution
Based on the assembled genome of G. hirsutum, the genomic distributions of microsatellites, genes and transposable elements were investigated. We illustrated greater physical densities in distal chromosomal regions than in the central regions; in agreement with a previous study that found a greater marker density in the distal gene-rich ends of the chromosomes20. Specifically, the genomic distribution of microsatellites was positively correlated with genes and negatively correlated with TEs (Fig. 3, Table S5). For both At and Dt, the frequencies of microsatellites in the 1-Mb genomic intervals were significantly positively correlated with genes (r = 0.78 and 0.73) and negatively correlated with TEs (r = −0.34 and −0.14). These results are similar to previous reports that microsatellites are associated with gene sequences in plants34,45. It was interesting that the homoeologous chromosomes A05 and D05 exhibited the highest average frequency of microsatellites on the At and Dt, respectively.
Development of genome-wide microsatellite markers
A total of 77,996 (83.2%), 63,263 (81.46%) and 44,388 (82.8%) identified microsatellite markers were designed from the flanking sequences of 93,736, 77,661 and 53,586 microsatellites from the genomic sequences of G. hirsutum, G. arboreum and G. raimondii, respectively. All of the 77,996 developed microsatellite markers were summarized with motif length, amplification length and polymorphism in Table S6. The remaining microsatellite markers failed to generate specific amplicons mostly due to the limited number of flanking sequences from each side of identified microsatelltes. Similar observations have been reported in the mining of genome microsatellites in other crop plants37,41,46.
The most useful microsatellites are those where the chromosome location is known on the assembled chromosomes. The physical location of 65,498 of the 77,996 microsatellite markers was revealed on the 26 chromosomes of G. hirsutum, with average marker density of 33.86 markers per Mb. The average marker density of 40.64 markers/Mb on the Dt was larger than the 29.33 markers/Mb on At. The maximum density (47.07 markers/Mb) was found on D05, followed by 44.35 markers/Mb on D07 and 42.27 markers/Mb on D09 and the minimum density (24.81 markers/Mb) were found on A06 (Table 2). The physical positions of the developed markers across the 26 chromosomes can provide a high-density microsatellite map that contributes to genome-wide MAS in research of gene mapping and to comparative genome mapping involving G. hirsutum and related crop plants. Only 10,116 (13%) identified microsatellite markers were available in the previous public database (www.cottongen.org) and the remaining 67,880 (87%) were universal.
Although several studies have been conducted to develop genic microsatellite markers from Expressed Sequence Tag (ESTs) transcripts of G. arboreum14, G. raimondii47,48 and G. hirsutum48, the number of publicly available genic microsatellite markers in Gossypium was limited (http://www.cottongen.org). We have identified 10,449 (13.4%) genic microsatellite markers known as “functional markers” that have a high transferability across species. Of them, 7,705 were new genic markers by crosschecking within CottonGen and relative literatures. There were more intergenic microsatellites markers than genic markers as a result of 8.64% genic region in G. hirsutum genome, although microsatellites are preferentially associated with nonrepetitive DNA in plant genomes34. Microsatellites in coding regions can regulate gene expression or function and the mutation rate in coding sequences is lower than in noncoding sequences, therefore, the number of SSRs and polymorphisms is lower in coding regions49. The main advantage of developing genic microsatellite markers is the possibility of finding associations between functional genes and phenotypes50,51.
In silico PCR analysis
To avoid complicated errors in genotyping due to the polyploidy nature of G. hirsutum, all the developed genome-wide microsatellite markers were subjected to in silico PCR analysis based on the genome sequences of G. hirsutum, G. raimondii and G. arboreum (Table 3). As to 77,996 microsatellite markers designed in G. hirtusum, 0 (0.0%), 29,392 (37.7%), 20,911 (26.8%), 5,151 (6.6%) and 22,542 (28.9%) markers generated 0, 1, 2, 3, > 3 in silico PCR products from the de novo sequences of G. hirsutum, respectively (Table 3 and Table S6). Among all these microsatellite markers, an average of forty eight in silico PCR products were identified, because some element (TE) associated markers could generate tens of thousands of in silico products, as reported in Brassica45. We also found 62,326 (79.9%) markers generating ≤10 in silico PCR products with an average of two alleles and four alleles for 68,811 (88.2%) markers generating ≤50 in silico PCR products (Fig. S3).
Through in silico PCR analysis, a set of 20,911 (26.8%) high-quality double-locus microsatellite markers and 29,392 (37.7%) single-locus were established. In diploid species such as barley and rice, most microsatellite markers are single locus amplifying a maximum of two alleles. However, in allotetraploid G. hirsutum, most microsatellite markers are found in multiple loci and amplify multiple alleles from homoeologous loci. This limits their application in genetic and breeding studies52. Microsatellite markers that generate one in silico PCR product are likely to be more useful, as is the case of single locus SSR markers developed by practical PCR amplification in inbred lines53 and genome-wide development45.
Furthermore, as to 77,996 microsatellite markers designed in G. hirtusum, 32,868 (42.1%), 30,588 (39.2%), 2,876 (3.7%), 1,391 (1.8%) and 10,273 (13.2%) markers generated 0, 1, 2, 3 and > 3 in silico PCR product from the genomic sequences of G. raimondii, respectively and 25,026 (32.1%), 33,406 (42.8%), 3,388 (4.3%), 1,501 (1.9%) and 14,675 (18.8%) markers generated 0, 1, 2, 3, and>3 in silico PCR product from the genomic sequences of G. arboreum, respectively.
Application and experimental evaluation of microsatellite markers
A total of 511 from 77,996 microsatellite markers were selected to evaluate the amplification (Table S7). In our result, 505 (98.8%) of the 511 microsatellite markers could produce clear and reproducible amplification products with 996 polymorphic alleles (Table S7). Among these 505 microsatellite markers, 266 (52.7%) amplified the corresponding polymorphic alleles with in silico PCR products. Furthermore, 131 (29.4%) amplified more polymorphic alleles than in silico PCR products, which may be caused by the high proportion of TEs in the Gossypium genome. Therefore, these genomic microsatellite markers would be of enormous of use for various genotyping applications.
To generate microsatellite markers with the potential to enhance the genetic map, we tested the polymorphisms of 77,996 developed microsatellite markers in TM-1 and Hai7124 using re-sequencing data of Hai7124 (111.8 GB, 51,526 SSRs). Approximately 9,001 (11.5%) of these microsatellite markers were polymorphic. Of the 9,001 microsatellite markers, 8,825 were mapped to 26 chromosomes, including 4,446 on the At and 4,379 on the Dt. The largest proportions were found on A12 and D02 (Table S8). Thirty microsatellite markers were randomly selected to amplify polymorphisms between TM-1 and Hai7124, successfully validating the accuracy of developed markers (Fig. S4). The first comprehensive SSR maps were reported in studies using 138 BC1 plants derived from an interspecific cross of (G. hirsutum acc.TM-1 × G. barbadense cv. Hai 7124) × TM-148. In order to construct a high-density genetic map of cotton in our laboratory, we have tried EST-SSRs, SNPs and InDels when no sufficient markers in cotton were available54. Thus, these 8,825 polymorphic microsatellite markers are a useful resource for enhancing the genetic map and improving molecular marker assisted selection breeding.
In the present study, we conducted a genome-wide analysis to identify 100,290 microsatellites in G. hirsutum and developed 77,996 microsatellite markers. Among these markers, 67,880 (87%) were universal and 7,705 were new genic microsatellite markers. These genome-wide microsatellite markers were useful in genotyping applications such as germplasm characterization and high-density microsatellite marker linkage map construction. Importantly, the physical positions of the universal microsatellite markers on 26 chromosomes and identification of polymorphic marker can provide a high-density microsatellite map that contributes to genome-wide microsatellite marker selection in research areas such as gene fine mapping, MAS breeding and comparative genome mapping involving G. hirsutum and related crop plants.
Experimental procedures
Sources of genomic sequences
The high-quality genome sequence of the genetic standard line of Upland cotton, TM-1 (PRJNA248163), is available in http://mascotton.njau.edu.cn. The genomic sequence of G. raimondii32 was downloaded from http://phytozome.netandthegenomicsequenceofG. arboreum33 was downloaded from http://cgp.genomics.org.cn/page/species/index.jsp.
Identification of microsatellites
Genome sequences were searched for perfect microsatellites using PERL5 script MIcroSAtellite (MISA, http://pgrc.ipk-gatersleben.de/misa/) with basic motifs from mono- to hexanucleotide55. Repeats with a minimum of 18, 9, 6, 5, 4 and 3 were defined for the mono- to hexanucleotide, respectively. Compound microsatellites were defined as ≥2 repeats interrupted by ≤100 bp, as previous report48.
Statistical analysis
Each chromosome was divided into 1-Mb for statistical analysis of microsatellites, genes and TEs for the represent practical frequencies and average frequencies. Then, Excel statistical function CHISQ.TEST was used to calculate the significance level (EquationSource math mrow mtable mtr mtd mrow msubsup miP mix mn2 mtd mrow msub mrow mrow mtexttest ) of these two frequencies of microsatellites as well as genes and TEs in 26 chromosomes.
Design of SSR primers
Primer pairs were designed from the flanking sequences of identified microsatellites using PRIMER3 software56 and two perl scripts, p3_in.pl and p3_out.pl served as interface modules between MISA and Primer3 with the primer designing parameters: 18–27 bp in length, 57–63 °C in melting temperature, 30–70% in GC content and 100–280 bp in product size. These two perl scripts were downloaded from MISA ( http://pgrc.ipk-gatersleben.de/misa/). Primer3 was downloaded from http://www-genome.wi.mit.edu/genome_software/other/primer3.html. The p3_in.pl was used to create a primer3 input file which was submitted to Primer3. Then p3_out.pl was used to calculated and merge all information together.
In silico analysis of microsatellite polymorphisms
The primer-pair sequences of previously developed publicly available Gossypium SSR markers were downloaded from the CottonGen website (www.cottongen.org). In our study, all microsatellite markers were aligned to genomic sequences of G. hirsutum, G. arboreum and G. raimondii using an in silico PCR strategy with the following default parameters: 2 bp mismatch, 1 bp gap, 50 bp margin and 50–1000 bp product size, as previous report45,57. The software (e-PCR-2.3.11) used for in silico PCR was downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/schuler/e-PCR/. And only one genome was used at a time.
Evaluation of polymorphisms in G. hirsutum and G. barbadense
Illumina reads of G. barbadense cv. Hai7124 (Biosample: SAMN03002317) were mapped to the genome of G. hirsutum acc. TM-1 using the Burrows-Wheeler Alignment tool (BWA ver 0.6.2)58, with mainly default parameters. SAMtools59 was used to call InDels ≥5 with mapped reads of a minimum mapping quality of 20. If InDels were located in the region of paired primers/microsatellite markers, we regarded the paired primer as a putative polymorphic primer between TM-1 and Hai7124.
Data Section
The detail information of 77,996 developed markers and in silico PCR amplification were deposited in Table S6. These markers will be publically available in http://mascotton.njau.edu.cn and CottonGen database.
Additional Information
How to cite this article: Wang, Q. et al. Genome-Wide Mining, Characterization and Development of Microsatellite Markers in Gossypium Species. Sci. Rep. 5, 10638; doi: 10.1038/srep10638 (2015).
References
Gupta, P. & Varshney, R. The development and use of microsatellite markers for genetic analysis and plant breeding with emphasis on bread wheat. Euphytica 113, 163–185 (2000).
Powell, W., Machray, G. & Provan, J. Polymorphism revealed by simple sequence repeats. Trends Plant Sci. 1, 215–222 (1996).
Agarwal, M., Shrivastava, N. & Padh, H. Advances in molecular marker techniques and their applications in plant sciences. Plant Cell Rep. 27, 617–631 (2008).
Cavagnaro, P. et al. Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.). BMC Genomics 11, 569 (2010).
Zhang, H., Li, Y., Wang, B. & Chee, P. Recent advances in cotton genomics. Int. J. Plant Genom. 2008, 742304 (2008).
Zhu, H. et al. Mining and validation of pyrosequenced simple sequence repeats (SSRs) from American cranberry (Vaccinium macrocarpon Ait.). Theor. Appl. Genet. 124, 87–96 (2012).
Percival, A., Stewart, J. & Wendel, J. Taxonomy and germplasm resources. In: Cotton: Origin, History, Technology and Production ( Smith, C.W. & Cothren, J.T., eds) pp. 33–63. New York: John Wiley & Sons (1999).
Fryxell, P. The natural history of the cotton tribe (Malvaceae, tribe Gossypieae), pp. 245. College Station: Texas A & M University Press (1978).
Endrizzi, J., Turcotte, E. & Kohel R. Qualitative genetics, cytology and cytogenetics. In: Cotton, ( Kohel, R.J. & Lewis, C.F., eds), pp.81–129, Madison: University of Wisconsin Press (1984).
Wendel, J. New World tetraploid cottons contain Old World cytoplasm. Proc. Natl. Acad. Sci. USA. 86, 4132–4136 (1989).
Hoffman, S. et al. Identification of 700 new microsatellite loci from cotton (G. hirsutum L.). J. Cotton Sci. 11, 208–241 (2007).
Lacape, J. M., Dessauw, D., Rajab, M., Noyer, J. L. & Hau, B. Microsatellite diversity in tetraploid Gossypium germplasm: assembling a highly informative genotyping set of cotton SSRs. Mol. Breeding 19, 45–58 (2007).
Edwards, K., Barker, J., Daly, A., Jones, C. & Karp, A. Microsatellite libraries enriched for several microsatellite sequences in plants. BioTechniques 20, 758–760 (1996).
Han, Z. et al. Characteristics, development and mapping of Gossypium hirsutum derived EST-SSRs in allotetraploid cotton. Theor. Appl. Genet. 112, 430–439 (2006).
Park, Y. et al. Genetic mapping of new cotton fiber loci using EST-derived microsatellites in an interspecific recombinant inbred line cotton population. Mol. Genet. Genomics 274, 428–441 (2005).
Frelichowski, J. et al. Cotton genome mapping with new microsatellites from Acala ‘Maxxa’ BAC-ends. Mol. Genet. Genomics 275, 479–491 (2006).
Guo, Y. et al. BAC-derived SSR markers chromosome locations in cotton. Euphytica 161, 361–370 (2008).
Zhang, X. et al. Characterization of the global transcriptome for cotton (Gossypium hirsutum L.) anther and development of SSR marker. Gene 551, 206–213 (2014).
Wang, Z. et al. A Whole-Genome DNA Marker Map for Cotton Based on the D-Genome Sequence of Gossypium raimondii L. G3-Genes Genom. Genet. 3, 1759–1767 (2013).
Blenda, A. et al. CMD: a Cotton Microsatellite Database resource for Gossypium genomics. BMC Genomics 7, 132 (2006).
Yu, J. et al. CottonGen: a genomics, genetics and breeding database for cotton research. Nucleic Acids Res. 42, D1229–D1236 (2014).
Lacape, J. M. et al. A combined RFLP-SSR-AFLP map of tetraploid cotton based on a Gossypium hirsutum x Gossypium barbadense backcross population. Genome 46, 612–26 (2003).
Rong, J. et al. A 3347-locus genetic recombination map of sequence-tagged sites reveals features of genome organization, transmission and evolution of cotton (Gossypium). Genetics 166, 389–417 (2004).
Yu, J. et al. A High-Density Simple Sequence Repeat and Single Nucleotide Polymorphism Genetic Map of the Tetraploid Cotton Genome. G3-Genes Genom. Genet. 2, 43–58 (2012).
Yu, Y. et al. Genome structure of cotton revealed by a genome-wide SSR genetic map constructed from a BC1 population between Gossypium hirsutum and G. barbadense. BMC Genomics 12, 15 (2011).
Chen, L. et al. Genetic linkage map construction and QTL mapping for yield and fiber quality in upland cotton (Gossypium hirsutum L.). Acta Agron. Sin. 7, 015 (2008).
Liu, R., Wang, B., Guo, W., Wang, L. & Zhang, T. Differential gene expression and associated QTL mapping for cotton yield based on a cDNA-AFLP transcriptome map in an immortalized F-2. Theor. Appl. Genet. 123, 439–454 (2011).
Xu, Z. et al. An integrated genetic and physical map of homoeologous chromosomes 12 and 26 in Upland cotton (G. hirsutum L.). BMC Genomics 9, 108 (2008).
Blenda, A. et al. A high density consensus genetic map of tetraploid cotton that integrates multiple component maps through molecular marker redundancy check. PLoS One 7, e45739 (2012).
Lister, R., Gregory, B. & Ecker, J. Next is now: new technologies for sequencing of genomes, transcriptomes and beyond. Curr. Opin. Plant Biol. 12, 107–118 (2009).
Zhang T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fibre improvement. Nat. Biotech. DOI:10.1038/nbt.3207 (2015).
Paterson, A. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).
Li, F. et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567–572 (2014).
Morgante, M., Hanafey, M. & Powell, W. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat. Genet. 30, 194–200 (2002).
Huo, N. et al. The nuclear genome of Brachypodium distachyon: analysis of BAC end sequences. Funct. Integr. Genomics, 8, 135–147 (2008).
Weng, Y., Johnson, S., Staub, J. & Huang, S. An extended intervarietal microsatellite linkage map of cucumber, Cucumis sativus L. HortScience 45, 882–886 (2010).
Sonah, H. et al. Genome-wide distribution and organization of microsatellites in plants: an insight into marker development in Brachypodium. PLoS One 6, e21298 (2011).
Sharma, P., Grover, A. & Kahl, G. Mining microsatellites in eukaryotic genomes. Trends Biotechnol. 25, 490–498 (2007).
Yasodha, R. Characterization of microsatellites in the tribe bambusee. Gene Conserv. 10, 51–64 (2011).
Sharma, M. et al. A genome-wide survey of switchgrass genome structure and organization. PLoS One 7, e33892 (2012).
Pandey, G. et al. Genome-Wide Development and Use of Microsatellite Markers for Large-Scale Genotyping Applications in Foxtail Millet [Setaria italica (L.)]. DNA Res. 20, 197–207 (2013).
Zou, C., Lu, C., Zhang, Y. & Song, G. Distribution and characterization of simple sequence repeats in Gossypium raimondii genome. Bioinformation 8, 801–806 (2012).
Kalia, R., Rai, M., Kalia, S., Singh, R. & Dhawan, A. Microsatellite markers: an overview of the recent progress in plants. Euphytica 177, 309–334 (2011).
Parida, S., Kumar, K. A., Dalal, V., Singh, N. & Mohapatra, T. Unigene derived microsatellite markers for the cereal genomes. Theor. Appl. Genet. 112, 808–817 (2006).
Shi, J. et al. Genome-wide microsatellite characterization and marker development in the sequenced Brassica crop species. DNA Res. 21, 53–58 (2013).
Parida, S., Yadava, D. & Mohapatra, T. Microsatellites in Brassica unigenes: relative abundance, marker design and use in comparative physical mapping and genome analysis. Genome 53, 55–67 (2010).
Wang, C., Guo, W., Cai, C. & Zhang, T. Characterization, development and exploitation of EST-derived microsatellites in Gossypium raimondii Ulbrich. Chinese Sci. Bull. 51, 557–561 (2006).
Guo, W. et al. A Microsatellites-Based, Gene-Rich Lingkage Map Reveals Genome Structure, Function and Evolution in Gossypium. Genetics 176, 527–541 (2007).
Blanca, J. et al. Transcriptome characterization and high throughput SSRs and SNPs discovery in Cucurbita pepo (Cucurbitaceae). BMC Genomics 12, 104 (2011).
Li, Y., Korol, A., Fahima, T., Beiles, A. & Nevo, E. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol. Ecol. 11, 2453–2465 (2002).
Varshney, R., Graner, A. & Sorrells, M. Genic microsatellite markers in plants: features and applications. Trends Biotechnol. 23, 48–55 (2005).
Guo, W., Fang, D., Yu, W. & Zhang, T. Sequence divergence of microsatellites and phylogeny analysis in tetraploid cotton species and their putative diploid ancestors. Plant Biol. 47, 1418–1430 (2005).
Li, H. et al. Development of a core set of single-locus SSR markers for allotetraploid rapeseed (Brassica napus L.). Theor. Appl. Genet. 126, 937–947 (2013).
Zhao, L. et al. Toward allotetraploid cotton genome assembly: integration of a high-density molecular genetic linkage map with DNA sequence information. BMC Genomics 13, 539 (2012).
Thiel, T., Michalek, W., Varshney, R. & Graner, A. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor. Appl. Genet. 106, 411–422 (2003).
Untergasser, A. et al. Primer3--new capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012).
Rotmistrovsky, K., Jang, W. & Schuler, G. A web server for performing electronic PCR. Nucleic Acids Res. 32, W108–112 (2004).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Acknowledgements
This work was financially supported in part by grants from the Major State Basic Research Development Program of China (973 Program, 2011CB109300), the National High Technology Research and Development Program of China (863 Program) (2011AA10A102, 2012AA101108-04-04), National Key Technology Support Program (2015BAD02B00), the Priority Academic Program Development of Jiangsu Higher Education Institutions and the 111 project (B08025).
Author information
Authors and Affiliations
Contributions
T.Z.Z. conceived and designed the experiments. Q.W., L.F. and J.D.C. performed the experiments. Y.H., Z.F.S., S.W. and L.J.C. participated in part of materials and libraries preparation. Q.W. and L. F. analyzed the results. T.Z.Z., L.F. and W.Z.G. drafted the manuscript.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Electronic supplementary material
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Wang, Q., Fang, L., Chen, J. et al. Genome-Wide Mining, Characterization and Development of Microsatellite Markers in Gossypium Species. Sci Rep 5, 10638 (2015). https://doi.org/10.1038/srep10638
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/srep10638
This article is cited by
-
Analysis of germplasm genetic diversity and construction of a core collection in Camellia oleifera C.Abel by integrating novel simple sequence repeat markers
Genetic Resources and Crop Evolution (2023)
-
An empirical analysis of mtSSRs: could microsatellite distribution patterns explain the evolution of mitogenomes in plants?
Functional & Integrative Genomics (2022)
-
Development of Genome-wide SSR Markers for Physical Map Construction with PCR-based Polymorphic SSRs in Jute (Corchorus Spp.)
Tropical Plant Biology (2022)
-
Analysis of genetic diversity and relationships of Perilla frutescens using novel EST-SSR markers derived from transcriptome between wild-type and mutant Perilla
Molecular Biology Reports (2021)
-
Genome-wide characterization and development of simple sequence repeat markers for genetic studies in pomegranate (Punica granatum L.)
Trees (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.