Simple sequence repeats drive genome plasticity and promote adaptive evolution in penaeid shrimp

Simple sequence repeats (SSRs) are rare (approximately 1%) in most genomes and are generally considered to have no function. However, penaeid shrimp genomes have a high proportion of SSRs (>23%), raising the question of whether these SSRs play important functional and evolutionary roles in these SSR-rich species. Here, we show that SSRs drive genome plasticity and adaptive evolution in two penaeid shrimp species, Fenneropenaeus chinensis and Litopenaeus vannamei. Assembly and comparison of genomes of these two shrimp species at the chromosome-level revealed that transposable elements serve as carriers for SSR expansion, which is still occurring. The remarkable genome plasticity identified herein might have been shaped by significant SSR expansions. SSRs were also found to regulate gene expression by multi-omics analyses, and be responsible for driving adaptive evolution, such as the variable osmoregulatory capacities of these shrimp under low-salinity stress. These data provide strong evidence that SSRs are an important driver of the adaptive evolution in penaeid shrimp.


Supplementary Figure 8. Substitution rate distribution of repeats in the four decapod genomes.
The substitution rates were calculated between the genomic and repeat consensus sequences. The distribution of transposable elements of the two penaeid shrimp species was similar in comparison with the other two decapod species (E. sinensis and P. virginalis). The genome data of the penaeid shrimp species were downloaded from NCBI SRA database (Supplementary materials Table S14). The paired-end Illumina sequencing reads were mapped against the F. chinensis and L. vannamei genomes, respectively. The physical coverage of the genome and the paired reads mapping rates were calculated. As the sequencing data of F. indicus, M. ensis, and M. joyneri were too scarce to cover the genome, the physical coverage of these three species did not calculated. * indicates significant difference with p < 0.05. Figure 11. Comparison of the gene repertoire of 17 arthropod genomes. "1:1" indicates single-copy genes, "X:X" indicates orthologous genes present in multiple copies in all the ten species, where X means one or more orthologs per species, "patchy" indicates the existence of other orthologs that are presented in at least one genome. The penaeid shrimp harbored many more dinucleotide SSRs ((AT) n , (AC) n , (AG) n ), while P. humanus had more A-rich SSRs ((A) n , (AAT) n , (AAAT) n ), and H. robusta had more triplet and tetranucleotide SSRs ((ATAC) n , (ATC) n , (ATC) n , and (AAC) n )(p < 0.05).

Supplementary Figure 15. SSR length distribution of the five crustacean genomes.
A peak of long SSRs could be observed along the (AT) n , (AC) n , and (AG) n SSRs distribution of the two penaeid shrimp species, but not present in other types of SSRs. Figure 16. Phylogenetic tree of the MMR gene family. The phylogenetic tree was constructed by using ML methods with the substitute model of JTT+G, and 1000 bootstraps were performed. Various MMR genes were clustered together, and penaeid shrimp genomes contained most of them.

Supplementary Figure 17. Comparison of TEs harbor (AT) n and (AAT) n SSRs in the two shrimp genomes.
The percent of TEs harboring (AT) n and (AAT) n SSRs showed some differences (hAT-Charlie, Penelope and Gypsy) in two penaeid shrimp genomes. * indicates p < 0.05 and ** indicates p <0.01. However, these TEs did not specifically harbor (AT) n and (AAT) n except Gypsy. Gypsy specifically harbor (AT) n in L. vannamei, while it harbor (AAT) n in F. chinensis specifically. Whereas hAT-Charlie and Penelope majorly contained (AT) n in both shrimp genomes.

Supplementary Figure 18. SSR length distribution of four crustacean genomes.
The SSRs were mostly short SSRs with the length short than 25 bp. No peak could be observed around the length of 60 bp (long SSRs).

Supplementary Figure 19. The distribution of TE numbers that harboring various lengths of SSRs.
The curve with different colour indicates different TEs that harboring SSRs. Except for (AAT) n SSRs, a peak of long SSRs could be observed along the (AT) n , (AC) n , and (AG) n SSRs distribution in the two penaeid shrimp genomes.

Supplementary Figure 20. The synteny of the two penaeid shrimp genomes.
The synteny of the two shrimp genomes was assessed by MCScanX. Each linking line in the center of the circle indicates a synteny block that involving at least 5 collinear gene pairs. Only 293 synteny blocks covering 2149 genes were identified between the two shrimp genomes. The orthologous genes between F. chinensis and L.vannamei showed highly synteny in exons and some SSRs in introns. However, most of the SSRs in the gene body showed significant differences between the two shrimp species in all of these orthologous genes. SSR elongation and insertion could be detected when comparing the orthologous genes. Even for the genes with no SSR inserted in gene body (Klf1), the SSRs located in the up-or downstream showed some differences.

Supplementary Figure 26. The ATAC-seq mapping depth of the gene body and promoter.
The ATAC-seq reads mapping depth was calculated for each gene, and the upstream 3 Kb was considered as the promoter region. The bar of the heatmap indicates the average sequencing depth for the corresponding region. Generally, the sequencing depth was relatively higher around the transcription start site (TSS) and transcription end site (TES) than other regions. Figure 27. The KEGG enrichment of the genes around differential ATAC peaks under low-salinity stress in ATAC-seq analysis of L. vannamei. The KEGG terms in red colour were related to amino acid or lipid metabolism. Figure 28. The KEGG enrichment of the genes around differential peaks under low-salinity stress in ATAC-seq analysis of F. chinensis. The KEGG terms in red colour were related to amino acid or lipid metabolism. Figure 29. TE contents in all identified peaks at 3% and 30% salinity and only the differential peaks (3% vs. 30%) identified by ATAC-seq. The differential peaks were identified according to various p values from differential analyses of ATAC-seq.  Supplementary Table 1