Introduction

The whitefly Bemisia tabaci (Gennadius) (Hemiptera: Aleyrodidea) is a species complex containing at least 34 cryptic species1,2,3,4. The species complex colonizes more than 600 different species of plants and causes significant damage through transmitting plant viruses and feeding on plant phloem sap3. These cryptic species are morphologically indistinguishable5 and the mitochondrial cytochrome oxidase I (mtCOI) marker has been widely used to delimit different members of the complex2,6. To date, at least 12 distinct genetic groups have been identified from the complex based on mtCOI sequences2,6 and all available mating studies are in favor of the species-level boundaries3. The 12 distinct groups relates to the break in divergence frequencies identified at around 12%. However, it is perhaps more important to consider that there are 4 major clusters that represent the complex: (1) SubSaharan Africa (the ancestral cluster); 2) Asia; 3) New World and 4) North Africa/Middle East/Asia Minor.

During the last twenty years, the Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED) cryptic species of the complex have invaded many countries around the world and the invasion of MEAM1 and MED are associated with the displacement of closely related members of the complex7,8. Numerous efforts have been made to reveal the possible factors responsible for the invasion of MEAM1 and MED whiteflies. However, because the species of the B. tabaci complex are morphologically indistinguishable, the evolution of the complex and the migration and displacement process of MEAM1 and MED invasion are hard to trace.

Previously, various genetic markers have been used to study the genetic diversity/structures of different cryptic species of the B. tabaci complex such as the random amplification of polymorphic DNA (RAPD) PCR9, amplified fragment length polymorphisms (AFLP)10, restriction fragment length polymorphism (RFLP)11, mitochondrial DNA6, ribosomal ITS112 and microsatellite markers13,14,15,16. Among these genetic markers, microsatellites, or simple sequence repeats (SSRs), are randomly repeated motifs of DNA composed of 1–6 base pair (bp) long units17, which can be highly polymorphic among populations and are valuable for linkage mapping, comparative genomics and gene-based association studies18. In addition, microsatellites are also indispensable tools that can be used to reconstruct invasion histories and colonization routes and to reveal population bottlenecks and regional dispersal patterns19. Owing to these advantages, microsatellite has become increasingly popular for analyses of population genetics and evolutionary mechanisms of pest invasions20.

To date, 54 microsatellite markers are available for B. tabaci14,21,22,23. However, all of these microsatellites were derived from the genomic DNA and the connections between these markers and gene functions are completely unknown. Expressed Sequence Tag (EST) and transcriptome sequences contain polymorphic genetic markers and can be used to identify microsatellites24. Compared to the genomic DNA-derived microsatellites, EST- and transcriptome-derived microsatellites lack introns and intragenic regions and can correspond to genes with known or predicted functions25. In addition, those microsatellites have fewer null alleles and stutter bands26 and have more potential statistical power in multiple comparisons27. Furthermore, EST- and transcriptome-derived microsatellites have high degree of transferability across species28 and can be used in closely relative species29. Therefore, systematical investigation of EST- and transcriptome-derived microsatellites will facilitate evolutionary and comparative studies in the B. tabaci complex composed of closely related cryptic species27.

Recently, the transcriptomes of two invasive (MEAM1, MED) and one indigenous whitefly species (Asia II 3) have been sequenced30,31. These studies have generated a tremendous amount of data and provided a valuable source for the identification of microsatellite markers in whiteflies. The first objective of this study is to identify microsatellites from the three transcriptomes. In addition, microsatellites located in different regions of a gene serve various functions32. The distribution of microsatellites on genes was also analyzed. Furthermore, PCR experiments were employed to verify these predicted microsatellites and their cross species transferability. By comparative analysis of the newly developed microsatellites, the genetic relationships of six B. tabaci species were revealed. This study provides a rich resource of microsatellites for the B. tabaci complex and will facilitate researches on whitefly genetic diversity and evolution.

Results

Identification of microsatellites from the B. tabaci transcriptome databases

A total of 27.653 Mbp, 44.937 Mbp and 24.468 Mbp of sequences from the MEAM1, MED and Asia II 3 transcriptomes were used for mining microsatellites with the MISA-Micro Satellite program33 (Table 1). There were 6419, 11711 and 4115 microsatellites in MEAM1, MED and Asia II 3 respectively (Table S1), which correspond to one microsatellite per 3.837 ~ 5.946 Kbp of transcriptome sequences. The total numbers of polynucleotide repeats were 358, 433 and 322 in MEAM1, MED and Asia II 3 respectively. While most microsatellites-containing unigenes have only one microsatellite, there are 362, 299 and 190 unigenes containing multiple microsatellites. In addition, 277, 367 and 193 unigenes contain compound microsatellites in MEAM1, MED and Asia II 3, respectively (Table 1).

Table 1 Frequency and distribution of microsatellites in three species of the B. tabaci complex

Of the characterized microsatellites, mononucleotide repeats were the most common, followed by dinucleotide, trinucleotide and tetranucleotide repeats (Fig. 1A). On a complementary strand, a polyA repeat is the same as a polyT repeat. Similarly, in different reading frames or on a complementary strand, (AC)n is the same as (CA)n, (TG)n and (GT)n, while (AAG)n is the same as (AGA)n, (GAA)n, (CTT)n, (TTC)n and (TCT)n. Thus, mononucleotide, dinucleotide and trinucleotide repeats can be grouped into 2, 4 and 10 unique classes respectively34. In the three species of the B. tabaci complex, A/T motifs were the most abundant in mononucleotide repeats and AG class were the most common in dinucleotide repeats (Fig. 1B). However, the usage of trinucleotide was different. In MEAM1, the AAG class was the most widespread followed by ATG and AAC class (Fig. 1C). In MED, the most prevalent three triplet codons were AAG class, whereas in Asia II 3, ATG is the most frequent class (Fig. 1C). A total of 45 tetra microsatellites were identified from both MEAM1 and Asia II 3. Furthermore, two pentra motifs were found in Asia II 3 and one hexa motif was found in MEAM1 and Asia II 3, respectively (Fig. 1A).

Figure 1
figure 1

Distribution of microsatellites in the three whitefly species.

(A) Distribution of repeat loci. (B) Distribution of different dinucleotide repeats. (C) Distribution of different trinucleotide repeats.

Distribution of microsatellites in 3′UTR, 5′UTR and CDS regions

The distribution of polynucleotide microsatellites in CDS, 5′UTR and 3′UTR regions was investigated. Based on the information of BLASTx homology, the position of 71, 71 and 40 polynucleotide microsatellites were respectively determined for MEAM1, MED and Asia II 3 (Table 2 & Table S2). In the CDS, 45, 48 and 27 microsatellites with polynucleotide repeats were found in MEAM1, MED and Asia II 3, respectively, which were significantly higher than that of UTRs (Table 2). Interestingly, the number of trinucleotide repeats in CDS region was also much higher than other types of microsatellites. The characteristics of the amino acids encoded by the trinucleotide repeats in CDS were then investigated. In MEAM1, MED and Asia II 3, a total of 37, 37 and 20 triplet codons were found and they encoded 10, 15 and 8 different amino acids respectively (Fig. 2A). The codons encoded aromatic amino acids took up most of the partition, followed by aliphatic and heterocyclic amino acids (Fig. 2B). The codons encoded hydrophilic amino acids were 29, 20 and 17, while encoded hydrophobic amino acids were 3, 15 and 3 in MEAM1, MED and Asia II 3, respectively.

Table 2 The numbers of polynucleotide microsatellites in 3′UTRs, 5′UTRs and CDS
Figure 2
figure 2

The characteristics of trinucleotide repeats in CDS region of three species.

(A) Distribution of different amino acid codons in the three species. (B) Distribution of amino acid codons which encoded different types of amino acids in the three species.

Gene Ontology (GO) and KEGG annotation of microsatellite-containing sequences

GO assignments were used to classify the functions of the genes with microsatellites. Based on sequence homology, 272, 456 and 255 microsatellite-containing sequences from MEAM1, MED and Asia II 3 have GO annotations and can be categorized into 35 functional groups. The 35 functional groups were classified into three main categories (Biological process, Cellular component and Molecular function) (Fig. 3). GO analysis showed that ‘Metabolic process’, ‘Cellular process’ and ‘Cell part’ terms are dominant. Next, these genes were annotated to different KEGG pathways. A total of 318, 538, 297 microsatellite-containing sequences of MEAM1, MED and Asia II 3 were mapped to 193, 222 and 210 KEGG pathways, respectively (Table S3). Some of these genes are related to resistance to environmental stresses and insecticides, such as aldehyde oxidase35, cytochrome P45036 and mitogen-stress activated protein kinases 237 (Table S4).

Figure 3
figure 3

GO classifications of microsatellite-containing sequences from the three species.

The right y-axis represents the number of microsatellite-containing genes in a category. The left y-axis represents the percentage of a specific category of microsatellite-containing genes in that main category.

Characterization of the predicted microsatellite markers

To validate the predicted microsatellites, 88 primer pairs were synthesized (24 for MEAM1, 32 for MED and 32 for Asia II 3) to amplify microsatellites from whitefly DNA. Among the 88 primer pairs, 57 (65%) generated clear PCR products and 31 (35%) did not yield good amplification. The frequencies of microsatellite amplification in MEAM1, MED and Asia II 3 were 70.83% (24 designed, 17 effective), 53.13% (32 designed, 17 effective) and 71.88% (32 designed, 23 effective), respectively. The failure to amplify microsatellites may be caused by non-specific primers, the presence of large introns in the genomic DNA or inappropriate PCR conditions. Among the 57 markers that generated clear PCR products, 42 markers showed polymorphisms; whereas 15 generated a single PCR product (monomorphism). Details (repeat motifs, PCR primers, allele sizes, gene ID, accession number and possible change of amino acids) of the 57 markers are shown in Table S5.

The cross-taxa transferability of microsatellites

Next, the cross-amplification of the 57 primer pairs were investigated by PCR and capillary electrophoresis38. The result showed that 42 (73.68%) amplified fragments from all of the three whitefly species (Table S6), suggesting that these microsatellites are highly conserved and may act as genetic markers for the B. tabaci complex. In addition, 8 primer pairs could amplify fragments from the two invasive whiteflies and 1 pair amplified from both MED and Asia II 3, while the remaining 6 pairs of primers only amplified fragments from one of the three species. Next, the functions of these 42 microsatellite-containing genes were determined through Blast search, of which 33 showed significant similarities to known genes (Table S5). The alleles of these 42 microsatellites were compared among MEAM1, MED and Asia II 3 whiteflies. Of the 42 microsatellites, 26 contained the same alleles in MEAM1 and MED, whereas only 17 markers shared in MEAM1 and Asia II 3 and 16 markers shared in MED and Asia II 3 (Table S6), which is consistent with the fact that MEAM1 and MED have closer relationships among the three cryptic species. Interestingly, the sodium channel gene (Gene ID 99267), which is associated with xenobiotic resistance39, was found to contain different alleles in invasive and native whitefly species (Table S6).

Characteristics of the 13 microsatellites in six species of the B. tabaci complex

Thirteen polymorphic microsatellites (Table S5 marked in yellow) were then employed to assess the polymorphism and heterozygosity among six laboratory colonies of the B. tabaci complex (8 individuals for each species) (Table 3). A total of 93 alleles were identified from laboratory colonies of the six species based on the 13 microsatellite markers. When compared the performance of 13 microsatellite markers independently among the 6 cryptic species, the number of alleles (NA) ranged from 3 to 15, with an average of 7.2 alleles per locus. The observed (Ho) and expected (HE) heterozygosities ranged from 0.022 to 0.581 and 0.297 to 0.888 respectively. When compared the performance of these loci among the 6 cryptic species, the China 1 cryptic species exhibited the highest NA (3.077), while MEAM1 and Asia II 7 displayed the lowest NA (2.231) (Table 3). The MEAM1 showed the lowest observed (Ho) and expected (HE) heterozygosities. There are null alleles for two loci (291416, 27966) in MEAM1, two loci (31541, 22561) in MED, one locus (25869) in Asia II 3, three loci (102573, 29116, 99267) in Asia II 7 and two loci (36306, 102573) in Asia II 6. The Hardy-Weinberg equilibrium test was done for each locus in each population, 25 of 50 groups were significantly deviated from Hardy-Weinberg equilibrium (Table 3). Significant genotypic linkage disequilibrium was not detected. With regard to the polymorphism (PIC) of the 13 microsatellite-containing genes, the glycine-rich cell wall structural protein precursor (Gene ID: 29116), sodium channel (Gene ID: 99267) and NADH dehydrogenase subunit 4L (Gene ID: 76476) had high polymorphism (Table 3). The CCAAT/enhancer binding protein (Gene ID: 102573) showed relatively low polymorphism (Table 3). The polymorphism information content (PIC) ranged from 0.246 to 0.866, with an average of 0.612 in the six cryptic species, which indicates the effectiveness of microsatellites markers for detecting polymorphism40. In addition, the genetic diversity of the 13 microsatellite loci was compared among the six whitefly species. The level of gene diversity from the highest to the lowest was Asia II 3, MED, China 1, Asia II 6, MEAM1 and Asia II 7 (Table S7). Fig. 4 displays the gene diversity of the 13 microsatellite loci in every comparison. For example, in the comparison of MED and MEAM1, 7 of the 13 loci in MEAM1 showed a decrease of gene diversity compared to that of MED. To further testify these markers, 4 microsatellites were randomly picked to sequence and analyze the alleles per locus among the six cryptic species (Fig. 5). Complex mutational patterns (single base mutation, change of repeat units and indels within flanking region) were observed in these transcriptome-derived microsatellites (Fig. 5), which is normal in insects41.

Table 3 The characteristics of the newly developed microsatellite markers for six cryptic species of the B. tabaci complex
Figure 4
figure 4

Comparison of gene diversity among the six cryptic species of the B. tabaci complex.

Each point represents the gene diversity of one locus and the lines connect points from different species.

Figure 5
figure 5

Mutational patterns of selected microsatellites in B. tabaci.

The black lines indicate the repeat motifs and the black-dotted lines represent the indels in the flanking region.

The implications of microsatellites in evolutionary analysis

To date, at least 12 distinct genetic groups have been identified from the complex based on mtCOI sequences which consist of 4 major clusters that represent the complex: 1) SubSaharan Africa (the ancestral cluster); 2) Asia; 3) New World and 4) North Africa/Middle East/Asia Minor. The 6 species covered belong to 2 of the 4 major clusters, i.e. Asia and North Africa/Middle East/Asia Minor. The Neighbor Joining method was used for cluster analysis of the six species (Figure 6 A & B). Interestingly, the genetic relationship based on the 13 microsatellite loci is in agreement with the phylogeny of the partial mtCOI sequences2. Principal coordinates analysis (PCA) were done using GENALEX 6 software42 and the results revealed that the China 1, Asia II 3, Asia II 6 and Asia II 7 clustered on the left quadrant of the plot (Figure 6 B & C). When considered the first and third factors, the results revealed clearly that there are three genetic groups among the 6 cryptic species. The PCA analysis supports the phylogenetic clusters. In conclusion, these microsatellites may be used as markers to describe the genetic diversity of the B. tabaci complex.

Figure 6
figure 6

Phylogenetic and Principle components analysis of the six cryptic species.

(A) Phylogeny of the six cryptic species. POPTREE2 software59 were used to construct the phylogenetic tree and the numbers on the node represent bootstrap support. The scale bar represent Nei's standard genetic distance D60. (B) Two-dimensional scatter plot of the first and second factors for 6 cryptic species. The first and second PCs account for 60.21% and 20.52% of the variation, respectively. (C) Two-dimensional scatter plot of the first and third factors for 6 cryptic species. The first and third PCs summarize 60.21% and 13.75% of the variation, respectively.

Discussion

To our knowledge, this is the first investigation of the frequency and distribution of B. tabaci microsatellites using transcriptome data (Table 1). We found that mononucleotide repeats were the most common, followed by dinucleotide repeats, trinucleotide and tetranucleotide repeats in all of the three species. These results confirm the theory that the microsatellite abundance decreases with the increase of motif repeat number and repeat length43. These microsatellites provide a valuable resource for the development of genetic markers in B. tabaci. In addition, these markers offer a chance to classify the functions of these microsatellite-containing genes. Interestingly, some microsatellite-containing genes were found in pathways related to environmental stress responses (Table S4). These markers may open a new avenue for the research on the B. tabaci pesticide resistance by estimating the frequencies of alleles in genes related to resistance across populations44.

Many studies have demonstrated that microsatellites derived from transcribed sequences harbor higher transferability because of their conservative features45,46. In this study, among the 57 microsatellite markers, 42 primer pairs (73.68%) could amplify fragments from the three species. These results illustrate that the transcriptome-derived markers have high inter taxon transferability47. Therefore, these markers can be used to amplify microsatellites from the other closely related species that do not yet have markers and can be considered as anchor markers for comparative mapping and evolutionary studies across species48.

For 13 microsatellites among the colonies of six species (MEAM1, MED, Asia II 3, Asia II 6, China 1 and Asia II 7) of the B. tabaci complex, the number of alleles (NA) ranged from 3 to 15, with an average of 7.2 alleles per locus. Previously, Valle, Lourenção, Zucchi and Pinheiro23, Tsagkarakou and Roditakis22 and De Barro, Scott, Graham, Lange and Schutze21 found that the number of alleles per locus ranged from 1 to 2, 2 to 13 and 6 to 44, respectively. The difference in polymorphisms may be attributed to different population sizes or cryptic species used. Polymorphism and heterozygosity are critical characteristics of microsatellite markers. The high allele numbers and heterozygosity (Table 3) of these transcriptome-derived markers suggests that they can be used to assess the genetic profile of the B. tabaci complex. A total of 25 loci significantly deviated from Hardy-Weinberg equilibrium (Table 3). The source of deficiencies includes the recurrent inbreeding, subpopulation structure (Wahlund effect) and/or null alleles. The big differences between Ho and HE also were found among the genomics-based microsatellites developed for B. tabaci by De Barro49. In addition, the marked differences between Ho and HE also were observed in the genomics-based microsatellites of B. tabaci collected in different environment14,16. Some loci did not have heterozygosity and the FIS values were all positive, suggesting that the loss of heterozygosity can happen in some microsatellites. This is probably due to the fact that our samples were obtained from lab populations. Additional experiments with field populations are warranted to reveal the polymorphism and heterozygosity of different species of the B. tabaci complex.

Due to the lack of morphological characteristics, the systematics of B. tabaci species complex depends largely on molecular approaches2,3,4,6. To date, the evolutionary history of B. tabaci complex was inferred exclusively from the partial sequence of COI gene2,6,50. Other molecular markers are in great need to further illustrate the complicated evolutionary relationships of the B. tabaci species complex. For the 4 major clusters that represent the B. tabaci species complex, the most divergent cluster is SubSaharan Africa followed by New World. We have used 11 newly developed microsatellites to study the SubSaharan Africa1 and the results revealed that they can work on the cryptic species (data unpublished). Owing to the absence of the specimen, we did not study the New World, however, the most divergent cluster is SubSaharan Africa followed by New World among the 4 major clusters. Therefore, our newly developed microsatellites are conserved and many of them can be used to analyze other cryptic species of the B. tabaci complex. The genetic relationship of 6 species derived by 13 novel conserved microsatellites markers is in accordance with the phylogeny of the partial COI sequences. It provides additional evidence for the evolutionary relationship of the 6 cryptic species. In addition, our results also prove that microsatellites can be considered as suitable markers for future evolutionary analysis in other species complex.

Conclusion

In this study, we characterize a large number of transcriptome-derived microsatellites from members of the B. tabaci complex. Functional categorization of these microsatellites-containing genes provides valuable information about the potential functions of these microsatellites. In addition, 57 microsatellite markers were experimentally validated. Many of these microsatellites can be used for population analysis and are transferable across different species of the B. tabaci complex. Complex mutational patterns were observed in these transcriptome-derived microsatellites. What's more, by analyzing the genetic relationships of six B. tabaci species basing on newly developed microsatellites, the results indicated that these microsatellites may be used as markers to describe the genetic diversity of the B. tabaci complex. These markers enrich the existing microsatellite markers of B. tabaci and can be used to analyze the genetic diversity and evolutionary pattern of this whitefly complex.

Methods

Mining the microsatellites from the transcriptome database

The MISA-Micro Satellite identification tool (http://pgrc.ipk-gatersleben.de/misa/) was used to search for microsatellites in the transcriptome data of MEAM1, MED and Asia II 3 whiteflies. Microsatellites were defined as being mononucleotide repeats > = 10 repeats and di-, tri-, tetra-, penta- and hexanucleotide repeats > = 6 repeats33. Criteria for compound microsatellites was an interval of bases < = 100 of the motif length.

The location of microsatellites

Coding sequences (CDS) of each gene were first determined by BLASTx against the Swissprot database using a threshold of 1 × 10−5. CDS with unexpected stop codon in the Blast hit region were removed. Start codon positions were determined by examination of the in-frame ATG codon present 30 bp upstream or downstream of the beginning of the aligned reference protein. The stop codon positions were determined by examination of in-frame TAA, TAG and TGA motifs present within 30 bp of the stop codon of the reference protein. The 5′ or 3′UTR regions were defined based on the CDS prediction. The locations of microsatellites were determined based on the predicted 5′ UTR, 3′UTR and CDS regions.

GO and KEGG pathway analysis

To understand their functions, all microsatellite-containing genes were searched against the GenBank nr protein database using BLASTx with an E-value cut-off of 10−5. Blast2GO software was used to assign the Gene Ontology (GO) terms to these microsatellite-containing genes51. Blastall software was employed to perform the pathway analysis by searching all genes against the KEGG database.

Sample collection and DNA extraction

The invasive MEAM1 (mtCOI GeneBank accession number: GQ332577), MED (KF452516), the indigenous Asia II 3 (KF452527), Asia II 6 (KC540758), China 1 (KF452525) and Asia II 7 (EU192043) species of the B. tabaci complex were collected from Zhejiang, China. These cryptic species were maintained separately on cotton (cultivar Zhe-Mian 1793) with the following controlled conditions: 27 ± 1°C, a photoperiod of 14 h light: 10 h darkness and relative humidity of 70 ± 10%7. Total DNA was extracted from individual female adult whiteflies following the method of Frohlich et al52. The purity of the populations was identified by PCR amplification of a 0.7 kb fragment of mtCOI gene. The following forward and reverse primers were used to amplify the partial mtCOI sequences (5′-TGRTTYTTTGGTCATCCVGAAGT-3′ and, 5′-TTACTGCACTTTCTGCCACATTAG-3′).

Validation of microsatellite markers using PCR

To test these markers, Primer Premier 5.053 was used to design PCR primers from the sequences flanking the microsatellites. Bemisia tabaci produces males from unfertilized eggs and females from fertilized eggs. Therefore, only females were used to test polymorphism at each locus. M13-specific primers (5′-CACGACGTTGTAAAACGAC-3′) with a fluorescent dye (FAM or HEX, Applied Biosystems) were added to the 5′-end of each forward primer38. PCR was carried out in an S1000 thermal cycler (Bio-Rad). A 15 μL PCR reaction contained 0.25 μL 100 pmol/μl forward primer, 0.25 μL 100 pmol/μL reverse primer, 0.25 μL 100 pmol/μL 5′dye-labelled M13 primer and 1.6 μL 10 × Ex Taq Buffer, 1.2 μL 2.5 mM dNTP and 0.1 μL Ex Taq polymerase (Takara, Japan). PCR cycling conditions were 94°C for 3 min, followed by 32 cycles of 96°C for 15 s, 51–63°C for 20 s, 72°C for 50 s. The PCR reaction products were diluted and detected on a MegaBACE 1000 DNA analysis system (Amersham Biosciences) at the Center of Analysis and Measurement in Zhejiang University. The ET550-R size standard (GE Healthcare) and Genetic Profiler version 2.2 (GE Healthcare) were used to judge the sizes of amplification.

Polymorphism and microsatellite distribution analysis

Software POPGENE (version 1.31)54 was used to calculate the total number of polymorphic alleles (N), average number of alleles per locus (NA), average number of effective alleles per locus (Ne), observed heterozygosity (Ho), expected heterozygosity (HE), the genetic identity (I), genetic distance (D) and the gene diversity for each loci of different cryptic species. FSTAT (Version 1.2)55 was used to examine the allelic richness (R). The inbreeding index (FIS) and p (FIS) was estimated by GENEPOP 4.056. Polymorphism information content (PIC) was calculated by PIC-CALC version 0.657. The null alleles and technical artifacts like stuttering and large allele dropout was assessed using MICRO-CHEKER v.2.2.358.

Data accessibility

DNA sequences: Sequences have been submitted to the GenBank with the accession number of: KF916587-KF916610. Detailed information of predicted microsatellites from the transcriptomes of the three species is shown in Supplementary Table S1. PCR primer sequences are presented in Supplementary Table S5.