Introduction

The ichneumonid wasp Latibulus argiolus (Rossi, 1790) of the subfamily Cryptinae (Hymenoptera: Ichneumonidae) is a specialized ectoparasitoid of social paper wasps (Polistes Latreille, 1802) with respect to its larvae and pupae1,2. Because parasitoids use only one host for their full larval development3, they are under strong natural selection that in turn leads to optimal exploitation of the host4. Therefore, the most important behaviour increasing the fitness of free-living female parasitoids is their oviposition decision, including the choice of the host and number of eggs laid on a specific host among other factors. A nest of paper wasps containing many potential hosts (various developmental stages of the wasp), can be attractive for the L. argiolus female allowing it to theoretically maximize its fitness. On the other hand, the strong defensive capabilities of Polistes colonies against intruders can influence the oviposition mode of L. argiolus, for example, by reducing the number of eggs laid in a single nest and in turn, by enlarging the number of host nests parasitized by this parasitoid. Furthermore, up to 34% to 78% of the host nests are destroyed by predators (such as birds and ants)5, that can lead to high L. argiolus mortality and lower its genetic variability. In turn, this process might cause a decrease in the fitness of this parasitoid and might function as an additional factor for females not to limit their reproduction to a single host nest. By linking the oviposition decision and genetic variability of L. argiolus, we can interpret the biology of this parasitoid in a broad ecological and evolutionary context. Investigations of the L. argiolus and paper wasp relationships may serve as a model for the coevolution of solitary parasitoid–social host interactions.

The marker of choice for many areas of population research, such as mating systems, kinship structure, demography or conservation genetics, are microsatellite loci6,7,8. The usefulness of these markers results from their codominance, high polymorphism, and abundance throughout the genome9. The high popularity of microsatellite loci, resulting in their frequent application in population genetics studies, is caused also by the beneficial proportion between the amount of information gained and the sum of financial costs and expended work efforts. Thus far, no such loci have been described for the L. argiolus species; therefore, the purpose of this study was to investigate these loci. Several methods for novel microsatellite marker development exist. One of them introduces third generation sequencing on a PacBio RSII platform. The described de novo loci must be verified first for reliable amplification and high polymorphisms. Microsatellites fulfilling the criteria of error free amplification, high genotyping quality, and polymorphisms may be applied for population genetic analysis. In studies applying microsatellites, at least over a dozen loci are required to obtain high statistical power of the calculations results10,11. Therefore, concurrent amplification of markers in multiplex polymerase chain reactions (PCR) is desirable because this procedure will significantly shorten the time of laboratory work and reduce the cost of analysis. Such assays need to be first optimized and validated to justify the use of the loci in future research projects12,13,14.

This study aimed to develop a set of novel microsatellite loci and then to optimize and verify the newly designed multiplexes that could be applied in population and evolutionary studies of the specialized ichneumonid parasitoid L. argiolus species.

Material and methods

Sampling and DNA extraction

The study material consisted of 213 L. argiolus specimens (126 diploid females and 87 haploid males). Specimens were obtained from 43 Polistes nests from ten wild populations in Poland: (1) Białystok-Lotnisko (53°05′59.5″N 23°09′37.6″E), (2) Haćki (52°50′05.5″N 23°10′57.9″E), (3) Kalitnik (53°00′54.5″N 23°42′54.7″E), (4) Krzyżanowice (50°27′12.4″N 20°33′29.1″E), (5) Mietlica (52°32′28.0″N 018°22′57.3″E, (6) Przewóz (52°28′51″N 18°23′25″E), (7) Skupowo (52°49′51.0″N 23°42′55.9″E), (8) Suraż (52°55′19.3″N 22°57′49.9″E), (9) Stawiany (50°35′30.6″N 20°37′19.7″E), and (10) Zalesiany (53°04′10.4″N 23°02′32.9″E). The wasp nests collected in the field (in May/June) were transferred into plastic cages (24 × 17 × 16 cm) and kept in the laboratory at temperatures of 23 to 27 °C under natural light:dark (L:D) conditions. The individuals of L. argiolus, emerging from the host nest in June/July, were frozen and preserved at − 20 °C. DNA was obtained from the hind legs using the DNeasy Blood & Tissue Kits (cat. no. 69506, Qiagen).

Pacific biosciences RS platform sequencing

The PacBio library was constructed using 9 µg of genomic DNA consisting of 16 (12.5 µl) equally mixed female DNA samples. DNA was fragmented using miniTube (cat. no. 520064) in the Covaris System to a size of 1.5 to 3 kb according to protocol with minor modifications in which the intensity was set to 0.2 while the sonication time was reduced to 40 s. The shared sample was purified applying 0.6 × volume ratio of AMPure PB Beads (cat. no. 100-265-900, Pacific Biosciences). Furthermore, the 2 kb library was prepared using 750 ng of DNA according to the Pacific Biosciences protocol available online15 and using the SMRTbell Template Prep Kit 1.0 (cat. no. 100-259-100, Pacific Biosciences). The procedure included DNA damage and end repair followed by blunt ligation of hairpin adaptors at both ends of the DNA fragments. Failed ligation products were removed with ExoIII and ExoVII enzymes. Purifications steps separating enzymatic reactions were performed by applying 0.6 × volume ratio of AMPure PB Beads. Size distribution of DNA fragments during the library preparation procedure and for the final library was checked by electrophoresis on 0.5% agarose gels. The resulting library was doubly purified and prepared for sequencing using DNA/Polymerase Binding Kit P6 v2 (cat. no. 100-372-700, Pacific Biosciences) and applying MagBeads Kit v2 (cat. no. 100-676-500, Pacific Biosciences) for loading onto the sequencer according to the protocol generated with Binding Calculator v.2.3.1.1 (available online: https://github.com/PacificBiosciences/BindingCalculator). Single molecule real-time testing (SMRT) was carried out on a PacBio RS II sequencer running two SMRT Cells 8Pac v3 (cat. no. 100-171-800, Pacific Biosciences) and a 360-min data collection mode.

Data gathered from sequencing were subject to the RS_ReadsofInsert.1 protocol analysis with default settings to generate reads of insert (ROI). The ROI reads were subsequently subjected to microsatellite analysis and primer design using msatcommander v.1.0816 with a threshold of at least eight or ten repetitions for tri- and tetra-nucleotide repeats, excluding mononucleotide repeats. Primer design parameters including the msatcommander option “combine loci” consisted of several parameters: (1) size of primers ranging from 18 to 22 bp, (2) annealing temperature (Tm) ranging from 58 to 62 °C, (3) GC content ranging from 30 to 70%, and (4) amplicon product size range of 80 to 400 bp. Finally, only microsatellite sequences containing tri- or tetra-nucleotide motif with at least 10 repeats were used for marker selection and amplification test performance.

Microsatellite marker selection and testing

Loci with the highest number of tandem repeats were chosen from the identified microsatellite sequences set. The DNA sequences of the newly developed loci were aligned against the NCBI database of already described nucleotide sequences using BLAST algorithms (BLASTN 2.10.1 + and BLASTX 2.10.1 + programmes)17,18,19. In the next step, loci were filtered for their optimal amplification performance by applying following criteria calculated in silico: (1) maximal PCR efficiency and no dimer formation, (2) difference in annealing temperature between primer pair below 2 °C, (3) minimal penalty score, and (4) sequence length of microsatellite ranging from 100 to 350 bp. For multiplex design purposes, the selected markers were further ordered according to the predicted PCR product size into three categories long (> 300 bp), medium (150–300 bp), and short (< 150 bp). The finally selected markers were amplified on DNA of 16 diploid females. Polymorphisms and the allele range of the chosen markers were tested using the universal primer labelling method20,21. The forward primers were tailed at the 5′ end with a CAC-sequence tag (5′-CACGACGTTGTAAAACGAC). This tag sequence was identical to four fluorescent dye-labelled universal forward primers (labelled with the dyes, including 6-Fam, Hex, Tamra, and Rox). The PCR reactions were performed in a final volume of 15 µl containing 9–227 ng of DNA (1 µl), 1 × MasterMix of DreamTaq Hot Start Green PCR Master Mix (cat. no. K9022, Thermo Fisher Scientific), 0.02 M tagged forward locus specific primer, 0.2 M locus specific reverse primer, and 0.2 M fluorescent labelled universal forward primers. The amplification conditions consisted of several steps: (1) initial denaturation 95 °C for 5 min, (2) 35 cycles of 30 s at 94 °C, (3) 90 s at 60 °C, (4) 40 s at 72 °C, and (5) a final elongation step of 10 min at 72 °C. Negative controls were included for each PCR reaction. Positive controls were not used as the DNA concentration and quality of all samples was satisfactory enough to permit successful PCR reactions. All loci were amplified separately, and the reaction efficiencies were visualized on 2% agarose gels. For loci in which no amplification products were observed, the amplifications were repeated by lowering the annealing temperature to 57 °C. PCR products were separated using an automated sequencer ABI 3500XL Genetic Analyzer applying GeneScan 600 LIZ as an internal lane size standard (cat. no. 4366589, Applied Biosystems). PCR products labelled with different fluorescent dyes were mixed for joint analysis. The resulting fragment sizes were read using GeneMapper v.4.1 (Applied Biosystems) software. Out of the tested set, only loci fulfilling following criteria were selected for multiplex panel optimization: (1) clear peak morphology, (2) no artefacts co-amplification, and (3) high number of alleles.

Multiplex panel optimization

Out of 41 de novo described loci, 14 microsatellite markers were selected and organized into two multiplex PCR sets (seven loci each) by maximizing the number of loci labelled with the same dye (avoiding though allele overlap between loci) as shown in Fig. 1. The gaps between the neighbouring loci were set to the size of 66 to 137 bp according to the genotype data set of 16 females obtained during single microsatellite marker testing. The forward primer of each marker was fluorescence-labelled (Table 1, Fig. 1). Multiplex PCRs were performed with Labcycler Gradient (SensoQuest GmbH) in 5 µl reaction volume containing 2 µl genomic DNA (~ 20 ng), 1.7 µl Multiplex PCR 1 × MasterMix (cat. no. 206143, Qiagen), 0.3 µl mix of primers (cat. no. 450056, Life Technologies), and 1 µl RNase-free water (cat. no. 206143, Qiagen). Table 1 shows the final concentration of each primer pair. For each PCR reaction, a negative control was used as a sample to which no DNA was added. Each multiplex PCR started with an initial activation step of 95 °C for 15 min followed by 22 cycles of denaturation at 94 °C for 30 s, annealing at 57 °C for 90 s, extension at 72 °C for 60 s, and ended with a final extension at 60 °C for 30 min. The PCR products were mixed with 10 µl ultra-grade formamide (cat. no. 4311320, Applied Biosystems) and 0.2 µl GeneScan 500 LIZ size standard (cat. no. 4322682, Applied Biosystems), denatured at 95 °C for 5 min, rapidly cooled, and then subjected to fragment length analysis by using a four-capillary ABI 3130 Genetic Analyzer (Applied Biosystems). The fragment lengths of microsatellite alleles were estimated automatically using the AutoBin feature in GeneMapper 4.0 software (Applied Biosystems), checked, and then optionally corrected manually to avoid false alleles calling of the software. In addition, approximately 10% of the samples (N = 20) was randomly chosen, re-amplified, and genotyped in order to calculate the genotyping error of the optimized multiplex.

Figure 1
figure 1

Two multiplex sets developed for simultaneous amplification of 14 microsatellite loci for Latibulus argiolus.

Table 1 Characterization of 14 microsatellite loci developed for Latibulus argiolus and organized in two multiplex sets (1, 2).

Data analysis

The number of alleles (NA) and their size ranges for each of the preliminarily tested locus were calculated with GenAlEx v.6.503 programme22. The software MICRO-CHECKER 2.2.123 was used for identifying possible genotyping errors (stuttering, large allele drop-out, false, and null allele frequencies) by performing 1,000 randomizations. We used CERVUS 3.0.324 to estimate the number of alleles per locus (NA) and their size ranges. CERVUS 3.0.3 was also used to calculate the expected and observed heterozygosity (HE and HO, respectively), the mean polymorphic information content (PIC), and the null allele frequency values (Fnull), and to estimate departures from Hardy–Weinberg equilibrium (HWE) with Bonferroni correction24 for evaluating the significance of HWE deviations. Linkage disequilibrium between loci was tested using GENEPOP v4.025 with the Markov chain method (10,000 dememorization steps, 100 batches, 5,000 interactions) and Fisher’s exact test; a Bonferroni correction for multiple testing was then applied26. The described calculations were performed exclusively for diploid females (N = 110) with exceptions for the NA and size range values, which were conducted for the whole sample set (N = 197), including haploid males (N = 87). Because the analysed females came from populations, in which the number of females was on average only 7.2, calculations were performed for the pooled sample of 110 diploid individuals.

Results

De novo sequencing of microsatellite loci

Library sequencing allowed us to obtain 1,133 Mb of raw data covering 65,556 reads for the jointly used SMRT Cells. The average sizes of the inserts were 2,241 and 2,464 (with quality of the inserts reads exceeding 0.92). The final output of reads length reached 151.4 Mb. Out of these sequences set, 48,515 ROIs were acquired and accumulated with 90,546,727 bp. The average ROI length equalled 1,874 bp, and the average number of passes was over 17 (17.74). The final sequence list consisted of 11,355 DNA sequences that were subjected to microsatellite loci searching. Sorting of microsatellites from this set allowed 415 sequences to be found that had an at least eight repetition threshold for which 401 primer pairs could be successfully designed. Altogether, 194 microsatellite loci covering tri- or tetra-nucleotide repeats at a number higher or equal to 10 were identified (Supplementary Table S1), for which 173 primers pairs could be designed (Supplementary Table S2). Only six tetra-nucleotide loci fulfilling the microsatellite selection criteria (out of these set) were detected (Supplementary Table S1). From this set, 41 tri-nucleotide loci, named LA1–LA41 were selected for PCR amplification and testing (Supplementary Table S3) and then deposited in GenBank under accession numbers MN531308–MN531348.

Out of 41 amplified markers tested in 16 diploid L. argiolus females, 40 (97.6%) were successfully amplified and were polymorphic (Supplementary Table S4). For four markers (LA1, 14, 16, and 38), the annealing temperature had to be lowered to 57 °C in order to obtain PCR products. Most of the markers (78%) amplified well and yielded clear and readable products. Twenty-four (60%) of the markers gave single peaks, while in the others 16 (40%), additional loci peaks impeding potential uncertainness in allele scoring were observed. These peaks consisted of stutter bands or after-peaks. In seven cases, two peaks differing by one bp in length were observed. The latter effect is caused in cases in which polymerase does not add adenine to the 3′ end of the newly synthesised strand during replication (resulting in so called − A and + A bands occurrence)27. BLASTX analysis showed no results for neither of sequence of the 41 tested loci. BLASTN results indicated that eight DNA sequences were similar to previously described DNA or cDNA sequences deposited in GenBank (Supplementary Table S5). However, in all results, the obtained query cover values were too low for convenient sequence annotation to the already described in GenBank genes or transcripts. This indicates that the selected loci are probably located in the non-coding regions of the L. argiolus genome. The exception could be locus La36 for which the query cover values exceeded 50% together with low E value scores and relatively high percentage identity of the aligned sequences for several insect species. In most of these cases BLAST search identified transcription factors sequences as the most probable results. The observed allele number in the tested loci ranged from 4 to 18 with 25 loci exhibiting at least 10 alleles. The best amplified, readable, and most polymorphic loci (Supplementary Table S4) were considered for multiplex development. LA11 was excluded from this set since it exhibited difficulties in raw read rounding, which could have been caused by its hypervariability (NA = 18).

Development and characterization of the microsatellite multiplex assay

For multiplex optimization purposes, 14 loci were selected and organized into two multiplex panels (Table 1, Fig. 1). The combined microsatellite primers pairs in these assays successfully amplified the chosen loci to yield peaks that were clear and easy to score. For most of the loci, the initial primer concentration had to be adjusted to achieve balanced amplification of the markers (Table 1). We did not detect any evidence of genotypic errors using the MICRO-CHECKER 2.2.1 software, a finding that was also proven by the low frequencies of null alleles explored by CERVUS 3.0.3 (Table 1). The amplification success of the multiplexes was high, reaching 197 (100%) of the microsatellite profiles. Robustness of the optimized assay was further verified by obtaining 100% correctly genotyped samples for 20 duplicated samples (~ 10% of the studied samples set as recommend by Pompanon et al.27). No significant linkage disequilibrium among any pair of the loci under study after Bonferroni correction (adjusted value of P < 0.004) was found, indicating that the studied loci were most probably not linked. Deviations from Hardy–Weinberg equilibrium were significant (P < 0.001) for only one (La24) of the 14 loci tested after Bonferroni correction (Table 1).

The average number of alleles (NA) per locus was 17.6, ranging from 8 in locus La23 to 27 in loci La19 and 21. The set of diploid female samples was characterized by the highly observed heterozygosity (mean HO = 0.83) ranging from 0.75 in locus La24 to 0.92 in locus La8 in addition to the highly expected heterozygosity (mean HE = 0.85) ranging from 0.70 in locus La23 to 0.90 in loci La8 and 29. The mean polymorphic information content (PIC) ranged from 0.65 in locus La23 to 0.89 in loci La8 and 29 (Table 1).

Discussion

In this study, we successfully describe new loci and optimization of a multiplex assay for L. argiolus, the highly specialized parasitoid of social paper wasps. PacBio RSII platform sequencing allowed for reliable discovery of microsatellite markers. This method has been proven to perform very well in de novo microsatellite description and was recently often applied28,29,30. Application of this method is especially useful in non-model organisms in which no information about genomic data exists. In such cases, it also outperforms other methods by being more efficient and accurate30,31. This method is also relatively cost and time efficient compared to the classical methods of microsatellite development9,28. Out of 11,355 DNA sequences, approximately 1.7% contained tri- or tetra-nucleotide motifs exceeding ten repeats. Tri- and tetra-nucleotide loci were selected as they were proven to cause less difficulties in scoring and rounding32. This finding is important, especially since insects are considered problematic for microsatellite isolation and genotyping and are often biased with particularly high frequencies of null alleles9,33,34. Tri- and tetra-nucleotide loci have been suggested to be less frequent than di-nucleotide in the genome; however, they are less prone to amplifications errors, especially stuttering29. For this reason, primers and in silico amplifications were very carefully chosen. Sequences with high repeat numbers were selected as it was proven that such loci have higher polymorphism due to higher mutation rate caused by polymerase slippage35.

The amplification success of the designed markers was very high (97.6%), and most of the loci amplified well, which points to the high quality of the RSII sequencing and well-defined parameters for primers design28,29,30,31. The relatively long reads (on average, exceeding 2 kb) facilitated the design of primers that enabled high PCR performance28.

The approach of applying universal primer labelling20 has been reported to be very efficient and cost effective in other studies21,32,36. In our current study, it was also proven to be a very useful and suitable method for primer labelling. The selected markers amplified well, and in most cases, produced (78%) clear bands and in 60% produced single peaks that permitted confident reading. Almost all of the tested markers were polymorphic, which may have been the consequence of selecting sequences with more than ten repeats. The rate of polymorphic loci in conventional methods of de novo microsatellite isolation may be as low as 30%30,35. This finding indicates that appropriate microsatellite sequences were selected for amplification tests in single specimens. This finding may also point to a high intrapopulation polymorphism of the studied species. However, a trade-off between high variability and possible artefacts may exist. The extremely polymorphic markers may be biased with high null allele frequency as a consequence of high a mutation rate in these loci9,27,33. This bias could have occurred in the LA11 locus. For multiplex optimization, direct fluorescent labelling was applied to reduce primer-dimer formation and enhance the PCR efficiency32, which was 100% in the studied sample set.

Our data show that the 14 microsatellite loci selected for assay development are reliable markers for genetic analyses of L. argiolus individuals with a good quality of detection, high polymorphism, and low frequency of null alleles. It is true that one of the loci was not in HWE, but it should be emphasized that the HWE test was performed for pooled diploid individuals that came from different populations resulting most probably in Wahlund effect. We believe that the presented tool will be very useful in L. argiolus studies, especially in the context of its life history strategies after considering co-evolution of this haplodiploid parasitoid with its social host. Because oviposition decisions of female parasitoids primarily influence their fitness, we hypothesize that female of L. argiolus lays more eggs in larger nests of the paper wasp and exhibits adaptive sex ratio manipulation in response to the number of her offspring produced in a single nest of the host. The previously mentioned hypotheses could be now addressed by applying the developed marker set that would allow determination of kinship and could also be a very good tool for the rapid identification of sex of preimaginal stages of L. argiolus. We believe that application of microsatellite markers will yield many answers concerning the biology and genetic structure of this specialized parasitoid. Additionally, the deposited sequence data may be subject to further microsatellite searches and testing in other closely related species of this interesting group of parasitoids.