First microsatellite markers for the European Robin (Erithacus rubecula) and their application in analysis of parentage and genetic diversity

The European Robin is a small passerine bird associated with woodlands of Eurasia and North Africa. Despite being relatively widespread and common, little is known of the species’ breeding biology and genetic diversity. We used Next Generation Sequencing (NGS) to develop and characterize microsatellite markers for the European Robin, designing three multiplex panels to amplify 14 microsatellite loci. The level of polymorphism and its value for assessing parentage and genetic structure was estimated based on 119 individuals, including seven full families and 69 unrelated individuals form Poland’s Białowieża Primaeval Forest and an additional location in Portugal. All markers appeared to be highly variable. Analysis at the family level confirmed a Mendelian manner of inheritance in the investigated loci. Genetic data also revealed evidence for extra-pair paternity in one family. The set of markers that we developed are proven to be valuable for analysis of the breeding biology and population genetics of the European Robin.

The European Robin Erithacus rubecula, a small (c.19 g) passerine bird, is a common and widespread species of wooded habitats in Europe, parts of western Asia and North Africa 1 . Although it is a familiar and moderately well-known species in Western Europe, there is little information about the species' breeding biology across much of its range, and studies of its population genetics are mostly lacking [2][3][4][5][6][7][8][9] . This absence of genetic studies is probably related to the European Robin's elusiveness during the breeding season, when nests are notoriously difficult to locate 1 .
Most of the available fragmentary data on the breeding biology of European Robins in Britain comes from highly transformed environments, such as gardens and secondary woodlands 4 . However, these studies may not be representative of the European Robin's ecology in more natural forests elsewhere in its range. This study bias and difficultly of observing breeding European Robins means that there is no information characterising the species' typical patterns of behaviour and breeding biology in primaeval forest habitats, to which it was originally adapted 10 .
The European Robin is considered to be socially monogamous and highly territorial in the breeding season, when pairs may have up to three consecutive breeding attempts between spring and early summer. Nests are typically located in recesses or cavities on standing or fallen trees within several meters of the ground, in banks or walls, or under overhanging vegetation on the ground itself; nests generally contain 4-5 eggs that are incubated for around 15 days before hatching altricial nestlings that leave after a further 15 days, becoming independent around 10 days later 1,4,5,11,12 . Despite this basic knowledge there is no information on genetic relatedness or extra-pair paternity within or between broods and their parents.
Sample collection and DNA extraction. Samples of genetic material from European Robins were collected in two widely separated areas of Europe, in order to isolate microsatellite markers and investigate relatedness. In north-east Poland, birds were sampled between 2018 and 2019 in the Strict Reserve of Białowieża National Park (52° 46′ 0″ N, 23° 52′ 0″ E) ( Fig. 1.), which is characterized by features typical of pristine forest, namely a multi-storey profile of tree stands, abundant dead wood with many uprooted trees, tall tree heights and a diverse tree community of deciduous and coniferous species 10 . Genetic material was collect on three sample plots (about 30 ha each) at representative locations in the forest.
European Robins are summer migrants to the Białowieża forest, arriving in early spring in order to breed, after wintering in southern or western Europe. During the breeding season (late March to July), 95 individuals from the Białowieża forest were sampled for genetic analysis. Adult birds (N = 19: 11 males, 8 females) were caught using mist-nets (fine sheets of netting suspended between poles) and a recording of songs and calls as a lure. Further samples of a single chick (8-10 days old) from each of 28 nests were also collected. These combined samples enabled genetic analysis for 47 presumably unrelated birds. Additionally, we collected samples from seven 'full' families (social father and mother plus all the nestlings) to verify Mendelian inheritance of the isolated microsatellites. These seven 'full families' included 14 of the 19 adult birds reported above. The material for genetic analysis was collected during standard capture and marking (with numbered metal leg-rings) procedures for adult birds, and during nest inspections for nestlings, and involved plucking a single tail feather from each individual. Samples were stored in Eppendorf tubes filled with 96% pure ethyl alcohol.In Portugal, where the breeding population of European Robins is resident all year round 31 1.). Fradelos, Azurara and Arvore consist of mixed plantations of Pinus pinaster and Eucalyptus globulus with low management and an understory composed mostly of small native shrubs. Larça is a secondary native forest (tree height usually lower than 10 m) and a dense and diverse understory. These four locations are integrated in a landscape mainly composed by small forest patches, intertwined with agriculture and urbanized areas. Finally, Tapada de Mafra is a fenced property of more than 800 ha, managed for eco-tourism and big-game hunting with a mosaic of open areas and diverse forests dominated by Quercus spp. The birds were captured with mist nests placed along dirt tracks or with spring traps baited with mealworms (Tenebrio molitor). A blood sample was collected from each bird by puncturing the brachial vein near the joint between the humerus and ulna with a sterile needle (27G), following standard methods 32 . Approximately 75 µl of blood was collected with a capillary tube and then transferred to a 2 ml tube filled with 98% ethanol, which was stored at 4 °C.
For the feather samples, the main source of DNA is cells attached to the base of the calamus (quill) when feathers are in the growth phase 16,24 , or blood clots within the rachis of fully grown feathers 16,33 . Hence, approximately 0.5 cm of the calamus was extracted and cut into smaller pieces using a sterile scalpel on sterile Petri dish. Cut parts of the calamus were then incubated with a Proteinase K overnight at 56 °C.
For samples from Portugal, blood clots were removed from the alcohol and put into the buffer LT (EURx Polska) and incubated with a Proteinase K overnight at 56 °C. We then used a Tissue DNA Purification Kit (EURx Polska) to extract DNA from the calamus and blood following standard protocols. Each extract was analysed using Nanodrop (Thermo Fisher Scientific) to check the purity and concentration of DNA. The extracted DNA was stored at -20 °C.
Microsatellites typing with Pacific Biosciences RSII. An amount of 7.4 µg of genomic DNA consisting of four equally (20 µl) mixed DNA samples was used for a 2 kb library construction. DNA was fragmented, using Covaris System according to protocol, into sizes between 1.5 and 3 kb, applying the following modifications: intensity was set to 0.2, and time of sonication to 500 s. Fragmented DNA was purified using AMPure PB Beads applying 0.6 × volume ratio.
Subsequently, 750 ng of shared and concentrated DNA was used for library preparation with SMRTbell Template Prep Kit 1.0, applying the online Pacific Biosciences protocol (PacBio). This procedure consisted of: 1. DNA damage and ends repair, 2. blunt ligation of hairpin adaptors at both ends of the DNA fragments, and 3. removal of failed ligation products with ExoIII and ExoVII enzymes. To separate the enzymatic reactions, DNA was purified with 0.6 × volume ratio of AMPure PB Beads. Size distribution of DNA fragments and its integrity www.nature.com/scientificreports/ was monitored on 0.5% agarose gels during and after completing the procedure of library preparation. The final library was double purified with 0.6 × volume ratio of AMPure PB Beads. Binding Calculator v.2.3.1.1 (PacBio) was applied for protocol generation in order to prepare the library for sequencing applying the DNA/Polymerase Binding Kit P6 V2 and MagBeads reagents. Sequencing was performed on a PacBio RS II sequencer running three SMRT Cells in the 360-min data collection mode. Data generated through the PacBio sequencing platform was analysed to obtain Reads of Insert (ROI) according to the RS_ReadsofInsert.1 protocol applying default settings. The ROI reads were further analysed with the msatcommander v.1.08 34 program in order to identify microsatellite sequences and design primers. A threshold of at least five repetitions for tri-and tetra-nucleotide repeats, excluding mononucleotide repeats, was set in this analysis. Primer design parameters including the msatcommander option "combine loci" were as follows: length of the primers 18-22 bp, annealing temperature (Tm) 58 °C-62 °C, GC content 30%-70%, and amplicon product size 80-400 bp with minimum of 20-30 bp DNA sequence flanking from both sides of the tandem repeat of the microsatellite.
Microsatellite marker selection and testing. Selection of microsatellite markers was performed in several steps. Firstly, based on information obtained for the set of microsatellite sequences, identified in the genome of the European Robin from the above analyses, loci with the highest number of tandem repeats were selected (total number of 281 sequences). In the second step, loci were filtered based on criteria calculated in silico: (1) maximal PCR efficiency, (2) minimal penalty score, (3) no dimer formation, (4) a maximal difference in melting temperature of 2 °C between a primer pair, and (5) sequence length of the microsatellite ranging from 100 to 350 bp. This gave 40 selected microsatellite loci, which were then divided based on product size into three groups: (1) short < 200 bp, (2) medium 200-300 bp, and (3) long > 300 bp. The reason for this division was optimization of designed multiplex PCR.
Finally, for further testing, we selected 11 short loci, 14 medium loci and 15 long loci. These loci were initially tested on 17 unrelated individuals following the method of Schuelke (2000) 35  Analysis of the amplification of the microsatellite loci in different combinations of multiplex sets was verified using ABI 3500XL Genetic Analyzer. As a size standard we used GeneScan 600 LIZ (Applied Biosystems). The genotyping was performed using a mixture of formamide with LIZ-600 and PCR product, which was denatured for 5 min. at 95 °C. Genotypes were then identified using GENEMAPPER v.4.1.
Statistical analysis. All statistical analyses were performed on genotypes of 69 presumably unrelated individuals (45 from Poland and 24 from Portugal). For each locus we assessed the number of alleles (A), unbiased expected heterozygosity (H E ) and observed heterozygosity (H O ) using GenAlEx version 6.5 37 and Hardy-Weinberg equilibrium (HWE), and also Genepop on the Web version 4 38,39 . Additionally, the fixation index (F IS ), allelic richness and F ST were calculated using FSTAT version 2.9.3 40 .
Parameters of usefulness of the selected microsatellites in paternity verification and population genetics studies were also estimated, using Cervus 3.0 41 . Specifically, we calculated the PIC index (polymorphic information content), F(null) (null allele frequency), NE-1P (average non-exclusion probability for the first parent), NE-2P (average non-exclusion probability for the second parent), NE-PP (average non-exclusion probability for a candidate parent pair), NE-I (average non-exclusion probability for identity of two unrelated individuals) and NE-SI (average non-exclusion probability for identity of two siblings). Probability of identity (PI) and probability of identity for siblings (PI Sibs ) were estimated using GenAlEx version 6.5 37 . www.nature.com/scientificreports/ The final sequence list consisted of 62,872 DNA sequences that were subjected to microsatellite loci identification. Altogether 1932 microsatellite loci covering tri-or tetra-nucleotide repeats in at least five repetitions were identified. For the a priori chosen parameters of microsatellite selection for amplification testing, primer pairs for 961 loci were successfully designed. From this set, 18 tri-nucleotide and 22 tetra-nucleotide markers (named ER1-ER40), with the number of repeats ranging between 37 and 10, were further chosen for amplification and polymorphism description (Supplementary Table S1) and then deposited in GenBank under accession numbers SRX10929063-SRX10929102.

RSII
Agarose gel electrophoresis indicated that 36 out of 40 tested primer pairs (90%) were successfully amplified. The amplification was evaluated as successful if the length of the band was similar to the size of alleles indicated by mstacommander. Genotyping using ABI 3500XL Genetic Analyzer (Applied Biosystems) showed that 23 primer pairs amplified microsatellite fragments, meaning that the peaks on chromatograms were sharp and the score of the peaks was similar to the expected product size. Finally, 17 primer pairs were selected for designing multiplex PCR. In the case of three primer pairs, amplification failed in multiplex reaction, despite applying several combinations. Hence, we finally designed three multiplex panels ( Table 1), consisting of 14 primer pairs, as follows: Mix 1 (ER24, ER40, ER28, ER21, ER38, ER15), Mix 2 (ER32, ER7, ER4, ER5) and Mix 3 (ER39, ER13, ER25, ER31).
We failed to amplify loci ER25 in three samples, but the remaining 13 loci were amplified in all 119 samples. The resulting characterization of microsatellite polymorphism in 69 unrelated individuals of the European Robin indicated that the number of alleles at a single locus ranged from nine (locus ER32) to 26 (locus ER31).
The highest observed and expected values of heterozygosity were found in loci ER21, ER15, ER39, ER13 (H O = 0.913;) and ER31 (H E = 0.935). The lowest value of heterozygosity was observed in locus ER24 (H O = 0,594; H E = 0.696) ( Table 1). Two loci (ER24 and ER31) significantly deviated from HWE (Table 1). Accordingly, in these loci, F IS was significantly greater than zero, suggesting heterozygote deficiency. Overall, the observed heterozygosity was high (H O = 0.805) and significantly lower than the expected heterozygosity (H E = 0.849, P < 0.001). The estimated frequency of null alleles [F(null)] was very low, reaching 10% only in two cases (Table 1).    Genetic differentiation, assessed as F ST , was low between both regions (F ST = 0.015) but this was statistically significant (P < 0.05).
Parameters interlinked with the usefulness of identified loci in the parentage and relatedness analysis indicated that the described microsatellite panels can be successfully used in such studies. The polymorphic information content was high in all loci (PIC > 0.5; Table 2), and, accordingly, the analysis of 69 unrelated individuals showed a very low non-exclusion probability for the combination of 14 loci. The probability of identity and probability of sibling identity was the lowest in locus ER31 (PI = 0.01; PI Sibs = 0.28) and the highest were in loci ER24 (PI = 0.11; PI Sibs = 0.43) and ER32 (PI = 0.11; PI Sibs = 0.41). The respective values for the combination of 14 loci were also very low (Table 2).
Paternity analysis in the seven families (with known social parents) indicated extra-pair paternity in only one case (Table 3, family WE8). In two of the nestlings, we didn't find the putative father's alleles in ten out of 14 investigated loci. Moreover, analysis of full families confirmed that all microsatellite loci had Mendelian inheritance (Table 3).

Discussion
The PacBio RSII sequencing enabled microsatellite markers for European Robins to be discovered in a sufficient number, and the restricted selection using narrow criteria enhanced the possibility of choosing high quality markers for population analysis. This method is often chosen for de novo microsatellite characterization, and is recognised as being reliable and efficient [42][43][44] . It is a method of choice in cases of non-model organisms with no available genomic data, as well as allowing the obtaining of long reads (here ca. 2 kb) that cover the whole microsatellite sequence together with a large flanking region. This method, coupled with a universal primer labelling system 35 , was proven to be highly cost effective and efficient when compared to the traditional approaches of microsatellite description 42,45 , or testing via individual fluorescent labelling of all of the tested primers 36,46 .
In our study, we selected 14 microsatellite loci amplified in three multiplex reactions, which are useful for further analysis of the breeding biology and population genetics of the European Robin. The criteria of loci selection concentrated on tri-and tetra-nucleotide microsatellites, as they were reported to be less biased with amplification errors, especially stuttering, although they were less polymorphic and frequent than di-nucleotides. However, such an approach as ours results in far fewer problems in reading, and also later coding of alleles 43,46 .
The results of the analysis of nearly 70 unrelated birds from Poland and Portugal showed a high level of polymorphisms of the identified microsatellites, indicated by the values of allelic diversity and heterozygosity. Such a set of markers should allow adequate investigation of genetic diversity and population structure in the European Robin, complementing and advancing on the work of previous studies 22,23 . Indeed, we showed that the identified microsatellites amplified well in the two populations that were separated by a large distance across Europe (c. 2730 km). Comparison of birds from these two populations showed that resident birds from Portugal Table 2. Informativeness and exclusion probability of analyzed loci, estimated based on genotypes of 69 unrelated individuals of European Robin from Poland and Portugal. PIC-polymorphic information content, PI-probability of identity; PI Sibs -probability of identity for sibs; paternity non-exclusion probability (first second and pair parent, NE-1P, NE-2P and NE-PP respectively) and the combined non-exclusion probability of identity and sibling identity (NE-I and NE-SI) for each marker; overall-values for combination of 14 loci. www.nature.com/scientificreports/ have slightly lower genetic diversity (in terms of heterozygosity, as well as allelic diversity) than the migratory population from Poland. Higher genetic diversity of migratory populations compared to resident ones has also been confirmed in North American Golden-crowned Kinglets (Regulus satrapa) 47 .
We showed that genetic differentiation was low between the two populations, yet it was statistically significant. This supports previous studies that showed significant differentiation even within a limited geographical area, such as approximately 100 km 2 in one example 23 . However, several other studies have suggested that migratory birds have weakly marked genetic differentiation, even over a large area [48][49][50] . Due to limited geographical sampling so far, the issue of genetic differentiation among populations of European Robins should be investigated further. In particular, for such a widespread and generally abundant species, it would be informative to investigate the relationships and genetic diversity across the very large western Palearctic range, where some populations are highly migratory and others are sedentary or highly insular (e.g. island populations) 1 .
In the analysis of the genotypes of the seven full families of European Robins from Białowieża, we found strong evidence of extra-pair paternity in one nest. Although the sample size was low in this initial study (seven families, 48 nestlings), the result suggests a potentially low frequency of EPP in the study population (15% of sampled families, 4% of nestlings). However, this observation should be verified in a more extensive study, preferably by sampling a large number of representative families and populations across the species' range.
The rate of EPP that we found for European Robins was within the range of other European forest songbirds, such as the Wood Warbler Phylloscopus sibilatrix and Willow Warblers P. trochilus, which vary between zero and c. 50% of nests [51][52][53][54] . These rates are also similar to many other bird species 55 . Determining rates of EPP can be important for understanding strategies of breeding biology, settlement patterns and mate selection [56][57][58] .