The European Robin Erithacus rubecula, a small (c.19 g) passerine bird, is a common and widespread species of wooded habitats in Europe, parts of western Asia and North Africa1. Although it is a familiar and moderately well-known species in Western Europe, there is little information about the species’ breeding biology across much of its range, and studies of its population genetics are mostly lacking2,3,4,5,6,7,8,9. This absence of genetic studies is probably related to the European Robin’s elusiveness during the breeding season, when nests are notoriously difficult to locate1.

Most of the available fragmentary data on the breeding biology of European Robins in Britain comes from highly transformed environments, such as gardens and secondary woodlands4. However, these studies may not be representative of the European Robin’s ecology in more natural forests elsewhere in its range. This study bias and difficultly of observing breeding European Robins means that there is no information characterising the species’ typical patterns of behaviour and breeding biology in primaeval forest habitats, to which it was originally adapted10.

The European Robin is considered to be socially monogamous and highly territorial in the breeding season, when pairs may have up to three consecutive breeding attempts between spring and early summer. Nests are typically located in recesses or cavities on standing or fallen trees within several meters of the ground, in banks or walls, or under overhanging vegetation on the ground itself; nests generally contain 4–5 eggs that are incubated for around 15 days before hatching altricial nestlings that leave after a further 15 days, becoming independent around 10 days later1,4,5,11,12. Despite this basic knowledge there is no information on genetic relatedness or extra-pair paternity within or between broods and their parents.

Molecular methods have made an invaluable contribution to the study of the breeding biology and mating systems of birds13,14,15,16,17. These techniques have revealed that extra-partner mating is a much more frequent phenomenon among birds than was previously thought14,16,18, and extra-pair nestlings occur frequently even among socially monogamous species14. Such studies are important in advancing the understanding of female mate choice and reproductive fitness17,19,20. The benefits to females of adopting the extra-pair mating strategy are still debated, and there is no certain evidence that the advantages of such breeding activity outweigh the costs. Conversely, males obviously benefit from this behaviour through the increased reproductive success while engaging in the care of nestlings17.

Only three studies have applied molecular techniques to describe population genetics of the European Robin. The genetic structure of a resident population from the Canary Islands, which migrate short distances between breeding and wintering grounds, was characterized using polymorphism of mitochondrial DNA21. This study revealed that birds from Gran Canaria island should be considered as a distinct subspecies that is genetically separated from birds on continental Europe and the western Canary Islands. In contrast, analysis of population from the Azores indicated a lack of neutral genetic differences between group of birds from the archipelago and the group from mainland Portugal, suggesting a recent colonization of these islands22. Another resident population, from southern Italy, was investigated using cross-amplified microsatellites23, which suggested significant genetic differentiation between populations with long- and short-distance migratory behaviour.

The use of cross-amplified microsatellites in genetic studies is known to have several potential disadvantages, including particular markers being less polymorphic in target species than in the species of origin24,25,26. Also, diversification of the species may lead to mutations in sequences of flanking regions of some alleles, preventing the primer annealing during PCR27,28. This, in turn, may result in a high frequency of ‘null alleles’. Moreover, the success rate of amplification frequently decreases in proportion with the genetic distance between species29.

However, to date, there appears to have been no species-specific microsatellite markers described for the European Robin, including for the Central European population that represents the nominative and highly-migratory subspecies. Furthermore, this knowledge gap compounds the lack of data, supported by genetic information, for the diversity, ecology and biology of the European Robin’s populations in natural forest ecosystems across Europe. Such studies from primary forest conditions are informative and highly valuable as a benchmark for understanding a species’ biology in modified ecosystems and more outlying populations in the species’ range.

In this study we provide the first genetic data for European Robins from the last significant expanse of European temperate primaeval forest, the Białowieża Forest in north-eastern Poland, and from a second novel location on the western margin of the species’ European range, in Portugal. Information from the Białowieża Forest provides a unique opportunity to extend our knowledge of the European Robin’s biology in the natural, unmodified forest that still retains an almost complete suite of flora and fauna30. Such data provides an important reference system for further studies of evolutionary and ecological processes in this forest, in addition to those occurring in managed forests under strong anthropogenic pressure.

The main aim of the study is to describe the species-specific microsatellite markers for the European Robin, as a resource for population genetics and breeding biology studies. We then apply these molecular markers to investigate the paternity of multiple broods and individuals of European Robins from the Białowieża Forest, and additional samples from individuals resident approximately 2730 km away in Portugal, to estimate genetic diversity of the species across this geographical span.

Materials and methods

This study was carried out in compliance with the ARRIVE guidelines. All methods were performed in accordance with the relevant guidelines and regulations. The methods used for animal care and handling were approved by 2nd Ethical Committee for Animal Experimentation at Warsaw University of Life Sciences, Poland (decision number: WAW2/93/2017, dated on 15.12.2017) and Portuguese Institute for Conservation, Nature and Forests (ICFN, decision number: 347/2020/CAPT).

Sample collection and DNA extraction

Samples of genetic material from European Robins were collected in two widely separated areas of Europe, in order to isolate microsatellite markers and investigate relatedness. In north-east Poland, birds were sampled between 2018 and 2019 in the Strict Reserve of Białowieża National Park (52° 46′ 0″ N, 23° 52′ 0″ E) (Fig. 1.), which is characterized by features typical of pristine forest, namely a multi-storey profile of tree stands, abundant dead wood with many uprooted trees, tall tree heights and a diverse tree community of deciduous and coniferous species10. Genetic material was collect on three sample plots (about 30 ha each) at representative locations in the forest.

Figure 1
figure 1

Study area in Poland and Portugal (A). Figure (B) shows five places where robins were caught to collect samples.

European Robins are summer migrants to the Białowieża forest, arriving in early spring in order to breed, after wintering in southern or western Europe. During the breeding season (late March to July), 95 individuals from the Białowieża forest were sampled for genetic analysis. Adult birds (N = 19: 11 males, 8 females) were caught using mist-nets (fine sheets of netting suspended between poles) and a recording of songs and calls as a lure. Further samples of a single chick (8–10 days old) from each of 28 nests were also collected. These combined samples enabled genetic analysis for 47 presumably unrelated birds. Additionally, we collected samples from seven ‘full’ families (social father and mother plus all the nestlings) to verify Mendelian inheritance of the isolated microsatellites. These seven ‘full families’ included 14 of the 19 adult birds reported above. The material for genetic analysis was collected during standard capture and marking (with numbered metal leg-rings) procedures for adult birds, and during nest inspections for nestlings, and involved plucking a single tail feather from each individual. Samples were stored in Eppendorf tubes filled with 96% pure ethyl alcohol.In Portugal, where the breeding population of European Robins is resident all year round31, 24 samples of adult birds were collected from between April and June 2020 at five locations: (A) Fradelos (41° 21′ 58″ N, 8° 35′ 55.″ W); (B) Azurara (41° 21′ 02″ N, 8° 43′ 50″ W); (C) Arvore (41° 19′ 45″ N, 8° 43′ 59″ W); (D) Larça (40° 18′ 46″ N, 8° 24′ 13″ W); and (E) Tapada de Mafra (38° 57′ 40″ N, 9° 17′ 39″ W) (Fig. 1.).

Fradelos, Azurara and Arvore consist of mixed plantations of Pinus pinaster and Eucalyptus globulus with low management and an understory composed mostly of small native shrubs. Larça is a secondary native forest (tree height usually lower than 10 m) and a dense and diverse understory. These four locations are integrated in a landscape mainly composed by small forest patches, intertwined with agriculture and urbanized areas. Finally, Tapada de Mafra is a fenced property of more than 800 ha, managed for eco-tourism and big-game hunting with a mosaic of open areas and diverse forests dominated by Quercus spp. The birds were captured with mist nests placed along dirt tracks or with spring traps baited with mealworms (Tenebrio molitor). A blood sample was collected from each bird by puncturing the brachial vein near the joint between the humerus and ulna with a sterile needle (27G), following standard methods32. Approximately 75 µl of blood was collected with a capillary tube and then transferred to a 2 ml tube filled with 98% ethanol, which was stored at 4 °C.

For the feather samples, the main source of DNA is cells attached to the base of the calamus (quill) when feathers are in the growth phase16,24, or blood clots within the rachis of fully grown feathers16,33. Hence, approximately 0.5 cm of the calamus was extracted and cut into smaller pieces using a sterile scalpel on sterile Petri dish. Cut parts of the calamus were then incubated with a Proteinase K overnight at 56 °C.

For samples from Portugal, blood clots were removed from the alcohol and put into the buffer LT (EURx Polska) and incubated with a Proteinase K overnight at 56 °C. We then used a Tissue DNA Purification Kit (EURx Polska) to extract DNA from the calamus and blood following standard protocols. Each extract was analysed using Nanodrop (Thermo Fisher Scientific) to check the purity and concentration of DNA. The extracted DNA was stored at -20 °C.

Microsatellites typing with Pacific Biosciences RSII

An amount of 7.4 µg of genomic DNA consisting of four equally (20 µl) mixed DNA samples was used for a 2 kb library construction. DNA was fragmented, using Covaris System according to protocol, into sizes between 1.5 and 3 kb, applying the following modifications: intensity was set to 0.2, and time of sonication to 500 s. Fragmented DNA was purified using AMPure PB Beads applying 0.6 × volume ratio.

Subsequently, 750 ng of shared and concentrated DNA was used for library preparation with SMRTbell Template Prep Kit 1.0, applying the online Pacific Biosciences protocol (PacBio). This procedure consisted of: 1. DNA damage and ends repair, 2. blunt ligation of hairpin adaptors at both ends of the DNA fragments, and 3. removal of failed ligation products with ExoIII and ExoVII enzymes. To separate the enzymatic reactions, DNA was purified with 0.6 × volume ratio of AMPure PB Beads. Size distribution of DNA fragments and its integrity was monitored on 0.5% agarose gels during and after completing the procedure of library preparation. The final library was double purified with 0.6 × volume ratio of AMPure PB Beads. Binding Calculator v. (PacBio) was applied for protocol generation in order to prepare the library for sequencing applying the DNA/Polymerase Binding Kit P6 V2 and MagBeads reagents. Sequencing was performed on a PacBio RS II sequencer running three SMRT Cells in the 360-min data collection mode.

Data generated through the PacBio sequencing platform was analysed to obtain Reads of Insert (ROI) according to the RS_ReadsofInsert.1 protocol applying default settings. The ROI reads were further analysed with the msatcommander v.1.0834 program in order to identify microsatellite sequences and design primers. A threshold of at least five repetitions for tri- and tetra-nucleotide repeats, excluding mononucleotide repeats, was set in this analysis. Primer design parameters including the msatcommander option “combine loci” were as follows: length of the primers 18–22 bp, annealing temperature (Tm) 58 °C–62 °C, GC content 30%–70%, and amplicon product size 80–400 bp with minimum of 20–30 bp DNA sequence flanking from both sides of the tandem repeat of the microsatellite.

Microsatellite marker selection and testing

Selection of microsatellite markers was performed in several steps. Firstly, based on information obtained for the set of microsatellite sequences, identified in the genome of the European Robin from the above analyses, loci with the highest number of tandem repeats were selected (total number of 281 sequences). In the second step, loci were filtered based on criteria calculated in silico: (1) maximal PCR efficiency, (2) minimal penalty score, (3) no dimer formation, (4) a maximal difference in melting temperature of 2 °C between a primer pair, and (5) sequence length of the microsatellite ranging from 100 to 350 bp. This gave 40 selected microsatellite loci, which were then divided based on product size into three groups: (1) short < 200 bp, (2) medium 200–300 bp, and (3) long > 300 bp. The reason for this division was optimization of designed multiplex PCR.

Finally, for further testing, we selected 11 short loci, 14 medium loci and 15 long loci. These loci were initially tested on 17 unrelated individuals following the method of Schuelke (2000)35 and Austin et al. (2011)36. Briefly, the forward primers were tailed by a 5′-CACGACGTTGTAAAACGAC sequence tag, which was identical to four fluorescent labelled dyes: 6-Fam, Hex, Tamra and Rox. The PCR reaction mix contained: 1 µl of DNA, 7.5 µl of 2 × Dream Taq Hot Start Green PCR Master Mix (Thermo Fisher Scientific), 0.02 M tagged forward locus specific primer, 0.2 M locus specific reverse primer and 0.2 M fluorescent labelled universal forward primers. The reaction was performed under the following conditions: initial denaturation for 15 min at 95 °C, 35 cycles of 30 s at 95 °C, 50 s at 57 °C, 45 s at 72 °C and final elongation of 5 min at 72 °C. Microsatellite loci were amplified separately and visualised on 1.5% agarose gel. The genotyping analyses was performed using ABI 3500XL Genetic Analyzer (Applied Biosystems) and the PCR product sizes were read using GENEMAPPER v.4.1 software (Applied Biosystems). Based on the results obtained, 17 microsatellite loci were selected for further analysis and design of multiplex panels. These loci were characterized by a lack of amplification artefacts, a high number of alleles and no more than two peaks per locus on chromatograms.

The multiplex PCR reaction mix contained: 1 µl of DNA extract (containing 47–283 ng of DNA in the case of young birds) or 3 µl of DNA extract (containing 1–78 ng of DNA in the case of adult birds), 7.5 µl of PCR Master Mix (QIAGEN), 0.6 µl of the primer mix (forward and reverse for each locus; concentration: 0.85–1 µM) and 4.65 µl or 2.65 µl water for PCR depending on the amount of DNA extract used. This time forward primers were fluorescently labelled on their 5′ end with one of the following dyes: 6-FAM, TAMRA, HEX and ROX (Sigma-Aldrich). The amplification conditions were as follows: initial denaturation for 15 min at 95 °C, 40 cycles of 30 s at 94 °C, 90 s at 57 °C, 90 s at 72 °C and final elongation for 10 min at 72 °C.

Analysis of the amplification of the microsatellite loci in different combinations of multiplex sets was verified using ABI 3500XL Genetic Analyzer. As a size standard we used GeneScan 600 LIZ (Applied Biosystems). The genotyping was performed using a mixture of formamide with LIZ-600 and PCR product, which was denatured for 5 min. at 95 °C. Genotypes were then identified using GENEMAPPER v.4.1.

Statistical analysis

All statistical analyses were performed on genotypes of 69 presumably unrelated individuals (45 from Poland and 24 from Portugal). For each locus we assessed the number of alleles (A), unbiased expected heterozygosity (HE) and observed heterozygosity (HO) using GenAlEx version 6.537 and Hardy–Weinberg equilibrium (HWE), and also Genepop on the Web version 438,39. Additionally, the fixation index (FIS), allelic richness and FST were calculated using FSTAT version 2.9.340.

Parameters of usefulness of the selected microsatellites in paternity verification and population genetics studies were also estimated, using Cervus 3.041. Specifically, we calculated the PIC index (polymorphic information content), F(null) (null allele frequency), NE-1P (average non-exclusion probability for the first parent), NE-2P (average non-exclusion probability for the second parent), NE-PP (average non-exclusion probability for a candidate parent pair), NE-I (average non-exclusion probability for identity of two unrelated individuals) and NE-SI (average non-exclusion probability for identity of two siblings). Probability of identity (PI) and probability of identity for siblings (PISibs) were estimated using GenAlEx version 6.537.


RSII sequencing of the SMRT library generated 4,330 Mb of raw data covering 192,298 reads for the three SMRT Cells analysed jointly. The average length of inserts in these runs equalled 1,959, 1,873 and 2,997 bp, respectively (with quality ranging between 0.91 and 0.96). The final output length of reads reached 3,800 Mb. Of these sequence sets, 65,162 reads of insert (ROI) were acquired and accumulated with 87,123,089 bp. The average ROI length equalled 1,337 bp and the average number of passes exceeded 16.

The final sequence list consisted of 62,872 DNA sequences that were subjected to microsatellite loci identification. Altogether 1932 microsatellite loci covering tri- or tetra-nucleotide repeats in at least five repetitions were identified. For the a priori chosen parameters of microsatellite selection for amplification testing, primer pairs for 961 loci were successfully designed. From this set, 18 tri-nucleotide and 22 tetra-nucleotide markers (named ER1-ER40), with the number of repeats ranging between 37 and 10, were further chosen for amplification and polymorphism description (Supplementary Table S1) and then deposited in GenBank under accession numbers SRX10929063-SRX10929102.

Agarose gel electrophoresis indicated that 36 out of 40 tested primer pairs (90%) were successfully amplified. The amplification was evaluated as successful if the length of the band was similar to the size of alleles indicated by mstacommander. Genotyping using ABI 3500XL Genetic Analyzer (Applied Biosystems) showed that 23 primer pairs amplified microsatellite fragments, meaning that the peaks on chromatograms were sharp and the score of the peaks was similar to the expected product size. Finally, 17 primer pairs were selected for designing multiplex PCR. In the case of three primer pairs, amplification failed in multiplex reaction, despite applying several combinations. Hence, we finally designed three multiplex panels (Table 1), consisting of 14 primer pairs, as follows: Mix 1 (ER24, ER40, ER28, ER21, ER38, ER15), Mix 2 (ER32, ER7, ER4, ER5) and Mix 3 (ER39, ER13, ER25, ER31).

Table 1 The characterization of three multiplex PCR sets for the amplification of 14 microsatellite loci in 69 unrelated samples of European Robin.

We failed to amplify loci ER25 in three samples, but the remaining 13 loci were amplified in all 119 samples. The resulting characterization of microsatellite polymorphism in 69 unrelated individuals of the European Robin indicated that the number of alleles at a single locus ranged from nine (locus ER32) to 26 (locus ER31).

The highest observed and expected values of heterozygosity were found in loci ER21, ER15, ER39, ER13 (HO = 0.913;) and ER31 (HE = 0.935). The lowest value of heterozygosity was observed in locus ER24 (HO = 0,594; HE = 0.696) (Table 1). Two loci (ER24 and ER31) significantly deviated from HWE (Table 1). Accordingly, in these loci, FIS was significantly greater than zero, suggesting heterozygote deficiency. Overall, the observed heterozygosity was high (HO = 0.805) and significantly lower than the expected heterozygosity (HE = 0.849, P < 0.001). The estimated frequency of null alleles [F(null)] was very low, reaching 10% only in two cases (Table 1).

Considering the two separate locations, analysis of the 45 individuals from the population from Poland also indicated significant deviation from Hardy–Weinberg Equilibrium due to heterozygote deficiency (HO = 0.819; HE = 0.846, P < 0.002), although FIS was low and non-significant after Bonferroni correction (FIS = 0.04; adjusted P-value = 0.004, 280 permutations). Similarly, in the group of samples from Portugal the observed heterozygosity was lower than expected (HO = 0.780; HE = 0.820, P = 0.063), with an associated low value of FIS (FIS = 0.07). The mean number of alleles was higher in Poland (mean A = 13.64) than in Portugal (mean A = 10.71), although allelic richness was similar in both regions (Portugal R = 10.71; Poland R = 11.60).

Genetic differentiation, assessed as FST, was low between both regions (FST = 0.015) but this was statistically significant (P < 0.05).

Parameters interlinked with the usefulness of identified loci in the parentage and relatedness analysis indicated that the described microsatellite panels can be successfully used in such studies. The polymorphic information content was high in all loci (PIC > 0.5; Table 2), and, accordingly, the analysis of 69 unrelated individuals showed a very low non-exclusion probability for the combination of 14 loci. The probability of identity and probability of sibling identity was the lowest in locus ER31 (PI = 0.01; PISibs = 0.28) and the highest were in loci ER24 (PI = 0.11; PISibs = 0.43) and ER32 (PI = 0.11; PISibs = 0.41). The respective values for the combination of 14 loci were also very low (Table 2).

Table 2 Informativeness and exclusion probability of analyzed loci, estimated based on genotypes of 69 unrelated individuals of European Robin from Poland and Portugal.

Paternity analysis in the seven families (with known social parents) indicated extra-pair paternity in only one case (Table 3, family WE8). In two of the nestlings, we didn’t find the putative father’s alleles in ten out of 14 investigated loci. Moreover, analysis of full families confirmed that all microsatellite loci had Mendelian inheritance (Table 3).

Table 3 Microsatellite genotypes of members of three full families of the European Robin.


The PacBio RSII sequencing enabled microsatellite markers for European Robins to be discovered in a sufficient number, and the restricted selection using narrow criteria enhanced the possibility of choosing high quality markers for population analysis. This method is often chosen for de novo microsatellite characterization, and is recognised as being reliable and efficient42,43,44. It is a method of choice in cases of non-model organisms with no available genomic data, as well as allowing the obtaining of long reads (here ca. 2 kb) that cover the whole microsatellite sequence together with a large flanking region. This method, coupled with a universal primer labelling system35, was proven to be highly cost effective and efficient when compared to the traditional approaches of microsatellite description42,45, or testing via individual fluorescent labelling of all of the tested primers36,46.

In our study, we selected 14 microsatellite loci amplified in three multiplex reactions, which are useful for further analysis of the breeding biology and population genetics of the European Robin. The criteria of loci selection concentrated on tri- and tetra-nucleotide microsatellites, as they were reported to be less biased with amplification errors, especially stuttering, although they were less polymorphic and frequent than di-nucleotides. However, such an approach as ours results in far fewer problems in reading, and also later coding of alleles43,46.

The results of the analysis of nearly 70 unrelated birds from Poland and Portugal showed a high level of polymorphisms of the identified microsatellites, indicated by the values of allelic diversity and heterozygosity. Such a set of markers should allow adequate investigation of genetic diversity and population structure in the European Robin, complementing and advancing on the work of previous studies22,23. Indeed, we showed that the identified microsatellites amplified well in the two populations that were separated by a large distance across Europe (c. 2730 km). Comparison of birds from these two populations showed that resident birds from Portugal have slightly lower genetic diversity (in terms of heterozygosity, as well as allelic diversity) than the migratory population from Poland.

Higher genetic diversity of migratory populations compared to resident ones has also been confirmed in North American Golden‑crowned Kinglets (Regulus satrapa)47.

We showed that genetic differentiation was low between the two populations, yet it was statistically significant. This supports previous studies that showed significant differentiation even within a limited geographical area, such as approximately 100 km2 in one example23. However, several other studies have suggested that migratory birds have weakly marked genetic differentiation, even over a large area48,49,50. Due to limited geographical sampling so far, the issue of genetic differentiation among populations of European Robins should be investigated further. In particular, for such a widespread and generally abundant species, it would be informative to investigate the relationships and genetic diversity across the very large western Palearctic range, where some populations are highly migratory and others are sedentary or highly insular (e.g. island populations)1.

In the analysis of the genotypes of the seven full families of European Robins from Białowieża, we found strong evidence of extra-pair paternity in one nest. Although the sample size was low in this initial study (seven families, 48 nestlings), the result suggests a potentially low frequency of EPP in the study population (15% of sampled families, 4% of nestlings). However, this observation should be verified in a more extensive study, preferably by sampling a large number of representative families and populations across the species’ range.

The rate of EPP that we found for European Robins was within the range of other European forest songbirds, such as the Wood Warbler Phylloscopus sibilatrix and Willow Warblers P. trochilus, which vary between zero and c. 50% of nests51,52,53,54. These rates are also similar to many other bird species55. Determining rates of EPP can be important for understanding strategies of breeding biology, settlement patterns and mate selection56,57,58.

In summary, our results are the first to provide the essential basis for similar ecological and genetic studies of European Robins, which is an iconic and culturally significant species in parts of Europe4, but has nevertheless been poorly studied across much of its range until the present time. We encourage further studies of the genetic diversity of European Robins across its broad geographical range, which may be valuable in understanding any differences between migratory and sedentary populations. Analysis of genetic diversity may also assist in the understanding of any differences between populations inhabiting the varying climatic spectrum in the European Robin’s range (Mediterranean to Boreal) and the wide variety of modified habitats compared to the primeval forest of Białowieża. Comparisons in the genetic variability between other species across this range of enviornments and behaviours would also be valuable.