Next generation sequencing-aided comprehensive geographic coverage sheds light on the status of rare and extinct populations of Aporia butterflies (Lepidoptera: Pieridae)

The Black-veined White Aporia crataegi (Linnaeus, 1758), a common and widespread butterfly ranging from northwestern Africa to Europe and Asia, has been extinct in Britain since the 1920s and is on a steady decline in several other parts of its range. In order to investigate genetic diversity within A. crataegi and its correspondence with current subspecies-level taxonomy, we barcoded 173 specimens from across its range including, for the first time, extinct populations from Britain and Korea. Using next generation sequencing we also obtained a sequence for Aporia joubini, a peculiar taxon from China known only by its type specimen collected in the early twentieth century. Our phylogenetic analysis placed A. joubini sister to A. oberthuri, although further taxon sampling may reveal a different scheme. Within A. crataegi, we observed a shallow and weak mitogenomic structure with only a few distinct lineages in North Africa, Sicily, Iran, and Japan. Eurasian populations, including those extinct in Britain and Korea, clustered into a large set of closely allied lineages, consistent with a recent expansion during the Late Pleistocene glacial period. This study highlights the importance of museum collections and the unique opportunities they provide in documenting species diversity and helping conservation efforts.


Materials and methods
taxon sampling. We examined 173 A. crataegi individuals representing 23 subspecies out of the 25 recognized by Della Bruna et al. 9 from sites across Eurasia and North Africa (Maghreb region) ( Fig. 2A and Supplementary Dataset File). We were unable to obtain sequences from ssps. fert (Rhodes Island, Greece) and sachalinensis (Sakhalin, Russia). We analysed ten British individuals: four collected between 1860 and 1870, before the extinction of the native populations, and six at the beginning of the twentieth century, supposedly re-introduced from the continent. In addition, we analysed four specimens from Iranian populations of A. leucodice, and the only known (type) specimen of A. joubini, collected in 1902 from China. DNA extraction. A single leg was removed from each specimen and sent to the Centre for Biodiversity Genomics in Guelph, Canada for DNA extraction, amplification, and sequencing. DNA was extracted from a single leg on a Biomek FX liquid handling robot using a semi-automated DNA extraction protocol 26 with glass fiber plates (PALL Acroprep 96 with 3 µm GF membrane over 0.2 µm Bioinert membrane). DNA was eluted in 30-40 µl of ddH20 pre-warmed to 56 °C and stored at − 80 °C.
next-generation and Sanger sequencing. Due to DNA degradation in the very old specimens of A.
joubini and the British A. crataegi, the 658 bp COI barcode region was recovered using an NGS-based protocol 27 . In brief, for each sample, multiple short, overlapping amplicons were generated using nested, multiplex PCR. In order to associate reads with their source specimen, the amplicons were tailed with sample-specific universal molecular identifiers (UMI) before being pooled for sequencing on an Ion Torrent PGM. The short sequence reads were attributed to a sample via the UMIs, and filtered for quality and length before being assembled into a single barcode sequence.
For the remaining (younger) specimens, PCR and Sanger sequencing were carried out following standard procedures for Lepidoptera 28,29 . Briefly, the 658 bp barcode region was amplified using the primers LepF1 (5′ ATT CAA CCA ATC ATA AAG ATA TTG G 3′) and LepR1 (5′ TAA ACT TCT GGA TGT CCA AAA AAT CA 3′) and sequenced on an ABI 3730XL (Applied Biosystems capillary sequencer). The trace files were edited using Codon-Code Aligner 6.0.2 (CodonCode Corporation, Dedham, Massachusetts) and all resulting mtDNA sequences were aligned using the same program. All sequences were submitted to GenBank (Accession Numbers in Supplementary Dataset File) and BOLD system repository (dataset name: DS-APORIA; https ://doi.org/10.5883/ DS-APORI A).
Data set compilation and tree reconstruction procedures. To 40 . While the latter is based on the haplotype distribution and it is expected to be strongly affected by recombination, R2 is based on mutations and likely to be conservative on recombining regions 41 . Moreover, Fu's simulations suggest that FS is a more sensitive indicator of population expansion than other parameters, therefore FS should be regarded as significant if P < 0.02 42 . Arlequin 3.5 43 and DnaSP 5.0 30 were used to compute FS (P < 0.02) and R2 (P < 0.05) respectively, and to test their statistical significance by simulating random samples (10,000 replicates) under the null hypothesis of selective neutrality and constant population size using coalescent algorithms (both modified from Hudson 44 ). Expected mismatched distributions and parameters of sudden expansion τ = 2 μ t (τ, sudden demographic expansion parameter; t, true time since population expansion; μ is the substitution rate per gene) were calculated using Arlequin 3.5 by a generalized least-squares approach 45 , under models of pure demographic expansion and spatial expansion 46,47 . The probability of the data under the given model was assessed by the goodness-of-fit test implemented in Arlequin 3.5. Parameter confidence limits were calculated in Arlequin 3.5 through a parametric bootstrap (1,000 simulated random samples).

Specimens collection statement.
We declare that all samples analysed in this study were collected complying with institutional, national, and international guidelines.

Results
Sequencing results. A COI barcode sequence of 552 bp for A. joubini and sequences of various lengths (see Supplementary Dataset File) for seven out of ten British A. crataegi were recovered with NGS. These sequences perfectly matched with other Aporia barcodes, when using BOLD Identification System (IDS). Further confirmation of their validity was provided by the fact that they grouped with sequences from closely allied taxa. Phylogenetic analyses in the subtribe Aporiina. The ML tree confirmed the monophyly of A. crataegi clade (Bootstrap support, BS = 100) and the close relatedness to a clade (BS = 97) consisting of Aporia + Mesapia ( Fig. 1; 13,18 ). Our findings do not corroborate with the morphological groups proposed by Della Bruna et al. 9,19 based on shape of the uncus in male genitalia. For example, A. oberthuri and A. goutellei are closely related in our tree (Fig. 1), while the former has a bifid and the latter a pointed apex (see 18 ). Our analysis further highlighted  (Fig. 2B), rejected the null hypothesis of constant population size for these two phylogeographic units (Table 1). Mismatch distribution of these groups was examined according to sudden expansion model (Table 1), and goodness of fit tests did not show significant deviations from expected distributions, so that parameter τ = 2μt could be used to estimate the time (t) elapsed from population expansion. Estimated values of τ and their 5% and 95% confidence limits are shown in Table 1 barcode data alone provide strong support for previous findings based on combined data 13,18 on the systematic relationships within the genus Aporia (Fig. 1). The division of Aporia into three subgenera (Aporia, Metaporia and Mesapia) is not supported, since both Aporia (Aporia) and Aporia (Metaporia) (sensu 19 ) are found to be paraphyletic, with Mesapia nested within Aporia sensu lato. The diagnostic morphological characters for these taxa include smaller wingspan, more rounded wings, and hairy genitalia for Mesapia 49 , and minor differences in wing pattern elements for Metaporia 9 , none of which, in our opinion, merit a separate subgeneric (sensu 14,15 ) or generic (sensu 16  www.nature.com/scientificreports/ distance between these two clusters (2.67 ± 0.64%) falls well within the average distance between most Aporia species. Despite this, we refrain from elevating illumina as a separate species until supporting ecological or morphological evidence become available. As for A. joubini, based on similarities in wing pattern, Della Bruna et al. 9 suspected that this species may represent an aberrant form of A. harrietae paracraea, a closely related taxon that flies in the type locality of A. joubini. We agree with this assessment. If barcode sequences of A. hariettae become available and prove to be very close or identical to A. joubini, the latter should be downgraded to a junior synonym of A. hariettae.
Subspecific taxonomy and evolution of Aporia crataegi. Within A. crataegi, from 23 to 25 subspecies have been described based on fragmented distribution and few morphological characters 9,10 , but our mtDNA analyses reveal a total of six distinct lineages (Fig. 2). Our data highlighted several populations, described as different subspecies, sharing similar or identical mtDNA sequences, suggesting a very recent evolutionary origin. The most evident example occurs in Eurasia, where 66 samples from 53 localities, described as 12 subspecies according to Della Bruna et al. 9 , share an identical haplotype (H1: Fig. 2B) consistent with recent demographic expansion during the Late Pleistocene glacial period (about 67 kya: Table 1) for the Eurasian haplogroup (Fig. 2B). Populations from three major countries in the Mediterranean Basin (Italy, Greece, and Spain: Fig. 2) also showed very weak differentiation, despite three subspecies (ssp. rotunda, ssp. karavaievi, and ssp. rutae respectively) recognized by Della Bruna et al. 9 , confirming a pervasive phylogeographic pattern in European phylogeography 50,51 . The trace of demographic expansion model indeed dates a rapid expansion of the Mediterranean haplogroup to about 116 kya during the Late Pleistocene. The sea-level oscillations during this period, which created land bridges between the three main peninsulas 52 and permitted faunal exchange, can explain the fragmented distribution of this haplogroup (Fig. 2). This scenario has been suggested as a hypothetical distribution pattern during the last glacial period of many organisms including butterflies (e.g. Maniola jurtina 50 ). Afterwards, in the post glacial period, gene flow among these populations became interrupted, and they experienced a second expansion towards the continent. It is worth noting that although one mutation separates the two main haplogroups, the present postglacial distribution of the Mediterranean haplogroup in A. crataegi ( Fig. 2A) appears to be an example of the "grasshopper paradigm" 50 , frequently repeated in many animal and plant species 53,54 including butterflies (e.g. Zerynthia polyxena 55 ). The grasshopper paradigm is expected for species showing difficulties in crossing the mountain barriers represented by the Alps and Pyrenees 50,55 . Indeed, in A. crataegi, the populations of the Mediterranean haplogroup from Italy and Iberia remained trapped by the Alps and Pyrenees. Conversely, the Greek populations did not expand to Central Europe despite the absence of conspicuous mountain chains that would otherwise impede such colonization. Although the validity of using mtDNA as a marker in molecular ecology has been questioned in the last decade 56,57 , a recent study has provided evidence that mtDNA spatial differentiation is correlated with species traits known to affect the dispersal and colonization capabilities of butterflies 51 . Nevertheless, considering that the mtDNA variation observed in this study is not representative of the whole genomic variation, we cannot exclude a recent expansion of the Greek populations in Central Europe. Nuclear genetic analyses are therefore needed to better investigate the genetic variability, particularly in the contact zones between the main haplogroups. Furthermore, additional sampling might reveal a different genetic pattern in A. crataegi. The extinct British and Korean populations appear to be genetically most closely related to the other specimens from Eurasia (Supplementary File 1). Although this signals possible synonymy of these populations, we refrain from making any changes to their current taxonomy, as any proposal to synonymize a large number of 'subspecies' of A. crataegi with identical mtDNA haplotypes will require a more in-depth analyses of morphological differences and examination of type materials.
Our samples from five Sicilian populations showed three unique haplotypes, highlighting their significance for biodiversity in Sicily. In Iran, 15 samples from 12 localities encompassing four subspecies (sensu 9 ) share an identical haplotype (H41: Fig. 2B). Our analysis strongly supports the distinctiveness of the Iranian populations with the exception of two divergent sequences from Central Iran (Yazd, H36 and Kerman, H37: Supplementary Dataset File). In addition, our results support the distinctiveness of the single sample from Syria (ssp. augustior, sensu 9 ). As stated above, further sampling and nuclear genetic analyses particularly from Iran, Israel, Syria, Jordan, and Iraq will likely reveal additional haplotypes. Finally, our data supports the genetic peculiarity and evolutionary value of the populations from Japan and north-western Africa, previously recognized as two welldifferentiated subspecies (ssp. adherbal and ssp. mauritanica respectively). Although our phylogenetic analysis doesn't fully resolve the relationships between the North African and Sicilian populations (ssp. mauritanica and ssp. augusta respectively), the large differentiation between these populations of A. crataegi suggests a prolonged period of lack of genetic exchange between them, likely as a result of an early split in the range of the ancestral stock. Despite the small geographical distance between North Africa and Sicily, eight mutations divide these two subspecies (Fig. 2B). The last land bridge between Europe and Africa is known to have occurred at the end of Miocene (7-5.3 million years ago (mya)) 58 , when a temporary closure of the water corridors between Africa and the Iberian Peninsula permitted biotic exchange between the two continents 59 . During this period, known as the 'Messinian Salinity Crisis' , Spain was connected to Morocco, and Tunisia to Italy via Sicily 60,61 . Re-opening of the Gibraltar at the beginning of the Pliocene (5 mya) restored the barrier. This short period of connection between the two continents has been suggested as a plausible explanation for vicariance between the North African, Iberian, Sicilian and Italian lineages of many organisms, including butterflies (e.g. Melanargia 62 ; Pseudophilotes 63 ). However, considering Brower's 48  www.nature.com/scientificreports/ conclusions Natural history collections represent an invaluable source of information for species whose ranges are characterized by geomorphological or geopolitical impediments, but principally for storing specimens of extinct taxa when fresh biological material is no longer available, as the case of A. joubini and British populations of A. crataegi. These repositories allow us to fill the gap in genetic information and reconstruct the evolutionary history of species. For now, NGS-based analysis of COI barcodes has made efficient genomic studies of museum specimens possible, allowing the first look at the genetic traits of extinct species. Rapid advances in modern DNA technology will eventually allow complete reconstruction of genomes, uncovering the lost evolutionary history of these extinct populations and ushering us into a new age of Biological and Conservation Sciences.

Data availability
The datasets generated during the current study are available in the BOLD system repository: Dataset name: DS-APORIA; https ://doi.org/10.5883/DS-APORI A.