Tip-dated phylogeny of whirligig beetles reveals ancient lineage surviving on Madagascar

The temporal origin of Madagascar’s extraordinary endemic diversity is debated. A preference for Cenozoic dispersal origins has replaced the classical view of Mesozoic vicariance in the wake of molecular dating. However, evidence of ancient origins is mounting from arthropod groups. Using phylogenetic ‘tip-dating’ analysis with fossils, we show that a whirligig beetle species, Heterogyrus milloti, inhabiting forest streams in southeastern Madagascar is the last survivor of a once dominant and widespread Mesozoic group. With a Late Triassic to Early Jurassic origin (226–187 Ma) it is the hitherto oldest dated endemic lineage of animal or plant on Madagascar. Island biotas’ sensitivity to extinction is well known, but islands can also provide refuge from continental extinction. Heterogyrus milloti is an irreplaceable link to the freshwater biota of the Mesozoic and serves as a reminder of what may be lost without critical conservation efforts on Madagascar.


Results
The analysis resulted in a well-resolved phylogeny of extant and extinct taxa with all higher-level relationships discussed below strongly supported (posterior probability 0.98-1.00; Fig. 1 and Fig. S1). The tree was consistently stable across methods of analysis, models, and priors (Supplementary Materials). Notably, not a single fossil was recovered as stem taxa to extant Gyrinidae. Two Mesozoic fossils described in the genus Angarogyrus, with previous uncertain affinity, grouped with Spanglerogyrus, and this clade (hereafter Spanglerogyrinae) was sister to a clade with the remaining extant and fossil Gyrinidae. Re-examination of the Angarogyrus fossils ( Fig. 2a and d) reveals remarkable similarity to the living Spanglerogyrus (Fig. 2c). Both have a unique quadrate frons, with the frontolateral margins continued posteriorly over the dorsal eye (Fig. 2b,e,f), a pronotum with a strong medial lobe projecting anteriorly onto the head capsule (Fig. 2b,e,f), and a very small body size (Fig. 2a,c,d). An exceptionally well-preserved elytron of the fossil Angarogyrus minimus shows a similar coloration and covering of setae to the modern Spanglerogyrus albiventris (Fig. 2g and h).
Five other Mesozoic fossils described in the genera Cretotortor, Baissogyrus, and Mesogyrus were monophyletic together with the extant Malagasy genus Heterogyrus (hereafter Heterogyrinae). Heterogyrinae was sister to the subfamily Gyrininae, including all other extant taxa, as well as three Cenozoic fossils of the genera Miodineutes, Mesodineutes and Gyretes. Morphology of the well-preserved fossil Mesogyrus antiquus ( Fig. 3i and j) from the Karatau deposits 26 , shows marked similarity to Heterogyus milloti (Fig. 3f,g,h) including a short transverse sulcus of the metaventral discrimen, the triangular lateral wing of the metaventrite, the lobiform metanepisternum, and the elytron with nine elytral striae and sutural border. The Karatau deposits are the remains of a large, stable, freshwater, Jurassic lake 27 (Fig. 4). This indicates M. antiquus was lentic, compared to the extant H. milloti, known only from small trickling forest streams 22 , suggesting increased ecological diversity in the past. The heterogyrine genera were widely distributed and found throughout the Jurassic and Cretaceous (Table 1)  Previous TED analyses have, in some cases, resulted in unrealistically old ages, postulating extremely long ghost lineages with no trace in the fossil record 28,29 . This has been attributed to the type of relaxed clock model, morphological-molecular data conflict, and not accounting for systematic tip sampling bias 28,30 . We therefore tested the stability of estimated divergence times extensively (Supplementary Materials). The largest impact on divergence time estimation was the tree prior (uniform instead of FBD), which gave significantly older ages (Heterogyrinae-Gyrininae divergence: 248 Ma; Supplementary Materials), similar to ref. 30. Standard node calibration analyses with fossils excluded as terminal taxa gave 21-24 Ma younger estimates but still early Jurassic (182-185 Ma) for the Heterogyrinae-Gyrininae divergence (Supplementary Materials).

Discussion
A distinct transition is seen in the fossil record of whirligig beetles following the K-Pg boundary ( Table 1). The Heterogyrinae are replaced by the Gyrininae as the dominant whirligig beetle fauna, following the end Cretaceous extinction event. At the Arkhara site, bordering the K-Pg boundary, heterogyrines and gyrinines co-occur, but all younger fossils are gyrinines, with all older fossils being heterogyrines, apart from two spanglerogyrines. By the time of the K-Pg boundary, Madagascar was already isolated, as India had previously drifted away from the IndoMadagascan subcontinent 2, 7 . Therefore, as an island Madagascar likely served as a refugium allowing the last heterogyrine lineage's persistence through the K-Pg extinction event, while all other continental relatives went  (Fig. 1). The current distribution of Heterogyrus milloti is limited to humid mountain forests of Andringitra and Ranomafana, in the southern part of the eastern escarpment of Madagascar. It is only encountered in small forest streams, where no other gyrinids are found. A variety of gyrinine species occupy the larger streams, rivers, and ponds in the surrounding area at both lower and higher elevations. Given fossil heterogyrines occupied habitats similar to todays' gyrinines (Table 1) Table S2) and breaks a conceptual barrier rooted in the scientific community. Limited to discourse on the relative frequency of dispersal versus vicariance origins 2, 6, 7 , accounting for the fact that islands can serve as refugia from continental extinction, vicariance from India (88 Ma) or even Africa (160-130 Ma) 2 , is not necessarily the upper limit for Madagascar's 'paleoendemics' . Oplurid lizards, podocnemid turtles, xenotyphlopid blind and boid snakes, mantellid and microhylid frogs, and some cichlid fish, are the oldest endemic vertebrate clades (Fig. 5); but these are all at most Cretaceous or late Jurassic in age [5][6][7][8] . A few examples of endemic arthropod lineages in Madagascar (Fig. 5), such as astacoid freshwater crayfish 10 , scutigerine centipedes 11 and archaeid spiders 9 , are older, dated to the Mid-or Late Jurassic. Vascular plants of Madagascar, although rich in endemism at higher taxonomic levels, have but a few examples of Cretaceous ancestry 31 . In contrast, Heterogyrus milloti diverged from its closest living continental relative before the Gondwanan break-up. This is unprecedented in Madagascar but echoes the survival of the Tuatara on New Zealand 32 . Both represent the last surviving species of formerly widespread Triassic-Jurassic lineages 'rescued' from extinction by microcontinental islands. Madagascar serving as such a refugium sets the island in a new perspective and demonstrates that increased attention to arthropods will likely change our view of this famous natural laboratory of evolution.

Methods
Morphological data. Fourteen fossil taxa were included in the analysis, four representing outgroup taxa, and ten ingroup taxa, representing almost 50% of the known, unambiguous, gyrinid fossils (Table S3, Table 1). Fossil taxa were selected for inclusion based on availability and preservation of morphological features. Some described fossil gyrinid species could not be located in the indicated depository, such as Dineutus longiventris Heer, 1862 and Gyrinoides limbatus Motschulsky, 1856, and were unavailable for study. Gyrinid taxa described only from elytra (e.g. Protogyrininus sculpturatus, Mesogyrus anglicus, M. elongatus, M. sibiricus, Cretotortor archarensis) were not included in the final analysis, due to introduction of significant phylogenetic uncertainty. However, the elytral characters present in these fossils were accounted for through congeners (Protogyrininus is likely a synonym of Gyrinus) included in the analysis, which presented additional morphology. In total all unambiguous fossil gyrinid genera except one, Gyrinoides, were analyzed. A complete list of the fossil specimens included in the analysis and their depositories is provided in Table S3. The fossil Coptoclava longipoda was originally included in the dataset but was removed from all analyses except one due to ambiguous coptoclavid monophyly and affinity 17 . The result of the analysis with C. longipoda included is discussed in the Supplementary Materials.
Extant taxa were coded from specimens included in the study from ref. 22 and those listed in Table S4. Extant specimens were examined using a Zeiss Discovery.V8 SteREO microscope, as well as a scanning electron  Table S1.

Molecular data. Thirty-three additional ingroup gyrinid taxa and one outgroup taxon, Haliplus lineatocollis,
were added to the five-gene dataset from ref. 22 (Table S4). Additional taxa included sequences for all genes except EF1α which was found to have multiple copies in Gyrinidae 22 . In total, taxon sampling (extinct and extant taxa) for the analysis included ten outgroup taxa from Hydradephaga and 129 gyrinid taxa. The same primers were used as in ref. 22. New sequences are deposited in Genbank with accession numbers available from Table S4.
Phylogenetic analyses. All bayesian phylogenetic analyses were implemented using the MPI version of MrBayes 3.2.6 33 and were run on the super computer cluster 'Ulam' at the Center for Advanced Research Computing (CARC), University of New Mexico. The same 8-part partitioning scheme was used as in ref. 22. Each molecular partition was allowed a separate gamma distributed rate variation across sites parameter. Reversible-jump MCMC was used to integrate over the 203 possible symmetrical substitution rate models during analysis 33,34 . The morphological partition was given a Markov k model 35 , accounting for that only parsimony informative characters were included, and with a gamma shaped rate variation parameter across characters. Fifteen characters were treated as ordered (see Supplemental Material). A topological constraint defined Adephaga, all ingroup and outgroup taxa except Triaplus laticoxa, as re-examination of the fossil material suggests Triaplidae is not a member of Adephaga as previously proposed 20 . Each analysis included two independent runs starting at random topologies, each with one cold and three incrementally heated chains (temp = 0.1) run for 20 million generations, sampling every 1000th generation. MCMC convergence was monitored using Tracer v.1.6 36 and statistics provided by MrBayes 33 .

Clocks and calibration.
For a detailed discussion on the total evidence dating (TED) 23,28,30 under the fossilized birth-death (FBD) 25 process model and the sensitivity analyses performed, see Supplementary Materials. In all analysis, TED as well as node dating, topology and divergence times were inferred simultaneously, hence our inferred divergence times are not dependent on a topology presumed to be known without error. To set a prior on the base clock rate the method outlined by ref. 23 was followed. Here the tree height in units of expected number of substitutions per site from root to tips is estimated under a strict clock as an average across partitions, while making sure the root prior is unimportant (exp (0.1), (1.0) or (10)). The median of the tree height is divided by the minimum and maximum age of the root based on fossils to get a base clock rate interval in substitutions per  (Table S2). Only those lineages with median, minimum and maximum age estimates were included in the figure. Groups from left to right are plants, trematodes, decapods, centipedes, heterogyrine whirligig beetles (showing tip and node dating ages), insects (excluding whirligig beetles), spiders, mammals, birds, lizards, snakes, testudines, amphibians, and fish.
site per million years. The age of the earliest Tshekardocoleidae (c. 298 Ma) the oldest known coleopteran 20,37,38 was used to set a lower substitution rate level, and the min age of Triaplus laticoxa (Min age 221 Ma), the oldest fossil included in the dataset and hence the minimum age of the root, was used to inform on a maximum likely substitution rate. This guided the design of a lognormal (−5.7, 0.3) prior on the base clock rate. The tree prior was set to FBD with beta (1, 1) priors for the [0-1 interval] parameters (turnover and fossil sampling proportion) and an exp (10) prior for the net diversification parameter 25 . The proportion of sampled tips was set to 0.1 as about 10% of known species (ca. 1000) were sampled. The FBD tree prior is contingent on a root age, tmrca 25 . We used a uniform prior between 252-273 Ma for the root based on the oldest known Triaplidae fossil and the mid to late Permian boundary which also defines the border between only reticulated beetle elytra and the first smooth beetle elytra (see Supplementary Materials).
We used the uncorrelated relaxed clock model IGR 33,39 in MrBayes with the prior on rate variation across lineages set to exponential (10). The fossil terminals were assigned uniform age priors based on the dated period of respective fossil deposit (Table S3). Sampling assumption was set to diversified sampling for extant tips 30, 40 . Sensitivity analysis. To test the sensitivity of inferred divergence time estimates to model and parameter settings we ran a series of additional analysis (for full details see Supplementary Materials). The clock rate variation among lineages parameter was run with priors exp (1), (10) and (100). The variation of the base clock rate prior was increased to lognormal (−5.7, 0.6). The root age tmrca was set to a conservative uniform prior of [221 Ma − 298 Ma] based on oldest included fossil, and oldest known Coleoptera, as well as to offset exponential priors with offsets at 221 or 252 Ma. The net diversification prior of the FBD was run under an exponential (1), (10) and (100) prior. The sampling proportion of extant species was reduced to 0.01 to account for the possibility of a large proportion of unknown cryptic diversity. We ran an autocorrelated relaxed clock model (TK02) 33,41 together with the FBD tree prior as well as both uncorrelated (IGR) and autocorrrelated (TK02) clock models under a uniform tree prior. We allowed the FBD process to be piecewise divided into two or three time intervals 30,42 . Finally, we tested the effect on the topology and clade support of excluding the fossils and compared the divergence time estimates from TED with a standard node dating approach 28,29 . In the node dating analysis, we first used the inferred fossil placements from a non-clock analysis. Then we constrained the nodes Orectochilini, Dineutini and Heterogyrinae + Gyrininae to be monophyletic and set offset exponential calibration priors on these three nodes using the three oldest fossils from respective clades as the offset values (see Supplementary Materials). To be directly comparable with the TED analysis we ran this node dating analysis with an FBD tree prior but fixed the fossilization rate to 0 (no fossils sampled) since fossils were excluded as terminal taxa in this analysis.

Literature review.
We performed a literature review through 2016 of studies dating extant endemic Malagasy lineages of animal and plants, to compare ages. Multiple estimates for the same taxon were included, as were different dating techniques. The compiled age data are available in Table S2 of the Supplemental Materials.