Putting scales into evolutionary time: the divergence of major scale insect lineages (Hemiptera) predates the radiation of modern angiosperm hosts

The radiation of flowering plants in the mid-Cretaceous transformed landscapes and is widely believed to have fuelled the radiations of major groups of phytophagous insects. An excellent group to test this assertion is the scale insects (Coccomorpha: Hemiptera), with some 8,000 described Recent species and probably the most diverse fossil record of any phytophagous insect group preserved in amber. We used here a total-evidence approach (by tip-dating) employing 174 morphological characters of 73 Recent and 43 fossil taxa (48 families) and DNA sequences of three gene regions, to obtain divergence time estimates and compare the chronology of the most diverse lineage of scale insects, the neococcoid families, with the timing of the main angiosperm radiation. An estimated origin of the Coccomorpha occurred at the beginning of the Triassic, about 245 Ma [228–273], and of the neococcoids 60 million years later [210–165 Ma]. A total-evidence approach allows the integration of extinct scale insects into a phylogenetic framework, resulting in slightly younger median estimates than analyses using Recent taxa, calibrated with fossil ages only. From these estimates, we hypothesise that most major lineages of coccoids shifted from gymnosperms onto angiosperms when the latter became diverse and abundant in the mid- to Late Cretaceous.

whole specimens were left in lysis buffer and proteinase K overnight, two elutions of 50 µL were obtained at the end of the extraction, and the cuticle of each specimen was retrieved from the extraction column for slide preparation and identification.
Three nuclear markers were selected based on previous molecular studies on Coccomorpha: partial 18S, two regions of 28S (D2-3 and D10) and a region of EF-1 alpha. Table S1 provides the list of primers used for the amplification of the four regions as well the references from which they were selected; Genbank accession numbers are available on Table S2. PCR amplifications were performed with a Mastercycler ep Gradient S (Eppendorf) and consisted of 25 µL reactions with Illustra Ready-To-Go PCR beads (GE Healthcare), 1 µL of each primer (x10 dilution) and 2 to 4 µm of DNA template, depending on the quantity of DNA retrieved from the extraction. For 18S and 28S fragments, PCR conditions followed that of Hardy et al. [1]. The EF-1a fragment was amplified using two sets of primers covering two overlapping regions, which resulted in a fragment of ~ 1080 bp.
Conditions for PCR follow that of [2,3]. PCR products were purified using AMPure magnetic beads (Agencourt) and cycle-sequenced with the BigDye 1.1 Terminator Reaction Mix (Applied Biosystems, Inc.) and the same set of primers as PCR reactions. This protocol allowed a faster cycle sequencing program and used less BigDye. Cycle sequenced products were purified using CleanSeq (Agencourt) or ethanol precipitation. Sequencing was performed using an ABI 3730xl DNA Analyzer. Sequences were then compiled and edited using Geneious 5.1.7 [4] and saved in separate Fasta files for further analyses with additional sequences retrieved from Genbank. Sequence alignments were performed using MAFFT [5] through the Geneious platform. Introns from EF-1a were removed after alignment and the gene was partitioned by codon position (1 st and 2 nd as one partition and 3rd as one partition).
The best-fit model of nucleotide substitution for each of the three markers was assessed using jModeltest 2.1.4 [6] and the best models under the Bayesian Information Criterion (BIC) were: TIM3ef+I+G for 18S, TVM+I+G for both 28S regions combined, TPM2+I+G for the partition including the 1 st and 2 nd codon positions of EF-1a, and HKY+G for the partition including the 3 rd codon position of EF-1a. As TIM3ef, TVM and TPM models are not implemented in MrBayes, we used the GTR model for these cases. data (male morphology, female morphology and molecular sequence).
The combined matrices of Recent only and Recent and fossil sets are available as nexus files in GitHub. The original matrix from Vea and Grimaldi [11] is available in Morphobank under Project 1013. The final taxon sampling with GenBank accession numbers for molecular sequences is available in Table S2.
Four replicates of 10 to 80 million generations were run at an initial temperature of 0.2 for preliminary analyses (non-clock and strict-clock) and 0.1 for the divergence time estimate analyses. Convergence was considered achieved when the average standard deviation of split frequency was below 0.05. All trees were summarized using the command sumt in MrBayes with the option contype=allcompat. We chose to summarize the trees using all compatibility method because of the amount of missing data (fossils and a number of Recent terminals do not have molecular sequences) that tend to reduce significantly the posterior probabilities.

Preliminary analyses
A non-clock and strict-clock analyses were first performed with Coccoidea as a topology constraint in order to define prior values for the IGR model and the clock rate. For details on command lines, see the codebook available publicly in Github (https://github.com/ zourloubidou/Coccomorpha-divergence-time.md).

IGR model
The non-clock and strict-clock topologies were used as inputs to determine the prior distribution values for the IGR model and clock rate using the R script from Ronquist et al. [13]. Both node-dating and tip-dating analyses used a relaxed clock following the IGR model from Ronquist et al. [13]. For the node-dating analyses, we used the Recent only dataset (fossil terminals were defined in a variable using the command line "taxset fossils =" and excluded from prior to analysis using the command line "remove fossils;"), for which node age priors were set for the root, the ingroup and 10 families following an off set exponential distribution. All details on node calibration can be found in Table S3. For the tip-dating method, we used the Recent+fossil dataset and performed fixed prior ages on all fossil terminals (Table S4 for details on ages) as well as an offset exponential distribution for the root prior. In order to assess if any of node prior and the root prior distribution (either in the node-dating analysis or the tip-dating where a prior distribution was defined for the root)

Divergence-time estimate analyses
could influence the age estimates in MrBayes, we performed various analyses with both node-dating and tip-dating approaches as follow: -Node-dating method: 10 additional node-dating analyses were performed using all node priors and excluding one node prior at a time, as well as one analysis excluding the root prior, and one analysis changing the distribution of the root to lognormal (mean=240, SD=10).
-Tip-dating method: two additional analyses were performed with for the first one excluding the root prior, and the second changing the root prior distribution to lognormal (mean=240, SD=10).
All topologies with age estimates resulting from these analyses are available in TreeBASE (temporary link for reviewers: http://purl.org/phylo/treebase/phylows/study/TB2:S17496?xaccess-code=4cff04a26ac567ac00b91ec3451e380d&format=html) Heteromargarodes is a living genus, for which the only fossil species of the family was described in Cambay amber. Heteromargarodes hukamsinghi is in the total evidence analysis taxon sampling.

Ages and LTT plots
For a quick overview of age-estimate differences among the analyses, we obtained lineage through time (LTT) plots in R version 3.1.2 (2014-10-31) [20]. For all tip-dating analyses,  Table  S3). Node letters are clades mentioned in the results. Green area: period of major angiosperm radiation.