MicroRNAs Associated with Caste Determination and Differentiation in a Primitively Eusocial Insect

In eusocial Hymenoptera (ants, bees and wasps), queen and worker adult castes typically arise via environmental influences. A fundamental challenge is to understand how a single genome can thereby produce alternative phenotypes. A powerful approach is to compare the molecular basis of caste determination and differentiation along the evolutionary trajectory between primitively and advanced eusocial species, which have, respectively, relatively undifferentiated and strongly differentiated adult castes. In the advanced eusocial honeybee, Apis mellifera, studies suggest that microRNAs (miRNAs) play an important role in the molecular basis of caste determination and differentiation. To investigate how miRNAs affect caste in eusocial evolution, we used deep sequencing and Northern blots to isolate caste-associated miRNAs in the primitively eusocial bumblebee Bombus terrestris. We found that the miRNAs Bte-miR-6001-5p and -3p are more highly expressed in queen- than in worker-destined late-instar larvae. These are the first caste-associated miRNAs from outside advanced eusocial Hymenoptera, so providing evidence for caste-associated miRNAs occurring relatively early in eusocial evolution. Moreover, we found little evidence that miRNAs previously shown to be associated with caste in A. mellifera were differentially expressed across caste pathways in B. terrestris, suggesting that, in eusocial evolution, the caste-associated role of individual miRNAs is not conserved.

the colony in which adult males eclosed at the same time as the first workers eclosed), leaving 33 colonies in cohort 1.
We retained the colony queen in 13 of the 33 colonies (henceforth queenright colonies, generating worker-destined larvae) and removed the colony queen from the remaining 20 colonies (henceforth queenless colonies, generating queen-destined larvae). Queen removal was carried out 14 days after first worker eclosion. Because B. terrestris queens produce a queen pheromone in the presence of which female larvae develop into workers, queen removal induces queen production [3][4][5] . In the queenright colonies, we removed up to half of the 1st or 2nd instar larvae (1-3 days old) every 2-3 days until 10-14 days after first worker eclosion, after which further early-instar larvae were not sampled as colonies were likely to have reached their switch point. In the queenless colonies, we removed up to half of the 1st or 2nd instar larvae every 2-3 days for 6 days after the queen was removed. Because B. terrestris eggs hatch 5 days after they are laid 6 , any eggs present after this time would have been male eggs produced by egg-laying workers, and so sampling of early-instar larvae was then halted. In both queenright and queenless colonies, we allowed all unsampled larvae to develop to the 4th (final) instar, which is beyond the point in larval development when caste fate has been irreversibly determined 7 . We then sampled approximately half of the 4th instar larvae from both sets of colonies.
Among the 13 queenright colonies, eight colonies occurred in which all of the unsampled larvae eclosed as workers. These eight colonies were maintained for a further month after all focal larvae had pupated and eclosed as workers, which allowed us to determine that no queens or males eclosed for two weeks or more after final larval sampling. This meant that all sampled larvae in these colonies would have been very likely to develop into workers. We selected as colonies providing worker-destined larvae the four of these colonies that had the highest sample sizes of early-instar and late-instar larvae. In the four selected colonies, 100% (148/148) of unsampled late-instar larvae eclosed as workers (Table S2). All 1st and 2nd instar larvae sampled from these colonies were classified as early-instar worker-destined larvae and all 4th instar larvae were classified as late-instar worker-destined larvae. The masses (mean (range) mg, n = number of larvae) of the sampled worker-destined larvae were as follows: early-instar larvae: 7.5 (3.5-15.3) mg, n = 164; late-instar larvae: 252.4 (107.0-454.4) mg, n = 85. These values confirmed that the early-instar larvae were 1st or early 2nd instar larvae, as 2nd instar B. terrestris worker-destined larvae have been shown to have a maximum mass of 39.5 mg 3,4 . Likewise, these values confirmed that the late-instar larvae were 4th instar larvae, as B. terrestris worker-destined 4th instar larvae have been shown to have a minimum mass of 95.8 mg 3,4 .
Among the 20 queenless colonies, eight colonies produced queens. We sampled queendestined larvae from the four colonies among these eight that produced the highest percentages of adult queens from the unsampled late-instar larvae. In three colonies, this percentage was 100% (38/38 larvae; Table S2). In the fourth colony (QL- 14), it was 85% (11/13 larvae; Table S2), i.e. two unsampled larvae eclosed as workers. However, retrospective calculation of the proportion of all late-instar larvae (sampled and unsampled) that were queen-destined in this colony showed this proportion to be 95% (38/40; Table S2). All sampled late-instar larvae from all four colonies could be identified as queen-destined, because in B. terrestris it is possible to tell late-instar worker-and queen-destined larvae apart by mass alone, with queen-destined larvae having a mass approximately four times greater 3 . The masses (mean (range) mg, n = number of larvae) of queen-destined larvae sampled from the four colonies were as follows: early-instar larvae: 9.3 (7.2-14.7) mg, n = 76; late-instar larvae: 1168.7 (772.5-1455.4) mg, n = 65. These values confirmed that the early-instar larvae were 1st or early 2nd instar larvae, as 2nd instar B. terrestris queendestined larvae have been shown to have a maximum mass of 64.0 mg 3,4 . Likewise, these values confirmed that the late-instar larvae were 4th instar larvae, as B. terrestris queendestined 4th instar larvae have been shown to have a minimum mass of 256.2 mg 3,4 .
In summary, in the four selected queenright colonies, we monitored 397 larvae in total. Of these we sampled 164 early-instar worker-destined larvae and 85 late-instar worker-destined larvae. Of the remaining 148 unsampled larvae reaching the late instar, 100% eclosed as workers (Table S2). In the four selected queenless colonies, we monitored 192 larvae in total. Of these we sampled 76 early-instar queen-destined larvae and 65 late-instar queen-destined larvae. Of the remaining 51 unsampled larvae reaching the late instar, 96% eclosed as queens, i.e. all except for two larvae eclosing as workers (Table S2). These data confirmed that larvae sampled from each colony type would have developed along the expected caste trajectory in the vast majority of cases.
Within each colony, instar and caste, we pooled sampled larvae as whole bodies and then snap-froze them in liquid nitrogen and crushed them into a mass of fine frozen particles using a chilled ceramic mortar and pestle. We dissolved the pooled mixture into Trizol reagent (Invitrogen, Carlsbad, California, USA), using 1.5 ml of Trizol for every 100 mg of tissue. We snap-froze the resulting mixtures in liquid nitrogen and stored them at -80°C. We pooled larvae because sRNA-seq required up to 2 µg of total RNA per phenotype, which was more RNA than the early-instar larvae yielded individually. Pooling also meant that the miRNA read counts (the number of sequences returned from the sRNA-seq) of different samples reflected differences between phenotypes rather than individuals 8 . To ensure that the samples were biological replicates representing colony variation, each sample was created by pooling larvae from the same colony and phenotype. The samples consisted of 10-44 early-instar larvae per colony and 5-27 late-instar larvae per colony. These procedures generated 16 samples, representing four biological replicates (colonies) per each of the four phenotypes, with each colony providing one early-instar and one late-instar sample for each caste fate (Tables S3, S5).

Cohort 2: samples for Northern blots
Colonies in cohort 2 were reared and sampled following the same procedures as those used for cohort 1. The queen died soon after arrival in two colonies, which were therefore excluded from the experiment. Of the remaining 18 colonies, we maintained five in a queenright condition (to produce worker-destined larvae) and removed the colony queen from the remaining 13 colonies (to produce queen-destined larvae). We isolated worker-and queen-destined larvae following the methods used for cohort 1. In the five queenright colonies (Table S3), we monitored 551 larvae in total. Of these we sampled 223 early-instar worker-destined larvae and 92 late-instar worker-destined larvae. Of the remaining 236 unsampled larvae reaching the late instar, 235 (99.6%) eclosed as workers (a single male eclosed from these larvae in one colony; Table S3). In eight of the 13 queenless colonies, large numbers of workers eclosed and these colonies were therefore excluded from experiment. In the five remaining queenless colonies (Table S3), we monitored 284 larvae in total. Of these we sampled 122 early-instar queen-destined larvae and 56 late-instar queendestined larvae. Of the remaining 106 unsampled larvae reaching the late instar, 103 (97.1%) eclosed as queens (a single worker eclosed from these larvae in each of three colonies; Table  S3). As for cohort 1, because almost all unsampled larvae from each selected colony developed into the expected phenotype, we treated all sampled larvae from queenright colonies as worker-destined and all sampled larvae from queenless colonies as queendestined.
Larvae sampled from the cohort 2 colonies were pooled, snap-frozen and stored following the methods used for the cohort 1 larvae. Pooling was again conducted within colonies and phenotypes and was required because each Northern blot required up to 10 µg of total RNA. Samples consisted of 11-59 early-instar larvae per colony and 6-25 late-instar larvae per colony. This generated 20 samples, representing five biological replicates (colonies) per each of the four phenotypes, with each colony providing one early-instar and one late-instar sample for each caste fate (Tables S4, S6).
Cohort 3: samples for identifying miRNA stage-and tissue-specificity Colonies in cohort 3 were reared following the methods used for the colonies yielding queendestined larvae in cohorts 1 and 2. Following queen removal, one of the cohort 3 colonies produced 10 late-instar queen-destined larvae. We dissected four of the larvae into head, digestive tract and outer cuticle using a scalpel on a Perspex dissection board. The tissues were immediately separated from one another and homogenized in Trizol using a chilled ceramic mortar and pestle, frozen in liquid nitrogen, and stored individually at -80°C. We homogenized two of the larvae as whole-body preparations in Trizol, then froze them in liquid nitrogen and stored them individually at -80°C. The remaining four larvae were allowed to pupate in the colony. We sampled two of these two days after pupation ('early queen pupae') and the remaining two seven days after pupation ('late queen pupae'). We homogenized the four pupae in Trizol, froze them in liquid nitrogen, and stored them individually at -80°C. These procedures generated 18 individual samples for Northern blots: late-instar queen-destined larva head (4 samples); late-instar queen-destined larva digestive tract (4 samples); late-instar queen-destined larva outer cuticle (4 samples); late-instar queendestined larva whole body preparation (2 samples); early queen pupa (2 samples); and late queen pupa (2 samples).

RNA extraction
We extracted total RNA from the larval-Trizol homogenates (separately for each sample) according to the instructions of the Trizol manufacturer (Invitrogen, Carlsbad, California, USA) with minor modifications. Following RNA extraction, we quantified total RNA using a Nanodrop 8000 spectrophotometer (ThermoFisher Scientific, Loughborough, UK). We measured the integrity of the RNA on a 1.2% agarose gel. All RNA used for sRNA-seq and Northern blots was pure (260/280 ratio > 1.6) and there was no evidence of degradation (data not shown).

cDNA library preparation
In cohort 1, we used total RNA extracted from the queen-and worker-destined larvae to construct 16 cDNA libraries (Tables S3, S5). The 16 libraries consisted of four libraries of early-instar worker-destined larvae (EW1-4), four of late-instar worker-destined larvae (LW1-4), four of early-instar queen-destined larvae (EQ1-4) and four of late-instar queendestined larvae (LQ1-4) (Tables S3, S5). To make the cDNA libraries, we first enriched the total RNA for small RNAs (sRNA) (i.e. enriching the fraction of total RNA that was less than 200 bp in length) using a mirVana miRNA isolation kit (Ambion, Foster City, California, USA) according to the manufacturer's instructions. We then prepared the libraries using the TruSeq small RNA library preparation kit v.1.5 (Epicentre Technologies, Madison, Wisconsin, USA) with modifications to the 3' adapter to reduce sequencing bias 9 .
To ligate the adapters to the sRNA sequences, we followed the protocol provided with the TruSeq 1.5 library preparation kit with some modifications. Following preparation of the cDNA, we amplified each library with a unique index sequence using Illumina index primers (1-16). We tested the specificity of the PCRs by varying the cycle number for each library individually, choosing the cycle number that provided the clearest product when separated on an 8% polyacrylamide gel. We used the selected cycle number to prepare four PCR reactions for each library. We separated the PCR products on an 8% polyacrylamide gel, and then scanned the gel using a Molecular Imager FX Pro Plus (BIORAD, Hemel Hempsted, UK) and Quantity One software (BIORAD). We identified the 21-23mer miRNA band on the gel, and cut out the gel section that contained it. We then extracted the nucleic acids from the gel fragments and re-purified them using ethanol precipitation. To reduce the chances of adapteradapter contamination, which is a common problem during library preparation 10 , we separated the purified nucleic acid sample on a second 8% polyacrylamide gel and re-purified it by using ethanol precipitation and re-dissolving the pellet in ARG water. We took 1/6th of the purified product and separated it on a third gel to ensure that the miRNA band of interest was still present.

Sanger sequencing and sRNA-seq
To ensure that the cDNA libraries from the cohort 1 samples contained miRNAs, we cloned the construct from library EW-1 into a pGEMTeasy vector (Clontech, Saint-Germain-en-Laye, France) and then transformed it into DH5α super-competent Escherichia coli using blue-white colony staining. We then extracted the vectors containing the RNA fragment of interest by culturing the E. coli overnight, picking the white colonies and making mini-preps using the Qiagen mini-prep kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. We prepared ten samples using ready reactions (Life Technologies, Paisley, UK) and then sent them for Sanger chain-termination sequencing at the Earlham Institute (formerly The Genome Analysis Centre), Norwich Research Park, UK.
We used FinchTV sequence analysis software (Geospiza, Seattle, Washington) to identify each sequence. We were able to map six of the miRNA length sequences to the B. terrestris genome v.1.0 11 . One of the six sequences was identified as the miRNA Bte-miR-1, confirming that the libraries contained miRNAs (data not shown).
Finally, the 16 prepared cDNA libraries were shipped to BaseClear B.V (Leiden, The Netherlands), who conducted sRNA-seq of each library on the Illumina HiSeq2000 platform (single-ended reads, on 50 cycles). The sRNA-seq data are separated by each library and are available on the NCBI gene expression omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/). The worker-destined larvae data (GEO accession: GSE64512) were used to predict new miRNAs in the B. terrestris genome 11 and were made available following publication of the genomes, while the queen-destined larvae are available under the GEO accession number GSE77870. Each library is labelled on GEO according to its phenotype and replicate number, e.g. Early Queen Destined 1 corresponds to library EQ1.

Bioinformatic analysis Quality checks and genome mapping
The sRNA-seq libraries from the cohort 1 samples returned 86 million reads across all 16 libraries (2.4-6.3 million reads per library). First, we filtered out all reads that contained unassigned nucleotides; these represented less than 1% of the total number of reads in each sample (Table S4). Next, we trimmed the adapter sequences by matching the first 8 nt of the 5' end of the 3' adapter sequence; on average, the adapter was found for 80% of reads (Table  S4). We then trimmed the 4 nt corresponding to the 3' HD adapter 9 ; only sequences longer than 16 nt were kept for further analysis (on average 82.4% of the sequences with the adapter; Table S4). We term all reads remaining after these steps 'accepted reads'.
We also inspected the nucleotide composition of reads within the 21-24 nt range ( Figure S1). No sequence motifs stood out for any size class; however, we observed that the 5' end of the 22mers and 23mers was enriched for thymine, which is the known hallmark of miRNAs (22mers) and their variants (23mers) 12 . In addition, there was no evidence of a single nucleotide dominating the sequences at any nucleotide position on the sequences ( Figure S1).
Next we calculated the number of non-redundant reads (unique sequences) and the corresponding read complexities, defined as the ratio of the number of redundant reads (total number of reads produced from the sequencing) to non-redundant reads (Table S4).
We mapped all accepted reads to the B. terrestris genome v.1.0 11 , full length, allowing one mis-match and no gaps, using PatMaN 13 . An average of 69.4% of the redundant reads and 54.9% of the non-redundant reads mapped to the genome across all 16 libraries (Table S4). The higher proportion of redundant reads mapping to the genome indicated that the majority of mapped reads were reads with high abundance rather than 'noise', and therefore that the mapped reads were likely to contain a high proportion of functional sRNAs (including miRNAs). We also mapped the reads to available annotations, including annotated miRNAs, tRNAs and coding DNA sequences (CDSs), of the B. terrestris genome (Table S6). As expected, a high proportion of reads was incident to miRNAs (an average of 24.6% of redundant reads across all libraries, Table S6). The complexity of these reads was low, supporting the conclusion that a small number of sequences with high abundance were classified as miRNAs. However, we found that sample LW4 had a very low proportion of miRNAs compared to the other samples (< 5% miRNAs). A small proportion of reads, less than 2% in all samples, matched to tRNAs. Around 40% of the reads in each sample matched to annotated CDSs. The complexities of these reads were higher (> 0.1) and the proportion of non-redundant incident reads was consistently higher than the proportion of redundant reads, indicating that these reads were likely to have been degradation fragments (Table S6).
We next compared the similarity between replicates and treatments, for the mapped reads, using the Jaccard similarity index ( Figure S2). The Jaccard index reflects the proportion of reads that are shared between samples (1.0 = samples identical, 0.0 = no shared reads) and was calculated for each pair of samples using the 500 most abundant reads in each sample 14 . In these comparisons, 15/16 libraries shared 50% or more of the 500 most highly expressed sequences between replicates of the same phenotype, the exceptional library being LW4 ( Figure S2). Size class distribution plots showed that 22mer sRNA sequences were the most frequently expressed within each phenotype, again except for library LW4 ( Figure S3).

Normalization of read abundances
We normalized the 16 libraries using the read count per total (rpt) normalization method [15][16][17] , with the normalization total for each library set at 4 million, which was the median total read count (rounded up to the nearest million) of the accepted reads across all libraries (Table  S4) 16 . To evaluate the outcome of the normalization, we plotted the distribution of pairwise differential expression, calculated using an offsetted fold change, across all read lengths 14 .
For most samples, rpm normalization was appropriate. However, libraries EW4, LW4, EQ2 and LQ2 were significantly differentially expressed when compared to the other three replicates within each respective phenotype, particularly for reads of length 28-34 nt ( Figure  S4), which reduced the sequencing space for shorter fragments, introducing technical differential expression. LW4 also had a very low proportion of miRNAs (see above). We therefore excluded these libraries from further analysis.

MicroRNA prediction and differential expression analysis
Using the normalized matrix of abundances, we identified the B. terrestris miRNAs by aligning the sRNA-seq reads to the miRNAs listed under Hexapoda in miRBase v.21 12 allowing up to 2 mis-matches and accepting only reads aligning to the positive strand of the mature sequences (we thereby updated the set of predicted miRNAs for worker-destined larvae libraries EW1-4 and LW1-4) described in Sadd et al. 11 ). A set of custom-made Perl scripts (available at https://github.com/mattlbeck/collins_et_al_MCDB) was used to create the set of conserved miRNAs. The abundances of mature miRNAs were calculated both at the read level and for mature miRNA variant regions following methods in Mohorianu et al. 14 . A miRNA variant region was defined by finding the per-nucleotide count coverage across all reads and libraries aligning to the mature miRNA. The largest increase and decrease in counts across the mature miRNA indicated the start and end of a variant region. The region count was calculated as the algebraic sum of read counts from reads incident to this region. These methods produced a final list of predicted miRNAs (Appendix 1).
To calculate differential expression between pairs of phenotypes, we determined confidence intervals from the replicates for each miRNA in each treatment as the range of the normalized read counts (i.e. difference between minimum and maximum normalized read counts) [18][19][20][21] . If, for a given miRNA, confidence intervals between two treatments did not overlap, the log2 offset fold change (OFC) was calculated from the ratio of the minimum expression level in the sample with higher mean expression and the maximum expression level in the sample with lower mean expression 18 . That is, for any sequence in samples k and k', we defined the confidence intervals as for sample k and for sample k', where and are the minimum values of the normalized expression levels in the replicates for the sequence, and and are the maximum values of the normalized expression levels in the replicates for the sequence. The log2 offset fold change was then calculated as follows: For up-regulation (i.e. mean expression higher in sample k'), as ; and for down-regulation (i.e. mean expression higher in sample k), as . The offset (o), which was applied to each abundance before the fold change was calculated, was set at 20, thereby preventing the resulting rank of fold changes from being skewed by low-abundance reads 14 .
Using these methods, we identified miRNAs that were differentially expressed between worker-and queen-destined larvae within instars (i.e. by comparing EW with EQ and LW with LQ) and that were differentially expressed between instars within each caste trajectory (i.e. by comparing EQ with LQ and EW with LW). Following previous studies 8,22,23 , we defined sequences as differentially expressed between phenotypes if the log2 offset fold change was greater than one.

Northern blots
To verify the presence of miRNAs in the libraries prepared from cohort 1, and to validate the expression patterns identified by the bioinformatic analysis of these libraries, we used Northern blots to probe for miRNAs in RNA samples extracted from early-and late-instar queen-and worker-destined larvae in the ten selected colonies from cohort 2 (Table S5). We designed probes that were reverse-complementary to ten selected miRNAs (Table S7). These miRNAs fell into four classes: (a) six miRNAs (Bte-miR-13a, Bte-miR-87a, Bte-miR-100, Bte-miR-306, Bte-miR-6001-5p, Bte-miR-6001-3p) of the total of nine miRNAs identified (by bioinformatic analysis of our sRNA-seq data) as being differentially expressed between queen-and worker-destined larvae ( Figure 1); note that Bte-miR-6001-5p and Bte-miR-6001-3p are part of the same duplex, but because both miRNAs showed differential expression and had a read count > 200, we treated the two arms as separate miRNAs; we omitted three miRNAs (Bte-miR-13-3p, Bte-miR-2518, and Bte-miR-8516) also identified as being differentially expressed between castes because they had relatively low expression levels compared to the majority of miRNAs (< 200 counts in any phenotype); (b) two miRNAs (Bte-miR-9a, Bte-miR-184) that were not differentially expressed between any of the four phenotypes but are homologues of miRNAs found by previous studies to be associated with larval or prepupal caste differences in Apis mellifera ( 24-26 ; Table S1); (c) one miRNA (Bte-miR-71) identified as being upregulated in early instar larvae compared to late instar larvae, and having a homologue that has previously been associated with caste differences in prepupae in A. mellifera ( 24 ; Table S1); and (d) one miRNA (Bte-miR-275) identified as being upregulated in late instar larvae compared to early instar larvae, and having a homologue that has previously been associated with caste differences in larvae in A. mellifera ( 26 ; Table S1).
For each selected miRNA, we produced at least two Northern blots (using RNA from each of two independent colony replicates) to compare the expression in the four phenotypes (EW, LW, EQ, LQ) alongside one another. We quantified the signal in each band using the imaging software Quantity One (Bio-Rad). The expression pattern shown by sRNA-seq was considered to be validated if each Northern blot showed the same expression pattern as that shown by sRNA-seq, i.e. either differential gene expression of more than two-fold in the same direction or no differential gene expression. In the case of Bte-miR-6001-3p, low expression levels meant that, in the five Northern blots attempted, only two yielded sufficient signal to allow us to determine relative expression.
We also used Northern blots of RNA extracted from larval and pupal samples in cohort 3 to investigate whether the caste-differentiated miRNAs that were validated by the Northern blots (Bte-miR-6001-5p and Bte-miR-6001-3p) were expressed in a tissue-or stage-specific manner.
To conduct Northern blots, we extracted total RNA from larvae sampled from cohort 2 using the Trizol protocol described above. We then separated the RNA on a 15% denaturing polyacrylamide gel and transferred the RNA to a Hybond-NX nylon membrane (GE Healthcare, Amersham, UK) using Semi-dry membrane transfer apparatus (Scie-plas, Cambridge, UK). We bound the RNA to the membrane by chemical crosslinking. We hybridized membranes overnight at 37°C in UltraHyb-Oligo hybridization buffer (Ambion) with probes that were reverse complementary to the miRNA of interest and were labelled with 32P using T4 polynucleotide kinase and [γ -32 P] ATP. We rinsed the membranes with a wash solution (0.2 × sodium chloride/sodium citrate, 0.1% (w/v) sodium dodecyl sulphate (SDS)), then exposed them on a blank phospho-imaging screen inside a radioactive cassette (Fujifilm, Billingham, UK). We scanned the screens on a Molecular Imager FX Pro Plus (Bio-Rad, Hemel Hempsted, Hertfordshire) using Quantity One to visualize the signal. To allow membrane re-use, we stripped the membrane of its radioactive signal by incubating it in a stripping solution (pH 8.5, 0.1% SDS, 5mM EDTA) at 95°C for 20-40 minutes. All membranes were re-probed with U6, which is a small nuclear RNA acting as a loading control 27 .

Identifying the genomic region and function of caste-differentiated miRNAs
To investigate miRNA function, we identified the region where validated caste-differentiated miRNAs were expressed on the B. terrestris genome 11 . We used Basic Local Alignment Search Tool (BLAST) software to identify the region, and the program GBrowse to identify the nucleotide sequence (and therefore the neighbouring genes) around each miRNA. These analyses revealed that the caste-associated miRNA, Bte-miR-6001, and its precursor sequence, form a mirtron 28,29 located in the gene Very High Density Lipoprotein (VHDL).
In addition, we used miRanda 30 to identify a list of targets for the two validated casteassociated miRNAs, Bte-miR-6001-5p and Bte-miR-6001-3p. We scanned the B. terrestris miRNAs against the 3'UTR regions of genes in A. mellifera on the NCBI database (http://www.ncbi.nlm.nih.gov/), since the 3'UTR regions are the sequences usually targeted by miRNAs [31][32][33] and the genome annotation in A. mellifera 34 is more extensive than in B. terrestris 11 . We then compared the 3'UTR region of these genes for sequence homology to B. terrestris genes, discarding sequences that were not conserved between A. mellifera and B. terrestris. We identified the gene ontologies of the remaining genes using the Drosophila gene ontology terms on FlyBase (http://flybase.org/). Table S1. Expression patterns (from other studies) of selected miRNAs in female larvae or prepupae of Apis mellifera miRNA Pattern in Apis mellifera References Ame-miR-9a More highly expressed in queen-destined prepupae than in worker-destined prepupae. 24 More highly expressed in worker jelly than in royal jelly. 25 More highly expressed in worker-destined larvae than in queen-destined larvae. 26 Ame-miR-184 More highly expressed in worker jelly than in royal jelly. 25 Experimental increase caused queen-destined larvae to develop worker-like characteristics. 25 More highly expressed in queen-destined larvae than in worker-destined larvae. 26 Ame-miR-71 More highly expressed in worker-destined prepupae than in queen-destined prepupae. 24 More highly expressed in worker jelly than in royal jelly. 25 Ame-miR-275 More highly expressed in worker jelly than in royal jelly. 25 More highly expressed in queen-destined larvae than in worker-destined larvae 26 Table S2. Details of samples of Bombus terrestris larvae used for sRNA-seq analysis (cohort 1). Columns of numerical data represent: total number of larvae that hatched during the period when brood development was being monitored by photographing colonies; of these, the number of early-instar larvae sampled (removed) and then pooled for RNA extraction; the number of early-instar larvae left within the colony; the number of late-instar larvae sampled and then pooled for RNA extraction; the number of late-instar larvae left within the colony to complete development to adulthood; and the adult caste of the remaining unsampled larvae, i.e. numbers of adult workers and queens eclosing. The far right column shows the sequencing library to which each colony contributed.  Table S3. Details of samples of Bombus terrestris larvae used for Northern blot analysis (cohort 2). Columns of numerical data represent: total number of larvae that hatched during the period when brood development was being monitored by photographing colonies; of these, the number of early-instar larvae sampled (removed) and then pooled for RNA extraction; the number of early-instar larvae left within the colony; the number of late-instar larvae sampled and then pooled for RNA extraction; the number of late-instar larvae left within the colony to complete development to adulthood; and the adult caste of the remaining unsampled larvae, i.e. numbers of adult workers and queens eclosing. The far right column shows the Northern blot sample to which each colony contributed.  Table S4. Details of libraries prepared from samples of Bombus terrestris larvae and used for sRNA-seq (cohort 1). Columns of numerical data represent: number of larvae pooled to make each library; total read count (total number of sequence reads obtained); filtered reads, %filt (number and percentage of reads that do not contain unassigned nucleotides); reads with an adapter sequence, %adp (number and percentage of reads for which the first 8 nt of the 3' adapter was found); accepted reads, %acc (number and percentage of accepted reads (longer than 16nt) after the HD signature was trimmed); NR (number of non-redundant (unique) reads in each sample); C (sequence complexity, i.e. ratio of non-redundant reads to redundant reads (all reads) in each sample); GMR, %GMR (number and percentage of redundant reads mapping to genome); GMNR, %GMNR (number and percentage of non-redundant reads mapping to genome); and GMC (complexity of reads mapping to genome).  Table S6. Details of the numbers and percentages of miRNAs, tRNAs, and coding DNA sequences (CDSs) represented in sRNA-seq data from samples of Bombus terrestris larvae. Columns of numerical data represent, for each class of gene: R, %R (the number and percentage (of the total sequences that mapped to the genome) that were redundant reads (all reads)); NR, %NR (the number and percentage (of the total sequences that mapped to the genome) that were non-redundant reads (unique reads)); and C (the sequence complexity, i.e. ratio of non-redundant reads to redundant reads (all reads) in each sample).  Figure S1: Boxplots showing the distribution of proportions of each specific nucleotide base at any given position in the 21mer, 22mer, 23mer, and 24mer sequences identified through sRNAseq in Bombus terrestris, including the four-nucleotide signature of the 5' end of the 3' HD adapter sequence (positions highlighted in red). X axis, the nucleotide position on the focal sequence; Y axis, the proportion of each base (black represents adenine, red represents cytosine, green represents guanine, blue represents thymine) at a given nucleotide position on the focal sequence. In each boxplot, the thick horizontal bar is the median, box edges are quartiles, whiskers are the upper and lower 5%, and circles are outliers. Data from the 12 libraries remaining in the final analysis. Figure S2. Matrix of Jaccard indices comparing 16 sRNA-seq libraries prepared from Bombus terrestris female larvae from cohort 1. For the 500 most abundant sequences, the Jaccard index denotes the proportion of shared sequences between pairs of libraries. There were four biological replicates of each of four phenotypes, the phenotypes being early-instar queen-destined larvae (EQ), early-instar worker-destined larvae (EW), late-instar queendestined larvae (LQ) and late-instar worker-destined larvae (LW), the suffixed numbers representing replicate numbers. Figure S3. Size class distribution plots (following normalization) of RNA sequences (of lengths 21-27 nt) from the 16 sRNA-seq libraries prepared from Bombus terrestris female larvae (four biological replicates of each of the four phenotypes, EW, EQ, LW and LQ, as defined in the legend to Figure S2). Figure S4. Distribution of differences in normalized gene expression levels (offset fold change) between pairs of biological replicates within each phenotype class in 16 sRNA-seq libraries (four phenotype classes, each with four biological replicates) prepared from Bombus terrestris female larvae, for reads within the size-class range, 16-35 nt. Figure panel header: replicates being compared, with phenotypes as defined in legend of Figure S2; X axis, size class of reads being compared; Y axis, log2 fold change (OFC) value following normalization. The red line at Y = 0 denotes no differential expression between replicates. The blue lines correspond to ± 0.5log2(OFC), which approximates ± 1.5(OFC). In each boxplot, the thick horizontal bar is the median, box edges are quartiles, whiskers are the upper and lower 5%, and black points are outliers.