EGFR gene methylation is not involved in Royalactin controlled phenotypic polymorphism in honey bees

The 2011 highly publicised Nature paper by Kamakura on honeybee phenotypic dimorphism, (also using Drosophila as an experimental surrogate), claims that a single protein in royal jelly, Royalactin, essentially acts as a master “on-off” switch in development via the epidermal growth factor receptor (AmEGFR), to seal the fate of queen or worker. One mechanism proposed in that study as important for the action of Royalactin is differential amegfr methylation in alternate organismal outcomes. According to the author differential methylation of amegfr was experimentally confirmed and shown in a supportive figure. Here we have conducted an extensive analysis of the honeybee egfr locus and show that this gene is never methylated. We discuss several lines of evidence casting serious doubts on the amegfr methylation result in the 2011 paper and consider possible origins of the author’s statement. In a broader context, we discuss the implication of our findings for contrasting context-dependent regulation of EGFR in three insect species, Apis mellifera, D. melanogaster and the carpenter ant, Camponotus floridanus, and argue that more adequate methylation data scrutiny measures are needed to avoid unwarranted conclusions.

worker larvae reared in hive" 12 . Since this result stands in stark contrast to the absence of amegfr methylation in published honey bee methylomes 9,15,16 , including larvae of the same age as in the Kamakura paper 12 , we have conducted detailed analyses to re-examine the methylation status of egfr in Apis mellifera. We show not only that amergf is never methylated but has a high GC content that is consistent with non-methylated genes, and consider the origins of this unfeasible result in the Kamakura study.

Results
Our desire to meticulously examine the levels of egfr methylation in Apis was inspired by the absence of any methylated CpGs in this locus in the published methylomes from both queens and workers Table 1). In particular, the larval methylomes 9 representing queen and workers of the same 96 hrs stage of development as in the Kamakura's paper 12 were indicative of a potential problem. In addition, no methylation of amegfr has been found in several brain methylomes generated by two independent labs 15,16 . For comparison, we have contrasted sequencing coverage and methylation levels for egfr with a consistently methylated gene nadrin using 11 methylomes representing different tissues, cell types and developmental stages (Table 1). We also have noted that the 736 bp region of amegfr deemed to be methylated in the Kamakura study 12 is rich in CpG dinucleotides, which is untypical for methylated genes in Apis that are CpG-depleted because of the known tendency of 5-methyl cytosines to be converted into thymines 8,15,17,18 . The ratio of observed to expected (o/e) CpG dinucleotides is 1.104 in the 736 bp fragment and 1.324 for the whole gene. Such values are associated with non-methylated genes in Apis 8,15 ( Fig. 1) that can be easily separated from methylated genes in a bimodal distribution of the o/e frequency of CpGs in all annotated genes. However, given the low sequencing depth of genome-wide methylomes and the possibility that amegfr may be methylated in a restricted number of cell types that could have been picked by chance in the 2011 study, we have decided to reproduce the original low-depth plasmid sequencing experiment using ultra-deep sequencing of the same genomic region 12 . Unfortunately, we have failed to amplify the 736 bp DNA fragment following the conditions described in the 2011 paper. This is not entirely surprising as it is nearly impossible to amplify DNA fragments of such length from bisulfite treated DNAs. The DNA fragmentation during the treatment 19 leads to a practical upper size limit of the PCR amplicon ~400-500 bp, and the need for nested primers and a second round of PCR is often essential 20 . Instead, we used two sets of outer, and two sets of nested primers to amplify two overlapping amplicons (A1 = 440 bp and A2 = 441 bp) covering the supposedly methylated region of amegfr. To properly design our amplicons we had to re-annotate the old erroneous EGFR gene model used in the 2011 study (see Figure S1 for details). These amplicons have been processed and barcoded for ultra-deep sequencing with the NEB kit for Illumina libraries. Both libraries A1 and A2 have been sequenced together with additional libraries representing confirmed methylated genes on Illumina MiSeq using the Reagent Kit v3 that generates up to 25 million reads (15Gb) of 2 × 300 bp paired end reads.
As shown in Table 2 even with a depth of over 330,000 reads per amplicon virtually no methylation of EGFR in larval samples has been detected. The methylation pattern counts were normalised with MPFE 21 in order to eliminate most of the spurious patterns caused by sequencing errors and incomplete bisulfite conversion. As a result, for both amegfr amplicons A1 and A2, over 99.99% of reads were classified as not methylated. The very few methylated patterns observed in egfr are most likely technical noise and  Table 1. Sequencing coverage and egfr/nadrin methylation in the honey bee whole-genome bisulfite sequencing projects. *average from queen and worker libraries, **in preparation.
represent less than 0.01% of the total counts. Failure to detect methylation in the egfr region is unlikely to be a PCR artefact because methylated DNA tends to be over-represented in bisulfite sequencing data 22 .
With this depth of sequencing it is likely the methylation patterns in essentially every cell type can be visualised as illustrated by a genuinely methylated gene, nadrin (Fig. 2). In larval samples, 157 distinct methylation patterns were observed in this amplicon after removing the spurious patterns. Whether or not these patterns represent unique epigenetic signatures of 157 cell types in growing larvae remains to be established.

Discussion
Our findings showing that the honey bee egfr gene belongs to the non-methylated category make the result shown in the Kamakura paper 12 impossible to reproduce. There are three lines of evidence questioning the correctness of the original claim. First, on the basis of its high CG content, egfr is bioinformatically predicted to be non-methylated. Second, genome-wide methylation profiles in larvae, brains and embryos from both queens and workers show no evidence of methylation. Third, our new data using ultra-deep amplicon sequencing failed to detect any sign of methylated cytosines in larval samples. At this stage the source of this questionable result is unclear. One possibility is that the original DNA samples were not properly converted with bisulfite thus leading to a false impression that some cytosines in egfr were protected by methyl tag. This problem could have been exaggerated by omitting nested primers that typically are used to improve the recovery of AT-rich methylated amplicons from BS-treated DNA. However, the increase of amegfr methylation only in worker larvae in two situations shown in the Kamakura paper ( Figure S34) suggests that such an experimental clarification is unlikely. Whatever the reason for this doubtful result is, our findings have important consequences for understanding how conditional phenotypes are implemented in various lineages by engaging cell surface signalling via EGFR and its growth factor ligands. While it is not surprising that this important cell surface receptor has been implicated in the queen crafting process in honey bees and growth regulation in Drosophila, it is evident that its regulation is not contingent on DNA methylation. Although a queen bee can be experimentally induced by silencing de novo DNA methylation in newly hatched larvae by means of RNAi approach 13 , such treatment  is pleiotropic and affects hundreds of genes and pathways relevant to nutritional sensing, such as for example, the TOR/insulin network, and importantly, Anaplastic Lymphoma Kinase that has the capacity to directly induce downstream events by bypassing insulin signalling 9,23 . A queen phenotype can also be induced by interfering with the expression of genes belonging to the TOR/insulin network 24 and possibly other manipulations affecting nutritional signalling. For example, royal jelly is exceedingly rich in both methionine and sources of methyl groups (choline) and some of its MRJPs are unusually rich in this essential amino acid, providing substrates for methylation activities. In this context, the finding by Grandison et al. 25 that in Drosophila, methionine alone was necessary and sufficient to increase fecundity as much as did full feeding, but without reducing lifespan, is striking. Although the effects of RJ on honey bee larval growth are still not fully appreciated, there is little doubt that methylation changes and diet are clearly linked. It may be prudent to more closely examine the relationships between caloric restriction, methylation and foods rich in methionine, acetylcholine with longevity and fecundity in both Apis and Drosophila. Royalactin is simply one of very many components that contribute to network flux.
Obviously, it has a defined role in this process, but until all components of RJ are better understood its exclusive role in vivo should be considered with caution. It is most interesting that egfr methylation has recently been implicated in generating quantitative variation in size of the carpenter ant 26 allowing for comparative analyses of a highly conserved EGFR in three insect species in the epigenetic context. Drosophila lost its DNA methyltransferases and has to use other epigenetic mechanisms for EGFR regulation, whereas in two Hymenopterans with the complete DNA methylation toolkit, ants and honey bees, only one recruits methylation to regulate egfr. One possible explanation for such a contrast between ants and honey bees is that methylation in the carpenter ant is utilised as an indirect modulator of egfr for continuous size variations, whereas in Apis such a flexible epigenomic modification of egfr would not be desirable for the proper development of the focal individual that is critical for the colony. These comparative analyses underscore the inherent pitfalls in data transferability between different species at particular levels. In spite of a high level of EGFR conservation, both structural and functional, its context-dependent regulation appears to be driven by distinct mechanisms in insect species. An organism can utilise many different cellular systems to accomplish the same end result as long as it is the desired phenotypic outcome. Indeed, to solve the same problem many different designs can be constructed even from heteromorphic, but functionally similar elements because of the high level of degeneracy in biology 27 .
In conclusion, we argue that methylation data claimed to be relevant to specific biological processes need to be supported by in-depth analyses using newer, high resolution methods. However, whatever platform one utilizes for methylomic insights, all of them have to be properly scrutinised to answer the critical question: how relevant is the observed differential methylation to the phenomenon under investigation?

Materials and Methods
DNA bisulfite conversion and amplicon preparation. 1.5 μ g of larval genomic DNA 28 was bisulfite converted using the QIAGEN Epitect ® Bisulfite Kit, as per the manufacturer's protocol 29 . We routinely perform two consecutive treatments to avoid incomplete conversion especially in GC-rich regions. The converted DNA was amplified via a nested PCR reaction with amegfr specific primers (see Table S2 for BS-seq primers). The PCR products were purified utilising Agencourt ® AMPure ® XP PCR Purification system (Beckman Coulter).

NGS library preparation.
Libraries were prepared from 500-600 ng of each amplicon utilising the NEBNext ® DNA Library Prep Master Mix for Illumina ® , and NEBNext ® Multiplex Oligos for Illumina ® Index Primers Set 1 and Set2 (New England Biolabs). Size selection of adaptor ligated DNA was performed using Agencourt AMPure XP beads (Beckman Coulter), with the bead:DNA ratio of the first bead selection 0.9X, followed by a second bead selection with bead:DNA ratio at 0.2X (not sure about the ratios, I do not think this is necessary). Each library was eluted in 30 μ L of 0.1X TE, library size confirmed via agarose gel electrophoresis (we used Caliper LabChip GXII and HT DNA High Sensitivity Assay), and diluted to a final concentration of 4 nM. NGS MiSeq sequencing. Next generation sequencing was performed on Illumina MiSeq instrument using MiSeq Reagent Kit v3 (Illumina) and 600 cycles. PhiX spike was added at 5% concentration as recommended by Illumina for low-diversity libraries.

Analysis of bs-seq results.
For each individual analysed the frequency at which a mCpG occurred was calculated across all reads using custom Python scripts and open-source software. The process comprised of two steps. In the first, pairs of reads with the 30 nucleotide sequence starting at position 4 matching exactly the last 30 nucleotides of the primers used for nested amplicon PCR were extracted from FASTQ files, aligned with in silico bisulfite-converted genomic template using MUSCLE 29 , overlapping regions (if any) were proportionally truncated and, after removing all aligner-introduced gaps, both reads were combined into one continuous sequence and appended to a to a separate file ("extract file") for each amplicon and each library/sample. In addition, a quality filter was applied, rejecting all sequences shorter than 90% of the length of the template or containing in excess of 5% gaps. In the second step, batches of sequences from the "extract" files were re-aligned with the template using MUSCLE (to eliminate any potential positional errors introduced by read indels), the aligned template sequence was used to calculate positional information of all the expected CpGs and SNPs, and the positional data were used to score methylation status [ie. 0 for T and 1 for C occurring at a CpG position] for each combined read pair. The data were next appended to a separate table for each amplicon and each library/sample. The final tables were used to calculate and graph amplicon methylation data using MPFE 21 .