Repeated inversions within a pannier intron drive diversification of intraspecific colour patterns of ladybird beetles

How genetic information is modified to generate phenotypic variation within a species is one of the central questions in evolutionary biology. Here we focus on the striking intraspecific diversity of >200 aposematic elytral (forewing) colour patterns of the multicoloured Asian ladybird beetle, Harmonia axyridis, which is regulated by a tightly linked genetic locus h. Our loss-of-function analyses, genetic association studies, de novo genome assemblies, and gene expression data reveal that the GATA transcription factor gene pannier is the major regulatory gene located at the h locus, and suggest that repeated inversions and cis-regulatory modifications at pannier led to the expansion of colour pattern variation in H. axyridis. Moreover, we show that the colour-patterning function of pannier is conserved in the seven-spotted ladybird beetle, Coccinella septempunctata, suggesting that H. axyridis’ extraordinary intraspecific variation may have arisen from ancient modifications in conserved elytral colour-patterning mechanisms in ladybird beetles.

Understanding the biology of intraspecific phenotypic diversity has been of considerable interest, from both ecological and genetic perspectives. Harmonia axyridis shows incredible phenotypic diversity, which is determined genetically, but is also influenced by environmental temperature.
Ando et al. have made a huge leap forward in understanding how H. axyridis can maintain such an array of diversity. Here, they have compared tissue structures during elytral development to show where colours are laid down, knocked out a candidate gene with RNAi (pannier), found differential expression of pannier in different wing sections, sequenced the genome and performed fine-scale linkage analysis. Together this comprehensive analysis shows pannier is required for melanisation and the pannier locus maps precisely to the colour patterning region of the genome. Pannier also determines colour pattern in other ladybirds (Coccinella septempunctata) demonstrating functional conservation throughout their evolution. This analysis is really impressive, work is succinctly and clearly explained and I really enjoyed reading it. I only have very minor suggestions/questions.

Line 80-81
From the main text, it's not clear how and why you chose pannier as a RNAi target. To test one candidate gene and find it's the right one seems very lucky. As you write "This result was unexpected because pannier is not essential for wing blade patterning in Drosophila". If there were other genes tested, perhaps they could be mentioned or put in a supplementary file. Otherwise, the RNAi experiments should follow after other evidence has been presented (genetic mapping).
2. Resequencing of two forms was done; hC (F6 strain) and hA (NT3 strain) but there didn't seem to be any comparative genomics across the pannier gene. It would be nice to know how different these forms are and whether there is any variation in Scalloped binding site number. Considering protein coding changes in pannier don't account for phenotypic diversity, it would have been nice to see a little more on the regulatory regions.
3. Line 31. It seems a bit unusual stating that previous genetic work was done by 'Asian' geneticists, yet you didn't provide the ethnic background of Theodosius Donzhansky in the previous sentence. Consequently, I would recommend removing the ethnic associations unless they serve a clear scientific purpose. 4. Line 150-151 Claiming Scalloped binding motifs accumulated H. axyridis pannier intron 1, relative to other insects, seems like an unfair comparison. Given intron 1 of Ha is so much larger than the other insect species investigated, the expectation would be that all binding motifs to have 'accumulated'. A fairer comparison might be to compare the number of scalloped binding sites in the H. axyridis pannier 173 kb intron 1 and compare it to the genome average across 173 kb windows. Is there any reason RNAseq expression data was only assessed for vestigial and not scalloped (Supp. Fig. 7

Reviewer #3 (Remarks to the Author):
This is a review for the manuscript titled "pannier determines highly diverse intraspecific elytral colour patterns of ladybird beetles" submitted to Nature Communications. This addresses the overarching evo-devo question of how genetic variation can shape natural phenotypic variation. To weigh in on this question the authors leverage a well-suited model the ladybird beetle species Harmonia axyridis (H. axy.). It has been found for H. axy. that it has 20 alleles at a single genetic locus responsible for around 200 aposematic elytral colour patterns. The authors use a suite of genetic techniques to demonstrate that this genetic locus is for the conserved GATA transcription factor gene known as pannier. Pannier promotes the formation of black colour while simultaneously suppressing the red colour's formation. Moreover the work implicates the cis-regulatory region of pannier as harboring one of the major genetic variants contributing to the phenotypic variation. Furthermore, the authors extend beyond the intraspecific variation for H. axy. and show that pannier seems to have a more ancient role in elytral colour patterning as it seems to similarly shape the colour pattern of the seven spotted ladybird beetle species Coccinella septempunctata.
Overall, I found this manuscript to possess a compelling set of results on a compelling and central evo-devo question. While the figures look amazing, their descriptions were generally too vague. Most disappointingly, though, was the absence of clear justified evolutionary model for the inclusion of pannier in elytra colour patterning. I suggest rejecting the manuscript, but I encourage a resubmission with an improved discussion and figure legends as described below.
Major Concerns: 1. The pannier gene in situ presented in Figure 3b is not the most convincing result. Specifically the more intense pattern of expression in regions that develop to be black coloured is not obvious. If a better result cannot be provided, I suggest softening the language on page 13 line 98 to read "pannier seemingly showed higher". 2. The data supporting the co-option of Vestigial/Scalloped is very very speculative based upon some motifs in the non-coding region of pannier. The presentation of this model should be more indicative of its weekly supported nature. Perhaps call this "One of many plausible models". Within this model, when about did pannier evolve to regulate the genes for black and red pigments? This model needs to be elaborated on. It would be beneficial to have a conceptual figure for this model. The co-option of the wing selectors and pannier regulation of red and black genes should date back before the common ancestor shared with Coccinella. Does the RNAi phenotype for pannier where black colour is missing from the head suggest pannier regulated pigmentation before the co-option event of SD/Vg? This evolutionary model needs to be fleshed out. The discussion lacks clarity and focus on pages 20-21. 3. Page 19. The authors show that the regulatory region of an allele of pannier has expanded in H. axy. However we know little of what the gene structure is for Coccinella. It is likely to be under a conserved regulatory hierarchy. Does it have SD/Vg binding motifs too? 4. The figure legends lack sufficient detail to appreciate what is being shown and its importance. In particular Figure 4a.
Minor Concerns: Page 5 Lines 29 and 30 are highly redundant with Page 4 Lines 24-26. Page 5 Line 31, replace "by Asian geneticists" with just "by geneticists" Page 5 Line 38 and 39, This sentence seemed clunky and do not agree with "it had to be a single gene". I suggest changing it to something like "By elucidating the mechanisms responsible for how this single genetic locus evolved to shape such a strikingly diverse intraspecific colour polymorphism would provide a case-study that bears upon a major evolutionary-developmental biology question; how does morphology evolve?" Page 11. It is not obvious to how the authors came to test the Drosophila notum patterning pannier gene. Page 17 line 141. Change "gene body size of pannier" to "size of the pannier locus" Page 17 line 145. Change to "motif of the insect wing" Page 17 line 146. Can you add a calculation to support the "more accumulated"? Perhaps motifs per 1 kilo base pair. Page 20 line 179. Sentence does not work. Change to "regulate the multiple intraspecific wing colour patterns." Page 20 line 180. Change "pathways" to "mechanisms" and change "represent" to "stem from". This is also a run-on sentence that should be chopped into two sentences. Page 30 line 277. Change "raise" to "increase" and change "penetrance to" to "penetrance in" Page 31 line 293. You should share the sense probe images to letters readers compare and contrast with the antisense probe signal. Page 36 line 351. Change "Totally 12" to "In total, 12"

Responses to Reviewers:
First of all, we performed additional sets of de novo genome assemblies of h C , h A and h alleles in H. axyridis and a strain in C. septempunctata, and could obtain a contiguous genomic scaffold including pannier in each sample. As a result, we found traces of repeated inversions at the 1st intron of pannier in H. axyridis, which seemed to be the major driving force to have facilitated high diversification of elytral colour patterns in the ancestral lineage of H. axyridis. According to this finding, we changed the title of our paper as "Repeated inversions at the pannier intron drive diversification of intraspecific colour patterns of ladybird beetle". Please see our responses to each Reviewer's comments below, and also please evaluate descriptions on the new findings in the revised version of our manuscript.
Responses to Reviewer #1: Comment #1 -I found awkward, however, that the RNAi data was presented before the mapping, as usually, a forward genetics approach first identifies a candidate gene that is then assessed by reverse genetics, such as a knockdown. If this is really reflective of the experimental chronology that occurred, I would not mind keeping the current order, but typically, Fig. 4 should actually come as one of the first two figures before subsequent developmental genetics work, and loss-of-function data as a final climax?
The presentation order that Reviewer #1 suggested is actually normal when we select a forward genetic approach. However, the order of the data that we presented in this paper is according to the chronology that we actually experienced. Thus, we did not change the presentation order in the revised version of our manuscript. Alternatively, as pointed out by Reviewer #2 and #3, we inserted description on our initial candidate approach. The actual phenotypes in the first small screening were listed in Supplementary   Table 1 -Can the author clarify what is the evidence that pannier drives the HSp allele, while this was not included as an allelic parent in their crosses?
-And is it necessary to speculate here that pannier drives more minor alleles, while the mechanism for "mosaic dominance" is clearly far from being understood?
I recommend the authors to streamline their abstract, introduction and discussion to reflect the fact that pannier maps to three alleles for now (h, hC, and hA ; see Methods) and that further work is required to test the hypothesis this locus is a hotspot of phenotypic variation hosting more . I trust pannier will map to more alleles, but I do not think it is timely to make that claim yet.
In H. axyridis, Hosino (1934) first described that the major four alleles are genetically linked to the same genetic locus based on his genetic crossing experiments. Successive genetic experiments by Hosino (Hosino, 1936(Hosino, , 1939(Hosino, , 1940(Hosino, , 1941(Hosino, , 1942(Hosino, , 1943a(Hosino, , 1943b(Hosino, , 1948 Table 2) Comment #4 -GENOME SCAFFOLDING AROUND PANNIER: a big weakness of the paper is due to the lack of scaffold contiguity around pannier (for instance in Fig. 4). For now, the authors can simply not substantiate their claim (here taken from the Abstract) that their data "reveal a ladybird-specific large 150 kb-scale intronic expansion in the pannier locus as a characteristic genomic structure that may have facilitated expansion of elytral colour pattern variation". Perhaps there is indeed something interesting in this first intron, but we want to know what it is instead of beating around the bush. This thus seems to me like a key aspect of the paper that needs to be addressed.
The authors could to deploy alternative sequencing strategies to obtain contiguity across their genetic interval. A possibility would be to isolate High Molecular Weight Genomic DNA from their three strains, and run outsource library preparation and low coverage sequencing using PacBio SMRT cells.
The long reads could then be used to scaffold their previous HiSeq short reads. A number of alternatives based on optical mapping or cross-linking exist (10X Genomics, Dovetail Genomics, NanoBioGenomics, fosmid libraries). Ultimately, it would be ideal to obtain the panier genomic scaffold for the 3 mapped alleles and assess the possible role of structural variation (large indels, short inversions, TEs) in relationship to the phenotype.
We additionally performed linked-read and long read analyses of h, h A and h C alleles, and C. septempunctata using PacBio (approximate mean coverage: 10x) and 10x Genomics Chromium platforms (approximate mean coverage: 200x [Hiseq X ten]), and obtained contiguous scaffolds including pannier and neighboring genes. Mainly, four new finding were obtained from this analysis: (1) The first intronic sequences of pannier are highly diverged among h, h A and h C alleles compared to the neighboring genomic sequences, and contained traces of repeated inversion within the 1st introns (Fig. 5a); (2) Molecular phylogenetic analysis using the conserved intronic sequences revealed that the h allele and the common ancestor of other three alleles diverged first during evolution, and the latter three alleles diverged recently (Fig. 5d); (3) Several sequence blocks in the intron are conserved in C. septempunctata as well (Fig. 5a, C. sep); (4) Repertoires of known DNA binding motifs in the h, h A and h C pannier intronic regions are also highly diverged among the alleles (Table 1a). We included the results described above in our revised manuscript, and modified the discussion accordingly.

Comment #6
This is an exciting manuscript that deserves publication in Nat. Comm. , and the finding of pannier is well supported. I believe it deserves more efforts on the narrowing of the causal variation around pannier, and perhaps more efforts testing if obvious structural variations may underlie the three alleles.
Thank you very much for the comment. We additionally performed a RAD-seq analysis, and individual genotyping using three sets of genetic crosses in total to narrow down the responsive genomic regions. Furthermore, we performed de novo genomic assembly of h C , h A , and h alleles in H.
axyridis, and C. septempunctata to reveal structural variations around the pannier locus. We believe that the quality of our manuscript is now much improved thanks to your comments. It seems a bit unusual stating that previous genetic work was done by 'Asian' geneticists  Table 1).
We showed the expression levels of vg because this gene is the only transcription factor gene statistically upregulated at 24h AP. Therefore, sd was not significantly upregulated at this stage (data not shown). We inserted additional explanation to clarify this point as follows: "Furthermore, the RNA-seq data for the h C background also revealed that the sd co-activator gene vestigial was the only transcription factor gene that was significantly upregulated in the future black region from early pupal stages (Supplementary Figure 6) (2) Molecular phylogenetic analysis using the conserved intronic sequences revealed that the h allele and the common ancestor of other three alleles diverged first during evolution, and other three alleles diverged recently; (3) Several intronic sequences are conserved in C. septempunctata as well; (4) Repertoires of known DNA binding motifs in the h, h A and h C pannier intronic regions are also highly diverged among the alleles." Based on our new genomic data, we inferred the order of emergence among the 4 alleles in H. axyridis. In addition, we estimated that repeated inversion events within the 1st intron of pannier can be the major driving force to generate and maintain diverse colour patterns within a species.
The "head" phenotype mentioned in the comment is to be exact the "prothoracic" phenotype. Anyway, we think that this prothoracic phenotype is a key trait to estimate the origin of the colour-patterning function of pannier. However, in order to address this issue, we need to examine prothoracic colour-patterning function in other ladybird beetles, because we do not know whether the prothoracic colour patterning function observed only in H. axyridis corresponds to the ancestral function or a derived function at present. We think that this issue should be addressed in future research, but not in this paper. We revised the latter explanation to avoid redundant expression (Page 5, Lines 26-29).
Comment #6 -Page 5 Line 31, replace "by Asian geneticists" with just "by geneticists" We eliminated the ethnic association, as also pointed out by We changed the wording as suggested (Page 34, Lines 321-322).

Comment #13
-Page 20 line 180. Change "pathways" to "mechanisms" and change "represent" to "stem from". This is also a run-on sentence that should be chopped into two sentences.
We changed the wordings as suggested, and split the sentence into two sentences. (Pages 34-35, Lines 322-324) Comment #14 -Page 30 line 277. Change "raise" to "increase" and change "penetrance to" to "penetrance in" We changed the wording as suggested (Page 44, Lines 434).

Comment #15
-Page 31 line 293. You should share the sense probe images to letters readers compare and contrast with the antisense probe signal.
We inserted sense probes images in Figure 3b (Page 13).

4.
Line 150. Add the word "performed". "…genome assembler (Platanus2), and performed additional de novo.." We added "performed" in the sentence. It would also be great to have the intron 1 region indicated in this image, or to indicate exonic sequence, which should be conserved.
In Figure 5a, each sequence is compared relative to the one below it. To clarify this point, As suggested by Reviewer #1, we also performed an additional dot-plot analysis around the pannier locus, and clarified that the traces of the inversions are located within the first intron of the longest pannier transcript. (Supplementary Figure 5)