Identification of a TNF-TNFR-like system in malaria vectors (Anopheles stephensi) likely to influence Plasmodium resistance

Identification of Plasmodium-resistance genes in malaria vectors remains an elusive goal despite the recent availability of high-quality genomes of several mosquito vectors. Anopheles stephensi, with its three distinctly-identifiable forms at the egg stage, correlating with varying vector competence, offers an ideal species to discover functional mosquito genes implicated in Plasmodium resistance. Recently, the genomes of several strains of An. stephensi of the type-form, known to display high vectorial capacity, were reported. Here, we report a chromosomal-level assembly of an intermediate-form of An. stephensi strain (IndInt), shown to have reduced vectorial capacity relative to a strain of type-form (IndCh). The contig level assembly with a L50 of 4 was scaffolded into chromosomes by using the genome of IndCh as the reference. The final assembly shows a heterozygous paracentric inversion, 3Li, involving 8 Mbp, which is syntenic to the extensively-studied 2La inversion implicated in Plasmodium resistance in An. gambiae involving 21 Mbp. Deep annotation of genes within the 3Li region in the IndInt assembly using the state-of-the-art protein-fold prediction and other annotation tools reveals the presence of a tumor necrosis factor-alpha (TNF-alpha) like gene, which is the homolog of the Eiger gene in Drosophila. Subsequent chromosome-wide searches revealed homologs of Wengen (Wgn) and Grindelwald (Grnd) genes, which are known to be the receptors for Eiger in Drosophila. We have identified all the genes in IndInt required for Eiger-mediated signaling by analogy to the TNF-alpha system, suggesting the presence of a functionally-active Eiger signaling pathway in IndInt. Comparative genomics of the three type-forms with that of IndInt, reveals structurally disruptive mutations in Eiger gene in all three strains of the type-form, suggesting compromised innate immunity in the type-form as the likely cause of high vectorial capacity in these strains. This is the first report of the presence of a homolog of Eiger in malaria vectors, known to be involved in cell death in Drosophila, within an inversion region in IndInt syntenic to an inversion associated with Plasmodium resistance in An. gambiae.


Results
Assembly of the genome of IndInt strain. Fourteen generations of isofemale lines from a parent T6 lab-colony of the intermediate-form of An. stephensi (IndInt) were grown, both to achieve homogeneity required for high-quality assembly from sequencing of pooled DNA from multiple insects and to enrich for Plasmodium resistant phenotype 20 . Multiple tools were used to obtain contig-level assembly of IndInt from 60X coverage of reads sequenced using PacBio Sequel II technologies. The assembly statistics obtained from the various tools are shown in Table 1. In the first round, the best assembly statistics were obtained using the WTDBG2 tool with a L50 value of 6 and N50 9.1 Mbp. The next best statistics was shown by the tool Flye assembler with a L50 of 15 and N50 value of 3.9 Mbp.
To improve the contiguity of the assembly, the four different contig-level assemblies from CANU, FALCON, FLYE, and WTDBG2 [22][23][24][25] were merged in different combinations using the tool Quickmerge 26 . Assembly statistics www.nature.com/scientificreports/ were computed for each of the merged combinations to determine the best merged assembly (Table 1). During the second round, the assembly with the best statistics was obtained by merging assemblies from FLYE and WTDBG2, with a L50 of 4 and a high N50 of 17 Mbp. The computed genome size of this merged assembly was comparable to the actual size of An. stephensi genome, and was used for scaffolding. The scaffolds of the errorcorrected contigs were built using the SSPACE assembly tool with simulated mate-pairs of varying sizes from the assembly of IndCh strain as reference. The L50 value for the scaffold level assembly was 3 and the N50 value significantly increased from 17 to 37 Mbp (Table 1, last row). In order to check any heterozygosity from potential inversions, haplotype phasing of the assembly of the IndInt strain was done using a set of primary and associated contigs from FALCON and FALCON-unzip tools, respectively 23 . In the presence of heterozygosity, the mapped reads will be split between two contigs that represent the two forms of the heterozygous allele, resulting in reduced coverage. This creates two peaks in the histogram of read counts, one for the haploid level coverage from heterozygous regions and the other for the diploid level coverage from homozygous regions. A hump in the histogram indicated the presence of potential inversion heterozygosity in the genome of IndInt (Fig. 1a).
Synteny between the assemblies of IndInt and IndCh strains is shown in Fig. 1b. The block view clearly suggests that IndInt is homozygous for the standard form of 2Rb (2R +b /2R +b ) inversion similar to the IndCh strain. However, in the 3L arm of chromosome 3, IndInt shows the clear presence of an inversion not seen in the IndCh assembly. The locus of the physical markers revealed that the markers from 42A to 44C are reversed in the assembly, which matched with a reported breakpoint for the 3Li inversion in An. stephensi 9 . Based on the photomaps from several individuals of IndInt strain (see Fig. 1e), it is clear that 48% of the individuals are heterozygous for the 3Li inversion (3L +i /3Li), with the rest (52%) being homozygous for the standard form of 3Li (3L +i /3L +i ) ( Supplementary Fig. 1). It should be mentioned that the percentage heterozygosity reported here is from an outgrown generation maintained in the lab after the generation used for assembly. Markers 41A, 41B, 41C, and 46D shown in the standard form reported for 3Li (top Fig. 1c) are not detectable in the IndInt assembly using the BLASTN tool, perhaps because of limitations of the assembly tools in handling noise generated by reads representing the two arms near the inversion breakpoints from the heterozygous arms. Figure 1d suggests that the IndInt assembly is comparable to the high-quality genomes of other Anopheles species, Drosophila, and humans at both the contig and scaffold levels.
Validation of the 2Rb breakpoint region in IndInt. The IndInt strain is homozygous for the standard form of the 2Rb inversion, making it the second assembled genome for one of the three possible 2Rb genotypes, such as 2R +b /2R +b . This provides an opportunity to further refine and validate the breakpoint regions described previously by our group 6 . A BLASTN of the sequences from breakpoint regions from IndCh including 1000 bases to the left and right of both breakpoints, against the IndInt assembly, shows 100% identity for all the 2079 bases from the centromere-proximal breakpoint and 99.8% identity for all the 2009 bases from the centromeredistal breakpoint. We also provide the actual alignment with 50± nucleotides around the breakpoints in Supplementary Fig. 2 This confirms the 2Rb genotype to be 2R +b /2R +b for IndInt.

Interrogation of 3Li breakpoints in IndInt.
In order to define the breakpoints for the 3Li inversion, a contact map was generated by mapping the HiC reads from the UCI strain onto the final IndInt assembly. Since UCI is homozygous for the standard 3Li (3L +i /3L +i ) configuration and IndInt is heterozygous (3L +i /3Li), two expected butterfly-like structures in the chromosome 3L were observed in the contact map ( Fig. 2a) defining the location of both the proximal and distal breakpoints of the 3Li inversion in the IndInt assembly. By magnifying each of the butterfly structures to a resolution of 5 Kbp per dot, the proximal and distal breakpoints of the 3Li inversion were estimated to be between bp 22,545,000 to 22,550,000, and between bp 31,555,000 to 31,560,000 respectively.
Synteny of 3Li inversion with 2La of An. gambiae. Synteny of the final IndInt assembly was also plotted against the An. gambiae PEST strain 27 . Figure 2b shows that chromosome X, 2R, and 3R of An. gambiae are homologous to that of IndInt, respectively. However, the synteny between chromosomal arms 2L and 3L of An. gambiae are switched to chromosomal arms 3L and 2L of IndInt, respectively, as reported elsewhere 6 . Figure 2c shows the synteny between the 2La inversion of An. gambiae with the 3Li region of IndInt. The 2La region of An. gambiae was taken from a previous study 28 . The circle view (Fig. 2c) of this synteny reveals that the entire 3Li inversion region of IndInt consisting of 8 Mbp is syntenic to a big part of the 2La inversion region of An. gambiae comprising 21 Mbp. Since the 2La inversion of An. gambiae is associated with Plasmodium sensitivity 12,13 , by analogy, we infer that the 3Li of IndInt may also be associated with the Plasmodium resistance observed for IndInt strain. Furthermore, it is reported that the strains of An. gambiae that are homozygous for the standard form of 2La (2L +a /2L +a ) show higher vectorial capacity than the corresponding strains that are heterozygous for the inverted form. Similarly, the strain IndCh with higher vectorial capacity is homozygous for the standard form of 3Li (3L +i /3L +i ) and IndInt is heterozygous for the 3Li inversion (3L +i /3Li), suggesting the potential role of the 3Li inversion in Plasmodium sensitivity.
Functional characterization of genes within the 3Li region. Within   www.nature.com/scientificreports/ species 29 , while also resisting Plasmodium infection 30 . We found 99 missense mutations in the IndInt strain within 23 copies of the cuticle genes, as compared to the IndCh strain of the type form, which has a homozygous configuration for the standard form of 3Li, like the UCI strain. In order to find genes implicated in immune response within the 3Li region of IndInt, we intersected the 494 genes within this region with a list of 589 genes implicated in immunity in insects 16 . Out of these 589 genes, we found only 17 genes within the 3Li region (Table 2) including NFkB, a transcription factor implicated in cytokine production in humans. www.nature.com/scientificreports/ The genes implicated in Plasmodium resistance in An. gambiae 31 LRIM1, APL1C, and TEP1 are not within the 3Li region in IndInt. The TEP1 protein in IndInt is located on chromosomal arm 2L. The LRIM1 gene is outside the 3Li region, while the gene APL1C is located immediately outside the distal breakpoint of the 3Li region. The LRIM1 gene has leucine-rich repeats and forms a disulfide-bonded complex with APL1C (PDB ID: 3OJA 31 ), which stabilizes the mature TEP1 and promotes its binding to parasites to trigger pathogen destruction. Also, more recently, a knockout of LRIM1 in An. stephensi was shown to play a role in vector competence 32 . Furthermore, LRIM1 and APL1C are within the 2La region in An. gambiae, which is three times larger than the 3Li region in IndInt. In IndInt, the LRIM1 has four and six missense mutations compared to IndCh and UCI strains of type-form, respectively. However, since the gene APL1C shows no missense mutation in IndInt compared to IndCh, we expected other unannotated paralogs of LRIM1 within the 3Li region of IndInt. Deep annotation of the 30 functionally uncharacterized genes within the 3Li region shown in Supplementary Fig. 3 with fold-prediction algorithms, revealed two novel leucine-rich proteins, g22432 and g22212, similar in structure to LRIM1 (Supplementary Fig. 3a and 3b). These two LRIM1-like genes could potentially work with the intact APL1C gene in IndInt to launch a host defense system to antagonize the malaria parasite. The high conservation of these, otherwise uncharacterized genes across Anopheles species, shown in the multiple sequence alignments of g22432 and g22212, suggests functional conservation across species (Supplementary Fig. 4a and 4b).
Recently, members of LPS-induced TNF-alpha factors, LITAFs, have been shown to regulate anti-Plasmodium immunity in An. Gambiae 33,34 . A search in the assembled genome of An. stephensi identified four members of LITAF-like transmembrane genes (Supplementary Text 1). LITAFs are known to send signals to the nucleus to transcribe TNF-alpha molecules in humans 35 . In fact, LPS-induced animal models were used in developing drugs against TNF-alpha to treat auto-immune disorders. We were also able to find homologs of all the genes involved in this signaling pathway in IndInt (Supplementary Text 2).
We found an Eiger homolog, g22826, which is within the 3Li region of IndInt and is missing in the reported list of 361 genes linked to the immune repertoire in An. stephensi 5 . A crystal structure of the complex of Eiger and its receptor from Drosophila has revealed that Eiger belongs to the TNF superfamily and its two receptors, Wengen and Grnd, are members of the TNFR superfamily (PDB: 6ZT0 36 ) [37][38][39] . A search for an active TNFRlike gene in An. stephensi, using the HMM model (PF00020) of the cysteine-rich domain (CRD) from PFAM database, identified two potential hits g1129 and g18030 in IndInt. It should be mentioned that while Wengen displays detectable homology with TNFRSF, predicted Grnd homologs from many sequenced species, including malaria vectors, remain uncharacterized by sequence-based annotation tools. The gene, g1129, from chromosome X of the IndInt strain encodes a Wengen homolog of Drosophila, with a well-defined extracellular CRD, and a transmembrane region. The Grnd homolog (g18030) in the IndInt strain consists of an intact CRD region sandwiched between a predicted signal sequence and transmembrane region (Supplementary Fig. 5c and 5d). We found that the Eiger gene has a frame-shift mutation at the tail-end of the very conserved folding domain in the IndCh strain. Also, in the UCI and STE2 strains, the predicted Eiger homologs are missing the critical Table 2. Genes within the 3Li region of IndInt that are reported to be implicated in immune response by Waterhouse et al. 16 .  Fig. 6). In Fig. 3a, a 3D Fig. 3b. The choice of templates for comparative modeling of the CRDs of the two receptors is based on the conserved cysteine patterns of the ligand binding regions, which are different for the two receptors Wengen and Grnd. Interestingly, the Wengen binding domain has a cysteine pattern similar to TNFR-1A of humans and that of Grnd is similar to that from TNFR-1B, suggesting an evolutionary-conserved link between Eiger and TNF-alpha mediated signaling in arthropods and vertebrates, respectively.

A functional TNF-alpha/TNFR pathway in malaria vectors with lower vectorial capacity.
It is now well established that the Drosophila genome harbors two functional TNFR receptors and a single ligand, Eiger, belonging to the TNF-TNFR system 36 . In order to predict whether the homologs of the Eiger and its cognate receptor genes in An. stephensi (IndInt) are potentially functional, we used signal peptide and transmembrane predictions on the respective protein sequences derived from IndInt ( Supplementary Fig. 5). The Eiger homolog shows a transmembrane region at its N-terminus ( Supplementary Fig. 5a) consistent with type II configuration common to all members of the TNF superfamily across organisms. The two receptor homologs show clear transmembrane regions (Supplementary Fig. 5b and 5c) following the cysteine-rich folding domain. However, only Grnd, but not Wengen, has a predicted signal peptide signature ( Supplementary Fig. 5d) N-terminal to the CRD, consistent with the localization of these two genes in Drosophila, where Wengen is almost always localized in intracellular vesicles 36 . We also found the homolog of the TACE gene in IndInt, which is a TNF-alpha processing metalloprotease that converts the type-II membrane-anchored ligand into its soluble/   41 . In the Eiger gene from IndInt, we also found a cleavage site (AQ) N-terminal to the folding domain that is known to be specific to TACE 42 . The gene TACE itself is rendered active by genes called iRhom1 and iRhom2, which are actively being developed as potential targets to inhibit TNF-alpha processing. Interestingly, there are homologs of iRhoms in IndInt, albeit with a much longer N-terminal region than human iRhom1 and 2.
In order to show that all the genes in the cytoplasm necessary for Eiger-Wgn-Grnd mediated signaling are encoded by the genome of IndInt, we identified homologs of all the pathway genes from TNF-alpha and eiger signaling pathways from KEGG [43][44][45] (Supplementary Text 2). Figure 4 shows genes implicated in Wengen-based (left) and Grnd-based (right, hypothesis) based on the homology of the receptors to TNFR-1A and -1B, respectively. Several genes are disrupted in the other strains of type-form viz IndCh, UCI and STE2, as shown in Fig. 4, Supplementary table 1 and Supplementary table 2. The homologs for genes marked by red and purple stars are disrupted and absent, respectively, in the IndCh genome, potentially disrupting the Eiger-mediated signaling pathway. These genes are Eiger, ADAM17, Tab2, IKKb, NFkB, and Caspase. Interestingly, genes with blue stars are missing in all 4 assembled genomes.
It should be mentioned that a majority of these genes including Eiger, Grnd, and Wengen are missing in the collated list of 589 immune related genes in insects 16 and, more specifically, missing from a list of 361 immune related genes in An. stephensi reported recently in the UCI strain 5 .

Discussion
We report a high-quality genome assembly of an isofemale line of the intermediate-form of An. stephensi, Ind-Int, displaying low vectorial capacity relative to the strain IndCh of type-form 20 . A comparative analysis of this assembly with those of type-forms reveals a 3Li inversion (in a heterozygous configuration) that is uniquely . Pathways leading to Eiger-Grnd/Wgn-mediated signaling collated from TNF-TNFR and Eiger-Wgnmediated signaling pathways obtained from the KEGG database [43][44][45] and validated using other sources 46,47 . Red stars represent the disrupted genes in the IndCh genome, purple star is the gene missing in IndCh but present in IndInt, blue stars are the missing pathway genes in all four assembled An. stephensi genomes. The inset table shows the status of the genes in all 4 strains (IndInt, IndCh, UCI and STE2) of An. stephensi for which a highquality genome is available.  Fig. 2c, this inversion is syntenic to the 2La inversion region in An. gambiae, the heterozygous genotype of which is shown to be associated with Plasmodium resistance 12 ; thus, providing an opportunity to zero-in on genes that may be offering resistance to Plasmodium.
Here, for the first time, we report the presence of an Eiger homolog, a member of the TNF family, within this inversion region (Fig. 3b). Considering that Eiger has been shown to cause cell death in Drosophila 37 , it is tempting to hypothesize apoptotic role for Eiger in defence of parasites. Furthermore, considering that the Eiger gene in the IndCh strain harbors a frameshift mutation within the structurally folding domain and considering the extensive role of TNF-alpha in human immunity, it is tempting to predict a role for the Eiger gene in the innate immune response of An. stephensi against Plasmodium in IndInt strain. Similarly, the predicted Eiger genes in the STE2 and UCI strains are both disrupted in the N-terminal segments lacking a transmembrane region and several genes in the Eiger pathway are disrupted in the IndCh strain (Fig. 4, Supplementary table 2). These observations suggest a testable correlation between high vectorial capacity in An. stephensi strains with a dysfunctional Eiger-mediated signaling pathway, which has a lower vectorial capacity for Plasmodium with a potentially functional Eiger-signaling pathway. Furthermore, the increased expression of Eiger-Wgn genes in response to infection by Nora virus in Drosophila 48 demands an exploration of the role of Eiger mediated signalling in Plasmodium resistance in An. stephensi and in other malaria vectors, in general. More recently, in support of this, it has been shown that inhibitors of JNK 49 , a downstream gene of the Eiger/TNF-alpha pathway, abolish resistance to Plasmodium in both An. gambiae 50 and An. stephensi 51 .
The gene TNF-alpha and its two cognate receptors in humans, TNFR-1A and -1B, are implicated both in adaptive and innate immune response. Indeed, drugs inhibiting this pathway are now available to treat various autoimmune diseases such as rheumatoid arthritis, Crohn's disease, and psoriasis. The two cognate receptors of Eiger, Wengen and Grnd, show homology in their cysteine patterns to TNFR-1A and TNFR-1B respectively (Fig. 3c), suggesting evolutionary conservation of this signaling pathway. Consistent with the ability of the Eiger gene to participate in innate immune response analogous to signaling by TNF-alpha, we found genes in An. stephensi (IndInt) that are homologous to genes implicated in TNF-alpha mediated signaling (Fig. 4). Most interestingly, Eiger is not only a type-2 membrane protein like TNF-alpha, but the presence of homologs of ADAM17 and those of iRhoms in IndInt suggests that Eiger is rendered soluble/active similar to TNF-alpha, thus strengthening the evolutionary conservation of this entire pathway. Considering that many genes downstream of the TNF-alpha pathway are involved in multiple processes, it is surprising that a majority of these are missing from the list of genes with immune repertoire reported for insects 16 (Supplementary Table 1).
It is known that human TNF-alpha reduces Plasmodium falciparum growth and activates calcium channels in human malarial parasites 52 . It is reasonable to hypothesize that mosquito vectors may also elicit similar responses via the Eiger-Wengen-Grnd signaling pathway in parasites to control their growth. Furthermore, the Eiger mediated cell-death in Drosophila could be reversed by JNK inhibitors 36 . Most importantly, JNK inhibitors reverse Plasmodium resistance in both An. gambiae 50 and An. stephensi 51 . These observations support the hypothesis that a functional Eiger pathway in malaria vectors may reduce the growth of Plasmodium falciparum, similar to the action of TNF-alpha in the human host.
Drosophila and mosquitoes diverged some 350 million years ago, suggesting that the TNF-alpha based innate immunity may have evolved before this point of inflection between vertebrate and invertebrates. A search for TNF and TNFR-like molecules in the Alpha-fold database of C. elegans reveals ~ 14 TNF-like molecules without any TNFR-like homologs. The TNF-like molecules in C. elegans are actually cerebellin homologs. Interestingly, cerebellins are also both type 2-membrane proteins and form threefold functional oligomers just like members of the TNF superfamily. As proof of this, there are homologs of glutamate receptors and neurexin genes encoded by C. elegans, which are involved in forming synaptic structures in rats 53 . Since the TNF-alpha signaling system is conserved in flies, mosquitoes and vertebrates, missing/lost in nematodes, innate immunity by the TNF-alpha mediated signaling may predate the invertebrate-vertebrate split and TNF-like fold evolution may even be older.

Conclusion
The three forms of An. stephensi (type, intermediate and mysorensis) displaying varying vector competence, which are clearly identifiable by their number of egg ridges, offered an ideal system to discover genes implicated in Plasmodium resistance by interrogating genomes of strains with low vectorial capacity. Here, we have demonstrated that a bottom-up approach starting from phenotype to gene discovery, is more efficient in identifying hidden/divergent genes implicated in Plasmodium sensitivity. We have identified several genes in An. stephensi from gene families known to be involved in innate immunity including LITAFs, novel homologs of LRIM1 within the 3Li region, a homolog of TNF-alpha within the 3Li region, and majority of the genes from TNF-alpha signaling pathway. In particular. this is the first report of a homolog of Eiger in vectors with demonstrated role in innate immunity particularly apoptosis. This approach has opened up several novel avenues/molecular targets for advancing our understanding of the evolution of Plasmodium resistance in vectors and, perhaps, offer effective ways to manage malaria in the near future.

Materials and methods
Maintenance of An. stephensi in insectary. Larvae of An. stephensi were collected from Sriramanahalli area of Bangalore rural (13.4310° N, 77.3310° E) from the state of Karnataka, India, and successfully colonized in the insectary. Larvae were provided with larval food prepared by mixing Brewer's yeast and dog biscuits at a ratio of 30:70. Pupae were bleached with 1% sodium hypochlorite for 1 min and kept in adult rearing cages (Bugdorm-4S3030) (W32.5 × D32.5 × H32.5 cm). The adults were fed on a mixture of 8% sucrose, 2% glucose solution mixed with 3% multivitamin syrup (Polybion LC®) 54 . Adults were maintained in the insectary at 28 ± 1 °C, RH 75 ± 5% and 12:12-h day and night photoperiod cycles. Adult females were provided with blood on day 7 or 8  55 . On day 3, each gravid female was transferred carefully to a single ovicup. Each ovicup was covered with a nylon net. A cotton ball soaked with sugar solution was kept on the top of each ovicup. The line from single female (G 0 ) was selected to generate the isofemale lines that laid the highest number of eggs and had highest percent hatchability with egg ridge numbers 15-17. The eggs thus collected from the single female were allowed to hatch, and the larvae were provided with larval food and reared to adults as per the protocol mentioned earlier. Emerged adult siblings (G 1 ) were kept in a mosquito rearing cage and allowed to inbreed (sibling mating). Mated females were blood fed, and the same procedure was followed as mentioned earlier and continued till further generations. As of July 2022, the IndInt isofemale colony has been maintained in our insectary for 55 generations.
Karyotyping of chromosome 3 of IndInt strain. Polytene chromosomes were prepared from the ovarian nurse cells collected from semi-gravid females of the IndInt isofemale lines as per the method of Ghosh & Shetty, which was adapted from our recent publication 6 . The semi-gravid females were anesthetized and placed on a microslide in a drop of phosphate buffer. The ovaries were pulled out gently and fixed in modified Carnoy's fixative (methanol: acetic acid, 3:1) for 2-3 min. The remaining mosquito body was preserved in an Eppendorf tube with a few drops of phosphate buffer solution and kept at − 20 °C for PCR analysis. After fixation, the ovaries were stained with lacto acetic orcein for 15-20 min. After staining, 60% acetic acid was added to it and a clean coverslip was placed on top of the stained sample. Gentle pressure was applied on the cover glass for squashing. The edges of the coverslip were sealed with nail polish to avoid evaporation. The slides were examined under a microscope using a 40X objective lens for chromosome inversions. The inversion nomenclature and their frequency were recorded 56 .
Assembly. PacBio reads from the IndInt strain were assembled independently using FLYE 24 and WTDBG2 25 assemblers. The resulting assemblies were combined using Quickmerge 26 where the FLYE assembly was taken as the query and the WTDBG2 assembly served as the reference. Two rounds of Arrow polishing (PacBio gcpp v2.0.2) using PacBio reads, followed by two rounds of Pilon polishing (v1.22) using 100 × Illumina reads, were carried out on the merged assembly. In order to proceed with reference based de novo assembly, simulated mate-pair reads were generated from the IndCh standard assembly (https:// doi. org/ 10. 1038/ s41598-022-07462-3) using the tool WGSIM (https:// github. com/ lh3/ wgsim) for the sizes 50 Kbp, 100 Kbp, 250 Kbp, 500 Kbp, 1 Mbp, 2.5 Mbp and 5 Mbp. These simulated mate-pair reads along with the contig level assembly from FLYE + WTDBG2 served as input to obtain a scaffold level assembly using the SSPACE 57 tool. DNA physical marker sequences for each chromosome arm published by Jiang et al. 21 . were downloaded and aligned against the scaffolds using BLAST, in order to assign the chromosomal positions to the scaffolds. The scaffolds were stitched based on the order and orientation of the markers into pseudo-chromosomes. 100 N's were added between two scaffolds during stitching. Physical marker sequences were realigned to the stitched chromosomes for validation.
Haplotype phasing. The assemblers FALCON and FALCON-Unzip 23 were used to phase the IndInt genome into haplotypes. Raw IndInt PacBio reads were used by the FALCON assembler to produce a set of primary contig files and an associate contig file representing the divergent allelic variants. The output of primary and associate contig files were then used by FALCON-Unzip to produce partially phased primary contigs (all_p_ctg.fa) and fully phased haplotigs (all_h_ctg.fa), which represent divergent haplotypes. Polishing was performed on the phased contigs by FALCON-Unzip using Arrow. This procedure was adapted from our recent publication 6 .
Purge haplotigs. The tool 'Purge Haplotigs' 58 was used to determine the degree of heterozygosity in the IndInt strain. PacBio reads of IndInt strain were mapped to the consensus primary assembly (cns_p_ctg.fasta) obtained from FALCON-Unzip to obtain an aligned BAM file. Coverage analysis was performed on the BAM file to generate a read-depth histogram.

Identification of 3L breakpoints in IndInt.
The tool HiC-Pro 59 was used to generate a contact map by mapping the HiC reads from the UCI strain onto the genome of IndInt with different inversion genotype to produce a butterfly-like structure representing the breakpoints, which were visualized and extrapolated from the contact map of the tool Juicebox 60 . The breakpoints were validated by mapping the PacBio and Illumina reads onto the respective genomes. Alignment using 'unimap' was performed between the genomes of the IndInt and UCI strains, to determine the precise breakpoints for the 3Li inversion to consolidate with the findings from HiC-Pro.
Gene annotation. AUGUSTUS (version 3.2.3) 61 , an eukaryotic gene prediction tool, was used to find protein-coding genes for all the assembled genomes. The model organism closest to An. stephensi, Ae. aegypti, which was made available by AUGUSTUS, was used for gene prediction. This procedure was adapted from our recent publication 6 .