Unique features of a global human ectoparasite identified through sequencing of the bed bug genome

The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite.

. Distribution of transcription factor families across insect genomes. Heatmap depicting the abundance of transcription factor (TF) families across a collection of insect genomes.
Each entry indicates the number of TF genes for the given family in the given genome, based on presence of DNA binding domains. Color key is depicted at the top (blue means the TF family is completely absent). Species and TF families were hierarchically clustered using average linkage clustering. C. lectularius is boxed, and two TF families discussed in the text are indicated with triangles.

Supplementary Figure 42. Confirmation of candidate Lateral Gene Transfers (LGTs).
LGT regions were amplified using one of two procedures; either Phusion High-Fidelity Amplicons were purified using the Nucleospin Gel and PCR Cleanup kit (Macherey-Nagel; Düren, Germany) and sequenced with the primers used for amplification.
These notes were provided to complement the manual annotation process. The sections have been edited minimally for formatting, but largely remain as is to convey information provided directly from each annotation group.

SUPPLEMENTARY NOTE 1
Community curation A total of 1,352 gene models and 1,479 mRNA models were curated, in addition to 2 pseudogene models. Some models only received functional curation, which includes gene and mRNA names, symbols, descriptions, PubMed references, Gene Ontology categories, cross-references to other genes, and other comments explaining annotation actions as necessary; others were both structurally and functionally annotated. Curators handled a diversity of genes, although some gene families were dominant. For example, 146 cuticle proteins from the CPR family were curated, along with 102 unclassified cuticle proteins; 114 chemoreceptors were manually annotated, which includes 30 IRs, 36 GRs,and 48 ORs. Information about the C. lectularius genome project is available at its i5k Workspace@NAL organism page (https://i5k.nal.usda.gov/Cimex_lectularius). All tracks used by the general curation group, as well as the Official Gene Set, are publically accessible via the JBrowse genome browser 8 at the i5k Workspace@NAL (https://apollo.nal.usda.gov/cimlec/jbrowse/) 9 . The genome assembly and Official Gene Set can also be searched via BLAST+ 10 (https://i5k.nal.usda.gov/webapp/blast/). Information on all manually curated genes is provided as a summary table (Supplementary Data 2).

SUPPLEMENTARY NOTE 2
Significance of antioxidant genes in bed bug biology Blood meals are rich in pro-oxidants, and are known to contain high concentrations of compounds that lead to formation of reactive oxygen species (ROS). It is a well-known fact that digestion of hemoglobin, and specifically, heme, generates a plethora of ROS. A strong antioxidant enzyme system is therefore required to ameliorate and overcome blood meal-induced oxidative stress. We identified 36 genes belonging to 8 primary and secondary antioxidant gene families (excluding glutathione transferases) in the bed bug genome (Supplementary . Bed bugs possess all the antioxidant enzymes found in other blood-feeding insects like Rhodnius, Pediculus, and Anopheles. Interestingly, however, preliminary analysis shows that bed bugs have more catalase (Cat) and thioredoxin reductase (TrxR) genes than A. gambiae, P. humanus and D. melanogaster.
Catalases are known to prevent the formation of free hydroxyl radicals by reducing hydrogen peroxide into water and oxygen and TrxRs are important for catalyzing the activation of the antioxidant enzyme, thioredoxin (TrxS 2  TrxSH 2 ). Previous research has shown that a strain of Anopheles gambiae refractory to Plasmodium infection exhibited differential expression of certain thioredoxin, catalase, and superoxide dismutase genes following blood feeding 11 . Analyzing expression of antioxidant genes in the bed bug before and after blood feeding would reveal significant genes associated with ROS detoxification, heme digestion, and immunity. Finally, three bacterial catalases were also annotated and could likely indicate contamination from endosymbiotic bacterial DNA as noted in Apis mellifera and Drosophila species genome sequencing projects 12,13 .

SUPPLEMENTARY NOTE 3
Aquaporin genes in the bed bug genome We have identified 7 aquaporin genes for bed bugs that include the typical Drip, AQP2, AQP4 (two sequences), AQP5, AQP6 and Bib ( Supplementary Fig. 1). This number falls within the range of most insects (6)(7)(8) and Cimex has members of each group previously identified for insects 14

SUPPLEMENTARY NOTE 4
Supplementary text for the Cimex lectularis chemoreceptors The chemoreceptor (OR, GR, and IR) families were manually annotated. Briefly, TBLASTN searches of the genome assembly were performed using Acyrthosiphon, Pediculus, and Drosophila proteins as queries, and gene models were manually assembled in the text editor TEXTWRANGLER. Iterative searches were conducted with each new Cimex protein as query until no new genes were identified in each major subfamily or lineage. Additional searches included BLASTP and PSI-BLASTP searches 15 of both the MAKER and AUGUSTUS gene model proteins (the BLASTP searches of the AUGUSTUS proteins were most useful, with the subsequent PSI-BLASTP searches turning up only one additional divergent OR). All of the Cimex genes and encoded proteins are detailed in Supplementary Data 6-8. The gene models for these have been updated in the WebApollo genome browser.
Rather unusually there were no long pseudogenes in any of the three families, but there were several shorter fragments of genes that were not included in Supplementary Data 6-8 or the analyses because they encode less than 50% of a typical family member length. All Cimex, Acyrthosiphon, and Pediculus proteins in each family, as well as select other insect GRs, and all Drosophila melanogaster IRs, were aligned in CLUSTALX v2.0 16 using default settings, and problematic gene models were refined in light of these alignments. For the GRs, whole family alignments appeared unsatisfactory, so separate alignments of the sugar, carbon dioxide, fructose, and then remaining GRs by species were performed and then combined in profile alignment mode to obtain the final alignment. PhOr11 and 12 are too short to include and ApGr12 was removed as it is so highly divergent it disrupts the alignments.
For phylogenetic analysis, the poorly aligned and variable length N-terminal and C-terminal regions were excluded, as were any long internal length difference regions, for example between the longer ORCO proteins and most of the other ORs, and multiple regions within the IR alignment, using TRIMAL v4.1 17 , with positions retained only if present in 80% of the sequences. Phylogenetic analysis was carried out using maximum likelihood executed in PHYML v3.0 18 with default settings. Trees were colored and arranged in FIGTREE v1.4 (http://tree.bio.ed.ac.uk/software/figtree/), declaring roots as indicated in each figure legend.
All peptide sequences for the GRs, IRs, and ORs can be acquired by request from Hugh Robertson (hughrobe@uiuc.edu) or Joshua Benoit (joshua.benoit@uc.edu).

The odorant receptor family
The odorant receptor (Or) family of seven-transmembrane proteins in insects mediates most of insect olfaction (e.g. Su et al. 19 , Touhara and Vosshall 20 ), with additional contributions from a subset of the distantly related gustatory receptor (Gr) family, for example, the carbon dioxide receptors in flies [21][22][23][24] , and a subset of the more recently described and unrelated ionotropic receptors (IRs) [25][26][27][28] . The Or family ranges in size from a low of 12 genes in the human body louse Pediculus humanus 29 to 400 in the ant Pogonomyrmex barbatus 30 . The other sequenced hemipteroid insect, the pea aphid Acyrthosiphon pisum, has 79 genes 31 , which is an average size for insects. Although most of the 60 Or genes in Drosophila melanogaster are scattered around the genome (e.g. Robertson et al. 32 ), with only a few in small tandem arrays, tandem arrays are more typical of other sequenced insects, especially those with large repertoires, from which it is inferred that these larger repertoires partly result from retention of gene duplicates generated in these tandem arrays by unequal crossing over (e.g. Robertson and Wanner 33 ).
The ClOr gene set consists of 48 gene models, with one model encoding two proteins through alternative splicing, for a total of 49 proteins (Supplementary Data 6). All are intact, which is somewhat unusual for insect odorant receptors, where there are usually at least a few pseudogenes. Two genes are nevertheless incomplete because of gaps in the assembly, but they are likely to be intact in the genome. Two more required fixes of the assembly. ClOr12 has part of an exon missing in a gap and it was fixed with raw reads. More remarkably, the ORCO gene also required a fix to replace an in-frame stop codon in the third exon. The raw reads reveal that this is a "polymorphism" involving a 3-bp indel, which in the intact version introduces an extra amino acid as well (thus the version in WebApollo is a readthrough of the pseudogenic allele). The exact nature of this situation is unclear, because approximately equal numbers of reads are present for each version in all libraries, including the long mate pair libraries that were generated from multiple individuals. It seems unlikely that a balanced polymorphism would have precisely 50% heterozygotes in the lab colony, unless it was a balanced lethal (presumably involving this gene and/or neighboring genes). Alternatively, both copies of this gene exist in the genome and their assembly was merged in the whole genome shotgun assembly.
Detailed examination of the two haplotypes has not yet resolved this issue.
The MAKER set of gene models employed as the Official Gene Set was particularly depauperate for these Ors, with partial models for just three genes, however most of them were at least partially modeled in the AUGUSTUS set (Supplementary Data 6), with only 7 absent, although many required changes. The gene structures of these Ors share a few features with other insect Or genes, specifically the commonly present final three phase 0 introns and a preceding phase 2 intron, which was commonly preceded by a long first exon, however this long first exon was often interrupted by a variety of additional introns, up to a total of six in Or39 (Supplementary Data 6; Supplementary Fig. 2).
The phylogenetic tree reveals that the Cimex and Pediculus OR families contain mostly old lineages, with entirely species-specific expansions. In stark contrast, the Acyrthosiphon ORs consist of two large expansions where most genes are very young 31 .
Most of the bed bug ORs have long branches, with only a few recent duplications, e.g. Or 7/8, 10/11, 13/14, 16/17, 18a/b, 25/26, 29/30, and 40/41, and most of these are tandem pairs in the genome (as are a few older pairs and one triplet -Supplementary Data 6). It appears therefore that the olfactory abilities of the bed bug and Pediculus have not changed much for a long time, while Acyrthosiphon has undergone enormous recent changes in its olfactory abilities 34 .
The Gr family ranges in size from a low of 6 genes encoding 8 proteins in the human body louse 29 and 10 genes in the honey bee Apis mellifera 33 to 215 genes encoding 245 proteins in the flour beetle Tribolium castaneum 36 . The pea aphid Acyrthosiphon pisum has 77 Gr genes 36 . The Gr family is more ancient than the Or family, which was clearly derived from within it 32,37 , and is found in the crustacean Daphnia pulex 38 , the centipede Strigamia maritima 39 , the tick Ixodes scapularis (HMR, unpublished), and many other animals (Saina et al. 40 ; HMR, unpublished). This evolutionary history is reminiscent of the more recently described ionotropic receptors (Irs) 24,25,27 , many of which also function in gustation 41,42 .
The ClGr gene set consists of only 24 models, encoding 36 proteins, smaller than that of most other insects, except Apis mellifera 33 , Pediculus humanus 29 , Ceratosolen solmsi 43 , and Glossina morsitans 2 . Like the Ors, there are no long pseudogenes, although a few highly degraded pseudogenic fragments are present in the genome. Five genes were modeled as being alternatively spliced, in the same fashion as several Grs in flies and some other insects, with alternative long first exons spliced into three shared short Cterminal exons, although in the absence of transcriptome evidence these models remain hypothetical. Some of these proteins are so divergent we were concerned about missing some, so in addition to TBLASTN searches, a final check for possible divergent genes/proteins was performed by PSI-BLASTP search of the AUGUSTUS modeled proteins with two iterations, which did not reveal any new models (the AUGUSTUS models more commonly included the existing Grssee Supplementary Data 8). The AUGUSTUS modeling had access to all available insect Grs in GenBank, for comparative information, and succeeded in building at least partial gene models for 19 of these 24 genes (but not the alternatively spliced transcripts); however, only one of these was incorporated into the official gene set. Most of the AUGUSTUS models required at least one change. The basic gene structure for the entire ClGr set is a long first exon, followed by three short Cterminal exons separated by three phase 0 introns. The locations of these introns and their phases are the same as predicted by Robertson et al. 32 to be ancestral to the entire insect chemoreceptor superfamily, and are also shared with Gr genes in other animals (Saina et al. 40 ; HMR unpublished). There were only a few exceptions: Gr5 has one additional intron while Gr1-4 has 2-3 additional introns, all interrupting the first long exon.
Cimex contains four genes encoding proteins related to the highly conserved carbon dioxide receptors of flies, and these were named Gr1-4 (Supplementary Figure 3). This carbon dioxide lineage is absent from all Hymenoptera sequenced to date, as well as Acyrthosiphon and Pediculus, so they appeared to have been lost repeatedly. A large related subfamily expansion was discovered in the termite Zootermopsis nevadensis 44 , indicating that this gene lineage is indeed ancient in insects, and this finding of them in Cimex confirms this inference. It remains to be shown that they indeed participate in perception of carbon dioxide.
Cimex also contains a gene encoding another conserved protein, named ClGr5, an ortholog of the DmGr43a protein that functions as a fructose receptor (Supplementary The remaining Cimex GRs (6-24) are quite divergent from any of the conserved Grs, and form a distinct lineage in the tree. These include all of the alternatively-spliced models. As was true for most of the Ors, the long branches to most of these proteins are similar to those to the Pediculus proteins, and in stark contrast to most of the aphid Grs, which form two recently expanded gene subfamilies that reveal evidence of positive selection of amino acids indicative of adaptive divergence 31 . Most of the remaining Drosophila Grs are implicated in perception of bitter tastants, but it is hard to be confident of such a function for these bed bug Grs and their Pediculus relatives.

The ionotropic receptor family
In addition to the Or and Gr families in the insect chemoreceptor superfamily 32 , there is a second completely different family of olfactory and gustatory receptors in insects, the ionotropic receptors 25,28 , which clearly evolved from the ionotropic glutamate receptors involved in synaptic transmission 26 . These proteins are somewhat larger than the Ors and Grs, and have three transmembrane domains comprising a cation channel and an external ligand-binding domain. They function as obligate heterodimers or higher multimers. While some of these Irs are highly conserved, and have been implicated in olfaction, others are highly divergent and some are implicated in gustation 28,41,42 . Like the Ors, all of which function as heterodimers with the highly conserved ORCO protein 40 , most IRs function in complexes with some of the most conserved proteins, specifically IR8a and/or IR25a 27,28 .
The ClIr gene set consists of 30 models, larger than Acyrthosiphon with 19 and Pediculus with 14 ( Supplementary Fig. 4) 26,44 . This number is nevertheless considerably less than the 65 genes in Drosophila melanogaster 23,25,28 , which has at least one large flyspecific expansion, and a lot smaller than the termite Zootermopsis nevadensis, which has 150 Irs 44 . Once again there are no large intact pseudogenes, although the N-terminus could not be identified for two genes (Ir41e and 75b), so they might be pseudogenes but these might also be genome assembly problems, while Ir41d contains a gap in the assembly that was repaired with raw reads ( Naming of the Irs is somewhat complicated. Following the example of Croset et al. 26 , those with obvious simple orthologs in Drosophila were named for that gene/protein, despite these names having no significance for the bed bug, having been designated for their cytological location in Drosophila melanogaster (see also Terrapon et al. 44 ). Cimex has two paralogous amplifications of receptors that are also multi-copy in Drosophila (DmIr41a/76a/92a and DmIr75a-d/31a/64a/84a), and these were named with lower case letters that do not imply orthology with the similarly named Drosophila genes (Ir41a and Ir75a-d). Finally, Cimex, like Pediculus and Acyrthosiphon, has a set of highly divergent IRs only weakly related to the divergent Irs of Drosophila, and these were named Ir101-106 to avoid any confusion with the Drosophila Irs, which only go up to Ir100a.
The ligand-specificity is known for only a few Irs in Drosophila, so relatively little can be said about possible ligands and roles for these Cimex Irs. Grosjean et al. 47 report that DmIR84a along with Ir8a is responsible for perception of phenylacetic acid and phenylacetaldehyde, but Ir84a has no simple hemipteran ortholog, albeit being part of the Ir75 expansions ( Supplementary Fig. 4). In Drosophila, Ir75a-c along with Ir8a are implicated in perception of propionic acid, while Ir76a (which is related to the Ir41a expansion), Ir76b (a reasonably conserved potential co-receptor), and the co-receptor Ir25a form a functional receptor for phenylethyl amine. Thus it is possible that the Cimex relatives of some of these lineages are involved in similar perception. The large expansion of the IR75 lineage into 12 genes is of particular interest and might be important in blood feeding. This lineage is, however, shrunk to two genes in Pediculus, which is also an obligate blood feeder, and also separately expanded to 17 genes in the termite Zootermopsis nevadensis 44 , where they are presumably involved in some other aspect of chemical ecology.

SUPPLEMENTARY NOTE 5
Circadian clock genes in the bed bug genome Hymenoptera TIM is not present at all and PER is known to heterodimerize with CRY2 (reviewed by Bloch 49 ). CRY2 is homologous to the mammalian CRY: it does not function as photoreceptor but as transcriptional repressor 50,51 . This role is exploited in Diptera by PER 52,53 . If we further look at Lepidoptera we can find that both CRY (mammalian-like and Drosophila-like) are present, with CRY1 being photosensitive and CRY2 acting with TIM and PER as transcriptional regulators (reviewed by Reppert 54 ).
In C. lectularius the first feedback loop of the clock seems to be relying on CRY2 In the Cimex genome we did not find any sequences for either CRY1 (Drosophila-like) or JET (JETLAG) (Supplementary Data 9) both necessary in D.
melanogaster for the light input pathway to the clock 58,59 . It is possible that in C. lectularius TIM acts simply by increasing PER stability or that in this molecular clock CRY2 acts as a blue light photoreceptor. It is known indeed that also mammalian-like Cryptochromes can be activated by light in living cells 60 . Whether C. lectularius CRY acts indeed as a photoreceptor or acts by repressing the activity of the two transcription factors CLK and CYC remains to be determined experimentally.

Cuticular proteins
It is well established that the bed bug cuticle plays a substantial role in resistance to insecticides; this is thought to be due (at least in part) to changes in the expression of bed bug cuticle proteins in resistant strains [61][62][63][64] . Identification and classification of bed bug cuticle protein genes is essential to our understanding of the genetic and physiological basis for penetration-based insecticide resistance. Using the criteria established by were three occasions where proteins from separate families were co-located in the same cluster: cluster 1 (CPR/CPRL); cluster 5 (CPF/CPFL/CPAP1); and cluster 9 (CPR/CPAP3).
The fact that bed bug cuticle proteins (as in other insects) are arranged in gene clusters may accelerate the development of insecticide resistance, as genes within a cluster may be coordinately regulated; thus one regulatory change could affect the expression of many or all genes in the cluster. Alternatively, gene clusters are prone to expansion via unequal crossing over, which can be facilitated by the highly identical nature of the genes in the cluster.
As in other insects, the CPR family represented the largest single family of putative cuticle protein genes found in the bed bug genome, and these separated relatively neatly between RR-1 (soft cuticle) and RR-2 (hard cuticle) types, the latter of which are far more abundant in the genome (Supplementary Fig 7). The 121 CPR-type genes we identified is slightly more than Drosophila 66 but less than the silkworm 67 or the malaria mosquito 68,69 ; data for other hemipterans is not currently available. While the number of genes is not extraordinary, we note several interesting features of bed bug CPR genes. Virtually all CPR genes contain only a single CB4 domain, though each insect genome examined to date seems to contain a few exceptions. For the bed bug, these would be CPR115 (6 CB4 domains), CPR116 (2 CB4), and CPR14 (2 CB4), as well as an interesting cluster of 10 genes located on Scaffold 24. These genes all have an identical gene structure consisting of a signal peptide encoded by the first exon, and a CB4 domain encoded by each of the next two exons (Supplementary Fig. 7). Examination of the coding sequence suggests that these genes arose from an interesting duplication of CPR45 (located in the same cluster) to generate both donor and acceptor splice sites derived from different pre-existing parts of the ancestral gene. This event must have occurred relatively recently, as this cluster is not present in the other hemipteran genomes (R. prolixus and A. pisum), though homologs of the ancestral CPR45 gene are.
Adelman and colleagues previously identified the bed bug pro-resilin gene, a conserved CPR (now CPR78) containing an RR-2-like CB4 domain, an N-terminal consensus (EPPVNSYLPPKS) and a series of glycine-rich repeats 61 . Upon analysis of the full genome sequence, we identified four other CPR genes that cluster with CPR78 that each contain >20% glycine. Interestingly, in a second cluster of CPRs 5 out of 8 genes also contain >20% glycine, with CPR22 at 31%. What was most surprising was the identification of CPR57 and CPR58, both G-rich CPRs with a clear RR-1 consensus CB4 domain, but also a clear pro-resilin consensus at the N-terminus ( Supplementary Fig. 8).
In fact CPR57 is a protein of over 600 amino acids and is more than 40% glycine. This suggests that bed bugs may have expanded and diversified the resilin family, potentially to accommodate the stretching and reformation of the cuticle required during the acquisition of a blood meal.
Other cuticle protein families, such as Twdl, CPF, CPAP1 and CPAP3, are wellconserved between bed bugs and other insects; a slight expansion the alanine-rich CPF family (25-30% Ala) was observed, as bed bugs encode 5 such genes compared to R.
prolixus (2) and A. pisum (2) (Supplementary Fig. 9,10). Finally, we identified a cluster of 17 bed bug genes that encode predicted proteins containing between 1-12 copies of the 18-amino acid motif identified by Nakato et al. 70 and Anderson et al. 71 from insect cuticles ( Supplementary Fig. 11). As these genes are located in a cluster with CPR type genes, and share similar low-complexity regions with these same genes (as well as the 18-amino acid motif), we propose to name this family CPR-like, or CPRL. The defining features for this family would thus be the presence of 1 or more 18-amino acid motifs and the absence of a CB4 chitin-binding domain.

SUPPLEMENTARY NOTE 7
The repertoire of digestive genes in Cimex lectularius A total of 10 gene groups were annotated that are potentially associated with digestion including serine proteases, cysteine proteases, aspartate proteases, carboxypeptidases, aminopeptidases, and lipases. Most of these putative proteins were characterized with secretory signal peptides in the N-termini, suggesting that they are secreted into midgut Although this is not as large as the numbers of serine proteases present in dipteran species, e.g. Drosophila melanogaster (204 gene copies) and Anopheles gambiae (305 gene copies), and coleopteran species, e.g. Tribolium castaneum (~160 gene copies), the number of serine proteases identified in C. lectularius seems to be the most abundant digestive enzymes. Furthermore, the repertoire of serine proteases within the C. lectularius genome is in tandem distribution on DNA scaffolds suggesting a linear expansion of these genes during evolution. Specifically, 13 serine protease genes, most of which contained 6-9 exons, are located in tandem within a 323-kb region on Scaffold 51. Strikingly, we further revealed a total of 32 serine protease genes that contain only a single exon and 22 of which clustered as a single subclade in our phylogenetic analysis ( Supplementary Fig. 12). Most serine protease genes in hemipteran insects consist of multiple exons, for example, only 4 out of 90 serine protease genes in Nilaparvata lugens contain single exons 72 . Therefore, the abundant presence of single-exon serine protease genes in the C. lectularius genome and their phylogenetic relatedness indicate that they were recently expanded through gene duplication and/or suggest the rapid deployment of these genes during digestion. This expansion of the serine protease class of genes could be attributed to the high demand of protein digestion after C. lectularius takes a (huge) blood meal.
In contrast to most other insects that have more cathepsin B and L (both are cysteine proteases) than cathepsin D (aspartic protease) genes, C. lectularius possesses more of the cathepsin D within its genome. For example, Acyrthosiphon pisum has 29 cathepsin B and 2 cathepsin L but only 1 cathepsin D while C. lectularius has 19 cathepsin D genes compared to 8 cathepsin B and 9 cathepsin L genes (Supplementary Data 12).
However, this resembles another hemipteran blood-sucking insect, Rhodnius prolixus, whose digestive tract expressed 17 cathepsin D genes but only 2 cathepsin B and 6 cathepsin L genes 73 . Phylogenetic analysis suggests that cathepsin D genes probably expanded in C. lectularius multiple times during evolution independently from what occurred in Rhodnius prolixus ( Supplementary Fig. 13). Cathepsin D is an aspartic protease that favors acidic pH values, a feature found in hemipteran insects, for optimal activity 74 . We observed, however, an unusually high number of Cathepsin D genes present in the C. lectularius genome. If these genes are expressed in C. lectularius midgut, similar as those found in R. prolixus, their duplication in the genome is likely an adaption to the digestion of a large blood meal.

Comparison of epigenetic systems in Oncopeltus and Cimex
It is not clear if Oncopeltus, Cimex, or Rhodnius have functional DNA methylation systems. An ortholog of Dnmt3 (the de novo methyltransferase) was not identified in Oncopeltus, Cimex, or in Rhodnius prolixus. This is suggestive of a loss of Dnmt3 in the lineage leading to this clade of insects. However, the Oncopeltus, Cimex, and Rhodnius genomes do encode copies of the maintenance methyltransferases (two copies in Oncopeltus, one in Cimex, two in Rhodnius). All three genomes encode an ortholog of Tet1 (putative demethylation enzyme).
Oncopeltus is unusual in that there is a very small number of genes encoding histone proteins and no loci could be detected that encode the linker histone, Histone H1.
This seems specific to Oncopeltus as Cimex has a large number of loci encoding histone proteins, similar to Daphnia (Supplementary Data 13).
In Drosophila the histone genes are present in the genome in large numbers of quintet clusters, each cluster having one gene from each of the five classes of histones.
This arrangement of genes is also observed in other insects such as the pea aphid 75

. In
Cimex we see two-quintet cluster of histone genes (Supplementary Fig. 14). The remainder of the histone genes is only present as single copies on a scaffold, are interrupted by non-histone encoding genes, or are the result of recent gene duplications.
Oncopeltus, unlike Cimex, does not have these quintet clusters. All of the histone genes are present as single copies on a scaffold.
Oncopeltus is unusual in that there are duplications of the MYST histone acetyltransferases mof (males absent on the first) and enok (enoki mushroom).
Duplications of these genes have only previously been reported for Acyrthosiphon pisum 74 . Cimex also has duplications of these genes (enok and mof), but Rhodnius does not. Phylogenetic analysis indicates that these genes have duplicated independently in the lineages leading to Cimex and Oncopeltus from the lineage leading to the pea aphid ( Supplementary Fig. 15).

SUPPLEMENTARY NOTE 9
Visual genes / Light detection

Summary
Bed bugs are equipped with relatively small but canonically organized compound eyes that protrude prominently from the lateral head capsule 76,77 . In behavioral assays, beg bugs are attracted to darker objects and there is tentative evidence of object recognition, suspected to play a role in host habitat detection 78 . Consistent with this evidence for low resolution landscape vision, the bed bug genome contains a relatively small set of known light-sensitive G-protein coupled transmembrane receptor genes including one member each of the UV-and broadband long wavelength-sensitive rhabdomeric opsin subfamilies. This is in line with most other hemipteran genomes sequenced so far (Supplementary Fig.   16) but also with crepuscular insect species in general 89 .

Annotation
The UV-opsin subfamily is conserved in Holometabola. UV-opsin(s) have been recovered in all Hemiptera, including Cimex, which has a singleton ortholog 2.
The B-opsin subfamily is a visual opsin that is conserved in Holometabola. This subfamily is missing in Cimex and was likewise not recovered in most Hemiptera (Acyrthosiphonpisum, Cimex lectularius, Drosophila melanogaster, Oncopeltus fasciatus, Rhodnius prolixus). However, singleton B-opsins were found in Pachypsylla venusta and Frankliniella occidentalis, so this loss likely happened at some point during hemipteran evolution.
The LW-opsin subfamily is visual opsin that is conserved in Holometabola. All hemiptera have a gene(s), including Cimex, which has a singleton ortholog.
The C-opsin subfamily is a non-retinal opsin that is conserved in many holometabola but not in Diptera. Most Hemiptera have C-opsin gene(s), including Cimex, which has a singleton ortholog.
The Rh7-opsin subfamily is conserved in holometabolous insects by has not been functionally characterized. All Hemiptera, including Cimex, have a singleton ortholog.
The arthropsin subfamily was recently discovered in Daphnia and other noninsect arthropods but has not been functionally characterized. Partial sequences for arthropsin genes were discovered in Oncopeltus and the pea aphid, but no genes were recovered from Cimex.
With only two visual opsins conserved, the Cimex opsin repertoire seems typical of that of highly crepuscular species such as the human louse Pediculus humanus or the red flour beetle Tribolium castaneum.

SUPPLEMENTARY NOTE 11
Heat shock Genes Bed bug heat shock genes are similar to genes that encode heat shock proteins in other insects (Supplementary Data 15). Several match to the same gene model which could represent a reduction or lack of duplication events in bed bugs compared to other insects 80 .

SUPPLEMENTARY NOTE 12
Hox cluster Hox genes are a classic example of conservation across the Bilateria, both for the genomic organization and the developmental function of these transcription factors 81,82 .
We were able to find all ten Hox genes (Supplementary Data 16) in the expected order and orientation (same transcriptional orientation for all genes, with the anteriormost gene at the 3' end of the cluster) on one of the largest scaffolds of the Cimex genome (Scaffold 8, 16.6 Mb: Supplementary Fig.17A). The cluster occupies a 3.5 Mb region. The difference in Hox cluster size seems to be proportional to the genome size compared to the coleopteran Tribolium castaneum (160 Mb genome with a 0.71 Mb cluster 83 ) and the dipteran Drosophila melanogaster (120 Mb genome with a split cluster combined size of 0.65 Mb 84,85 ). This increase in size is largely due to an increase in intergenic and intronic distances. At the same time, the previously observed trend in protein size is perpetuated: just as Tribolium Hox proteins are smaller than their Drosophila orthologues 82 , several of the Cimex orthologues show up to 20% protein size reduction compared to Tribolium (Lab, Pb, Dfd, Ftz), while the diverged Hox gene zen 86,87 is the only one that encodes a larger protein (25% larger than in Tribolium). Compared to Tribolium, splice sites are also well conserved for most genes, although zen, and the other diverged Hox gene, ftz, is again the exception.

Iro-C cluster
Synteny is also conserved within the small Iroquois-Complex (Iro-C), a second family of homeodomain transcription factors that arose from ancient tandem duplications. Whereas Drosophila has three family members [88][89][90] , we find that, as in Tribolium, Cimex possesses just two: mirror (mirr), conserved across the Insecta, and iroquois (iro), which is the single gene ortholog corresponding to the tandem paralogs araucan (ara) and caupolican (caup) in Drosophila. As in Tribolium, the two Iro-C genes in Cimex occur in tandem with the same transcriptional orientation along the scaffold, with iro upstream of mirr ( Supplementary Fig. 17B), although the Cimex Iro-C cluster is 2.3-to 3-fold larger than in Tribolium and Drosophila, respectively. Both Iro-C genes are also fairly well conserved compared to Tribolium at the level of gene structure (3-4 conserved splice sites out of [4][5][6] and protein sequence (≥58% identity for ≥69% sequence coverage) Assembly quality/accuracy of automated annotation There were Maker predictions available for all the genes annotated here. However, the maker predictions were fragmented and it was necessary to manually inspect and merge at least two Maker models to build complete models matching the protein queries. The reader should be aware of the possibility of fragmentation when looking at other automated gene models. Good to very good RNA-seq evidence was available for all gene models, which greatly facilitated the prediction of exon structure and UTR assignment.
Only one possible misassembly issue was spotted: in the model for Abd-B. This model has three exons in Cimex, in contrast to Tribolium and other insects where Abd-B has only two exons. The intron between exons 1 and 2 is largely filled by two big gaps, and the end of exon 1 and beginning of exon2 show the exact same nucleotide sequence. A duplication in this region was not observed in homology alignments, and we also found no way to set splice boundaries so as to keep only one copy of the sequence. Therefore, we suggest that the duplication is a misassembly artifact, and that the gene model should only consist of two exons.

Supplement: methodology
We annotated the Hox genes by performing tblastn searches on the Cimex lectularius scaffolds with the corresponding Tribolium and Oncopeltus Hox gene protein sequences available in NCBI (the current official gene set, OGS, models). To confirm orthology, we then blasted our Cimex models back into NCBI. Homology, intron/exon boundary assessments, and protein sequence completeness were identified by manual inspection and correction of protein alignments generated with ClustalW2 (http://www.ebi.ac.uk/Tools/msa/clustalw2/). and then re-blasting the resulting hit sequences into NCBI for Arthropoda hits.

SUPPLEMENTARY NOTE 13
Insecticide targets Our task was to identify and annotate the following insecticide target relevant genes and     The putative nAChR alpha 3 subunit is hardly backed by RNA-seq reads. The annotated transcript seems to be 5' incomplete since a start with GTC (V) is rather unlikely. There is no ATG in the same frame until the next stop codon upstream.

SUPPLEMENTARY NOTE 14
Peptidergic and aminergic signalling in the bed bug Cimex lectularius Most developmental and physiological processes are hormonally regulated or orchestrated by regulatory peptides or biogenic amines which are produced by endocrine or neuroendocrine cells. Neuropeptides and biogenic amines also act as neuromodulators or neurotransmitters within the nervous system, and play a key role in controlling behavior.
Of special interest for hematophageous insects such as the bed bug are peptides and biogenic amines that induce or terminate post-feeding diuresis, as well as peptidergic and aminergic signaling networks that control feeding behavior and digestion 91 . While biogenic amines are metabolites, bioactive peptides are produced by posttranslational processing of larger precursor molecules called prepropeptides 92 . Most regulatory peptide and amine signals are received and transduced by G protein-coupled receptors (GPCRs), also known as seven transmembrane domain receptors. Peptides, biogenic amines and especially their GPCRs represent attractive molecular targets for synthetic or naturally occurring insecticides 93,94 .

Prepropeptide genes
In the bed bug genome, we identified 50 genes encoding putative prepropeptides (Supplementary Data 17) containing >100 putative bioactive peptides that were predicted based on sequence homology to biochemically confirmed or predicted peptides from other insect species and the presence of flanking prohormone convertase cleavage sites 95 . C.
lectularius possesses the core set of 20 regulatory peptides common to all insects characterized thus far ( Supplementary Fig. 34). The other 30 prepropeptide genes cover most insect peptides that occur only in a subset of taxa, including the recently identified CNMamide, RYamides, elevenin, natalisin, EFLamide and the restrictively distributed ACP ( Supplementary Fig. 35). Like the brown planthopper (Nilaparvata lugens), the hitherto best characterized hemipteran in terms of peptides, C. lectularius also lacks trissin (found in holometabolous insects and the body louse 96 , as well as neuropeptide-like precursor 2 (NPLP2) which is lacking in most insect species 97 . NPLP3 and NPLP4 homologs were annotated as neuropeptide precursors in the Tribolium genome, but rather represent cuticular peptides/proteins. We were unable to find a gene for inotocin/argininevasopressin-like peptide, which is, however, present in the Nilaparvata transcriptome 98 . C.
lectularius appears to have only two insulin-like peptide (ILP) genes: one ILP B ("insulinlike") and one with some similarities to both ILP B and ILP C ("IGF-like"). Other Hemiptera have several ILP Bs 98,99 . Unlike Rhodnius 100,101 , the bed bug appears to have also only one capa gene encoding anti-diuretic hormones (periviscerokinins). Remarkably, C.
lectularius seems to possess an unusual myosuppressin (MS) sequence, which is longer and even more derived from the insect consensus than the already unusual MS sequence found in Rhodnius 102,103 . It is noteworthy that a derived MS sequence is not a general feature of the Heteroptera, as pentatomid MS shows an insect consensus sequence 104 .
Also unusual is the methionine residue in position 2 of the bedbug Arg 7 -corazonin. While Arg 7 -corazonin is common among insects 105 and was also found in the reduviid bug Triatoma infestans 106 , Met 2 is a new variant. These predicted sequences, however, will need biochemical confirmation.
From a genomic perspective, it is interesting to note that the loci of several highly related peptide genes that are thought to have arisen from gene duplications lie very closely together. This holds true for AstC-AstCC, neuroparsin 2-4, Bursicons, tachykininrelated peptidesnatalisin, the insulin-like peptides and the glycoprotein hormones.
Moreover, prepropeptide genes seem to be unevenly distributed throughout the scaffolds, and may have a tendency to cluster at specific regions. For example, out of the 18 prepropeptides located on the largest ten scaffolds, 15 are located on either scaffold 3, 6,

Peptide GPCR genes
For most of the predicted peptides, a GPCR was found as suggested by a sequence homology/phylogenetic tree analysis (Supplementary Data 17, Supplementary Fig. 36).
This supports the occurrence of the predicted peptidergic signaling pathways. Noteworthy, C. lectularius seems to possess two different receptors each for CCAP, CRF-like diuretic hormone, sulfakinin and SIFamide. The functional importance of this finding remains unclear.
For quite a number of peptides, we were unable to identify a receptor. Though this may indicate the absence of these receptors in C. lectularius, it is more likely that these GPCRs have been overlooked and will become identifiable with increasing genome coverage. It is also conceivable that we failed to assign identified orphan receptors to identified peptide ligands. Noteworthy, besides the orphan receptors, there is only one identified receptor for which we did not find a ligand: the receptor homolog of the Drosophila trissin receptor CG34381 96 . Trissin peptides contain six Cys residues which form three intramolecular disulfide bridges. They where found in Diptera and Lepidoptera so far, but not in other insect taxa even though a rather similar peptide has been predicted in bees (see Caers et al. 107 ). We were also unable to detect a homologous peptide in the bed bug.

Biogenic amine GPCR genes
We identified the expected set of GPCRs for octopamine, tyramine, dopamine, serotonin (5-HT) and acetylcholine, as well as some orphan "trace-amine" receptors (Supplementary Data 18, Supplementary Fig. 37 108 ). This set is complete when compared to the honey bee and fruit fly 108 , with the exception of a dedicated tyramine receptor (CG7431 and CG16766 in Drosophila, Am13 in the honeybee). As discussed for the peptide GPCRs above, this may not necessarily indicate the absence of this receptor type in C. lectularius.
From a more general perspective, the predicted peptidome and receptor repertoire of Cimex shows little peculiarities. This is expected, since all characterized insect peptidomes (with exception of the small parasitoid Nasonia 109 ) are very similar to each other. In fact, most insect neuropeptide families also occur in the other arthropod orders 110,111 , and there is good evidence that not only aminergic but also many peptidergic signaling pathways are of ancient bilaterian origin 112 .
In conclusion, we were able to identify the majority of genes for prepropeptides as well as peptide and amine GPCRs in the bed bug genome. Though most likely correct, the predicted peptide sequences need to be confirmed biochemically as it is not possible to predict posttranslational processing with absolute certainty. The availability of the bed bug genome and our prepropeptide gene annotation now provides a solid platform allowing and greatly facilitating an in-depth biochemical peptidomic characterization.
Similarly, the inferred receptor specificities are likely to be correct in most cases, but also need to be tested e.g. by receptor expression in heterologous cell systems. Already now, the annotation of peptide and amine signaling genes opens the door to experimentally dissect the functions of peptides and biogenic amines in the regulation of physiological and developmental processes such as feeding, ecdysis, reproduction and diuresis by RNAi (e.g. Mamidala et al. 113 ). It also allows to look for peptide and receptor expression profiles in time and cellular/tissue distribution pattern by PCR and in-situ hybridization and informs about the specificity of immunolabelings. Not the least, our analysis especially of the GPCRs can be informative for the development of new insecticides that help to control pyrethroid-resistant bed bug strains 114 .

SUPPLEMENTARY NOTE 15
Odorant-binding proteins (OBP) and chemosensory proteins (CSP) Odorant-binding proteins (OBP) and chemosensory proteins (CSP) were annotated: 11 OBP-coding genes and 16 CSP-coding genes were found, four of which are partial. There are fewer OBP genes in the bed bug genome than in other blood sucking insects such as mosquitoes or tsetse flies but higher than in the black-legged tick Ixodes scapularis. The number of CSP genes is lower than in mosquitoes Aedes aegypti (Aaeg) and Culex quinquefasciatus (Cqui) and higher than in the tsetse fly Glossina morsitans (Gmm) and mostly like those of the tick, I. scapularis and the human lice Pediculus humanus. There are gene duplications in both OBP and CSP gene families. Bed bug OBP genes form no cluster with those of any other insects and seem to be species-specific. On other hand, bed bug CSP genes are more conserved across blood sucking insects with homologous genes in all three mosquito species (Cqui, Agam and Aaeg).

SUPPLEMENTARY NOTE 16
Development

SUPPLEMENTARY NOTE 17
Genome size determination The colony of bed bugs used in this study originated from Columbus, Ohio in 2002, and has been maintained since then at 85% RH, 15 h:9 h light:dark, 22 °C to promote colony longevity 126 . Blood feeding was as described in Montes et al. 116 . Briefly, bed bug colonies were held within glass Mason jars (1 pint) on folded filter paper (10 cm diameter).
Individuals were fed on chicken blood two times a week through a membrane (Parafilm M, Pechiney Plastic, Menasha, WI) that was maintained at 37°C with a circulating water bath.
Females and males utilized in this study were two weeks post-eclosion.
Genome size was determined after Johnston et al. 117  lectularius. The genome size of males and females was compared using the GLM procedure from SAS (NC).
The 2C nuclei from C. lectularius ran well, producing a peak whose average channel number was more than twice that of the 2C nuclei of D. virilis (Supplementary Fig.   38). The average genome size for the C. lectularius female was 1C = 864.5 ± 1.7 Mb; the average genome size of a male gamete is significantly smaller (P < 0.01) at 1C = 823.5 ± 3.7 Mb.
The limited intraspecific genome size variation and significant genome size difference between males and females of C. lectularius reported here is consistent with published cytology. A Berkeley strain of C. lectularius was determined to have 26 autosomal chromosomes, with a sex chromosome make-up of X 1 X 2 Y for males and X 1 X 1 X 2 X 2 for females. Strains with supernumerary X chromosomes were reported 119  Establishing an accurate genome size for Cimex lectularius L. is essential in furthering molecular research on the bed bug.

SUPPLEMENTARY NOTE 18
Cimex lectularius sialogenome Saliva of blood sucking animals contains a complex cocktail that disarms their hosts' hemostasis, the physiological process that prevents blood loss consisting of platelet aggregation, vascular responses and blood clotting. A previous bed bug sialotranscriptome (from the Greek sialo=saliva) followed by proteome analysis unraveled its complexity 122 . Several enzymes were found, including a previously described novel apyrase 123 that hydrolyses ADP and ATP agonists of platelet and neutrophil aggregation, diadenosine phosphatases that might hydrolyze nucleotides released by platelets, serine proteases that might be involved in fibrinolysis, esterases similar to acetylcholine esterase, as well as inositol phosphate phosphatases. Salivary serpins may account for the anticlotting function found in Cimex saliva 124 . Small molecule binding proteins include the heme containing nitrophorins which transport nitric oxide, themselves members of the inositol phosphatase family, and salivary odorant binding proteins, with unknown properties. Products belonging to the antigen 5 family, ubiquitously found in sialotranscriptomes, were also found, as well as several other secreted products of unknown functions.
Expanded gene families (usually found as tandem gene repeats) recruited to a blood sucking salivary function or the presence of single gene duplication events where one product is co-opted for salivary functions are commonly found. The current assembly of the bed bug genome provides insight into the evolutionary processes leading to the unique adaptations necessary for a blood-sucking mode of life. Supplementary Fig. 39 indicates the number of genes found in the bed bug genome for particular gene classes, compared to other genomes. The Cimex type apyrase protein was originally discovered in the bed bug salivary glands and shown to be ubiquitously distributed where it plays a cellular role possibly in driving glycosylation reactions. As an example of convergent evolution, this type of enzyme was co-opted as the salivary apyrase in Rhodnius (based on the unique calcium-dependence of this type of enzyme 125 ) and sand flies 126 , while a modified 5'-nucleotidase was co-opted in mosquitoes 127 and in Triatoma infestans 128  Ap4 hydrolases: gnl|CDD|239520 cd03428, Ap4A_hydrolase_human_like, diadenosine tetraphosphate. (Ap4A) hydrolase is a member of the Nudix hydrolase superfamily.

SUPPLEMENTARY NOTE 19
Vitamin metabolism A comparison of known vitamin metabolism genes from D. melanogaster and the scaffolds available for C. lectularius revealed high levels of similarity between these two species and other insects (Supplementary Data 19). Protein sequences from D. melanogaster were blasted against transcripts from C. lectularius to find the highest e-value match, and these transcripts were then blasted against the C. lectularius genome (scaffolds) to find their location (scaffold #). The identified genes were then blasted against databases for Anopheles, Pediculus, Tribolium, and Pediculus to find specific orthologs in those species.
Of 83 genes studied, only 5 did not have a close ortholog in C. lectularius (CG7560, CG32099, CG8446, CG12237, CG10581). Several genes that are unique in D.
melanogaster in related areas (e.g. folate production) mapped to a single gene in C.
lectularius, implicating duplication events in D. melanogaster. No genes were identified that were unique to only D. melanogaster and C. lectularius.

SUPPLEMENTARY NOTE 20
Cimex lectularius immune response analysis The predicted protein set was queried using a recently curated set of insect immune proteins 132 , via BLASTP and high stringency (e 10^-7). Hits were then matched reciprocally to this gene set, and aligned to determine protein completeness. Several problematic families (e.g., CLIP serine proteases and scavenger receptor proteins, which contain CLIP domains) are included with best estimates of naming, but will require additional alignments as well as protein evidence for clarity. Antimicrobial peptides were identified by querying Cimex proteins with a predicted length of 100 a.a. and smaller using PSI-BLAST. Query sequences included all AMP's found in paurometabolous insects and exemplars from each of the other insect orders (n = 113 query sequences).
Sequences which found at least one match (defensins and diptericins) were then used to query the Cimex genome assembly to identify any proteins that were not included as gene models (none were found).
Additional details 1) Two solid defensin paralogs arose from direct searches of the gene set: Nominally they should be named as: Defensin1 -CLEC002659-PA Defensin2 -CLEC002658-PA 2) There was an interesting cluster of diptericin-related antimicrobial peptides, also close to 'Prolixin' from Rhodnius. These have consecutive Gene ID numbers CLEC003672-PA, CLEC003673-PA, and CLEC003674-PA, and they will need more alignment to resolve naming. CLEC003672-PA and CLEC003674-PA are 154 and 164 a.a., respectively and indeed seem to have two tandem diptericin components each. CLEC003673-PA is 70 a.a.
and is an intact diptericin. These would be interesting to see on a browser, and with expression data, there seem to be at least 3 peptides, maybe 5, in an array. All searches were repeated using PSI-BLAST and a protein dataset pruned to include only models 100 a.a. and smaller, and no additional matches were found, it will likely take a close analysis of upregulated transcripts or peptides after a challenge/mating to identify the rest.

UDP-glycosyltransferase annotation
The Cimex lectularius genome contains 7 UDP-glycosyltransferase (UGT) genes including one partial sequence due to a genomic gap (Supplementary Data 20). There are fewer UGT genes in the bed bug genome not only than in any other phytophagous insects of which genomes have been sequences so far, but also than in other blood sucking insects such as mosquitoes or tsetse flies; but higher than in the human body louse Pediculus humanus corporis (4 UGTs). Four UGT genes in the bed bug form a cluster in a genomic location (Scaffold64), sharing common three exons with different first exons probably by alternative splicing, which suggests domain duplication led to the gene diversification having broad substrate specificity ( Supplementary Fig. 40).

Transcription factors
We identified likely transcription factors (TFs) by scanning the amino acid sequences of predicted protein coding genes for putative DNA binding domains (DBDs), and when possible, we predicted the DNA binding specificity of each TF using the procedures described in Weirauch et al. 135 . Briefly, we scanned all protein sequences for putative DBDs using the 81 Pfam 136 models listed in Weirauch and Hughes 137 and the HMMER tool 138 , with the recommended detection thresholds of Per-sequence E value < 0.01 and Per-domain conditional E value < 0.01. Each protein was classified into a family based on its DBDs and their order in the protein sequence (e.g., bZIPx1, AP2x2, Homeodomain+Pou). We then aligned the resulting DBD sequences within each family using clustalOmega 139 , with default settings. For protein pairs with multiple DBDs, each DBD was aligned separately. From these alignments, we calculated the sequence identity of all DBD sequence pairs (i.e. the percent of AA residues that are exactly the same across all positions in the alignment). Using previously established sequence identify thresholds for each family 135 , we mapped the predicted DNA binding specificities by simple transfer. For example, the DBD of CLEC000015-PA is 95% identical to the Drosophila melanogaster Awh protein.
Since the DNA binding specificity of Awh has already been experimentally determined, and the cutoff for homeodomain TFs is 70%, we can infer that CLEC000015-PA will have the same binding specificity as Awh.
Using the above procedure, we identified a total of 634 putative TFs in the C.
Likewise, for the most part, the number of members of each TF family is comparable to that of other insects ( Supplementary Fig. 41), with some notable exceptions. For example, the "MADF" family consists of 29 proteins in C. lectularius, which is more than double the number that is present in the genomes of the related insects A. mellifera (11 members) and P. humanus (1 member). Conversely, the chromatin reorganizing family "BAF1_ABF1" is not present in the C. lectularius genome, despite being nearly ubiquitously present in other insect genomes, including up to 18 members in drosophilids.
Of the 634 C. lectularius TFs, we were able to infer motifs for 214 (34%) (Supplementary Data 31-32), mostly based on DNA binding specificity data from D.
melanogaster (133 TFs), but also from species as distant as human (54 TFs) and mouse (12 TFs