Transcriptomic meta-signatures identified in Anopheles gambiae populations reveal previously undetected insecticide resistance mechanisms

Increasing insecticide resistance in malaria-transmitting vectors represents a public health threat, but underlying mechanisms are poorly understood. Here, a data integration approach is used to analyse transcriptomic data from comparisons of insecticide resistant and susceptible Anopheles populations from disparate geographical regions across the African continent. An unbiased, integrated analysis of this data confirms previously described resistance candidates but also identifies multiple novel genes involving alternative resistance mechanisms, including sequestration, and transcription factors regulating multiple downstream effector genes, which are validated by gene silencing. The integrated datasets can be interrogated with a bespoke Shiny R script, deployed as an interactive web-based application, that maps the expression of resistance candidates and identifies co-regulated transcripts that may give clues to the function of novel resistance-associated genes.

: Gene ID, Gene Name, Probe Sequence, Anopheles gambiae BLAST hits, % identity of off target BLAST hits and the alignment of the off target BLAST hits. Highlighted in yellow are probes with potential cross-hybridisation due to the similarity of the off-target sequences (Sheet 2: dsRNA): Gene ID, Gene Name, Construct Sequence, Anopheles gambiae BLAST hits, % identity of off target BLAST hits, and information on the construct alignment and putative offtarget siRNA targets. Highlighted in yellow are constructs that have at least 1 off-target hit which align with 100% identity to a 20bp siRNA, which may be produced from the dsRNA in vivo.
Maf-S regulated genes overlapping with  Section 1: Introduction IR-TEx is an app written in ShinyR to explore microarray datasets that compare resistant and susceptible Anopheles gambiae, An. coluzzi and An. arabiensis populations, available in public repositories, in a used friendly manner. In its current form, IR-TEx allows the user to search for transcripts of interest using a VectorBase Transcript ID by: Country; Exposure Status; Species and Insecticide Class. The user can also find co-correlated transcripts across experiments by manipulating the Absolute Correlation Value (recommended: 0.7-0.9). The outputs from IR-TEx come in several forms.

IR-TEx basics
IR-TEx can be used to explore the relationships between expression levels of transcripts across populations of Anopheline vectors with varying levels of resistance to insecticides. To run the IR-TEx simply visit the GitHub page which will contain a link to the current application web page. IR-TEx is currently hosted at: https://www.lstmed.ac.uk/projects/ir-tex The output below shows the appearance of a typical Interactive Dashboard displaying transcript expression, experiment and geographical location. 2 The interactive dashboard is composed of the following: 1) Expression Line Graph showing the log2 fold change of the transcript of interest for each microarray data set 2) Probe Expression Table showing the VectorBase Transcript ID, Detoxification Class, Transcript Description, raw Fold Change (FC) and Q value (Q) (adjusted p-value) for each probe (row) and dataset (column). 3) Summary Data showing the number of arrays in which the transcript is significantly differential. 4) Download to obtain a local copy of the Probe Expression Table. 5) Map highlighting the location of the data set containing the transcript of interest with significant differential expression illustrated as a traffic light system. 6) Correlation Line Graph showing the log2 fold change of the transcript of interest and transcripts correlated with the transcript of interest (if any) with an absolute correlation greater than that user defined threshold. Table showing the VectorBase Transcript ID, Detoxification Class, Transcript Description, raw Fold Change (FC) and Q value (Q) (adjusted p-value) for each transcript of interest or correlated transcript (row) and dataset (column). 8) Download to obtain a local copy of the Correlated Transcript Expression Table   Performance and Resources IR-TEx requires only a relatively modest amount of computing power per user. The most computationally intensive part of the application is the initial generation of the correlation matrix. The default matrix that loads on application startup includes an optimum number of datasets and correlation threshold which typically takes ~4s to load on standard Intel i7 processor, consuming ~3GB of RAM in the process. IR-TEx is best ran locally if regular use is intended.

7) Correlated Transcript Expression
NB. Recalculation of the matrix is required each time a dataset is added or subtracted or an option changed. An increased amount of processing power is required if, for example, all studies are included or a low correlation threshold is selected.
Installing IR-TEx IR-TEx is a ShinyR application and can be downloaded and installed for free from the following GitHub site at https://github.com/LSTMScientificComputing/IR-TEx and includes a number of files, including a table of fold change and Q values, and a longitude-latitude file for geographical locations of the collection sites. To install locally, the app needs to be installed alongside the packages dismo (https://cran.r-project.org/web/packages/dismo/index.html), rgdal (https://cran.rproject.org/web/packages/rgdal/index.html), shinycssloaders (https://cran.rproject.org/web/packages/shinycssloaders/index.html) and ShinyR. Instructions for the latter are available here -https://shiny.rstudio.com.
3 Section 2: Inputting other resistance datasets Overview of Entering Data All datasets used in this app are currently from the LSTM Agilent 15K array V1 (A-MEXP-2196), dating from AgamP3.5 (2009). Although this array is the most commonly used for insecticide resistance experiments due to the multiple probes for 'detoxification family' genes, we recognise that other array designs may be used in the future. There are two options for inputting these datasets; the first is to use fold changes and Q values for probes only found on the original arrays and the second is to set missing probes to '0' which would cause them to be missing within the app. Below is a walkthrough to adding more resistance datasets to the existing data, without having to change any core code within the app.
Adding data to the web-based app To add new datasets to the existing web-based app, please email the first author with the new experimental files and designs, in addition to latitude and longitude of collection site of the resistant population: victoria.ingham@lstmed.ac.uk.

Adding data to a local IR-TEx installation
The following is a step-by-step guide to adding data to a local installation of IR-TEx. Please follow the steps below 4. Insert dataset descriptors -In the top cell insert a name for the population followed by FC, in the second the country of the resistant population, in the third either Exposed or Unexposed dependent upon whether the resistant population is exposed, in the fourth the species Anopheles gambiae, Anopheles coluzzi or Anopheles arabiensis (MUST be full species ID) and in the fifth cell down the class of insecticide or 'None' if unexposed. Underneath this paste the raw fold changes corresponding to the probe of this row. Repeat this with Q values in the furthest right column on the sheet, keeping all information in the top 5 columns identical whilst replacing FC with Q. 5. Mapping data -Now open geography.txt, it will contain a column of resistant population names, exactly how they appear in the Q value columns of Fold Changes.txt, a latitude and a longitude. Input your population name EXACTLY as it appears in the Q value column under the last row of the first column, followed by the latitude and longitude of the collection site (or approximate original location) of the new resistant population in the dataset. 6. Installreplace the existing files with your newly modified files (Fold Changes.txt, geography.txt) and restart the application.
User Guide -IR-TEx 5 Section 3: Adapting the App to Handle Other Expression Data Part of the utility of this app is use in a wider field than insecticide resistance alone. It will specifically be useful in fields that have a variety of transcriptomic data from different experiments, from which there will be some merit to analyse them together. To achieve this, there will need to be changes to the key code. For the purposes of this walkthrough, no previous knowledge of R is necessary but to fully adapt the code, R knowledge will be required. Due to this, the walkthrough will cover inputting data with ONLY 4 filtering criteria.
Creating a new data file The first task is to create a new Fold Changes.txt file (tab delimited) to appropriately match the template provided for insecticide resistance as seen below.
Populating the data file Enter the appropriate parameters for each filter, as in the example below. Capitalisation and punctuation is important in R so make sure everything is consistent and DO NOT change, for example, between 'female' and 'Female'. These filters should overlap otherwise you will not be able to select multiple datasets Modifying the R code to accept your data Once the data is in with the probes matching across rows for all datasets, the R code can now be modified by following the steps below.
The guide will assume that geography is NOT relevant to these datasets, so the map will be removed, as well as geography.txt. If geography is relevant ignore step a.) and the final step deleting lines 479-584 and modify the geographical parameters as in section (ii). a.

Inclusive of the brackets: })
Use of one colour arrays and RNASeq data with the app One-colour arrays have often been used due to the original high price of two-colour arrays. Similarly, different arrays and different analysis techniques are often used on array data. With the advent of RNAseq data, this leads to a further confounding variable when using this app.
These differences in distributions are a confounding factor when integrating this data, specifically with use of correlation networks which are driven by extreme values, as often seen in RNAseq.
To input data, follow guide in section ii, once this is complete and the data stored in a tab delimited text file, normalisation can be performed. By using a quantile normalisation across a table containing all fold changes, it is possible to make the distributions identical in terms of statistical properties and is a technique widely employed with microarray data analysis. As seen below, these distributions are much more similar after normalisation; this transformation can also be used across array platforms (Affymetrix, Agilent, Ilumina etc.) For further information about algorithms and a description of the methods and use case please see our publication "Transcriptomic meta-signatures identified in Anopheles gambiae populations reveal previously undetected insecticide resistance mechanisms" V.A Ingham, S. Wagstaff and H. Ranson. Nature Comms.
Where to find help To install IR-TEx, you will need to download the current file from https://github.com/LSTMScientificComputing/IR-TEx and execute it in a ShinyR environment.
Detailed instructions how to deploy the ShinyR environment can be found on the Shiny project webpage -https://shiny.rstudio.com. Example instructions on how to configure ShinyR for Ubuntu can be found here.