DoOR 2.0 - Comprehensive Mapping of Drosophila melanogaster Odorant Responses

Odors elicit complex patterns of activated olfactory sensory neurons. Knowing the complete olfactome, i.e. the responses in all sensory neurons for all relevant odorants, is desirable to understand olfactory coding. The DoOR project combines all available Drosophila odorant response data into a single consensus response matrix. Since its first release many studies were published: receptors were deorphanized and several response profiles were expanded. In this study, we add unpublished data to the odor-response profiles for four odorant receptors (Or10a, Or42b, Or47b, Or56a). We deorphanize Or69a, showing a broad response spectrum with the best ligands including 3-hydroxyhexanoate, alpha-terpineol, 3-octanol and linalool. We include all of these datasets into DoOR, provide a comprehensive update of both code and data, and new tools for data analyses and visualizations. The DoOR project has a web interface for quick queries (http://neuro.uni.kn/DoOR), and a downloadable, open source toolbox written in R, including all processed and original datasets. DoOR now gives reliable odorant-responses for nearly all Drosophila olfactory responding units, listing 693 odorants, for a total of 7381 data points.


Results
Different types of new data. With this update of the DoOR.data package we introduce three classes of new data: (1) Better precision raw data from datasets that were already included. When we published the first version of DoOR, not all raw data from the considered publications was available. In many cases we had to estimate quantitative responses from graphical plots. In the meantime we received the raw data for many of these studies from the authors and updated the existing DoOR datasets. This situation is improving: publishing data that underlies the plots of a paper in the supplements becomes a widespread practice, giving increasingly access to raw data in (a) Each point in the matrix represents an odor-responding unit combination. Colors indicate whether that combination is new in DoOR (red), was updated (blue), unchanged (green) or is still missing (grey). Response units were sorted according to the numbers of odorants they were tested with, odorants were sorted accordingly (see Tables S2, S3 and S4 for responding unit and odorant names respectively). (b) Visualization of ensemble responses. Responses elicited by propanoic acid mapped onto a representation of the Drosophila antennal lobe model from Grabe et al. 17 . Glomerulus names are shown in the top panel, the corresponding receptor names are shown in the bottom panel. DoOR 2.0 contains mappings of 13 IR innervated glomeruli that were still unmapped (dark grey) in DoOR 1.0. The two dark grey glomeruli VP2 and VP3 are non-olfactory. the community. (2) Raw data from new publications: Many new studies have been published since the first DoOR release. These studies added important new data to the Drosophila olfactome (Fig. 1a). Importantly, they deorphanized most of the remaining OR response profiles for which no ligand was known previously. (3) Data recorded by us that is first published along with this paper. Here we present the response profiles of five OSN classes, measured via calcium imaging. Our set of ~100 odorants, added a total of 529 new odorant responses to DoOR. One of these response profiles deorphanizes the Or69a sensory neurons, others contribute to existing profiles (Fig. 2). See Table S1 for an overview of all studies that contributed to the DoOR project. We note, with respect to points 1 and 2, that many but not all colleagues were willing to share (published) odorant response data.
We added original data for the following receptors ( Fig. 2 and Table S5): For Or10a OSNs methyl salicylate, ethyl benzoate and butyric acid elicited the strongest responses in our hands. About one third of the tested odorants led to a reduction of calcium concentration (inhibition). In total our dataset added 52 odorants to the known response profile. Or10a is expressed in ab1D neurons, their axons innervate glomerulus DL1. The cells co-express the gustatory receptor Gr10a 8 . For Or42b our data added 76 substances. 3-hexanone, ethyl propionate and ethyl (S)-(+ )-3-hydroxybutyrate were the three strongest ligands in this dataset. The receptor is chirality selective: the stereo isomer ethyl (R)-(− )-3-hydroxybutyrate did not activate these neurons. Or42b is expressed in ab1A neurons, their axons innervate glomerulus DM1. Or47b responded mainly with inhibition, confirming previous reports 3, 25,26 . We added 61 new odorant responses to DoOR and observed weak inhibitions for most substances. Benzaldehyde, furfural and acetic acid produced stronger inhibitions. Excitatory responses were weak as compared to other receptor cells. The strongest responses in our set were to (S)-carvone and propanoic acid, but these are not the best ligands. A stronger ligand, methyl laurate, has been recently discovered by Dweck et al. 27 , the dataset is included in DoOR. Or47b is expressed in at4 neurons, their axons innervate glomerulus VA1d. Or56a expressing ab4B OSNs responded best to geosmin, confirming the deorphanizing data by Stensmyr et al. 28 . In addition, stimulation with (1R)-(− )-fenchone and alpha-ionone also evoked lower but reliable signals. We also observed inhibitory responses: 2,3-butanedione and acetic acid produced the strongest decreases in fluorescence. In total we added 82 responses to the response profile. Or56a is expressed in ab4B neurons, their axons innervate glomerulus DA2. The cells co-express the receptor Or33a 8 . For Or69a no ligands were known previously. We recorded responses to 106 odorants. Or69a OSNs responded particularly broad, showing activity towards most of the odorants in our set: the receptor kurtosis was − 0.36. We found the strongest responses for ethyl 3-hydroxyhexanoate, alpha-terpineol, 3-octanol and linalool. Or69a is expressed in ab9 neurons, their axons innervate glomerulus D. Overall, across the five characterized receptors, we analyzed the response profiles with respect to chemical class (Fig. 2). It is apparent from the figure that chemical class is not a good response predictor: all colors intermingle across the entire odor-response range. In other words, it is not useful to characterize individual receptors as "alcohol receptor", or "ester receptor". Deorphanized OSNs. With the new datasets included, DoOR response profiles are now existing for all known OSNs except for Ir40a, and all antennal lobe glomeruli except VA7m have been assigned to a sensory neuron ( Fig. 1b and Table 1). All other glomeruli without cognate receptor gene in the previous 2010 version of DoOR have been deorphanized now. They are innervated by IR expressing OSNs housed in coeloconic sensilla or in the sacculus 12 . We updated the "OSN to glomerulus" mappings and the glomerulus nomenclature according to Silbering et al. 12 and the recently published in-vivo atlas of the Drosophila AL by Grabe et al. 17 . Ligands have been published for IRs 12 , and are included in DoOR. Response profiles for five OSNs measured via calcium imaging and added to DoOR. Bars represent mean calcium signals (n = 3 − 16 and 122 − 296 for controls) measured from five different GAL4 driver lines in response to a set of 100 odorants. Colors indicate the chemical classes the different odorants belong to. Shaded areas indicate half maximal and half minimal response ranges respectively. Or56a was recorded using a different reporter (GCaMP3 vs. GCaMP1.3); the different scale is due to the reporter, and does not indicate different receptor calcium response properties. Mineral oil solvent responses were subtracted. Number of odorants in the dataset is given as n. All odorant names and response values are given in Table S5. Updated DoOR mappings. There were multiple cases where response profiles could not be assigned unambiguously to a single receptor already in DoOR 1.0. These included cases where the receptor of the measured OSN was not known (e.g. OSN ac1A), or where more than one functional receptor was expressed in an OSN (e.g. ab5B, which expresses Or47a and Or33b). In these cases we assigned the response profile to the OSN name. While some of these cases have been resolved in the meantime (unknown partners were mostly IRs), others have to remain. For example, when a sensory neuron expresses two odorant receptors, it is necessary to measure each receptor and the neuron separately, generating three datasets (for more examples, see below). Consequently, we had to expand this naming scheme, assigning some response profiles to the sensillum and others to the glomerulus they were measured in. Due to these difficulties in nomenclature we refer to the different origins of DoOR response data as "responding units" throughout the text. In most -but not all -cases a "responding unit" consists of an unambiguous mapping of receptor cell in a given sensillum, receptor gene/protein, and glomerulus in the antennal lobe. All "responding units" are listed in Table 1, together with the relevant information. In many cases electrophysiological recordings from coeloconic sensilla could not be mapped to the individual OSN, because spike amplitudes and/or shape were not discrete enough to perform a separation. For example, single sensillum recording (SSR) data from Silbering et al. 12 was integrated as summed OSN responses for the individual coeloconic sensilla ac1-4. Marshall et al. 29 were able to separate the unit with the strongest amplitude (the A neuron) but summed the remaining. We considered this data as e.g. "ac1A" (the strongest, unambiguous) and "ac1BC" (the other two, not separable). Mapping responses recorded from glomeruli to the corresponding IR is also not always straight forward due to complex innervation patterns. For example, OSNs that express Ir64a project from chamber III in the sacculus to the two glomeruli DC4 and DP1m 30 . As a mapping to the IR name would be ambiguous and the OSN names for ac sensilla are not well defined, we extended our nomenclature and introduced concatenated names of the receptor & glomerulus (Ir64a.DC4 and Ir64a.DP1m). Ir75d is expressed in three different OSNs housed in the three sensilla ac1, ac2 and ac4, they all target the VL1 glomerulus. Assuming that the IR is the main determinant of the OSN response, we mapped recordings from the VL1 glomerulus to Ir75d. We also used the published IR response profiles to estimate putative sensillum/receptor/glomerulus relationships (see below).
We updated existing names in two instances. We renamed Gr21a to Gr21a.Gr63a as neither receptor alone is functional and none is known to be a co-receptor 31 . We renamed ab3B to Or85b as the co-expression of Or98b is not clear and no response profile for the latter is existing 7 . Or23a and Or83c as well as Or2a, Or19a and Or43a were initially described as being expressed in the trichoid sensilla at2 and at3 7 . Ronderos et al. 32 describe Or23a and Or83c OSNs as being housed in an intermediate sensillum and thus rename at2 to ai2. Dweck et al. 33 renamed at2 and at3 to ai1 and ai2 respectively. Since these two proposed new nomenclatures are conflicting we decided to reduce confusion and to keep the old at1-at4 nomenclature.
Merging algorithms. We rewrote and optimized large parts of the DoOR code, mainly to increase computational speed. The logic of the core algorithms for merging several datasets into a single consensus response profile remained unchanged. Pairwise merging was based on the assumption that the same monotonic relationship between odorant responses for a given responding unit applies for all data sets: A better ligand A should always elicit a stronger response as compared to a weaker ligand B, regardless of the recording technique used. Pelz et al. 2006 34 have shown that this assumption is valid for calcium responses and extracellular recorded action potentials in Drosophila OSNs. The merging procedure consisted of the following steps: datasets were rescaled to the range [0, 1] and merged pairwise by calculating the best fitting function on odorants recorded in both studies. Odorants measured only in one of the studies were subsequently projected onto this function. The sequence of merging was determined by iteratively finding the pairs that produced the best fit 4 . The "best fit" was quantified as the fit yielding the lowest "mean orthogonal distance" (MD, see Methods). Where feasible (  ≡ , , n 10 3 628 800 datasets permutations) we computed all possible merging sequences and selected the one with the lowest MD to all original datasets. We were able to test all possible permutations for all datasets except Or22a.

InChIKeys as new default odorant identifiers.
With this version of DoOR we switched from CAS numbers to InChIs (International Chemical Identifier) and InChIKeys respectively as main chemical identifiers used in DoOR 35 . InChIs are unique chemical identifiers derived from the chemical structure of a compound. InChIs are free to use and the algorithm for generating them is freely available under an open source license (http:// www.inchi-trust.org/downloads/). Another advantage is that InChIs are human readable and thus their correct use can be verified. As compared to InChIs, CAS numbers are ambiguous: they are assigned to substances rather than compounds. This can result in several CAS numbers for the same chemical. Isopentyl acetate for example, an odorant with a banana like smell for humans, was tested by many studies included in DoOR. PubChem lists 152 synonyms, including "isopentyl acetate", "isoamyl acetate" and its IUPAC name "3-Methylbutyl acetate". Among the synonyms are also two different CAS numbers, 123-92-2 as well as 29732-50-1 which both map correctly to isopentyl acetate but might have created two separate entries in DoOR. Conversely, the InChI algorithm always produces the standard InChI InChI= 1S/C7H14O2/c1-6(2)4-5-9-7(3)8/h6H,4-5H2,1-3H3 and the corresponding InChIKey MLFHJEHSLIIPHL-UHFFFAOYSA-N. As InChIs can be quite long, within the DoOR algorithms we use InChIKeys for all computations. InChIKeys are the 27 character long hashed version of each InChI. While only InChIKeys are used for our DoOR algorithms, we included additional information as a service to the users: we included the name, CID (PubChem Compound Identification) and CAS (Chemical Abstracts Services) identifiers and also added SMILES (simplified molecular-input line-entry system), another structure based chemical identifier.
Redundancy in nomenclature can create duplicates. In fact, even in published sets, we found several cases where the same odorant appeared multiple times in a single dataset, sometimes with different chemical names. Possible explanations could be that the different instances of a chemical were provided from different suppliers. In these cases we merged the entries by calculating the mean responses. Whenever we performed such a merge, we noted that in the dataset.info data frame. New tools. With DoOR 2.0 we provide several new tools. These include seven new functions for data visualization, e.g. a function for mapping odorant responses on the AL model of Grabe et al. 17 (dplot_ALmap(); Fig. 1b), a function for generating tuning curves (dplot_tuningCurve(); Figs 4 and 3) and several functions to visualize response profiles or to compare responses across responding units. We added a tool for sensillum identification based on physiological measurements (e.g. single sensillum recordings; identifySensillum()) and a tool to find odorants that sparsely activate a given responding unit (privateOdorant()). Additionally we provide several helper functions, for example to translate chemical identifiers (transID()). A complete list of new functions is available in the detailed documentation that is provided as R-vignettes with the DoOR.functions package and available on the DoOR web page.
Contribute to DoOR. The source code of the two DoOR packages (DoOR.data and DoOR.functions) is now available via GitHub (https://github.com/Dahaniel/DoOR.data & https://github.com/Dahaniel/DoOR.functions). This allows to download pre-release versions of DoOR. Everybody (the community) can now contribute feature requests, bug reports and improved code. Package releases will be available via Zenodo (zenodo.org) with individual DOIs assigned, thus all DoOR.data and DoOR.functions versions will be citable. The most recent DoOR version will also be made available via the CRAN R-package repository for easy installation from within R.
We encourage all users to get access to the full DoOR.data and DoOR.functions, because they offer several important features, among which direct computer-readable access to all used datasets, routines for back-calculation of consensus responses onto particular datasets, the possibility to add, include or test own datasets and many ways to easily visualize data via the DoOR plotting functions. However, we have seen that many users value DoOR for its ease to get immediate responses to quick questions, such as "what is the best ligand for receptor X", "which are the receptors responding to odorant Y", or "which glomerulus is innervated by OrZ". For all of these uses, and graphical displays, we implemented a web interface as a service to the community at http:// neuro.uni.kn/DoOR. The interface was improved with respect to DoOR 1.0, adding sortable tables and changes in the graphical display of the antennal lobe (now based on Grabe et al. 17 ).
Broadly and narrowly tuned responding units. How large the set of odors is that a given responding unit is sensitive to can be quantified as its sparseness. Several sparseness measures exist and two commonly used ones are lifetime sparseness 36 (LTS) and lifetime kurtosis 37 (LTK; Equation 1). We chose LTK as a sparseness measure because in contrast to LTS it allows for negative values (inhibitory responses), which are frequent in Drosophila OSNs. We implemented both statistics in the sparse() function in DoOR.functions. We computed LTK across all DoOR responding units that contained at least 50 odorant responses. We arbitrarily defined the threshold of 50 responses to exclude bad LTK estimates calculated on response profiles where only few odorants were measured. We found responding units to be widely distributed across the LTK scale, with many being broadly tuned (low or negative kurtosis; Fig. 3a,c) and less that responded only to a few specific ligands (high kurtosis; Fig. 3a,b). We found the highest LTK values (most narrowly tuned receptors) for Or82a (LTK = 63.88, n = 180, narrowly tuned to geranyl acetate), ac2A (LTK = 39.12, n = 84, narrowly tuned to putrescine), Or49b (LTK = 35.87, n = 164, narrowly tuned to 2-, 3-, and 4-methylphenol), Gr21a.Gr63a (LTK = 23.57, n = 52, specifically activated by CO 2 ) and ab2B (LTK = 20.9, n = 101, tuned to ethyl 3-hydroxybutyrate and cyclohexanol; Fig. 3a,b). At the lower end of the scale we found ac3B (LTK = − 0.92, n = 98), Or35a (LTK = − 0.44, n = 123; expressed in ac3B OSNs and also measured individually in the empty neuron system), the newly deorphanized Or69a (LTK = − 0.26, n = 107) and Or85f (LTK = 0.17, n = 114) (Fig. 3a,c). The lowest LTK value resulted for ab4B (LTK = − 1.53, n = 182), but see below.
A low LTK value can also indicate that the best ligands for this responding unit is still unknown. For example, Or47b had a low lifetime kurtosis of 0.35 in DoOR 1.0 (where it seemed to be broadly tuned) because at that time only weak responses were known. Dweck et al. 27 discovered Or47b to be narrowly tuned to the single compound methyl laurate, for their dataset we calculated the high LTK value of 33.12 (i.e. narrow tuning; Fig. S1a). When adding datasets with newly discovered single ligands of narrowly tuned OSNs to DoOR, these responses get systematically underestimated. The reason is in the mathematical model used: merging functions are calculated on the overlapping range of two datasets. If one of the two datasets contains an extremely good ligand, and the other does not, that ligand is, from a statistical point of view, and outlier, and cannot be considered for the merging function. We add these outliers using a linear function with slope 1, added to the fitting function outside of the overlap region. For good ligands, this function creates a systematic underestimation in the consensus set. The situation becomes unfortunately bad when the overlap between two studies (i.e. the odors in common) is low. The Or47b dataset from Dweck et al. 27 , for example, was excluded from the default merge because it overlapped with all other datasets only by three odorants (the minimum criterion in DoOR is five), and thus the resulting consensus spectrum did not contain the best ligand methyl laurate. Consequently, LTK was low (LTK: 0.25, Fig. S1a). When including the Dweck et al. 27 dataset by manually adjusting the minimum criterion to three overlapping odors, the resulting LTK value increased to 3.4. This was still lower than the LTK value of 33.12 calculated for the original Dweck 27 dataset (Fig. S1b), due to the necessity of mapping to a function with slope 1. We observed a similar effect for ab4B (Or56a): the LTK value of − 1.53 increased to 89.35 when we merged only the studies that included its best ligand geosmin ( Fig. 3c and S1).
We did not apply any manual selection of source datasets for the pre-computed consensus matrices included in DoOR.data, thus Or47b and ab4B (Or56a) have a low LTK in these matrices. As published datasets will appear, and will be included into DoOR, the consensus spectrum will better reflect the new ligands. We have added a  comment to this respect on the homepage. However, we encourage the users to install the DoOR packages and to adjust the merging parameters to their individual needs.
Odorants activated responding units with differing specificity. We calculated the kurtosis from the perspective of the odorant (population kurtosis, PK). This yields a quantification of how specific or unspecific a given odorant activated the population of responding units (Fig. 4). Considering only odorants that were tested with at least half of the responding units present in DoOR (39) we found odorants to be continuously distributed along the PK scale, with some odorants activating small subsets of responding units and many eliciting broad ensemble responses (Fig. 4a). We found the following odorants on the upper end of the PK scale: CO 2 activated Gr21a.Gr63a, geranyl acetate activated Or82a, water activated Ir64a.DC4, (1R)-(− )-fenchon activated Or85e and methyl salicylate activated Or10a the strongest (Fig. 4a,b): these are candidates for a "labeled line" coding logic. The broadest ensemble responses were elicited by 2-heptanone, hexanol, isopentyl acetate, Z3-hexenol and 4-methylphenol (Fig. 4a,c).
Mapping IRs to their corresponding OSNs. It is difficult to map IR responses to OSNs, because the two available data sources are difficult to disentangle: on one hand single sensillum recordings with spikes that are difficult to sort (with exceptions, in some studies the largest spike amplitude could be separated), on the other hand single IR calcium imaging data, and the difficulty that individual IRs are expressed in several OSN types. See Table 1 for responding units and their relationship to IR/OSN/sensillum/glomerulus. By correlating response profiles from SSR recordings with response profiles from IR calcium data (using the mapReceptor() function from DoOR), we have created hypotheses about their respective mappings.
We considered correlations that were significant (p < 0.05) and relevant (correlation coefficient above 0.75). With these criteria we found that glomerulus VC5 (Ir41a) responses correlated with high significance to the DoOR responding units ac2 (summed OSN data; r = 0.8, p-value = 2.3 * 10 −6 , n = 24) and ac2A (r = 0.76, p-value = 4 * 10 −3 , n = 12). For DP1l the situation is more complex: Two classes of OSNs express Ir75a. Neurons from ac2 sensilla that target glomerulus DP1l express only Ir75a. Neurons from ac3 sensilla that target glomerulus DL2 additionally express Ir75b and c 12 . In our analysis data originating from DP1l correlated with high significance to the ac3A responding unit in DoOR (r = 0.82, p-value = 7.2 * 10 −7 , n = 24), indicating that ac3A neurons expresses Ir75a/b/c and that Ir75a accounts for large parts of the response profile. This supports the notion from Yao et al. 38 that the B neuron expresses Or35a. Together the situation appears to be the following: ac2A neurons express Ir75a and target the DL2 glomerulus; ac3A neurons express Ir75a/b/c and target the DL2 glomerulus; ac3B neurons express Or35a (and Ir76b) and target the VC3 glomerulus (Table 1).
While these correlations match published IR-sensillum expression patterns 12 , we note that Ir64a.DC4 correlated with high significance to ac3A (r = 0.81, p-value = 2 * 10 −6 , n = 24) which would contradict that Ir64a is expressed in sacculus neurons 16,30 : more experimental data will be needed here.

Discussion
Animals can code for millions of odors with a limited number of olfactory receptors. Even though estimating the exact capacity of olfactory systems is a matter of fierce debate [39][40][41] , it is clear that the combinatorial nature of olfaction lies at the basis of this astounding capacity: it is the pattern of activity across receptor neurons that gives the brain the necessary information about the chemicals in its environment. In a combinatorial world (receptor ON or OFF) the theoretical capacity of n receptor types would be 2 n , with 50 receptors in Drosophila that would be 2 50 , corresponding to approx. 10 15 . Since olfactory coding is not binary (i.e. receptors can have all intermediate and also negative activity values), the number might even be higher. On the other hand, the capacity might be smaller, since the code is redundant, and has also to accommodate temporal complexity, concentrations, and mixture analysis, making a prediction of the exact number difficult. Nevertheless, it is apparent that understanding the olfactory code is only possible if the response to a substance is known for all sensory neurons. Given the large number of receptors, and the large number of possible stimuli, this is a daunting task, and no task that a single research group, not even a consortium, could accomplish. Therefore, we have created a technology that allows to merge data from different groups, recorded with different techniques, into a consensus database. The first version has been used by the community for five years now, with great success. We present the second version, with more data (including previously unpublished data) and better tools, in this paper.
Every database is a service to the community that needs long-term care and participation. In order to keep its value we need to continuously update and improve data and code and for this rely on the support from the community. For DoOR 2.0 we received considerable help from many colleagues who supplied us with the raw data of their publications, and even with unpublished datasets. Some datasets however could not be obtained (lost data, crashed disks, refusals to reply to emails). DoOR version 2.0 has many improvements over DoOR 1.0. DoOR is now hosted on GitHub so the most recent code and data are always accessible. At the same time releases with individual DOIs will be available via Zenodo and CRAN. GitHub also eases bidirectional communication, users can send feature requests, report bugs and hints to missing data via the GitHub issue tracker system. InChIKeys are the new unique identifiers used in DoOR, and we added several new functions, e.g. for calculating kurtosis (sparse()) or identifying a recorded OSN based on its odorant specificity (identifySensillum()). Our goal is to move DoOR increasingly into a common tool used and shaped by the community, rather than just provided by us.
This new version contains additional data for new odorants and additional receptors. DoOR now includes 11 new studies contributing 15 new datasets. 467 new odorants were added, 13 glomeruli and five additional response profiles were deorphanized. In total 2894 responses of new odorant-responding unit combinations were added.
Scientific RepoRts | 6:21841 | DOI: 10.1038/srep21841 Where are the remaining major lacks in the dataset now? From a neuro-circuitry point of view, the most important lack is in glomerulus VA7m, which has not yet been linked to a receptor. Also, the area innervated by IR receptors in the antennal lobe and their mapping to sensilla and response profiles remains understudied. More data in the next few years will elucidate the missing information. From an olfactory point of view, many odorant responses are still missing (see gray area in Fig. 1a). But the number of potentially interesting chemical compounds that may have an odor is virtually infinite, and therefore just adding new compounds will have only limited effect on our understanding of the system. Thus, the major question for the future years will be with respect to those receptors which do not yet have very strong best responses (e.g. Or2a, Or23a and ab10B; see Table 1 for a full list). It is conceivable that we are missing the best ligands for these receptors, and that finding a stronger ligand will help us understand the importance of that responding unit within the olfactory code, and for the animal's ecological niche. These studies will also help us in defining how odors are encoded in the first place, i.e. how the brain extracts information from the activity patterns across receptors. When does the animal interpret activity in a receptor with low kurtosis (e.g. geosmin in ab4B) as that substance being present, when does it interpret activity in the same receptor as a result of another odorant with weaker affinity? The answer most likely is found in the ensemble response across receptors, and can only be found if we know that representation. The value of DoOR lies in the overview of the Drosophila olfactome that helps to understand the nature of combinatorial coding. It enables analyses as calculating the kurtosis of ensemble responses elicited by individual odorants or mapping unknown response profiles. It is a resource for modeling, for selecting experimental parameters and can be used for extrapolating new response patterns.
Nevertheless DoOR is -and always will be -incomplete, not only in the sense that some odorants will always be missing. Some aspects of olfactory coding are not yet covered, others are impossible to cover in such a consensus approach. Aspects that might be covered in future versions include odorant concentrations. Merging responses of different odorant concentrations across labs is difficult because it is difficult if not impossible to measure absolute concentration in a controlled way. For one, the absolute odorant concentrations reaching the animal depends on the vapor pressure of a compound. Concentrations are also influenced by how a stimulus is presented. Within studies extremely potent substances are often used in higher dilutions, so that we had to exclude these in some of the DoOR datasets -an unfortunate aspect, because these are, after all, the best ligands. While it is impossible to correctly integrate absolute concentrations of stimuli across studies, it is mathematically possible to integrate concentration-response curves, should they be published more often. In this case, we would add separate datasets per concentration/dilution into DoOR. Similarly saturating and adapted responses created by strong ligands are problematic as they lead to flattening of the odorant response patterns and create distortions in the mapping function. As an example: butyl acetate and ethyl hexanoate elicit equally strong responses from Or22a expressing OSNs when tested at a 1:100 dilution 3 but the OSNs are sensitive to an almost three log steps lower concentration of ethyl hexanoate as compared to butyl acetate (quantified as the dose eliciting the half maximal response in Pelz et al. 34 ).
Other features that cannot be consensualized in a database include temporal dynamics of odorant responses. However, such information could be implemented in future DoOR releases by enabling to store full time traces as meta links to the individual response points in the matrix, allowing the user to have access to all response time-courses. Similarly, mixture responses or responses to complex odor stimuli are unlikely to enter DoOR due to the many parameters they depend on, but a future version of DoOR might still act as a repository for such information.
There are many other online databases that collect information across studies to provide important tools for neuroscience research. To name a few, FlyBase (http://flybase.org/), and the Virtual Fly Brain (http://www. virtualflybrain.org/) focus on genetic or morphological aspects of Drosophila; Sense Lab (https://senselab.med. yale.edu/) offers four different olfactory databases offering e.g. odorant activity maps of mouse olfactory bulbs and genetic codes of olfactory receptors of many species; PubChem (https://pubchem.ncbi.nlm.nih.gov/) and ChemSpider (http://www.chemspider.com/) provide detailed information on millions of chemical compounds. All these databases complement each other and the value of each individual database increases with mutual links so that additional information on odorants, genes or morphological structures is only a click away. Currently we link from the individual records of our DoOR web page to FlyBase, the Virtual Fly Brain and PubChem. We plan to increase the linked databases in the future.
DoOR facilitates a variety of analyses on the Drosophila olfactome, allowing to ask important questions. As an example, we searched for odorants that elicit narrow activation patterns across responding units (Fig. 4b). These odorants are likely of special importance for Drosophila, since narrow response patterns are computationally easy to distinguish from patterns elicited by other odorants. One example of such an important odorant for Drosophila is CO 2 , which activates a single type of OSN and innately mediates aversion 42 . Water had the second highest kurtosis value in our analysis. Water is a very important stimulus for every animal. However the activated Ir64a.DC4 is likely an acid sensor, and whether the water response might be resulting from the pH < 7 of the distilled water remains to be investigated (Ai et al. 54 ; Ana Silbering, personal communication). We also provide a sensillum identification tool, identifySensillum(): by inserting odor-response values of a recorded unit, the tool provides plausible responding unit and sensillum candidates. We hope this tool will be helpful for electrophysiologists or other experimenters as a tool for reliably identifying recorded units in physiological experiments.
The increasing number of narrowly tuned OSNs being described in Drosophila in the recent years 27,28,42 has sparked a debate about whether effective coding of odorants works via a set of narrowly tuned labeled line OSNs, or whether the predominant nature of olfactory coding is combinatorial 27,28,[42][43][44] . Drosophila also has many OSNs that are broadly tuned to many different odorants, such as Or35a (in ac3B) or Or69a (in ab9) (Fig. 3a,b). Both strategies have advantages. On the one hand being extremely sensitive to ecologically relevant substances is a prerequisite for finding proper food, mating partners or egg-laying sites. On the other hand, broadly tuned OSNs that sample all across the chemical space ensure that animals are not anosmic to new substances and can adapt Scientific RepoRts | 6:21841 | DOI: 10.1038/srep21841 to new or changing environments. Indeed, the Drosophila olfactory system is able to detect and distinguish substances that do not play a role in a fly's daily life, like explosives, drugs and breast cancer metabolites 29,45 . It may well be that the same glomerulus that is highly specialized for a particular odorant in a low concentration/high sensitivity mode, participates in an across-glomeruli combinatorial representation in a high concentration/low sensitivity situation 46 .
We put this service into the hands of our colleagues and hope to provide a useful tool for the exploration of the olfactory code and for designing future experiments. We strongly rely on your support and feedback and are happy to include new data and process feature requests and bug reports. We also note that the DoOR framework is in no way restricted to Drosophila but can be used for different species right away if sufficient odor-response data is available. With the advances in in-vitro screenings of human olfactory receptors 47,48 , a human DoOR might soon be possible.

Material and Methods
Animals. For  Flies were kept at 25 °C in a 12/12 light/dark cycle at 60-70% RH. Animals were reared on standard medium (100 mL contain: 2.2 g yeast, 11.8 g of sugar beet syrup, 0.9 g of agar, 5.5 g of cornmeal, 1 g of coarse cornmeal and 0.5 mL of propionic acid).

Odorant preparation.
Odorants were purchased from Sigma-Aldrich in the highest purity available. Pure substances were covered with Argon to avoid oxidation. All odorants were applied at 10 −2 diluted in 5 mL mineral oil (Sigma-Aldrich, Steinheim, Germany). Odorants were prepared in 20 mL head space vials covered with pure nitrogen to avoid oxidation (Sauerstoffwerk Friedrichshafen GmbH, Friedrichshafen, Germany) and immediately sealed with a Teflon septum (Axel Semrau, Germany). A list of all odorants and the measured responses is given in Table S5.
Calcium imaging. Calcium imaging was performed on two setups which consisted of a fluorescence microscope (BX50WI or BX51WI, Olympus, Tokyo, Japan) equipped with a 50× air lens without cover slip correction (Olympus LM Plan FI 50× /0.5). Images were recorded with a CCD camera (SensiCam, PCO, Kelheim, Germany) with 8 × 8 pixel on-chip binning, which resulted in 80 × 60 pixel sized images. We recorded each stimulus for 20 s at a rate of 4 Hz using TILLvisION (TILL Photonics, Gräfelfing, Germany). A monochromator (Polychrome II or Polychrome V, TILL Photonics, Gräfelfing, Germany) produced excitation light of 470 nm wavelength which was directed onto the antenna via a 500 nm low-pass filter and a 495 nm dichroic mirror, emission light was filtered through a 505 nm high-pass emission filter.

Stimulus application.
A computer-controlled gas chromatography auto sampler (PAL, CTC Switzerland) was modified and used for automatic odorant application. A head space of 2 mL was injected in two 1 mL portions at time points 6 s and 9 s with an injection speed of 1 mL s −1 into a continuous flow (60 mL min −1 ) of purified air. The stimulus was directed at the antenna of the animal via a Teflon tube (inner diameter 2 mm, length 39.5 cm, with the exit positioned ~2 mm from the antenna). Stimuli arrived at the antenna with 750 ms delay due to delays in the autosampler and the flow. Therefore, stimulus onset was determined as 6.75 s and 9.75 s.
Four to eight odorants were presented in a row (one block, ISI > 2 min) interspaced by solvent control, room air control and an receptor specific reference odorant. The reference odorants were Data analysis: calcium imaging. We analyzed calcium imaging data using custom written routines in IDL (ITT VIS, USA) and R 51 .
Recorded movies were manually corrected for lateral movement artifacts. Then, an area of interest was defined for the parts of the antenna that showed fluorescence increase upon stimulation. Time traces were averaged across this area. We included all measurements into the analysis as long as animals showed stable responses to the reference odorant.
Relative percentage fluorescence change was calculated as ∆ / = (( − )/ ) × F F F F F 100 i 0 0 with F i being the fluorescence at frame i and F 0 being the mean fluorescence of 5 s before stimulus onset.
To correct for the photo-bleaching of the dye, we fitted an exponential decay function of the form A * exp −x/B + C to each response trace using the nls() function in R. Because some odorant responses would not reach baseline within measurement time, the decay rate parameter B was estimated from the median mineral oil control trace within each animal. We omitted 750 ms at the beginning of the time-trace and 11 s after stimulus presentation. The pre-stimulus part of the recording was weighted 100 fold 52 .