Introduction

Cancer cells exhibit a metabolism that is fundamentally altered as compared to that of normal cells1,2, leading to changes in the tumor's microenvironment, in lipid peroxidation activity3,4 and to a variety of potential intra- and extracellular cancer-specific markers. Thus, quantitative and qualitative variations of metabolites provide important information on cell condition5. As a consequence, metabolomics is emerging as a novel diagnostic methodology. The detection of cancer-related marker molecules has been focussed on examining the composition of cell serum or culture media6,7. Since many metabolic products are small molecules, the analysis of volatile organic compounds (VOCs), e.g. in breath, is a promising non-invasive alternative for identifying cancer tissues8,9,10.

Several studies investigated VOC profiles in the headspace of cancer cell cultures11,12,13. Traditionally, analyses of cancer cell VOCs have been performed using gas chromatography - mass spectroscopy (GC-MS), the gold standard for sequential and analytic VOC analysis. In recent years, VOC analysis has been complemented by gas sensor arrays11,14,15 which offer a simultaneous and synthetic representation of the VOC profile in real time. Detection of cancer with gas sensor arrays has been demonstrated both in vitro using the headspace of cell cultures11,16 and in vivo by analysing breath samples14,15.

Gas sensor arrays are also referred to as electronic noses. They resemble natural olfactory systems in that they consist of an array of differentially selective units17. However, electronic noses are still limited with respect to sensitivity and receptive range18, a criterion for which they can perform worse than biological systems19.

The observation that animals, e.g. dogs, can recognise various types of cancer, set off a new wave of experiments20,21 (reviewed in22,23). Dogs were able to detect melanoma tissue in skin samples24, bladder cancer25 and prostate cancer26 in urine samples, lung and breast cancer in breath samples27,28, ovarian carcinoma in tissue and blood samples29,30 and colorectal cancer in breath and watery stool samples31. These and other applications, where dogs are employed e.g. to detect illicit substances, show how well-suited natural noses are for chemosensing, even if the target odours do not occur in the animal's natural environment. Similarly, fruit flies can distinguish odours that do not belong to their normal habitat32.

Using dogs for chemosensing has the limitation that access to the signals is only indirect, via the animal's behaviour in combination with a human trainer and interpreter. Avoiding a behavioural readout, we here show that the fruit fly's antenna can be accessed directly by calcium imaging, giving rise to a natural chemosensor array that can detect cancer-cell specific volatile profiles.

Calcium imaging of the fruit fly's antenna

The structure of most olfactory systems is similar: Olfactory receptor neurons (ORNs) express a given olfactory receptor (OR) type. In the fruit fly Drosophila melanogaster there are about 50 different types of OR expressing ORNs. The expressed type of OR defines the characteristic spectrum of ligands the ORN will respond to33. A particular odour elicits an ensemble response which consists of the differentially activated, inhibited or non-activated ORNs34,35. This ensemble response can be observed by calcium imaging on the antenna surface of the fly (Fig. 1a) and it is conveyed to higher brain areas for further processing (Fig. 1b).

Figure 1
figure 1

Calcium imaging of the fly's antenna.

(a) Top: The antenna of a fixed fruit fly is exposed to fluorescent light. Bottom: Fluorescent image of the left antenna of an Orco-GCaMP3 fly. (b) Top: The peripheral olfactory organs comprise the antennae and the maxillary palps. Antennae were used in this study. Bottom: Morphological categories of sensilla (which house the ORN's dendrites) and their distribution on the third antennal segment. The fluorescent area on the antennal surface in (a) overlaps with the area of large and small basiconic sensilla. (c) Schematic of GCaMP function (left) and Ca2+ sources (right). VGCC voltage gated cation channel, OR odorant receptor, ORCO OR co-receptor, CICR Ca2+ induced Ca2+ release, IP3ICR IP3 (second messenger molecule) induced Ca2+ release.

Calcium imaging relies on the fact that the intracellular Ca2+ concentration is correlated with action potential rate and can be used as a proxy for neuronal activity36,37. The cytosolic Ca2+ concentration rises mainly through extracellular Ca2+ influx via voltage-gated Ca2+- or cation-channels36,38,39, or via influx from intracellular stores39. For Drosophila ORNs, influx of extracellular Ca2+ via ligand-gated channels is discussed40,41 (Fig. 1c). The change in Ca2+ concentration can be measured with reporter proteins like GCaMP42,37. GCaMP consists of a Ca2+ binding calmodulin domain and a green fluorescent protein. Upon Ca2+ binding, GCaMP changes its conformation and fluorescence properties, now emitting more light at 530 nm (Fig. 1c).

Here, we expressed GCaMP3 in approx. 30 (of the 50) antennal ORN types in Drosophila melanogaster by utilising the promoter of the naturally expressed co-receptor Orco in these ORNs43. We could thus record odour evoked activation patterns on the antenna by calcium imaging (see Fig. 1a)44.

A sensitive, natural receptor array

Insect odorant receptors are highly sensitive chemosensors45,46,47. Some studies have accessed the insect olfactory periphery through measuring an antennal compound signal. For instance, the antenna of the colorado potato beetle Leptinotarsa decemlineata was integrated into a field effect transistor (FET)48,49 which could resolve different concentrations (in the 10−9 order) of Z-3-hexen-1-ol, a green-leaf odour48.

FET signals as in48,49 and electroantennograms (EAGs) measure summed antennal activity, providing only a single feature. Reading out an array of olfactory receptors would lead to multiple features, enabling odour discrimination as by artificial sensor arrays. In a comparative study19, a natural Drosophila receptor array outperformed a metal oxide semiconductor sensor array, a popular gas sensor type. The fruit fly's natural sensors proved to be less correlated with each other and had a broader receptive range than their artificial counterparts, effectively covering larger parts of odour space. While Berna et al.19 relied on a collection of single sensillum recordings from the literature to evaluate the Drosophila receptor array in silicio, Park et al.50 developed an in vivo sensor array: Simultaneous EAGs from four different insect species resulted in a sensor array where each of the four sensors had a distinct odour response profile.

Here, we show that a large receptor array on the Drosophila antenna can be read out in vivo by calcium imaging, with a quality that is sufficient to identify cancer cell lines.

Results

Cancer odours elicit responses on the antenna

By selectively expressing GCaMP in all Orco bearing cells we measured odour responses on the Drosophila antenna (Fig. 2). The seven test odours (Methods) were taken from the headspace of the culture media of five different cancer cell line samples (canc1 to canc5), of a healthy control cell sample (healthy control) and of a cell free medium sample (medium control). The test odours were given as 1 s double pulses and in random order in each fly. Additionally, butanol and N2 (odourless) were measured as references at the beginning and at the end of each experiment. Odour delivery resulted in strong intracellular Ca2+ increases, whereas no response was visible during N2 applications (Fig. 2). Spatial response patterns differed between odours, but were reproducible for repeated stimulation with the same odour (Supplementary Fig. S1).

Figure 2
figure 2

Spatial odour response patterns on the antenna differ for different cancer cell lines: Fluorescence changes ((F − F0)/F0) in response to odour stimulation.

Images are spatially filtered with a Gaussian kernel (width = 3). The colour scale is min-max (with respect to the entire movie: see Supplementary Fig. S1).

Example spatial response patterns for healthy control, canc2 and canc1 are shown in Fig. 2 (complete odour set: see Supplementary Fig. S1). All three response patterns differed clearly from the N2 control. The cancer odours exhibited a spatially broader and higher amplitude response than healthy control (Fig. 2).

Formalising the visual impression, we computed distances between odour response patterns based on a subset of the pixels: As noisy or non-responding pixels can obscure odour distances, we employed unsupervised feature selection (Methods) to select a set of 300 informative response spots on the antenna of each fly. A response pattern then comprises the activity values of all response spots in the same fly at a particular point in time. Fig. 3a visualises the positions of such response spots on one antenna (same fly as in Fig. 2). Thus, we reduced the CCD-camera readout of 80 × 60 pixels down to 300 representative spots. Each spot represents a place on the antenna and thus the response of an unknown number of ORNs. Because ORNs are not uniformly distributed on the antenna, responses differed for different locations.

Figure 3
figure 3

Response spots form spatially contiguous clusters with common response properties.

(a) Left: Raw fluorescence image showing the Drosophila antenna. Right: Clustering overlayed (see Methods). (b) Mean time series ((F − F0)/F0) of the black and red clusters from (a). Note that the black cluster responds to butanol and all test odours, while the red cluster responds to cancer cells, but not to clean medium or butanol. The sequence of odour application (double pulses, 1 s each) is indicated on the x-axis. In the experiments, odours were separated by a pause of one minute. The black bars at the top mark the time points (5 for each response peak) selected for further analysis.

Next, we clustered the response spots based on response time-series similarity (Methods). Fig. 3a reveals several, spatially contiguous clusters on the surface of the antenna. Several clusters exhibited responses to all odours, with slight differences in response amplitude. Others showed distinguished, odour-specific responses. For example, in Fig. 3b the response spots in the black cluster exhibited strong responses to all stimuli except for the N2 control, while the spots in the red cluster showed higher responses to the cancer cell samples than to healthy control and medium control.

Response spot positions are roughly conserved between flies

We next compared responses across flies. To this end, for each measurement we used the time windows where responses peaked (frames 30–34 and 40–44; marked as black bars in Fig. 3b). These values, across odour responses, were z-score normalised to correct for animal-specific differences in signal amplitude and variability (Methods), resulting in what we now call “response profiles”.

We clustered these response profiles across all flies (Fig. 4). Within each fly, the resulting clusters were spatially contiguous, confirming the findings above (Fig. 3a). Every fly contributed to most clusters and the approximate spatial location of these clusters was conserved across flies (Fig. 4a, s.a. Supplementary Fig. S2).

Figure 4
figure 4

Response spot clusters are roughly conserved across flies.

(a) Clustering of response spots across all flies. The 14 (out of 15) clusters with members in each fly are superimposed onto raw fluorescence images from the respective antenna movies. For an extended version of this figure, see Supplementary Fig. S2. (b) Cluster centers for two of the clusters from (a). The cluster centers are normalised odour response profiles containing 10 time points for each odour presentation. (c) False-colour coded response magnitudes (mean response over all 10 time points during the odour responses) of each cluster for all test odours. (d) Response magnitudes across odours for clusters 5, 12 and 13 (same data as in c). Note that cluster 12 has no response to butanol, but responds to the different test odours, while cluster 5 is strongly responsive to butanol and control medium, but less to cancer cell odours. Cluster 13 has a less specific response profile.

Clusters differed in their odour response profiles. For example, cluster 11 exhibited stronger responses to the reference odour butanol than to the test odours, while cluster 12 responded to the cancer odours but less to healthy control (Fig. 4b, c). Similarly, the response profiles of cluster 5 and 12 WERE complementary, while cluster 13 RESPONDED to all test odours, but more strongly to the test odours than to the butanol reference (Fig. 4d). While distances between odour response patterns are based on many response spots, already a single, spatially confined region on the antenna (e.g. cluster 12) might contribute significantly to a cancer detection task.

Note that the clustering was performed on response profiles, i.e. using odour response information only. The spatial aspects, contiguity within a fly and conserved position across different flies, emerged solely as a consequence of response profile similarity. The odour set employed in the experiments has induced a functional segmentation of the antenna surface: Control and cancer odours lead to distinguished and reproducible odour responses on the antenna with clusters of response spots that are functionally similar across animals.

Odour distances are strongly conserved between flies

We investigated whether distances between odour response profiles allowed for discrimination of healthy control and cancer odours. For a single antenna, Fig. 5a visualises the distances between odour response profiles by a two-dimensional PCA projection (Methods): The high-amplitude responses to butanol and the (odourless) N2 ended up at opposite sides of the 2D space, with the seven test odours in between, approximately aligned along an axis.

Figure 5
figure 5

Cancer odours can be detected using Drosophila antennae.

(a) Single fly (same fly as in Fig. 2): Both reference and test odours in a 2D space spanned by PC1 and PC2. (b) Only test odours in a 2D space spanned by PC1 and PC2. (c) Scree plot of the PCA analysis shown in a). Variability accounted for remains high for the first 10 dimensions. (d) Scree plot of the PCA analysis shown in b). Variability accounted for remains high for the first 6 dimensions. (e) Mean (and standard deviation) Pearson correlation between the 7 × 7 test odour distance matrices over time. Each odour was applied two times (black bars). Time points marked in orange were used for the response profiles. (f) Ranked distances (mean and std. dev.) of all stimuli that contain the culture medium to medium control. Stimuli are significantly different (Kruskal-Wallis rank sum test, p = 0.00003). Post-hoc testing: Significant differences (corrected p < 0.05; pairwise Wilcoxon rank sum tests with Holm correction) are marked with stars.

For Fig. 5b, PCA was computed on a matrix containing only responses to the seven test odours: Uninfluenced by the strong amplitude difference between the butanol and N2 responses, the two-dimensional projection better resolved the smaller distances between the test odours, leading to a clear separation of the two control odours (medium control and healthy control) from the cancer odours (canc1 to canc5). We obtained comparable results on pooled data, considering all response spots from all antennae (Supplementary Fig. S3).

We further assessed dimensionality of the odour responses by scree plots (Fig. 5c,d), showing that 10 or 6 dimensions, respectively, contributed to the variance in the odour responses. How significant are these differences across animals? If the observed distances are based on characteristic odour responses, correlation should be high at odour response and low otherwise, while variability across animals should be low at odour response and high otherwise. This is indeed what we found: Based on the signals of the c = 300 response spots in each animal and for each time point of the measurements, we calculated the 7 × 7 Euclidean distance matrices between the seven test odours. These were then correlated (Methods) to quantify odour distance conservation across animals (Fig. 5e). We found two pronounced peaks in mean correlation that coincided with the two odour pulses. Maximum mean Pearson correlations were greater than 0.9 (Fig. 5e).

Next, we quantified the distances (using full-dimensional response profiles) of healthy control and of the cancer odours to medium control. Healthy control consistently had the smallest distance to medium control, while the distance to the cancer cell samples was much larger and significantly different (Kruskal-Wallis, p = 0.00003; post-hoc: pairwise Wilcoxon with Holm correction). Fig. 5f summarises ranked distances to medium control. We also found high inter-rater concordance for the distance ranking as measured by Kendall's coefficient of concordance (W = 0.81, p = 0.0002). Thus, based on distances between odours as measured by calcium imaging of the antenna, it is possible to discriminate healthy control cells from cancer cells.

In order to assess the reliability of this approach, we performed linear discriminant analysis (LDA, MASS package for R) with leave-one-out cross validation: One odour/animal combination was omitted, then LDA was performed and the odour class of the omitted odour/animal combination was predicted. We found that classification success was high between the groups {canc1, canc2, canc3} and {canc4, canc5}, but misclassification was frequent within these groups (see Supplementary Fig. S3c). This indicates that it is not possible to reliably discriminate canc4 from canc5, or canc1 from either canc2 or canc3, but that discrimination between these two groups is robust. Importantly, in all cases healthy cells could be discriminated from cancer cells (LDA and leave-one-out cross validation, Supplementary Fig. S3d).

Both in the PCA projection (Fig. 5a, b) and in the distance analysis (Fig. 5f) two odour clusters became visible, grouping {canc1, canc2, canc3} and {canc4, canc5} together. We note that this clustering corresponds to differences in cell proliferation: The same grouping appears in the proliferation curves (Supplementary Fig. S4a), as canc4 and canc5 derive from faster growing cell lines (lower doubling time, Supplementary Fig. S4b) than canc1, canc2 and canc3.

While our protocol controlled for cell density by inoculating an appropriate number of cells (Supplementary Fig. S4a, c and Methods), this observation could also suggest that faster growing cells produce quantitatively more metabolites, rather than qualitatively different metabolites. As a result, our analysis would detect quantitative differences. Therefore, we quantified the total amount of substances in the probe overhead spaces using GC-MS. We found that canc3 had slightly less total VOC abundance than the other probes, while canc4 had slightly more total VOC abundance. Thus, VOC abundance does not correlate with cell proliferation (Supplementary Fig. S4b, d), indicating that quantitative differences in metabolite production cannot explain the results reported here. Possibly, cell proliferation speed in cancer and healthy control cells (highest doubling time, Supplementary Fig. S4b) translates into metabolic differences that affect the qualitative chemical composition of the culture medium headspace.

Discussion

We have shown that an array of odorant receptors on the fruit fly's antenna can be read out by calcium imaging, allowing to access a highly sensitive, natural chemosensor array. These Drosophila sensors are capable of detecting and discriminating medically relevant odours at low concentrations. Odours from culture media imprinted by different cancerous and healthy cell samples were represented by distinct odour response patterns on the antenna and distances between response patterns were conserved across flies. The finding that cancer cells produce different VOC profiles opens up the possibility of employing Drosophila sensors for analysing volatiles emitted from human breath, skin or other bodily products.

Our results confirm earlier findings32 indicating that odorant receptors with broad receptive ranges can provide distinguished responses to a variety of chemicals even if these chemicals do not occur in the animal's natural environment. This generality of olfactory sensing is also the basis for the reported cases of cancer classification by other animals, such as dogs23. Direct access to an array of odorant receptors, as established in this work, is an objective and standardisable way of circumventing the behavioural readout from animal systems that is prone to errors and that requires training of animals and human operators.

In contrast to electrophysiological recordings of single sensilla32, imaging a receptor array allows to record many receptors simultaneously. Initially, imaging the antenna surface results in data from a number of sensors (pixels) that is much larger than the number of accessible receptor types (approx. 30). Pixel responses may contain single receptor signals or unknown mixtures of multiple receptor signals. Pixels can be redundant or they may contain only noise. To obtain a diverse set of informative features for distance analysis, without using information about odour identity, we thus employed an unsupervised feature selection approach (Methods).

Relying on a natural receptor array differs from test assays targeted at cancer-specific metabolites or markers that have to be known beforehand. Our array strategy works also in the absence of such knowledge, allowing to discriminate cancer cells from healthy cells based on differences in their chemical markup that need not be known a priori. An array response comprises multiple, to some degree independent, features that can be combined and that increase the sensor's robustness. Differences between cancer cells and healthy cells that consist not of the presence or absence of key compounds, but rather of modified relative concentrations, are easily detected in a combinatorial array response pattern, while they can be confounded with sample concentration in a single-detector assay.

Distances between odours were conserved across flies and differences in cell growth rate may account in part for the observed distances and clusters. However, there was no complete correspondence between cell growth rate (Supplementary Fig. S4) and odour distance (Fig. 5): canc4 and canc5, which had the highest growth rate, had a smaller distance to healthy control (lowest growth rate) than canc1 - canc3 (intermediate growth rate). Thus, growth rate and cell density may contribute to the odour distances, but they cannot explain them entirely.

Based on the odour response distances, we can state that 1) all cell types leave traces in headspace VOC profiles that distinguish them from medium control, that 2) larger differences occur for the cancer cells than for healthy control and that 3) at least one chemical occurs in different concentration in cancer cells with respect to healthy control cells. The latter difference may be decreased or increased concentration. The increased responses to the cancer odours in cluster 12 suggests that the concentration of at least one substance was increased.

The findings of this study may also lead to a more targeted test assay for the breast cancer types. This would involve identification of several key odorant receptors that contribute most to the odour response distances. We found that, across flies, a spatially confined region on the antenna shows graded responses to the test odours (cluster 12 in Fig. 4b). Judging from the position on the antenna, it appears likely that the responses in this region stem from large basiconic sensilla51 (s.a. Figure 1b), a hypothesis that needs to be tested with dedicated experiments on single receptor cell lines.

Detection of cancer odours with artificial chemosensing systems has been demonstrated before, e.g. by GC-MS, by metalloporphyrin-coated quartz microbalances or by gold nanoparticle sensor arrays12,14,11. While sensor arrays can provide real-time responses, gas chromatography is only sequential and often requires preconcentrating the samples in order to achieve sufficient sensitivity. Electronic noses are not yet universal detectors and they are usually limited in their receptive range and their sensitivity18. Even though the Drosophila sensor array is bound to be limited in its receptive range as well, harnessing the capabilities of biological sensor systems can ultimately lead to sensitive chemosensors with a broader receptive range than available today and it might help to complement existing electronic noses, filling the gaps that cannot be reached by artificial systems.

Here, we have presented a proof of concept, showing that the Drosophila sensor array is suitable for medical applications. Future work includes integrating odorant receptors into artificial systems with the potential for real-time readouts from a sensor array with high sensitivity and a broad receptive range.

Methods

Cell cultures

Six different human breast cell lines were used in the experiments: MDA-231, MCF-7, SKBR3 (kindly provided by Prof. Giannini G., Department of Molecular Medicine, “Sapienza” University of Rome, Rome, Italy), BT474, ZR75-1 and MCF-10A (kindly supplied by Dr. Falcioni R., Department of Experimental Oncology, Regina Elena National Cancer Institute, Rome, Italy).

The immortalised, non-transformed human mammary epithelial cell line MCF-10A, referred to as healthy control, was grown in DMEM/F12 medium (Sigma-Aldrich) supplemented with 5%fetal bovine serum, 20 ng/ml epidermal growth factor (EGF), 10 μg/ml insulin, 0.5 μg/ml hydrocortisone (Sigma-Aldrich), 100 units/ml penicillin and 100 μg/ml streptomycin (Sigma-Aldrich), as previously described52.

The five human breast cancer cell lines (canc1 to canc5) were derived from different breast cancer histotypes: MDA-231 (canc4), MCF-7 (canc5) and SKBR3 (canc1) cell lines from metastatic breast adenocarcinoma (MetAC), BT474 cells (canc2) from invasive ductal carcinoma (IDC) and ZR75-1 cells (canc3) from metastatic invasive ductal carcinoma (MetIDC) (see ATCC.org website). These cancer cells were grown in DMEM culture medium (DMEM high-glucose medium (Sigma-Aldrich) supplemented with 10% fetal bovine serum (Sigma-Aldrich), 100 units/ml penicillin and 100 μg/ml streptomycin (Sigma-Aldrich)). All cells were cultured under standard conditions at 37°C in humidified atmosphere containing 5% CO2.

Analysis of cell growth rate

In order to inoculate the media for the later VOC experiments with a number of cells that will result in a similar cell number for all cell lines after 24 h, the proliferation rate of healthy control and cancer cell lines was analysed beforehand by cell count over 96 h (Supplementary Fig. S4). Each cell line was seeded in triplicate in its specific culture medium at an initial concentration of 2.5 × 105 cells/flask (25 cm2) and cell growth was monitored after 24, 48, 72 and 96 h. Data from individual growth curves was used to calculate growth rate and doubling time (td) with http://www.doubling-time.com/compute.php. The proliferation of healthy control cells was also analysed in DMEM culture medium (as used in the later VOC analysis) to check for a possible alteration of growth rate: No significant changes (compared to their growth in the specific culture medium) were observed up to 96 h of culture (Supplementary Fig. S4c).

Sample preparation

For VOCs analysis, healthy control and cancer lines (canc1 to canc5) were seeded in triplicate in culture flasks (25 cm2) in 5 mL of their specific culture medium and were grown for 24 h. The number of plated cells was chosen based on the specific doubling time of every cell line, in order to obtain a comparable cell number at the end of the incubation of 24 h. After 24 h, the specific culture medium was removed and replaced with 5 mL of the DMEM culture medium. Cells were grown in these conditions for the next 96 h, up to a confluence of 50%-60% (around 1.5 × 106 cells/flask). After this incubation period, the DMEM culture medium was harvested, centrifuged at 1200 rpm for 5 min and collected in sterilised glass vials. Note that with this procedure, all samples derive from flasks with comparable cell density.

The medium control was obtained by incubating DMEM culture medium in the same conditions as the cell samples, but without seeded cells. Thus, medium control contains the same background odour as the cell samples, but without the influence of cells. Cell viability was evaluated by Trypan Blue exclusion test in order to assess the effect of any cell stress during the incubation time.

Animals

Drosophila melanogaster were kept at 25°C on a 12/12 light/dark cycle. Flies were reared on standard medium (100 ml contain: 0.7 g of agar, 2.4 g yeast, 2.1 g of sugar beet syrup, 7.1 g of cornmeal, 6.7 g of fructose, 1.4 ml of Nipagin (10%), 0.6 ml of propionic acid).

Flies were of genotype w; P[Orco:Gal4]; P[UAS:GCaMP3]attP40, expressing the Ca2+ reporter GCaMP342,37 in all Orco bearing cells (UAS-GCaMP3 flies were provided by Loren L. Looger, Howard Hughes Medical Institute, Janelia Farm Research Campus, Ashburn, Virginia, USA). 1-5 day old female flies were used for experiments.

Odorant preparation

The reference odorant 1-butanol was purchased from Sigma (Sigma-Aldrich, Steinheim, Germany; CAS: 71-36-3) in ≥99.5% purity and diluted in 5 ml mineral oil (Sigma-Aldrich, Steinheim, Germany; CAS: 8042-47-5) to a concentration of 10−4 vol/vol. All odours were prepared in 20 ml headspace vials, covered with nitrogen and sealed with a Teflon septum (Axel Semrau, Germany). Cancer samples were used in 1 ml aliquots. Nitrogen was directly taken from an injector connected to a gas bottle.

Calcium imaging

Calcium imaging was performed as described elsewhere44,53. In brief, we used a fluorescence microscope (BX50WI, Olympus, Tokyo, Japan) equipped with a 50× air lens (Olympus LM Plan FI 50×/0.5). A CCD camera (SensiCam, PCO, Kelheim, Germany) was mounted on the microscope recording with 8 × 8 pixel on-chip binning, which resulted in 80 × 60 pixel sized images. For each stimulus, recordings of 20 s at a rate of 4 Hz were performed using TILLvisION (TILL Photonics, Gräfelfing, Germany). A monochromator (Polychrome V, TILL Photonics, Gräfelfing, Germany) produced excitation light of 470 nm wavelength which was directed onto the antenna via a 500 nm low-pass filter and a 495 nm dichroic mirror. Emission light was filtered through a 505 nm high-pass emission filter.

Stimulus application

Odours were applied automatically using a computer-controlled autosampler (PAL, CTC Switzerland). 2 ml of headspace was injected in two 1 ml portions at time points 6 s and 9 s with an injection speed of 1 ml/s into a continuous flow of purified air flowing at 60 ml/min. The 1 s stimulus was directed onto the antenna of the fly via a Teflon® tube (inner diameter 1 mm, length 38 cm).

Experiments were performed double-blind. The seven test odours (healthy control, medium control, canc1, canc2, canc3, canc4, canc5) were measured in random order. Reference odours (1-butanol and N2) were measured before and after a full block of test odours in order to ensure reproducibility and viability of the preparation (1-butanol) and to exclude contamination of the system (N2). The autosampler syringe was flushed with purified air for 1 min after each injection and washed with pentane (Merck, Darmstadt, Germany) automatically after each application of 1-butanol.

GC-MS measurements

Vials containing samples were placed in a water bath equilibrated at 40°C. Headspace VOCs were preconcentrated onto a SPME fibre (50/30 μm divinylbenzene/carboxen/PDMS, SUPELCO, Bellefonte, PA, USA) manually exposed to sample vapours for 1 h. The fibre with extracted VOCs was transferred to the GCMS (GCMS-QP 2010 Shimadzu) and desorbed at 250°C for 3 minutes in the injection port of the GC. The instrument is equipped with an EQUITY-5 (poly(5% diphenyl/95% dimethyl siloxane) phase, SUPELCO, Bellefonte, PA, USA) capillary column, 30 m length × 0.25 mm I.D. × 0.25 μm thickness. The analysis was conducted in splitless mode using ultra-high purity helium as carrier gas. Carrier gas constant liner velocity was kept constant at 30.2 cm/min. The oven temperature was kept at 40°C for 5 min, then increased by 7°C/min up to 220°C and then at 15°C/min up to 300°C. The final temperature was held for 2 min (total run time: 39 min). The mass spectrometer was used in the full scan mode over a mass range of 40–450 m/z. The detector voltage was 0.7 kV. The temperature of interface and ion source was kept constant at 250°C. The GC area of the samples was calculated using the section GCMS post-run analysis of the GCMS solutions software (version 2.4, Shimadzu Corporation).

Unsupervised feature selection

Antenna imaging movies were processed in a KNIME (www.knime.org) workflow using the ImageBee plugin54 for insect neuroimage data (http://tech.knime.org/community/image-processing). For feature selection, all 11 movies recorded in one fly (reference and test odours) were concatenated, resulting in a movie matrix A with dimensions (m = 80 × 11 time points) × (n = 80 × 60 pixels). All images were aligned by cross correlation to correct for animal movement.

In principle, all n pixels could be used for distance computations, however at the cost of including many unresponsive or noisy pixels that can obscure odour distances. We thus selected c = 300 pixels (features) based on their contribution to the norm of A.

Before feature selection, data was preprocessed to ensure that the norm of A was not dominated by e.g. unspecific background fluorescence: 1) Background fluorescence was removed by subtracting the mean image separately for each of the 11 movies. 2) Photon shot noise was reduced by smoothing individual images with a Gaussian kernel (width 9). Then, we selected c column vectors (pixels) into the m × c matrix C such that the Frobenius norm error was minimised (where C+ is the pseudoinverse). The objective criterion was optimised with the convex cone algorithm55. Minimising the Frobenius norm error in this way corresponds to the norm error minimisation objective of PCA and it helps to identify a diverse set of pixels that contribute a lot to the variance in A. I.e., instead of selecting pixels by an equally-spaced grid, pixel selection was biased towards areas that actually responded to the odour stimuli.

Response spot time series

Unsupervised feature selection from the movie matrix A (with m time points and n pixels) resulted in a set of c = 300 pixel indices. Before extracting the pixels from (the movement-corrected, but otherwise unprocessed) A, dye bleaching and other global intensity changes were reduced by histogram normalisation: For each of the 11 movies in A, we took the first image as a reference and matched all histograms of the remaining images to the histogram of the first image, obtaining the processed movie . Then, the c pixels were extracted from . In order to reduce noise, a postprocessing step described in55 replaced a pixel pj by the average of those pixels that have a time series more similar to pj than to any of the other c − 1 pixels.

In summary, from each antenna recording we obtained a set of c representative response spots, i.e. pixel positions and the corresponding averaged and processed time series, in an unsupervised fashion. Response spots were normalised to the prestimulus interval, separately for each of the 11 stimuli, by computing (Fi − F0)/F0 for each time point i. Here, F0 is the mean fluorescence during 20 time points before stimulus application and Fi is the fluorescence value at time point i.

For each time series, we selected five time points from each of the two response peaks (marked in Fig. 3b), resulting in t = 10 time points for each of the s = 11 stimuli, i.e. 110 time points. Together, each fly contributed a c × (s × t) response profile matrix M which was then z-score normalised: From all i rows (response spots), the mean μi was subtracted and rows were divided by the standard deviation σi. Likewise, from all j columns (time points), the mean μj was subtracted and columns were divided by the standard deviation σj.

Clustering, distance matrices, PCA

For Fig. 3, clustering was performed on complete (unnormalised) response spot time series from a single fly. For Fig. 4, clustering of (normalised) response profiles was performed on the row-concatenated matrix (all a = 1, …, N flies pooled). In both cases, we used the k-means clustering algorithm (stats package for R, default settings, 1000 restarts). The number of clusters (15) was estimated based on a scree plot of the overall within-cluster sum of squares error.

For analysis of odour distances, the Ma were reshaped as s × (t × c). Odour × odour (s × s) Euclidean distance matrices (Fig. 5e, f) were computed on the Ma. For correlation analysis, we regarded only the 7 × 7 distance matrices for the 7 test odours, enabling us to state explicitly that distances between the relevant test odours are correlated. We correlated, for each time point, the individual distance matrices: Fig. 5e shows the mean of all (N * (N − 1))/2 pairwise correlations between the N flies over time. By correlation we refer to the Pearson product moment correlation coefficient. Only the lower diagonal submatrices (without the diagonal) of the distance matrices were correlated.

PCA was computed on the full s × (t × c) matrix M from one fly (Fig. 5a) and on a modified matrix M′, from which all data points stemming from the reference odours butanol and N2 were removed (Fig. 5b). PCA for pooled data (Supplementary Fig. S3) was computed on the column-concatenated matrices Mpooled = {M1, …, MN} and , respectively.