Main

Gastric cancer (GC) is one of the most common causes of death from cancer worldwide, and most of the cases occur in developing countries (Ferlay et al, 2010). Unspecific clinical symptoms and the lack of defined risk factors often delay the diagnosis of the disease, leading to extremely poor prognosis and high rates of recurrence (Yasui et al, 2005; Pisters et al, 2008). Earlier diagnosis substantially improves the prognosis: 95% of patients with cancer that is confined to the inner lining of the stomach wall will survive longer than 5 years (Crew and Neugut, 2006).

The standard method for diagnosing GC is upper digestive endoscopy combined with biopsy and histopathological evaluation of the biopsy samples. This method has a high diagnostic accuracy of 95 to 99% (Dooley et al, 1984), but is plagued with some prominent drawbacks, which limit its suitability for population-based screening. First, compliance is reduced by the invasive and relentless nature of this procedure (Chen et al, 2009); second, the method is relatively costly, and requires highly skilled medical staff.

The incidence for GC varies widely in different regions of the world, reaching peak values in the countries of East Asia, Eastern Europe and South America (Pisters et al, 2008). In China, for instance, the age-adjusted incidence in men is 41.3/100 000 per year, whereas in the United States the corresponding incidence is almost one order of magnitude lower (5.7/3/100 000 per year) (Ferlay et al, 2010). The availability of upper endoscopy may be restricted in high-incidence areas, especially in the developing world, where population-wide screening would be necessary.

Japan, the first country that has started a population-based GC-screening programme, still recommends photofluorography both for organised and opportunistic screening (Hamashima et al, 2008), even though it involves exposure to X-ray irradiation. Indirect screening for atrophy, using blood pepsinogen tests, is recommended by the Asian-Pacific guidelines for high-risk populations (Fock et al, 2008), but so far the method has not been implemented for any organised screening programme. In areas of low GC incidence, on the other hand, endoscopy is frequently overused without major clinical gain, burdening the health budget. Hence, there is globally a high demand for a simple and non-invasive GC-screening test, to identify individuals at increased risk that should undergo an endoscopic examination, while avoiding unnecessary endoscopic investigations and costs in populations that are not at risk.

Biomarkers that are derived from exhaled breath may provide a safe and elegant solution for mass GC screening. Over the past two decades, the analysis of volatile organic compounds (VOCs) has witnessed an enormous boost, as they have been described as a possible method to diagnose rapidly a variety of diseases, for example, cancers of the lung, breast, colon, prostate, liver, head-and-neck, as well as kidney disease, multiple sclerosis and Parkinson’s disease (Gordon et al, 1985; O’Neill et al, 1988; Mendis et al, 1994; Phillips et al, 1994, 1999; Miekisch et al, 2004; Amann et al, 2007; Barash et al, 2009; Peng et al, 2009, 2010; Shuster et al, 2010; Song et al, 2010; Hakim et al, 2011; Ionescu et al, 2011; Tisch et al, 2011; Broza et al, 2013).

Haick and co-workers have developed highly sensitive, crossreactive, nanomaterial-based gas sensors that could classify different types of cancer in the exhaled breath, using statistical pattern recognition methods, irrespective of the patients’ gender, lifestyle, smoking habits and other confounding factors. The discriminative power of the sensor arrays was demonstrated in pilot studies, using limited patient cohorts (Peng et al, 2010; Tisch and Haick, 2010a, 2010b, 2010c; Hakim et al, 2011; Broza et al, 2013). Here, we demonstrate that arrays of nanomaterial-based sensors can distinguish the benign and malignant ulcers from other less severe gastric lesions, using breath samples of patients with gastric complaints. We further demonstrate that the results were not affected by important confounding factors such as alcohol/tobacco consumption and Helicobacter pylori (H. pylori) infection.

Patients and methods

Patients

Breath samples were collected after written informed consent from 160 volunteers with gastric complaints, aged 27–73 years, at the First Affiliated Hospital of Anhui Medical University (Hefei, China) (see Table 1).

Table 1 Clinical characteristic of all tested patients.

All volunteers underwent upper digestive endoscopy after recruitment according to the hospital’s routine clinical protocol. Biopsy samples were taken for histopathology, if lesions (including ulceration of the stomach lining) were visually observed. Otherwise, the endoscopic abnormalities were assessed according to the Sydney classification system of endoscopic division (Tytgat, 1991). The following exclusion criteria were applied before sample collection: patients who have undergone gastric resection in the past; patients who were found to suffer from endoscopically detectable precancerous conditions (e.g. mucosal atrophy); and patients who took medication affecting gastric acid secretion (e.g. proton pump inhibitors) and/or antibiotics during an interval of 1 month before the breath test. The reason for the latter exclusion criterion for this pilot study was that previous medication could strongly affect the composition of the exhaled breath.

After excluding of the breath samples of 30 patients who were damaged during storage and/or transport, the breath samples of 130 patients were analyzed for this study: 37 GC patients (early stages I and II: 17; late stages III and IV: 18, without staging information: 2), 32 patients with benign gastric ulcers and 61 patients with less severe gastric conditions (see Table 1). The less severe stomach conditions included cases with no endoscopic abnormalities (32) and with endoscopic abnormalities without ulceration (29) (see Table 1). The latter were classified by the treating physicians, according to the Sydney classification system of endoscopic division (Tytgat, 1991), as erythematous/exudative gastritis, flat erosive gastritis, raised erosive gastritis, hemorrhagic gastritis, enterogastric reflux gastritis or rugal hyperplastic gastritis. However, for this study we did not further subdivide the group of ‘less severe gastric conditions’, because the detection accuracy for premalignant lesions purely on endoscopic appearance at white-light endoscopy is highly controversial (Atkins and Benedict, 1956; Carpenter and Talley, 1995).

Ethical approval was obtained from the ethics committee of Anhui Medical University (Hefei, China), and the clinical trial was registered. The treatment decisions were based solely on the conventional diagnosis described above. Neither the patients nor their treating physicians were informed of the results of the breath tests.

Collection of the breath samples

Exhaled alveolar breath was collected in a controlled manner, as described in Peng et al (2009, 2010) and Hakim et al (2011). The volunteers were invited on specific collection days in groups of 10 to 20. None of the volunteers consumed food, tobacco or alcohol during an (overnight) 12-h interval before the breath collection. All volunteers were asked to rest for 1 h before the breath sampling and did not perform heavy physical exercise 24 h before giving the breath sample. All breath samples were collected in the same clinical environment and in duplicates (for the dual analysis, see section below) from each volunteer, and were stored in two-bed ORBOTM 420 Tenax TA sorption tubes for gas and vapor sampling (Sigma-Aldrich, St Louis, MO, USA). Unfiltered hospital air was sampled in the morning of each collection day. A detailed description of the breath collection, sample preparation and storage can be found in section S1.1 of the Supplementary Online Material (SOM).

Characterisation of the breath samples

The breath samples were characterised in a dual approach, using two totally independent, complementary characterisation methods: (i) chemical analysis of the breath samples with the aim to identify the VOCs that show statistically different concentrations in the compared subpopulations, using gas-chromatography/mass spectrometry (GC-MS). Compound identification and quantification were achieved through measurement of external standards, as recommended in Bajtarevic et al (2009), Ligor et al (2009), Sponring et al (2009) and Filipiak et al (2010). The breath sample analysis with GC-MS is described in detail in section S1.2 of the SOM. (ii) Characterisation of the breath samples with an array of 14 nanomaterial-based sensors, combined with a statistical pattern recognition algorithm (see section ‘Statistical analysis’), with the aim of identifying specific patterns (the so-called breath prints) for GC and non-malignant gastric conditions, and the subcategories described above. The sensors included layers of gold nanoparticles with 11 different organic ligands and layers of single-walled carbon nanotubes capped with four different organic overlayers (see SOM and Tisch and Haick (2010a, 2010b, 2010c)). The breath sample analysis with the nanomaterial-based sensor array is described in detail in section S1.3 of the SOM. A description of the nanomaterial-based sensor array is given in section S1.4 of the SOM.

A small number of samples (from 30 patients) were damaged or destroyed because of breakage during the transport and storage.

Study design

The primary aim of this cross-sectional comparative study was to distinguish GC patients from patients with benign gastric conditions who may present similar clinical symptoms. The secondary aim was to distinguish subpopulations in the malignant and non-malignant study groups. Conventional diagnosis served as reference standard.

This single-centre pilot study with a limited patient cohort of 160 (after application of the exclusion criteria, see section ‘Patients’) was designed as a feasibility test of a nanomaterial-based breath test for GC, with the aim of delivering a proof of concept that would justify a large-scale, multicentre trial with a more realistic ration of malignant to non-malignant gastric conditions.

The breath samples of 30 patients were damaged during storage and/or transport and could not be analyzed. Hence, the samples of 130 patients were analyzed for this study: 37 GC patients (early stages I and II: 17; late stages III and IV: 18; without staging information: 2), 32 patients with benign gastric ulcers and 61 patients with less severe gastric conditions (see Table 1).

Statistical analysis

GC-MS

The VOCs that showed significant differences (cutoff P-value: 0.05) between the study groups were determined from the GC-MS results by means of the non-parametric Wilcoxon/Kruskal–Wallis test for populations whose data cannot be assumed to be normally distributed (Wilkoxon, 1945), using JMP, version 9.0.0 (SAS Institute Inc., Cary, NC, USA; 1989–2005).

Sensor array

Each 14 sensor in the array responded to all (or to a certain subset) of the VOCs found in the exhaled breath samples. Specific patterns and predictive models for the studied gastric conditions were derived from the sensor array output, using discriminant factor analysis (DFA) (Ionescu et al, 2002). Discriminant factor analysis is a linear, supervised pattern recognition method that effectively reduces the multidimensional experimental data, in which the classes to be discriminated are defined before the analysis is performed. Discriminant factor analysis was also used as a heuristic to select the sensors with the most relevant organic functionality out of the repertoire of 14, by filtering out non-contributing sensors. The reason for selecting a certain set of sensing features for a particular problem was directly derived from their ability to discriminate between the various classification groups. The input variables for DFA were the four features extracted from each of the 14 sensors’ time-dependent resistance responses, that is, a total of 56 sensing features (see sections S1.3, S1.4 and Supplemantary Table S1 in the SOM). The four sensing features were related to the normalised resistance change at the beginning of the exposure, at the middle of the exposure and at the end of the exposure (with respect to the value of sensors resistance in vacuum before the exposure), and to the area beneath the time-dependent resistance response during the last third of the exposure period, as described in section S1.3 in the SOM. Discriminant factor analysis determines the linear combinations of the input variables such that the variance within each class is minimised and the variance between classes is maximised. The DFA output variables (i.e. canonical variables) are obtained in mutually orthogonal dimensions; the first canonical variable is the most powerful discriminating dimension. The classification success was estimated through leave-one-out cross-validation in terms of the number of true-positive, true-negative (TN), false-positive (FP) and false-negative (FN) predictions. Given n measurements, the model was computed using n−1 training vectors. The validation vector that was left out during the training phase was then projected onto the model, producing a classification result. All possibilities of leave-one-sample-out were considered, and the classification accuracy was estimated as the averaged performance over the n tests. Pattern recognition and data classification were conducted using MATLAB (The MathWorks, Natick, MA, USA). Following the leave-one-out cross-validation, 25% of the samples were randomly blinded for an additional validation test, the DFA model was calculated again with the remaining 75% samples and the blind set was classified as described above.

Results

Chemical analysis of the breath samples

The GC-MS analysis identified hundreds of different VOCs per individual breath sample, and 214 VOC were present in >85% of the breath samples. The GC-MS chromatograms of pristine Tenax material from unused ORBOTM 420 Tenax TA sorption tubes showed several prominent peaks corresponding to five VOCs that are probably contaminants of the Tenax sorbent material of the collection tubes. The VOCs were tentatively identified by spectral library match (Compounds library of the National Institute of Standards and Technology, Gaithersburg, MD USA, see section S1.2 in the SOM) as methylene chloride, acetaldehyde, L-cysteine sulphonic acid, malonic acid and naphthalene (Amal et al, 2012). These substances were disregarded in the subsequent comparative analysis. Propanol, ethanol and methyl-isobutyl-ketone (also tentatively identified by spectral library match) were found in high abundance in the room air samples at the location of the breath tests that were taken on each collection day. These are typical hospital contaminants (Amann et al, 2010). However, they were found in much lower (almost negligible) abundance in <85% of the breath samples, because of the effective lung washout prior to the breath collection that was routinely performed as an integral part of the one-step breath collection procedure (see section S1.1 in the SOM). Hence, 209 compounds were further analysed. Shapiro–Wilk tests showed that the null hypothesis for normal distribution of the GC-MS data was not fulfilled for these 209 VOCs. Therefore, non-parametric Wilcoxon/Kruskal–Wallis tests with a cutoff value of P=0.05 were used for the comparative analysis of the GC-MS data. We compared all possible pairs of the following groups in two data sets: GC; ulcer; less severe conditions; and non-malignant gastric conditions=ulcer+less severe conditions (see Table 1). Initially 35 VOCs were found to be of statistical significance for the separation of the groups. In total, 27 VOCs were excluded after comparison with the room air and Tenax TA control samples, because they appeared in similar abundance or showed strong day-to-day fluctuation. The remaining 11 VOCs were tentatively identified through spectral library match as tetra-chlorobutyl acetate, 2-propenenitrile, 1-methoxy-2-propanol, 2-butoxy-ethanol, furfural, 2-pentyl acetate, 6-methyl-5-hepten-2-one, isoprene, 4,5-dimethyl-nonane, 2-phenoxy-ethanol and 1-pentene. After measurement of calibration mixtures of high-purity external standards (Bajtarevic et al, 2009; Ligor et al, 2009; Sponring et al, 2009; Filipiak et al, 2010), we have excluded five VOCs: 1-methoxy-2-propanol, 2-phenoxy-ethanol and 1-pentene were excluded because of retention time mismatch between breath samples and calibration standards; tetra-chlorobutyl acetate and 2-pentyl acetate were excluded because the measured concentrations in the breath samples were below the corresponding limit of quantification (LoQ). Furthermore, 4,5-dimethyl-nonane was excluded, because we were not able to obtain a high-purity calibration standard, and, hence, could not perform identity confirmation and quantification.

The remaining five VOCs from the families of nitriles, alcohol ethers, aldehydes, ketones and alkenes showed statistically significant differences in the concentration levels of the compared groups (see Table 2). Three compounds (2-propenenitrile, furfural and 6-methyl-5-hepten-2-one) were on average elevated in GC, as compared with the less severe gastric conditions without ulceration (P<0.0001, see Table 2). Four VOCs (2-butoxy-ethanol, furfural, 6-methyl-5-hepten-2-one and isoprene) distinguished between patients suffering from non-malignant gastric ulcer and patients with less severe gastric conditions, showing significantly higher concentration levels in the former (see Table 2). The VOCs, which were significantly elevated in patients with GC and/or peptic ulcer, as compared with less severe gastric conditions, were found in the room air in significantly lower concentrations (P<0.05). However, it should be noted that these VOCs were found both in the room air and in the breath samples in the single p.p.b.v range, except in the case of isoprene (see Table 2). Indeed, the average concentrations of 2-propenenitrile, 2-butoxy-ethanol, furfural and 6-methyl-5-hepten-2-one in the breath samples of patients with less severe gastric conditions were not different from the room air concentrations.

Table 2 VOCs from exhaled breath samples identified in GC, gastric ulcer and less severe gastric conditions without ulceration, which show significant statistical differences between the study groups (P<0.05).

Identification and distinction of malignant and non-malignant gastric conditions using the nanomaterial-based sensors

The feasibility of the nanomaterial-based sensors to identify GC among patients with gastric complaints was demonstrated by building a DFA model based on all 130 characterised breath samples that discriminated well between the 37 GC patients and the 93 patients with non-malignant gastric conditions (see Table 1). Figure 1A shows the DFA plot obtained from the responses of seven sensors with different organic functionalities (see Supplementary Table S1). The malignant and non-malignant gastric conditions formed two well-defined clusters in two-dimensional DFA space with no overlap and with few misclassified samples. The clusters were completely separated along the first canonical variable (CV1). High classification success of the first DFA model was verified through leave-one-out cross-validation. Table 3 lists the excellent cross-validation results for accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). To further test the stability of this DFA model, we randomly blinded 32 of the 130 samples (25%), calculated the DFA model again with the remaining training set of 98 samples and projected the blinded test set onto the model. Subsequent disclosure of the sample identity (6 GC and 26 non-malignant conditions) yielded 5 TN (GC classified as GC), 25 TN (benign conditions classified as benign conditions), 1 FP (benign condition classified as GC) and 1 FN (GC classified as benign condition). The accuracy, sensitivity and specificity that were achieved in this additional blind validation test were 94%, 83% and 96%, respectively. Random blinding of different subsets of the data (totalling 25% of the samples irrespective of the sample identity) yielded similar results, demonstrating the stability of the proposed DFA model.

Figure 1
figure 1

Discriminant factor analysis separating between patients with: (A) GC and non-malignant gastric conditions; (B) early- and late-stage GC; (C) gastric ulcer and less severe gastric conditions; (D) gastric cancer, gastric ulcer and less severe gastric conditions. The less severe gastric conditions include the endoscopic abnormalities described in the Sidney classification for gastritis, as well as no obvious gastric mucosal lesions. Every point represents one patient.

Table 3 Statistical classification success, using DFA and leave-one-out cross-validation

Among the malignant gastric conditions, a second DFA model that was based on 35 of the 37 GC patients could completely separate the 17 early-stage GC cases from the 18 late-stage GC cases along CV1 (see Figure 1B). Two of the GC patients were excluded from this analysis, because no staging information was available for them (see Tables 1 and Supplemantary Table S1). The following classification success of the second DFA model was achieved: 91% accuracy, 89% sensitivity, 94% specificity, 94% PPV and 89% NPV (see Table 3). Random blinding of different subsets of the data (each blinded test set included 8 samples of the 35 staged GC samples, totalling 25% of the GC samples irrespective of the sample identity) yielded on average 90% accuracy, 88% sensitivity and 93% specificity, with little variation between the different data sets.

A third DFA model was build based on the samples of 98 patients having non-malignant gastric conditions for distinguishing between benign gastric ulcer and gastric conditions without ulceration (32 and 61 samples, respectively, see Table 1 and Supplementary Table S1). Figure 1C shows that clusters were formed for the two subpopulations along CV1, but the clusters had some overlap and were more spread out than for the first two DFA models. Nevertheless, leave-one-out cross-validation yielded reasonable values for accuracy (86%), sensitivity (84%), specificity (87%), PPV (77%) and NPV (91%) (see Table 3). Randomly blinding a subset of 23 samples yielded 83% accuracy, 83% sensitivity and 83% specificity. However, repeating the blind test with different randomly chosen blinded test sets of 23 samples showed some variability of the results (classification accuracies varied between 65 and 83%), indicating that the third DFA model is less stable than the first two models.

Figure 1D shows that a DFA model based on all 130 samples could distinguish between GC, gastric ulcer and less severe gastric conditions in one step with 77% classification accuracy. The separation between the three clusters requires the calculation of the first and the second canonical variable (CV1 and CV2). Randomly blinding different subsets containing a total of 32 samples yielded stable results for the classification accuracy with little variability (on average 75%).

In addition, we tested a DFA model based on the 61 samples from patients with less severe gastric conditions for distinguishing between patients with endoscopic abnormalities without ulceration and patients with no visible endoscopic abnormalities (see Supplementary Table S1). Table 3 shows that a high classification success could be achieved also for this case.

Finally, we have explored the possible effect of the most important confounding factors on the sensing results. In this study, we have paid special attention to the possible effects of tobacco and alcohol consumption, as well as the presence or absence of H. pylori infection. Tobacco consumption among the participants of this study varied between 19 and 44%, depending on the subpopulation, and alcohol consumption varied between 19 and 47% for the different subpopulations (see Table 1). The effect of tobacco consumption on the composition of the exhaled breath has been studied by mass spectrometry methods, and a variety of breath VOCs has been associated with tobacco consumption (see for example Amann et al (2010), Fuchs et al (2010), Kischkel et al (2010), and references therein). We have therefore carefully verified that none of the DFA models that were developed for this study was sensitive to either tobacco or alcohol consumption of the participants. For this purpose, we applied each DFA model separately to the two subpopulations for which it was developed, and defined consumers and non-consumers of alcohol or tobacco as the two classes to be separated. The DFA clusters showed complete overlap for all models, and the classification was correct in only 38–54% of cases (i.e. arbitrary). The percentage of H. pylori-infected participants showed a stronger variation (between 13 and 65% per subpopulation, see Table 1). We have verified that all the DFA models that were used in this study were insensitive also to H. pylori infection, with complete cluster overlap and arbitrary classification. However, we were able to develop a new DFA model for separating between infected and infection-free participants within the group of cancer patients. The study of a VOC-based breath test for H. pylori infection is currently underway and will be published elsewhere.

Discussion

Chemical composition of the breath samples

In the following section, we will attempt to explain the possible biochemical origin of some of these compounds. However, the origin of other breath VOCs cannot yet be easily understood.

2-Propenenitrile (acrylonitrile) can be found as environmental pollutant in cigarettes and in car exhaust, and was classified as a Class 2B carcinogen (i.e. possibly carcinogenic) by The International Agency for Research on Cancer (IARC) (1999). As such this compound could reach the blood after inhalation and could be accumulated in the body, yielding the observed higher relative amounts of the compound in the GC samples, and, thus, indicating a much increased GC risk in subjects who were exposed to the substance. Therefore, it is important to consider the effect of inhaling exogenous compounds on the blood, as any change in the composition of blood can affect the body’s metabolism and, hence, the breath VOC profile (Hakim et al, 2012). Exogenous compounds that increase the risk for certain types of cancer could be considered as exogenous cancer markers. Interestingly, the opposite trend has recently been reported for lung cancer patients: 2-propenenitrile was found at decreased levels in the breath of smokers with lung cancer, as compared with healthy smokers (Kischkel et al, 2010).

Of the four VOCs observed at significantly increased levels in the breath of ulcer patients (2-butoxy-ethanol, furfural, 6-methyl-5-hepten-2-one and isoprene), only isoprene could be explained in terms of endogenous physical pathways. Isoprene is formed along the mevalonic pathway as part of the cholesterol biosynthesis and is always present in high and varying concentration in human exhaled breath (Miekisch et al, 2004). Also, isoprene concentrations show a strong dependency on physical activity, CO (cardiac output) and minute ventilation, and may be re-distributed between peripheral and central compartments (King et al, 2010). In this study, all participants were asked to rest for 1 h before the breath sampling and did not perform heavy physical exercise 24 h before providing the breath sample, to minimise the effect of physical exercise on the blood and, hence, on the exhaled breath. It was recently shown that H. pylori uses the host cholesterol in defence against antibiotics (McGee et al, 2011), which would result in elevated cholesterol biosynthesis and could explain the observed higher levels of exhaled isoprene in gastric ulcer patients. In this context, it should be mentioned that decreased isoprene levels were also observed in the breath of lung cancer patients (Wehinger et al, 2007; Bajtarevic et al, 2009).

An additional comparison with the (hospital) room air levels at the collection site showed that 2-propenenitrile, 2-butoxy-ethanol, furfural and 6-methyl-5-hepten-2-one were present at similar levels in the room air as in the breath of the patients with less severe gastric conditions. It is therefore possible that the results were confounded through previous inhalation or uptake of the four VOCs from the hospital environment, storage in the body and subsequent gradual expiration. In this case, the concentration in the exhaled breath might be correlated with the period of previous exposure, rather than with the disease state. 2-Butoxyethanol is most likely exogenous, as it occurs in paints and in many cleaning products for industrial and home use, and could be taken up to the body through inhalation. Furfural occurs in many foods and flavourings, but was reported to be is toxic with a median lethal dose of 300–500 mg kg−1 in mice after oral intake (Hoydonckx et al, 2007). 6-Methyl-5-hepten-2-one is used as artificial flavouring, and could be taken up with food. The increased levels of these compounds in ulcer patients could indicate that exposure increases the risk for this disease, and, hence, they could be candidates as exogenous markers of ulcer. However, a larger study is necessary to verify these observations.

The breath prints of malignant and non-malignant gastric conditions that were derived from the nanomaterial-based sensors

The study design simulated a possible future breath test, based on a single breath sample, for the screening and differential diagnosis of gastric conditions that could be used to recommend upper digestive endoscopy, if indicated, or determine therapeutic intervention for less severe gastric conditions (see Figure 1, top panel). Breath samples would be taken from a wide population with gastric complaints and analyzed using the sensor array. The first part of the test would be to check for malignancy, using the DFA model that can distinguish between malignant and non-malignant gastric conditions. In the second part of the test, the malignant and non-malignant populations would be further distinguished: (i) the GC stage would be determined in the GC-positive subjects, using the DFA model that can distinguish between early- and late-stage GC; (ii) the GC-negative subjects would be tested for non-malignant ulcer, using the DFA model that can distinguish between benign gastric conditions with and without ulceration. In addition, we could distinguish in this study between patients with endoscopic abnormalities without ulceration and patients with no visible endoscopic abnormalities. This encouraging preliminary result could eventually lead to a breath test for gastritis, which would be of high clinical interest. However, in the absence of histology data for these two subpopulations, we cannot be certain how well the endoscopic observations correlated with clinically significant histological differences. An extended multicentre study that includes biopsies for all patients is underway and will be published elsewhere.

It is of special relevance that all DFA models were insensitive to the typical breath VOC patterns that are generated through tobacco/alcohol consumption and H. pylori infection, as these could be important confounding factors among patients with gastric complaints.

The results of the GC breath test correlated very well with the results of upper digestive endoscopy and biopsy (see Figure 1A and Table 3). Furthermore, the excellent classification success of early- and late-stage GC (see Figure 1B and Table 3) might be of high clinical interest for the subsequent targeted endoscopic examination of early-stage GC, supporting swift, lifesaving treatment decisions. Furthermore, the possibility to discriminate between benign ulcer and less significant lesions of the stomach may facilitate an appropriate selection of ulcer patients for endoscopy (see Figure 1C and Table 3).

The one-step distinction between different lesions (GC, ulcer disease, less significant lesions) is of principal relevance as it potentially allows simultaneous confirmation of one disease while excluding another. This may have important clinical consequences.

A breath test for distinguishing GC from less severe gastric conditions without ulceration (including gastritis) could be combined with conventional endoscopy with biopsy to increase the diagnostic yield for gastritis-like carcinomas, corresponding to type IIb (flat) GC in the Japanese classification. These subtle mucosal changes are visible only as slight surface irregularities and may be hard to distinguish from unspecific or inflammatory lesions by conventional endoscopy (Suzuki et al, 2006). As the breath test is fast and potentially inexpensive, and results could in principle be obtained in real time, the test could be repeated in case of a positive result, and, hence, the number of FP test results could be reduced through test repetition.

Although this small-scale pilot study does not allow drawing far-reaching conclusions, the encouraging preliminary results presented here have initiated a large multicentre clinical trial to confirm the observed breath prints. The large-scale study, which is currently underway, also includes very early-stage GC and precancerous conditions.

Note that the validation methods used in this pilot study ((i) leave-one-out cross-validation and (ii) randomly blinding 25% of the data after building the DFA models with the entire data) did not accommodate an independent sample set that would be necessary for blind validation. The limited sample size of this pilot study did not allow reducing the training set for building the DFA models. The recently initiated large-scale multicentre trial addresses this point and accommodates a test set comprising 25% of the collected samples, which are being blinded before the analysis and which will be disclosed only after the classification of the blind samples by the developed models. This blind validation is designed to validate the results that were presented here.

It should furthermore be noted that previous medication can strongly affect the chemical composition of the exhaled breath. We have therefore excluded from this pilot study patients who took medication affecting gastric acid secretion (e.g. proton pump inhibitors) and/or antibiotics during an interval of 1 month before the breath test. However, a future breath test for GC should be stable against the typical medication that might be consumed by patients with gastric complaints, to become interesting for clinical use. The possible confounding effects of previous medication are currently being investigated in our recently initiated multicentre clinical trial, with the aim of achieving stability against the most important medications.

The fundamental differences of the breath print characterisation with the nanomaterial-based sensor array and the chemical analysis of the breath samples by GC-MS

It is important to note that the breath print characterisation with the nanomaterial-based sensors is fundamentally different from the chemical analysis of the breath samples by GC-MS. The two methods should be considered as completely independent, complementary approaches. The patterns derived from the nanomaterial-based sensors usually show a better discriminative ability than the chemical analysis by GC-MS (Peng et al, 2010; Hakim et al, 2011). This can be understood in terms of the fundamental differences between the two methods. The sensors used in this study were broadly crossreactive, that is, all of the sensors are expected to respond to a wide variety of breath VOCs, with much overlap in the sensitivities to specific VOCs. While the responses to the same compound at a certain concentration are individually different between the constituent sensors, because of the chemical diversity of the organic sorbent phase, the signals to the constituent VOCs that are present in the breath sample are in good approximation additive (Konvalina and Haick, 2012). Hence, the overall signal of one sensor can be expected to stem from a total p.p.m.v. amount of VOCs. Among the VOCs that contribute to the sensors’ signals could very well be compounds that cannot be detected or quantified by GC-MS, because their individual concentrations lie below the LoD or LoQ of our GC-MS equipment. It is reasonable to assume that the sensors’ responses are less affected by noise than the detected p.p.b.v. concentrations of the separate compounds in the GC-MS analysis. On the other hand, the nanomaterial-based sensors are typically more sensitive to certain classes of VOCs, and less sensitive to other classes, because of the nature of the organic materials of the chemiresistive layers that adsorb the VOCs from the breath samples (see section S1.4 in the SOM). Hence, the signal from the nanomaterial-based sensors in the study might not stem from the same VOCs that were detected by GC-MS. For example, none of the DFA models in this study was sensitive to the smoking habits of the study population, even though the GC-MS analysis identified 2-propenenitrile, a known smoking marker, as distinguishing VOC between GC patients and subjects having less severe gastric conditions.

Possible future relevance for clinical practice

Upper digestive endoscopy with biopsy is currently the standard for diagnosing GC and distinguishing it from benign gastric conditions that may present similar clinical symptoms. A major drawback is the limited patient compliance with this highly accurate but invasive and costly procedure (Chen et al, 2009). The survival from GC is poor; in Europe, the 5-year survival is below 25%, and the situation is even worse in the United States (Verdecchia et al, 2007). Also, some early GCs provide very little optical contrast with the surrounding mucosa, so that they could be missed during a routine endoscopic examination. Tumour markers could in principle be used to complement (unambiguous) endoscopic findings, but the yield of the traditional tumour markers (CEA, CA 19-9, CA 242 and CA 72-4) for the detection of GC is low (Carpelan-Holmstrom et al, 2002).

A future nanomaterial-based breath test for the simultaneous detection of malignant and benign gastric conditions in patients with unspecific gastric complaints would be suited to precede and complement upper digestive endoscopy with biopsy. Breath testing is fast, simple and non-invasive. Hence, the test would be highly acceptable by patients and would therefore be highly suited for identifying at-risk individuals that should undergo further endoscopic investigations, while avoiding unnecessary invasive procedures. In this setting, breath testing could indicate malignancy before the endoscopic examination, thus allowing a well-directed, systematic search for malignant lesions, including hidden and small lesions that could otherwise be missed during endoscopy/biopsy. The results of the breath test could potentially provide valuable complementary information for distinguishing malignant and benign ulceration with identical morphology.

Conclusion

We have presented initial data demonstrating that VOC-based breath prints detected by nanomaterial-based sensors could be used for identification of GC and distinction from benign stomach ulcers and less severe stomach conditions, irrespective of important confounding factors such as tobacco/alcohol consumption and H. pylori infection. Chemical analysis of the breath samples showed that five VOCs (2-propenenitrile, 2-butoxy-ethanol, furfural and 6-methyl-5-hepten-2-one and isoprene) were significantly elevated in patients with GC and/or peptic ulcer, as compared with less severe gastric conditions. The concentrations both in the ambient (hospital) air and in the breath samples were in the single p.p.b.v range, except in the case of isoprene. Therefore, it cannot be excluded that 2-propenenitrile, 2-butoxy-ethanol, furfural and 6-methyl-5-hepten-2-one stem from acute or chronic accumulation in the body because of exposure to the hospital atmosphere. It should be noted that the applied methods were complementary and the potential marker compounds identified by GC-MS were not necessarily responsible for the differences in the sensor responses. A GC breath test could be developed in the future that could be used to precede and complement conventional upper digestive endoscopy with biopsy as low-price high-scale screening tool for identifying individuals who should be referred for the endoscopic examination. However, this small-scale pilot study does not allow drawing far-reaching conclusions. The encouraging preliminary results presented here have initiated a multicentre clinical trial with considerably increased sample size to confirm the observed breath prints. This study is currently underway and will be published elsewhere.