Introduction

Breast cancer is the most prevalent malignancy among women around the world and a major cause of female deaths. Each year, over 1.3 million women are diagnosed with breast cancer and approximately 0.5 million women die of this disease1. In the United States, nearly 3 million women have a history of invasive breast cancer and 226,870 new cases of breast cancer were reported in 20122. Breast cancer is a progressive disease in which larger tumor size and the presence of lymph node metastasis are associated with worse prognoses3. An early breast cancer diagnosis can effectively reduce patient mortality rates3,4,5. Currently, the most common breast cancer screening method is mammography4,5. Many clinical trials have confirmed that mammography screening can significantly decrease mortality rates among breast cancer patients3,6,7. However, mammography remains an imperfect test; in particular, one limitation of mammography screening is that it does not detect all breast tumors. In a randomized, controlled mammographic screening trial, mammographic sensitivity to detect breast cancer ranged from 71% to 96%5. Patients with dense breast tissue had even lower mammographic sensitivities, from 48% to 70%5,8.

Ultrasonography is more sensitive than mammography at detecting lesions in women with dense breasts9,10. However, ultrasonography does not detect most microcalcifications, which are the typical findings in ductal carcinoma in situ. In fact, 75% of cancers missed by ultrasonography were ductal carcinoma in situ and 25% were invasive carcinomas11. In addition, the results of ultrasonography can vary widely, depending on the expertise of the technician12,13.

Due to recent advances in analytical chemistry, the association between cancer and volatile organic metabolites in exhaled breaths has attracted increasing attention from researchers14,15,16. Breath analysis is suitable for disease screening because it is non-invasive, rapid and readily accepted by patients. Preliminary studies have confirmed that analyses of exhaled volatile organic metabolites can differentiate between breast cancer patients and healthy controls17,18,19. Here, we report a systematic study of gas metabolite profiles in human exhaled breath using pattern recognition methods. Volatile organic compounds from exhaled air in healthy individuals, breast cancer patients and cyclomastopathy and mammary fibroma patients were used as profile defining variables. Potential biomarkers of breast cancer, cyclomastopathy and mammary fibroma were analyzed.

Results

Breast Cancer Patients versus Controls

A total of 434 metabolites were consistently detected in 50% of the samples from breast cancer patients and normal controls. While the two-dimensional PCA score plot displayed a good separation trend (Fig. 1A), the OPLSDA score plot demonstrated separation between breast cancer patients and normal controls using one predictive component and three orthogonal components (R2X = 0.51; R2Y = 0.876; Q2 = 0.762; Fig. 1B). Moreover, the R2 and Q2 values calculated from the permutated data were less than the original values in the validation plot, which confirmed the validity of the supervised model (Fig. 1C).

Figure 1
figure 1

(A): PCA score plot. (B): OPLSDA score plot (one predictive component and three orthogonal components, R2X = 0.51; R2Y = 0.876; Q2 = 0.762). (C): PLSDA validation plot Intercepts: R2 = (0.0, 0.306); Q2 = (0.0, −0.512).

Breast Cancer versus Cyclomastopathy

A total of 406 metabolites were consistently detected in 50% of the breast cancer samples and the patients with mammary gland hyperplasia. While the two-dimensional PCA score plot demonstrated a good separation trend (Fig. 2A), the OPLSDA score plot demonstrated separation between breast cancer patients and patients with mammary gland hyperplasia when using one predictive component and three orthogonal components (R2X = 0.501; R2Y = 0.777; Q2 = 0.565; Fig. 2B). Moreover, all of the R2 and Q2 values calculated from the permutated data were less than the original value in the validation plot, which confirmed the validity of the supervised model (Fig. 2C).

Figure 2
figure 2

(A): PCA score plot. (B): OPLSDA score plot (one predictive component and three orthogonal components, R2X = 0.501; R2Y = 0.777; Q2 = 0.565). (C): PLSDA validation plot intercepts: R2 = (0.0, 0.226); Q2 = (0.0, −0.239).

Breast Cancer versus Mammary Gland Fibroma

A total of 408 metabolites were consistently detected in 50% of the breast cancer and mammary gland fibroma samples. While the two-dimensional PCA score plot demonstrated a trend for good separation (Fig. 3A), the OPLSDA score plot demonstrated separation between breast cancer and mammary gland fibroma using one predictive component and one orthogonal component (R2X = 0.225; R2Y = 0.686; Q2 = 0.524; Fig. 3B). Moreover, all of the R2 and Q2 values calculated from the permutated data were less than the original value in the validation plot, which confirmed the validity of the supervised model (Fig. 3C).

Figure 3
figure 3

(A): PCA score plot. (B): OPLSDA score plot (one predictive component and one orthogonal component; R2X = 0.225; R2Y = 0.686; Q2 = 0.524). (C): PLSDA validation plot intercepts: R2 = (0.0, 0.26; Q2 = (0.0, −0.222).

Potential Biomarkers

Among the significant metabolites identified using the VIP values in the OPLSDA model and the FDR values, 29 differential metabolites were annotated using the NIST 11 database with a similarity threshold of 75% (Table 1).

Table 1 Potential biomarkers

2-acetyl aminopropionic acid, methylacrylic acid, butyl acetate, benzocyclobutene, 4-hydroxybutanoic acid, 1,3,5,7-tetroxane, ethylene carbonate, 2,2-dimethyl decane, 2,3,4-trimethylheptane, 5-methyl-3-hexanol, 5-butylnonane tetradecane, hexadecane, 2,3,6-trimethyloctane, benzenemethanol, alpha,alpha-dimethyl and 2,5-dimethylhexane-2,5-dihydroperoxide exhibited significantly lower levels in the group of individuals with breast cancer than in the group of healthy individuals (P < 0.05) and breast cancer patients had significantly higher levels of 2,5,6-trimethyloctane, 1,4-dimethoxy-2,3-butanediol, cyclohexanone, dimethylacetamide and trans-2-butene oxide in their exhaled breath (P < 0.05).

The exhaled air from the breast cancer group compared with the cyclomastopathy group revealed significant differences in six potential biomarkers. The breast cancer group exhibited significantly lower levels of cyclooctanemethanol and trans-2-dodecen-1-ol (P < 0.05), but significantly higher levels of 2,5,6-trimethyloctane, 1,4-dimethoxy-2,3-butanediol, butyl glycol and cyclohexanone were found in the breast cancer group (P < 0.05).

The comparison of exhaled air from the breast cancer group vs. the mammary gland fibroma group revealed significant differences in eight potential biomarkers. Cyclopentanone exhibited significantly lower levels in the breast cancer group (P < 0.05), which had significantly higher levels of 1,2-propanediol, cyclohexanone, butyl glycol,1,4-dimethoxy-2,3-butanediol, 2,5,6-trimethyloctane, 3,4,5,6-tetramethyloctane and ethylaniline (P < 0.05).

2,5,6-trimethyloctane, 1,4-dimethoxy-2,3-butanediol and cyclohexanone were increased significantly in breast cancer patients relative to healthy individuals, mammary gland fibroma patients, or patients with cyclomastopathy (P < 0.05). Additional information is provided in Table 1.

Discussion

Diseases of the breast are some of the most common types of diseases among women. In particular, breast fibroadenoma, mammary gland hyperplasia and breast cancer are three major diseases of the breast that are challenging for clinicians to diagnose. Existing screening and diagnostic techniques remain unsatisfactory. For instance, the detection of serum markers not only exhibits poor specificity but also requires wounding patients, which could facilitate the transmission of blood-borne infectious diseases. MRI examinations are expensive and require both state-of-the-art equipment and well-trained physicians with sophisticated technological knowledge and skills.

In recent years, several studies have confirmed that certain specific volatile metabolites are present in abnormally high concentrations in the exhalations of breast cancer patients and the origin of these compounds has been analyzed15,20,21. Peng reported that exhaled breaths from breast cancer patients and healthy controls exhibited significantly different levels of five volatile compounds: 3,3-dimethyl pentane; 2-amino-5-isopropyl-8-methyl-1-azulenecarbonitrile; 5-(2-methylpropyl)nonane; 2,3,4-trimethyl decane; and 6-ethyl-3-octyl ester 2-trifluoromethyl benzoic acid15. Phillips et al. suggested that alkanes and methylated alkane derivatives could be utilized as specific volatile markers for breast cancer; in particular, the researchers proposed that these compounds were produced because the mitochondrial release of reactive oxygen species (ROS) created oxidative stress that resulted in lipid peroxidation of polyunsaturated fatty acids in the cell membrane20. Hietanen et al. observed significantly higher than normal pentane concentrations in exhaled breath samples from breast cancer patients and they conjectured that this difference originated from the peroxidation of fatty acids in the cell membrane21.

Previous studies that have addressed the use of exhaled breath analysis in the context of breast disease have had a few shortcomings15,20,21. First, these studies have been limited to comparisons between cancer patients and healthy individuals; thus, the results of these investigations can be used to discriminate between these two groups, but are not particularly helpful for the screening and differential diagnosis of breast diseases. Second, various complex pathological types of breast cancer are observed in patients and these studies do not distinguish among these different pathological types; however, it is known that different pathological types of cancer cells can generate unique volatile metabolites22,23.

By comparing these three sets of experimental results, we revealed that three potential biomarkers, 2,5,6-trimethyloctane, 1,4-dimethoxy-2,3-butanediol and cyclohexanone, were dramatically more concentrated in the exhaled air from breast cancer patients relative to the exhaled air from healthy individuals, mammary gland fibroma patients, or patients with cyclomastopathy. We therefore concluded that these three chemicals were relatively specific for breast cancer. Although butyl glycol levels were similar in the exhaled air from breast cancer patients and healthy individuals, significantly higher butyl glycol levels were detected in the exhaled air of breast cancer patients than in the exhaled air of breast fibroma patients or patients with cyclomastopathy. We speculate that this discrepancy reflects external contamination because we would otherwise have expected to observe a difference between the breast cancer patients and healthy individuals. Butyl glycol is commonly used in paint and ink solvents, metal cleaning agents and dye dispersants. Each of the remaining examined metabolites exhibited specificity for a particular participant group, without crossover between these groups. Thus, the data from each group could potentially be used to construct specific metabolite models for breast cancer, mammary gland fibroma and cyclomastopathy that could be used for the clinical screening of these breast diseases.

Most of the volatile markers identified in this study are alkanes, ketones, aldehydes, alcohols, or olefins. The mechanisms through which these metabolites are generated continue to be explored and no unified consensus has been reached. However, the majority of relevant experimental results support the idea that these compounds result from oxidative stress24. Tumor tissues are characterized by vigorous growth and high energy demands. The malignant growth processes of cancer cells can lead to gene mutations and protein expression abnormalities. As a result, polyunsaturated fatty acids in cell membranes are subject to excessive oxidation and an individual may produce excessive ROS20. In addition, reductions in other chemicals may result from the consumption of these substances by tumor cells25. The specific biological mechanisms that generate volatile metabolites remain under investigation. Phillips has hypothesized that the volatile biomarkers of breast cancer relate to changes in estrogen metabolism mechanisms17. This notion is supported by the fact that estrogen stimulation can stimulate the proliferation of both normal and neoplastic mammary epithelial cells26. Investigations have also revealed that abnormally high aromatase expression occurs in breast cancer tissues27,28. Aromatase (estrogen synthetase) is a component of the cytochrome P450 (CYP) enzyme complex that converts C19 androgen to C18 estrogen, thereby increasing estrogen generation29. Other P450 enzymes are also activated in breast cancer tissues, such as CYP1A1, CYP1B1 and CYP3A430. P450 can induce a wide variety of biological response, including promoting the biotransformation of alkanes, alkenes and aromatic compounds31. The metabolism of normal body cells can generate certain volatile metabolites, such as alkanes, that are produced as a result of oxidative stress32. Given the multitude of P450 functions, the elevated activity of this enzyme in breast cancer tissues may markedly alter the components of exhaled air. Phillips has suggested that breast diseases are associated with increases in oxidative stress and elevated cytochrome P450 activity17.

Estrogen is a common precipitating factor of breast diseases and changes in estrogen metabolism mechanisms play a crucial role in the carcinogenic processes of breast cancers. Spink reported that cytochrome P4501B1 catalyzes the conversion of estrogen to 4-hydroxyestradiol (4-OH-E2), which is a main driver of carcinogenic mechanisms in breast cancer tissues33. Liehr et al. demonstrated that the ratio of 4-OH-E2 to 2-hydroxyestradiol (2-OH-E2) is significantly higher in breast cancer tissue than in normal breast tissue34. Estrogen metabolites can promote carcinogenesis by damaging cellular macromolecules and promoting the proliferation of injured cells through receptor-mediated processes34. Changes in estrogen metabolism can generate certain distinct volatile substances, which may be associated with the chemicals that we observed at elevated levels in the exhaled air from breast cancer patients. However, the detailed mechanisms of these associations must be elucidated through additional clinical research. Furthermore, our observations that the concentrations of certain volatile compounds were at lower than normal levels in the exhaled air from breast cancer patients may relate to cellular consumption and increases in P450 activity. There are certain limitations associated with this study. First, the impact of subject age was not addressed. The individuals in the breast cancer, mammary gland fibroma and cyclomastopathy groups were approximately 30, 40 and 50 years of age, respectively. Existing studies35 have demonstrated that increased age is associated with elevated oxidative stress levels and the resulting oxidation products may damage proteins, DNA, lipids and other biological macromolecules, thereby generating certain volatile metabolites that may affect the results of this experiment. Second, although the breast cancer patients suffered from the same pathological type of breast cancer, there was no attempt to differentiate among breast cancers of different grades in this experiment. The presence of tumors of different clinical grades may affect the experimental results. In future studies, we will expand our sample size and differentiate among pathological types and stages of tumors in greater detail, allowing for more accurate volatile biomarkers to be identified.

Breast cancer, cyclomastopathy and breast fibroma exhibit specific metabolic profiles with respect to volatile metabolites. Three volatile organic metabolites(2,5,6-trimethyloctane, 1,4-dimethoxy-2,3-butanediol and cyclohexanone) associated with breast cancer may serve as novel diagnostic biomarkers.

Methods

Human Subjects

The present experiments were conducted in accordance with the Declaration of Helsinki. The protocol in this study was approved by the Ethics Committee at Harbin Medical University (No.201314) and written informed consent was obtained from patients prior to study enrollment. This study was conducted between May 2011 and April 2012 at the Department of Anesthesiology at the First Affiliated Hospital of Harbin Medical University.

Included in this study were women between 25 and 80 years of age identified as ASA I and II individuals and scheduled for breast surgery. The following exclusion criteria were used: 1) currently breastfeeding, pregnant, or the possibility of becoming pregnant; 2) a diagnosis of a known congenital disease; 3) radiotherapy or chemotherapy treatment prior to testing or another diagnosed malignant cancer at the time of testing; 4) co-existent chronic obstructive lung disease, asthma, or pulmonary tuberculosis and other pulmonary diseases; 5) the presence of a chronic inflammatory disease; and 6) the manifestation of any acute disease symptoms during the preceding two weeks. Moreover, to ensure uniformity of the experimental results by minimizing the impact of diet and environment on the composition of the subjects' exhaled breaths, study participants were asked to strictly fast for 8 hours (h) prior to breath sample collection. In addition to the group of breast cancer patients, this study also examined healthy female volunteers. The inclusion criterion for these individuals was the absence of a history of malignancies or infectious diseases. This study involved a total of 85 patients with histologically confirmed cases of breast disease (including 39 individuals with infiltrating ductal cancer, 21 individuals with mammary gland fibroma and 25 individuals with cyclomastopathy) and 45 healthy volunteers (who were negative for breast cancer by mammography and ultrasound examination). The demographic characteristics are summarized in Table 2.

Table 2 Demographic characteristics of study subjects

Breath Collection

Breath sampling and the parallel collection of ambient air were performed within 24 h after overnight fasting. Alveolar breath sampling was performed as previously described36,37. Briefly, 20 ml of exhaled gas were drawn into a gas-tight syringe (50 ml; Agilent Inc., USA). These samples were transferred immediately into evacuated 20 ml glass vials (Supelco Inc., USA). All vials were flushed thoroughly and cleaned with nitrogen gas (purity of 99.999%, Liming gas Inc., China) before being evacuated for breath sampling to remove any residual contaminants38. All samples were analyzed within 3 h post-sampling.

Solid-Phase Microextraction (SPME)

A manual SPME holder with carboxen/polydimethylsiloxane (CAR/PDMS) fibers of 75 µm thickness was purchased from Supelco (Bellefonte, USA). The SPME fiber was inserted into the vial and exposed to the gaseous sample for 20 min at 40°C. Subsequently, the desorption of volatiles occurred in the hot GC injector at 200°C for 2 min.

Gas Chromatography-Mass Spectrometry (GC/MS) Analysis

Analysis was performed on a GC/MS (Shimadzu GC-MS QP 2010, Shimadzu, Japan) equipped with a DB-5MS (length 30 m × ID 0.250 × film thickness 0.25 µm; Agilent Technologies, USA) plot column. Injections were performed in the splitless mode and the splitless time was 1 min. The temperature of injector was 200°C. The flow rate of the helium (99.999%) carrier gas was kept constant at 2 ml min−1. The column temperature was held at 40°C for 2 min to concentrate the hydrocarbons at the head of the column and then increased by 70°C min−1 to 200°C for 1 min, then ramped 20°C min−1 to 230°C for 3 min. The MS analyses were performed in full-scan mode, using a scan range from 35–200 amu. The ion source was maintained at 200°C and an ionization energy of 70 eV was used for each measurement.

Extraction and Pretreatment of the GC/MS Raw Data

Raw GC/MS data were converted into CDF format (NetCDF) files using Shimadzu GCMS Postrun Analysis software and subsequently processed using the XCMS toolbox (http://metlin.scripps.edu/download/). The XCMS parameters consisted of the default settings with the following exceptions: xcmsSet (fwhm = 8, snthresh = 6, max = 200); retcor (method = “linear,” family = “gaussian,” plottype = “mdevden”); and a bandwidth of eight for first grouping command and four for the second grouping command39,40. The data set of the aligned mass ions was exported from XCMS and could be further processed using Microsoft Excel to normalize the data prior to multivariate analyses.

Statistical Analyses

The normalized data were exported to SIMCA-p 11.5 for principal component analysis (PCA), partial least-squares discriminant analysis (PLSDA) and orthogonal partial least-squares discriminant analysis (OPLSDA). To prevent overfitting, the default seven-round cross-validation in the SIMCA-p software was applied and permutation tests using 100 iterations was performed to further validate the supervised model. Additionally, the nonparametric Kruskal-Wallis rank sum test was performed for each metabolite and the corresponding false discovery rate (FDR) based on p-values was calculated to correct for multiple comparisons. The potential metabolic biomarkers were selected based on variable importance in the projection (VIP) values calculated from the OPLSDA model and FDR values of 1.5 and 0.05.

Abbreviations

VOCs, volatile organic compounds; CAR/PDMS, carboxen/polydimethylsiloxane; SPME, solid-phase microextraction; GC/MS, gas chromatography-mass spectrometry; PCA, principal component analysis; PLSDA, partial least-squares discriminant analysis; FDR, false discovery rate; VIP, variable importance in the projection; CYP, component of the cytochrome P450; 4-OH-E2,4-hydroxyestradiol; 2-OH-E2, 2-hydroxyestradiol.