Introduction

Breast cancer is the most frequent cause of death in women worldwide. In 2020, over two million new cases of breast cancer were diagnosed, and 684,996 persons died from the disease1. The early detection of breast cancer is an important step toward achieving efficient treatment. Mammography (MG), the most commonly used screening test at present, can detect breast cancers during the asymptomatic phase and reduce mortality among women of certain ages2,3,4. Yet MG screening has several drawbacks. First, MG detects benign lesions, which can lead to unnecessary testing, treatment, and anxiety5. Second, MG is less sensitive in dense breast6. Third, MG is associated with significant pain caused by the relatively strong pressure applied to the breast. An alternative to MG that can screen for breast cancer safely, painlessly, and noninvasively is therefore urgently awaited.

Volatile organic compounds (VOCs) in biological samples such as breath and blood have been investigated in connection with cancer detection for more than two decades. The potential of VOCs as non-invasive biomarkers has been supported by reports on the capabilities of sniffer dogs and sensory devices in distinguishing between healthy controls and patients with cancers of the lung7,8,9,10, colon or rectum11, stomach12,13, liver14, head and neck15,16,17, ovaries18, and breast19,20,21,22,23,24,25,26. The combination of multiple biomarkers has strengthened the discriminatory power of this approach, raising accuracy to rates of 0.9 or higher in multiple studies.

While the previous studies have yielded promising results, critical steps still need to be taken to standardize the sample collection and storage and handling of the data, and to validate the results in independent samples. While the advantages of urine as an alternative matrix for volatile biomarkers have been outlined in lung cancer27, the data are scanty on cancers of other organs, including the breast. In this study we sought to identify and analyze VOCs that appear specifically in the urine of breast cancer patients. We assessed the potential of VOCs to become biomarkers of breast cancer by constructing a prediction model using a training set and an external validation set with triplicated reproducibility tests.

Methods

Subjects

The subjects were divided into two groups: patients with primary breast cancer and healthy volunteers (controls). The primary breast cancer patients were who were diagnosed by either fine-needle aspiration cytology or core-needle biopsy at Nippon Medical School Chiba-Hokusoh Hospital from November 2015 to October 2019 were enrolled. Clinical stages of the breast cancer patients were classified according to the Union for International Union Cancer Control (UICC) classification. The histological subtypes were based on the 15th St. Gallen International Breast Cancer Conference 201728. Control subjects with no histories of previously diagnosed cancer of any type were recruited from the public in systemic cancer screenings at Nippon Medical School Chiba-Hokusoh Hospital and Koyukai Asakusa Clinic over the same period. All procedures performed in this study involving human participants were performed in accordance with the Declaration of Helsinki (as reserved in 2013). The study was approved by the ethics committees of Nippon Medical School Chiba Hokusoh Hospital (IRB#320). All subjects provided their signed informed consent before enrolment.

Urine samples

Urine samples were collected with paper cups (Harn cup laminate A, Nissho Sangyo, Tokyo, Japan), transferred to sterile test tubes (Sterile SP tube TD4000, Eiken Chemical Co., Tokyo, Japan), sealed with caps, and stored at − 30 ℃ until analysis in 3 ml volumes. The breast cancer patients provided the samples a few days before surgery; the controls provided them during the cancer screening tests. All of the samples were transferred to the analysis institution, RIKEN KEIKI Co., Ltd., by a refrigerated courier service.

After the urine samples were thawed in a refrigerator, more than 3 mL of each sample was filtered and sterilized (Hawatch Scientific, PES Syringe Filter: Pore Size = 0.22 μm, Diameter = 25 mm, Material = PES Gama Sterile). Next, the sterilized samples were pipetted in 3 ml volumes into vials for an HS-20, and NaCl (> 99.5%, FUJIFILM Wako Pure Chemical Corporation) was added. These vials were sealed with the aluminum-cap (Silicone/PTFE, Shimazu GLC). These protocols were performed at a maximum of 3 samples at once to avoid degradation.

Gas chromatography–mass spectrometry (GCMS)

GCMS analysis was performed with GCMS-QP2010 Ultra Gas Chromatograph Mass Spectrometer (Shimadzu Co., Kyoto, Japan) equipped with HS-20 Trap with a capillary column (Inert Cap Pure WAX, 32 m length; 0.25 mm internal diameter; 0.25 μm film thickness). A helium (99.999%) carrier gas set at a flow rate of 1.76 mL/min was used for the GCMS analysis. The GC column temperature was maintained at 30 ℃ for 5 min, raised from 30 to 250 ℃ at a rate of 10 ℃ per min, and maintained at 260 ℃ for 6 min. The mass spectrometry was performed in a scanning mode (m/z = 33.00–300.00).

Sample and data analysis

Urine samples from breast cancer patients, healthy controls, and blank controls were analyzed in the GCMS on the same day using GCMS solution and LabSolutions Insight software Version 2.0 (Shimadzu, Co., Kyoto, Japan, https://www.shimadzu.com/an/products/liquid-chromatograph-mass-spectrometry/lc-ms-software/labsolutions-insight/index.html). The peak-data were obtained from the total ion current chromatogram (TIC) of each sample. Each set of peak-data was annotated according to the NIST/EPA/NIH Mass Spectral Library (NIST11) and WILEY REGISTRY® of Mass Spectral Data 9th Edition (Wiley9), and the area was quantitated. Air contamination and error were adjusted using a stable standard and a blank and samples. The urine of the experiment staff was used as a standard sample. Urine samples were collected from the same staff over several days and stored frozen. After a certain amount of urine has been collected, thaw the frozen urine sample in a refrigerator and mix all it to make it homogenized. After that, the sample was divided into several small-volume storage containers (Eiken Chemical FT2100 sterile screw round-bottom spits) and frozen again. These samples were used as the standard urine sample. A blank sample referred to a tube that contained room air on each day of the sample analysis. These data were further adjusted by urine creatinine levels.

The cancer urine samples were divided into two groups: a training set for building a prediction model, and an external validation set. Compounds detected in fewer than 80% of the samples from the training set were excluded from each group, and the remaining compounds were assessed by the following statistical analysis. Several of the variables were selected by stepwise analysis. Next, the compounds showing breast cancer patient (BCP) < healthy control (HC) and BCP > HC were selected. The prediction model was then constructed by discriminant analysis. The discriminant factor was analyzed by an Receiver Operating Characteristic (ROC) analysis. For reproducibility, 10 randomly chosen BCP samples and 10 randomly chosen HC samples were analyzed three times on different days. The external sample set was also tested, for validation. Discrimination analysis by multiple regression analysis was performed by Microsoft Excel (Microsoft 365). All the other statistical analyses were performed using the R statistical package (www.r-project.org).

Ethics approval and consent to participate

This study was approved by the ethics committees of Nippon Medical School Chiba Hokusoh Hospital (IRB#320).

Consent for publication

A signed informed consent was obtained from each participant. The informed consent is available upon request.

Results

Subjects enrolled in the study

Two hundred and eighty-seven subjects were enrolled in this study, including 110 BCPs and 177 HCs. The BCPs and HCs were randomly allocated to a training set and an external set using software. The training set included 60 BCPs (age 34–88 years, mean 60.3) and 60 HCs (age 34–81 years, mean 58.7). The external validation set included 50 BCPs (age 35–85 years, mean 58.8) and 117 HCs (age 18–84 years, mean 51.2) (Table 1). The clinical stages and histological subtypes are summarized in Table 1.

Table 1 Subjects enrolled in the study.

GCMS analysis

The GCMS analysis of the urine samples showed numerous peaks. A representative total ion chromatogram (TIC) is presented in Fig. 1. Compounds detected in fewer than 80% of the subjects were excluded in each group, and 191 compounds were assessed by an ensuing statistical analysis. The compounds identified are listed in Supplemental Table S1.

Figure 1
figure 1

A representative GCMS total ion chromatograms (TIC) of urine volatile organic compounds. TIC of volatile organic compounds from urine samples collected from a breast cancer patient (A) and healthy control (B). Both samples showed various peaks. TIC total ion chromatogram.

Building and validating the prediction model

Using the detected peak data, 10 compounds were selected by a stepwise backward elimination method. Next, the prediction model was built based on the P-value, standardized partial regression coefficient, tolerance, and variance inflation factor (VIF). First, discriminant models were created using the area values of each compound and the area values of each compound corrected for creatinine concentration. To create the discriminant model, we used 60 samples from BCP (stage I = 30 samples, stage II = 30 samples) and 60 samples from HC. By stepwise variable selection method, the detected compounds were narrowed down to constructing the discriminant model. After narrowing down the original compound list to a few compounds, box charts of each compound were used to compare the areas between BCP and HC. The compounds which were "BCP < HC” and "BCP > HC" were selected. A linear discriminant analysis was performed using the selected compounds. The prediction model built through this procedure used two VOCs in combination, 2-propanol and 2-butanone. As the box plots generated by this model show, 2-butanone was higher in BCP than in HC, while 2-propanol was higher in HC than in BCP (Fig. 2). The obtained discriminant equation was used as the discriminant model equation. The discriminant coefficients in the discriminant model equation were obtained by ROC analysis. The scattered plot and the area under the curve (AUC) are shown in Fig. 3. Using this AUC, 0.9442, and the cutoff value were decided by Youden index. The sensitivity was 93.3%, specificity was 83.3%, positive predictive value was 84.8%, negative predictive value was 92.6%, and accuracy was 88.3% for this model (Table 2). The performance of the constructed model for the histological subtypes were also evaluated (Table 3).

Figure 2
figure 2

Box charts of the peak areas of the 2-butanone and 2-propanol using the model. Box plots of the peak areas of 2-propanol and 2-butanone generated by the model. The model indicated that 2-butanone was higher in breast cancer patients than in healthy controls (A), and that 2-propanol was higher in healthy controls than in breast cancer patients (B). BCP breast cancer patient, HC healthy control.

Figure 3
figure 3

Scatter plots and the area under the curve (AUC) of the samples in the training set. (A) The scatter plots show the areas of 2-butanone and 2-propanol in each sample in the training set. The X-axis and Y-axis represent 2-butanone and 2-propanol, respectively. (B) The AUC, sensitivity, and specificity for this model were 0.9442, 93.3%, and 83.3%, respectively.

Table 2 The performance of the constructed model with the training set.
Table 3 The results for each histological subtype when applying the model to the training set.

Additional reproducibility tests were performed by randomly choosing ten models from each group in the training set. The reproducibility tests were triplicated on different days, and the results showed a matching rate of 100% for the BCP group and 90% for the HC group (Table 4).

Table 4 The results of the reproducibility tests.

Validation test using an external validation set

Next, an external validation set was used to confirm the validity of the training models. The box plots of the peak areas of 2-butanone and 2-propanol generated by the constructed model are shown in Fig. 4. The AUC was 0.9228, sensitivity was 84%, specificity was 90.5%, positive predictive value was 79.2%, negative predictive value was 92.9%, and accuracy was 88.6% (Fig. 5 and Tables 5, 6).

Figure 4
figure 4

Box charts of the peak areas of the 2-butanone and 2-propanol of the external validation set using the model. Box plots of the peak areas of 2-propanol and 2-butanone in the external validation set using the model. The peak areas in the external validation set were similar to those in the training set. (A) Peak areas of 2-butanone and (B) 2-propanol. BCP breast cancer patient, HC healthy control.

Figure 5
figure 5

Scatter plots of the samples and the area under the curve (AUC) in the external validation sets. (A) The external validation set was used to confirm the validity of the training models. (B) The area under the curve using the external validation set. The AUC, sensitivity, and specificity for this model were 0.9228, 84.0%, and 90.6%, respectively.

Table 5 The performance of the constructed model applying to the external validation set.
Table 6 The results for each histological subtype when applying the model to the external validation set.

Discussion

To the best of our knowledge, our study is the first to build a prediction model to detect breast cancer using a combination of only two VOCs, with the results validated by triplicated reproducibility tests and a validation set analyzed by GCMS. Among the different types of volatile compounds analyzed, we demonstrated that a combination of 2-butanone and 2-propanol was highly effective in detecting breast cancer at various clinical stages, achieving a sensitivity of 93.3% and a specificity of 83.3%.

While cancer screening biomarkers using biological samples have been extensively for the past decade, including several with blood and urine, a model to detect specific VOCs as cancer biomarkers holds the potential for application as a rapid, noninvasive, and inexpensive cancer screening technique that reduces the burdens on the individuals screened. The science of detecting cancers through body fluids was pioneered by Linus Pauling in 197129. More recently, the development of sensor techniques and devices has led to an exponential increase in studies to detect cancer from samples of exhaled breath, blood, urine, and cell-cultured mediums of cancers of the the lung7,8,9,10, colon or rectum11, stomach12,13, liver14, head and neck15,16,17, prostate30, kidney31, ovaries18, and breast19,20,21,22,23,24,25,26. Most of the previous studies on VOCs in breast cancer patients have examined breath samples. A series of studies by Phillips et al.25,32 demonstrated that breath oxidative stress markers can distinguish between women with breast cancer and healthy controls. Later, experiments by the same group found that breath samples show good potential as a biomarker for breast cancer21. Compared with earlier studies that included both early and advanced stages of breast cancer, our study is unique in enrolling a large population mainly consisting of patients with early-stage breast cancer, a disease difficult to diagnose with the current screening methods. Our study design supports the utility of this screening for early-stage breast cancer.

While breath sampling is easy and non-invasive, several obstacles impede further research on the development of the technique for cancer screening33. The procedure is impracticable for routine application, the samples are unstable, and noise from background concentrations can interfere with the very small concentrations of volatiles involved. Urine samples are thus expected to offer an alternative matrix for detecting VOC biomarkers. Adapting the headspace gas of urine samples to GCMS analysis has been well established using classical and basic water analysis techniques. Further, the urine can be partitioned, dispensed, mixed, spikes, stored, and dispatched. The only previous studies to investigate VOCs in urine samples of breast cancer patients were based on smaller populations and employed neither reproducibility tests nor external validation sets34,35,36. Our study sought to build a VOC-screening model with urine samples using a training set, an external set, and triplicated reproducublity tests.

The previous studies on the breath samples of breast cancer patients investigated analyses with cross-validation tests20,21,25 and/or external validation sets21. We designed and performed triplicated reproducibility tests by randomly choosing ten samples from each group as a training set and then confirmed the results with an external validation set. The results showed a concordance of 100% for the BCP group and 90% for the HC group, and the constructed model was confirmed by the external validation set. This is the first study to perform triplicated reproducibility tests and validation tests with external sets.

No studies to date have a identified a single VOC that can determine the presence of cancer with high reliability, and no consensus has been reached on a causative connection between specific VOCs or any type of cancer. A number candidate VOCs have the potential to become common biomarkers among cancers in general, but none are specific to any one type of cancer. A combination of two different VOCs in breath might serve as a marker of disease when one is high and the other is low22. We therefore postulated that combinations of several VOCs may be able to increase the overall sensitivity and specificity.

Several previous studies on VOCs in breast cancer patients are listed in Table 7. Previous analyses of the urine samples of breast cancer patients34,35,36 have identified a significant decrease in dimethyl disulfide, and increases in 4-carene, 3-heptanone, phenol, 1,2,4-trimethylbenzene, and 2-methoxythiophene. In our samples, the combination of 2-butanone and 2-propanol was the best detector of breast cancer of various clinical stages, achieving a sensitivity and specificity of 93.3% and 83.3%, respectively. The VOC 2-propanol, also known as isopropyl alcohol, is a colorless liquid used in making cosmetics, perfumes, skin and hair products, and other chemicals. Elevated levels of 2-propanol were identified in an earlier breath analysis of breast cancer patients22 and a study of lung cancer cells in vivo37. On the other hand, 2-propanol was significantly decreased VOCs in urine samples of cholangiocarcinoma or pancreatic cancer38. The level of 2-propanol may change in association with the altered activity of cytochrome P450 in breast cancer26, and the discrepancy of the result may attribute to the different sample types (i.e. breath or urine) used in each study. The detailed mechanism needs to be clarified. The VOC 2-butanone, more widely known as methyl ethyl ketone, is a widely used solvent. It can be obtained by the dehydration of 2,3-butanediol, a natural metabolite produced from glucose by several microorganisms such as Escherichia coli and Klebsiella pneumoniae39,40,41. Our preliminary test confirmed that 2-butanone was not eluted from blank tubes (data not shown). Several ketones, including 2-butanone, are elevated in the breath samples of lung cancer patients42, urine samples of prostate cancer patients43, and one study identified higher levels of 2-butanone in cultured lung cell lines than in normal cell lines in vitro37. The possible origins are endogenous production, microbiota, or environmental exposure. Fatty acid oxidation, the mechanism found to cause 2-butanone production in cancer progression, may result in elevated levels of ketones44,45. Butanoate metabolism is also reported to be highly activated in breast cancer and colon cancer patients34. Though 2-butanone has not been identified as a specific marker of breast cancer, the combination of 2-propanol and 2-butanone, applied with our prediction model constructed by multivariate analysis, proved to be extremely sensitive and specific in distinguishing breast cancer of all histological subtypes from healthy controls. Our study is the first to build a prediction model based on the P-value, standardized partial regression coefficient, and VIF. Further studies to identify the underlying biological mechanism of this combination of VOCs, and its clinical significance for daily practice, are merited.

Table 7 Published studies on VOCs on breast cancer.

Our study has some limitations. First, the results were derived from an analysis of a fairly non-diverse population, and thus may not extend to a broader population. Second, the control samples were only collected from healthy individuals, and the VOCs examined were not confirmed to be breast-cancer-specific as biomarkers. Previous analyses of VOCs in breath samples20,46,47 have shown that, among the VOCs that were significantly increased in breast cancer patients versus healthy controls and benign breast tumors, only one compound was significantly altered in the breast cancer patients versus the benign tumors. While the two markers in the current study are useful in complementing the current diagnostic tools, further studies with larger populations inclusive of benign tumors and non-breast malignancies are warranted. Furthermore, since breast cancer is a heterogeneous disease, analysis including the molecular subtypes, which is an independent classification from the histological subtypes, is desirable. However, the molecular subtypes were not available for the current study. To further substantiate our results, molecular subtypes by gene expression analysis are needed.

Conclusion

Our prediction model using the combination of the VOCs 2-propanol and 2-butanone usefully complements the current diagnostic tools for early-stage breast cancer. Further studies inclusive of benign tumors and non-breast malignancies are warranted.