In-Depth Characterization and Validation of Human Urine Metabolomes Reveal Novel Metabolic Signatures of Lower Urinary Tract Symptoms

Lower urinary tract symptoms (LUTS) are a range of irritative or obstructive symptoms that commonly afflict aging population. The diagnosis is mostly based on patient-reported symptoms, and current medication often fails to completely eliminate these symptoms. There is a pressing need for objective non-invasive approaches to measure symptoms and understand disease mechanisms. We developed an in-depth workflow combining urine metabolomics analysis and machine learning bioinformatics to characterize metabolic alterations and support objective diagnosis of LUTS. Machine learning feature selection and statistical tests were combined to identify candidate biomarkers, which were statistically validated with leave-one-patient-out cross-validation and absolutely quantified by selected reaction monitoring assay. Receiver operating characteristic analysis showed highly-accurate prediction power of candidate biomarkers to stratify patients into disease or non-diseased categories. The key metabolites and pathways may be possibly correlated with smooth muscle tone changes, increased collagen content, and inflammation, which have been identified as potential contributors to urinary dysfunction in humans and rodents. Periurethral tissue staining revealed a significant increase in collagen content and tissue stiffness in men with LUTS. Together, our study provides the first characterization and validation of LUTS urinary metabolites and pathways to support the future development of a urine-based diagnostic test for LUTS.

. Comprehensive workflow of urinary metabolite biomarker discovery of LUTS (A) and flowchart of metabolite identification process (B). Accurate mass matching with multiple online databases was conducted with a mass error ∆ ppm < 5. Because there are much fewer entries in MS/MS metabolite database compared to MS database, features with matching results in MS but not in MS/MS databases were considered putative identifications.
Scientific RepoRts | 6:30869 | DOI: 10.1038/srep30869 Technical Reproducibility of Metabolomics Platform. Quality control (QC) is of utmost importance in large-scale metabolomics biomarker studies to ensure stable system performance and limit experimental bias. A QC standard was prepared as a pooled mixture of aliquots from all urine specimens (26 LUTS patients and 20 control patients). The QC sample was injected before and frequently throughout the analytical run to monitor instrument stability. Technical reproducibility of the platform was assessed by analyzing the QC sample repetitively within the same day and across months. The average mass deviation was less than 1 ppm across months and the average relative standard deviation (RSD) for retention time were 0.9% (intraday) and 5.4% (interday). The average RSD for peak areas were 7.8% (intraday) and 19.6% (interday). LC-MS peak areas are highly correlated between technical replicates of both inter-and intra-day injections ( Supplementary Fig. S1). The metabolomics profiling platform yielded consistent peak area, m/z and retention time for reliable comparisons of metabolite profiles between LUTS patients and controls.

Selection and Identification of Candidate Biomarkers.
Overall 2802 aligned spectral features were detected in the LUTS patients vs. control data set. The accurate and efficient selection of candidate biomarkers is achieved by combining machine learning feature selection and traditional statistical test, where 118 features were selected from all 2802 peaks for subsequent metabolite identification.
Because of the complexity of a metabolome and the absence of a complete metabolite database, metabolite identification is one of the most challenging tasks in metabolomics studies. The use of a high-resolution and accurate-mass (HR/AM) Orbitrap MS and the strict evaluation of QC samples to ensure system reproducibility laid the basis of the successful metabolite identification. Following the designed flowchart in Fig. 1B, a total of 63 metabolites were identified (Supplementary Table S3), and examples of ID confirmation were illustrated in Supplementary Fig. S2. A list of representative metabolites was shown in Table 1. The heat map of the 63 identified metabolites was displayed in Fig. 2A. The blue and red heat map provided a direct visual comparison of relative expression levels of metabolites (rows) grouped by the sample type (columns).

Binary Classification Model and Statistical Validation.
A predictive model for patient classification was constructed using the 63 identified metabolites dataset with the linear SVM algorithm. SVM algorithm is especially robust in handling noisy data and generally not susceptible to outliers, which is well-suited for metabolomics data set 29 . On the training set, this model classified the LUTS vs. control patients with an AUC ROC of 0.93 (Fig. 2B). In order to evaluate whether this model is over-fitting and how it can be expected to perform on future patients, our process of biomarker selection and classification model construction was evaluated: the entire biomarker selection process was repeated using just the training set for each fold, and the resulting features were used to construct the predictive model for that fold, which was then applied to the held-aside test patient for that fold. This cross-validated AUC ROC was 0.90; relatively modest difference from 0.93, indicating that the over-fitting was small. By comparing the results without biomarker selection (AUC ROC of 0.68 for 2802 features), the established model demonstrated significantly increased discriminatory power and prediction accuracy.

Metabolic Pathway Analysis.
In the human body, metabolites can act synergistically within functionally defined pathways. Metabolic pathway analysis is based on the association between identified metabolites and their related biological processes. Besides the 63 identified candidate biomarkers, an additional 105 metabolites were putatively identified by accurate mass matching (Δ ppm < 1) in order to include as many metabolites in given pathways regardless of their statistical significance. Eventually, four potentially regulated metabolic pathways were identified, the lysine degradation pathway, the arginine and proline metabolism pathway, the nicotinate and   Table S1). The potentially disrupted metabolic pathways were illustrated in Fig. 3 (important pathway segments) and Supplementary  Fig. S3 (complete KEGG maps). Given the complexity of the selected metabolic pathways, it is possible that the entire pathway was altered, or only specific fragments within the pathway were perturbed in the disease state.
Biomarker Verification by Absolute Quantification. Candidate biomarkers were further verified by absolute quantification using selected reaction monitoring (SRM). Seven metabolites were selected from represented metabolic pathways including proline, pipecolic acid, lysine, carnitine, spermine, spermidine, and tyrosine ( Table 2). An eight-point standard curve was constructed for each metabolite with a fixed concentration of the corresponding isotopically labeled internal standard. Excellent linearity (average R 2 = 0.9986) was achieved for each metabolite across three orders of magnitude in dynamic range ( Supplementary Fig. S4). Following this assessment of dynamic range, we performed targeted absolute quantification of the seven metabolites in 46 clinical urine samples. As illustrated in Fig. 4, all of the metabolites exhibited consistent changing trend between absolute and relative quantification, validating our quantitation method and producing important molecular targets for future mechanistic study.
Collagen Assessment. Lower urinary tract inflammation and fibrosis are very commonly observed in prostatic tissues from male LUTS patients, which has recently been associated with increased symptom severity and risk for clinical progression of LUTS/BPH 3,30-32 . But the mechanistic and molecular basis for this association are unclear. Fibrosis is an aberrant wound-healing process downstream of inflammation, which can be characterized by myofibroblast accumulation, collagen deposition, extracellular matrix remodeling and tissue rigidity 5 . Many identified dysregulated metabolites in urine are related to collagen synthesis and deposition, such as metabolites in the arginine and proline metabolism pathway [33][34][35] . In order to investigate prostatic fibrosis as a potential contributor of LUTS and correlate metabolite dysregulations with changed collagen deposition indicating fibrosis, we performed a follow-up study to assess collagen content and tissue stiffness in periurethral prostatic tissues of LUTS patients vs. controls. The collagen content was determined to be significantly higher in LUTS patients (n = 5) than in control patients (n = 7), as illustrated by Picrosirius red staining images and colorimetrically quantified birefringes (Fig. 5). Total collagen content was significantly (p-value = 0.04) increased in men with LUTS and large collagen fibers (orange, p-value = 0.02) were significantly increased as well. Median collagen fibers (yellow, p-value = 0.37) and very large fibers (red, p-value = 0.13) were also increased in LUTS patients but not reaching statistically significance, probably due to limited sample size. The average tangent modulus of periurethral prostatic tissue was 1978 ± 314 kPa for LUTS patients vs. 411 ± 274 kPa for controls (p-value = 0.00002), representing significantly greater tissue stiffness in LUTS group. These data suggest that increased collagen content and tissue stiffness in the periurethral prostatic area indicating fibrosis may contribute to the dysregulation of urine metabolites related to collagen synthesis and deposition.

Discussion
A panel of metabolite biomarker candidates and their related metabolic pathways were successfully generated from our designed workflow. The established binary classification model has great potential for future development of a urine-based diagnostic test of LUTS. In addition, the key metabolites and their related bioprocesses are discussed here to help elucidate the underlying molecular functions involved in LUTS development and progression; particularly their possible associations with the function of lower urinary tract which involves complex regulation of smooth muscle contraction and relaxation and also the coordination of neural networks. The arginine and proline metabolism pathway was found to be distinctly perturbed with more than 30 identified metabolites ( Fig. 3B and Supplementary Fig. S3B). This pathway has been known to be related to the synthesis of collagen [33][34][35][36] . Two crucial polyamines, spermine and its precursor spermidine were significantly increased in LUTS patients' urine and verified by absolute quantification. Prostatic tissue is one of the highest polyamine producing organs in the body 37 . Studies suggested that polyamines can promote collagen production and cell proliferation. Arginase activity can also have direct effects on fibrosis, which is a potential contributor to LUTS/ BPH 3,36,38 . The increased collagen content and extracellular proteins causes tissue stiffness as well as reduced tissue elasticity and compliance 5 . Additionally, spermidine regulates Ca 2+ influx and Na + , K + ATPase activity, which is closely related to the contraction activity in the detrusor of urinary tract and the bladder smooth muscle 39 . Given that we and others have identified that the accumulation of extracellular matrix 3,5 , especially collagen 31 , is associated with LUTS in men, the identification of collagen precursors found within the urine is suggestive that these metabolites are putative biomarkers of LUTS. Perhaps more importantly, these putative biomarkers may be informative to personalized medical therapy treatment in men presenting with LUTS as current therapies do not target the extracellular matrix and hence may not be effective in men presenting with these urinary markers.
Methylated intermediates in the lysine degradation pathway were found to be significantly increased in LUTS patient urine, including dimethyllysine, trimethyllysine, and hydroxy-trimethyllysine (Fig. 3A). Methylation patterns of lysine serve as important biological signals which establish chromatin structure and regulate carnitine biosynthesis and fatty acid oxidation 40,41 . It was reported that the methylation of histone H3 at lysine 4 (H3-K4) is associated with transcriptional regulation of the prostate-specific antigen (PSA) gene in the prostate cancer cell line 42 . But further targeted investigation into methylated metabolites is necessary.
Tyrosine metabolism is related to signal transduction in human body, and tyrosine kinase can modulate smooth muscle contraction through Ca 2+ sensitization 43 . Majority of the identified metabolites in tyrosine metabolism were down-regulated in LUTS patients (Fig. 3C). Results of pathway analysis also indicated potential disruption of the nicotinate and nicotinamide metabolism pathway in LUTS (Fig. 3D) significantly elevated in the LUTS patient, was also found to associate with the regulation of inflammatory actions which is one of the most important etiologies of LUTS 4,44 .
These possible metabolic correlations with disease etiologies are consistent with our recent urinary proteomics study of LUTS in men, which identified and relatively quantified a group of proteins related to fibrosis and inflammatory responses 45 . Changes in smooth muscle tone, prostatic hyperplasia, inflammation, and increased collagen content have been identified in urology studies as possible contributors to urinary dysfunction 3,4,7,31,44 . However, in order to confirm the possible correlation between the changes of metabolites and functional bioprocesses of LUTS, future targeted mechanistic studies are necessary via cell culture or justified mouse models of LUTS 4,15,46-50 .
In summary, we have developed and implemented an in-depth metabolomics analytical platform combining MS-based analysis and advanced machine learning bioinformatics tools. The established method was successfully applied to study LUTS in men, resulting in important disease-associated biomarker and pathway candidates as well as a sensitive and specific classification model for potential non-invasive diagnosis of LUTS. The hypothesized metabolic correlation with collagen deposition was further studied in periurethral prostatic collagen staining. Aging female patients are also known to develop LUTS symptoms. Unlike LUTS in male which has historically been attributed to benign prostatic hyperplasia, LUTS in female is more likely associated with other factors such as bladder dysfunction, urinary tract infection, and postmenopausal urogenital changes. Because of the different etiology of LUTS in female 51,52 , we only focused on LUTS in male in the present study. The established workflow can also be applied to future studies of LUTS in female. It is also worth pointing out that the metabolites and pathways generated in this study are only candidate signatures of LUTS. Future targeted mechanistic study and clinical validation with a separate large cohort of patient samples are necessary before the real usage in clinical practice. Together, this study provided a well-designed methodology and promising molecular targets that are useful for future clinical diagnosis and pathophysiological study of disease.

Clinical Sample Collection. This study was approved by Institutional Review Board (IRB) Protocol and
conducted under the guidance of the University of Wisconsin-Madison Human Research Protection Program (HRPP). All human subjects provided informed consent before participating in this study. Midstream urine samples were collected from 26 patients with LUTS and 20 controls without LUTS in the Urology clinic of the University of Wisconsin Hospital according to the approved IRB. Because of the physiological and anatomic differences of lower urinary tract between female and male, the etiology and risk factors of developing LUTS are often separately studied for female and male patients 51,53 . In this study, the recruited LUTS patients were men with significant urinary frequency and urgency for a duration of more than 6 months as described by the American Urological Association Symptom Index (AUASI) frequency + urgency symptom scores of > 7 8 . Control male patients had no history of significant LUTS and the symptom score ≤ 3 (detailed patient inclusion and exclusion criteria are provided in Supplementary Fig. S5). Because many LUTS patients have a history of other urologic conditions, including renal cell carcinoma, renal cystic disease, kidney stones, erectile dysfunction, hydrocele, and low-grade prostate cancer, controls were also selected from patients with such diagnostic history not specifically associated with LUTS in order to provide a spectrum of patients that can dilute the effect of confounding variables. The age and body mass index were matched between recruited LUTS and control patients (Supplementary Table S2). After collection, all midstream urine samples were centrifuged at 1000 g for 10 min, spiked with sodium azide, de-identified, and stored at − 80 °C until analysis.
In order to compare collagen levels in men with or without LUTS, human periurethral prostatic tissues were collected from a separate group of LUTS patients (n = 5) and age-matched controls (n = 7) assessed under AUASI criteria. Periurethral tissues were procured at surgery from men undergoing radical prostatectomy, who had completed the AUASI within 30 days before surgery. Patient clinical information was provided in Supplementary  Table S2. Experimental details were described previously 3,31 . Tissue samples and related clinical information were obtained with IRB approval. Urine Sample Preparation. Urinary metabolites were separated from large molecules using 3 kDa molecular weight cut-off (MWCO) ultracentrifugation filters (Millipore Amicon Ultra, MA) according to the manufacturer's protocol. The flow-through fractions were collected as urinary metabolites. Osmolality of each metabolite fraction was measured by a freezing-point depression osmometer (Osmometer Model 3250, Advanced Instruments, MA) and metabolite samples were diluted to achieve the same osmolality. By pre-acquisition normalization of the urine, we ensured that each sample has the same osmolality and similar total metabolite concentration before instrumental analysis ( Supplementary Fig. S6). For absolute quantification, a mixture of isotopically labeled internal standard (I.S.) was spiked in urine samples before instrumental analysis.

LC-MS and LC-MS/MS Analysis.
Ultra-performance LC-MS analyses of urine samples were conducted using a Dionex UltiMate 3000 LC system coupled with a Q-Exactive TM Orbitrap mass spectrometer (San Jose, CA). Urinary metabolites were separated with a 20 min gradient on a Phenomenex biphenyl column (2.1 × 100 mm, 2.6 μ m, 100 Å) at a flow rate of 0.3 ml/min. Mobile phase A was 0.1% formic acid in H 2 O and mobile phase B was 0.1% formic acid in MeOH. The gradient was set as follows: 0-5 min, 0-3% solvent B; 5-15 min, 3-40% solvent B; 15-18 min, 80% solvent B. Full MS acquisition scanned from 70 to 1000 m/z at a resolution of 70 K. Automatic gain control (AGC) target was 1 × 10 6 and maximum injection time (IT) was 100 ms. UPLC targeted-MS/MS analyses were acquired at a resolution of 35 K with AGC target of 5 × 10 5 , maximum IT of 50 ms, and isolation window of 2 m/z. Collision energy was optimized for each target with higher-energy collisional dissociation (HCD) fragmentation. The injection order of urine samples with 3 technical replicates was randomized to reduce the experimental bias.

Data Processing and Statistical Analysis. Data files acquired by Thermo Scientific Xcalibur software
were processed by commercial SIEVE TM software for peak alignment and framing. Total ion current (TIC) normalization embedded in SIEVE was performed to reduce instrumental variation before directing to statistical analysis. A total of 2802 aligned spectral features were detected after filtering out irreproducible peaks that were present in fewer than three biological replicates from each group (LUTS or control). In order to select mass features whose peak areas differentiate between disease and control group, a Student's t-test was conducted to generate the average fold change and p-value of each detected feature. False discovery rate (FDR) correction was used to estimate the chance of false positives and correct for multiple hypothesis testing. The distribution of p-values was used to calculate q-values using the Benjamin-Hochberg algorithm in R package 54 . Features with both p-value and q-value < 0.05 were considered statistically significant. For metabolic pathway analysis, the pathway's p-value was calculated as the median of p-values of all the identified metabolites involved in the specific pathway. For absolute quantification and collagen staining experiments, the two-tailed Student's t-test was conducted and p-value < 0.05 was considered statistically significant.
Machine Learning Feature Selection. Chromatographic peak areas of detected features generated from SIEVE was input into WEKA 3.6 software 55 for machine learning based feature selection. Support vector machine (SVM) based attribute evaluation and information gain (IG) based attribute filtering were used to conduct feature selection and rank features based on their contributions to separate LUTS versus control groups. SVM constructs a hyperplane with the maximum margin to separate two groups as widely as possible 29 . IG measures the effectiveness of an attribute in classifying the data based on the entropy measure in information theory 56 . After obtaining the rank of all detected features, the top 100 features in SVM and IG evaluation were selected.
Metabolite Identification. Significantly altered features from statistical test (both p-value and q-value < 0.05) and feature selection (top 100 ranking features from both SVM and IG algorithms) were overlapped to compile a list of most significant features. Metabolite identification was performed according to our designed flowchart in Fig. 1B. First, accurate masses of selected features were searched against multiple databases (mass error < 5 ppm) using the MetaboSearch software 57 , including Human Metabolome Database, Madison Metabolomics Consortium Database, Metlin, and LIPID MAPS. Features with matching results from the databases were subjected to LC targeted-MS/MS analysis, and their MS/MS fragments were searched using the MetFrag software 58 to confirm identities. Metabolite IDs were also confirmed with available metabolite standard compounds.
Machine Learning Classification. Chromatographic peak areas of identified metabolites were directed into WEKA software to build binary classification models with the linear SVM algorithm, which has been shown to work well in high dimensional data. Leave-one-patient-out cross-validation was carried out to evaluate classification accuracy and measure the proportion of patient subjects correctly classified in this task. This procedure withholds one patient at a time as a test set and uses the rest of the data as a training set and repeats this process until all patients have been used exactly once as the test set and classified. The resulting probability values of LUTS for each patient, when that patient is used for testing, can be used to compute a ROC curve. To test the value of the biomarker selection, a ROC curve was also generated using all the features in the dataset for comparison.

Metabolic Pathway Analysis.
For metabolic pathway analysis, the identities of candidate biomarkers were input into the MetaboAnalyst 2.0 software 59 to query against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and generate a list of involved pathways 60 . Only pathways with more than two identified compounds were considered and additional metabolites were identified in order to maximize the metabolite coverage associated with a given pathway, regardless of their p-value and feature ranker.
Scientific RepoRts | 6:30869 | DOI: 10.1038/srep30869 Collagen Staining. Human periurethral prostatic tissues were subjected to mechanical testing to assess rigidity and stiffness as described previously 3 . The tangent modulus of a tissue sample was measured as the terminal slope of the nominal stress vs nominal strain response in kPa, representing passive tissue stiffness. For collagen staining, human periurethral prostatic tissues were fixed in 10% neutral buffered formalin, embedded in paraffin, and sectioned onto positively charged microscope slides. The tissue sections were stained with Picrosirius red and images were acquired under polarized light as described previously 31 . The different collagen fiber sizes were assessed by imaging and quantitating different colors of birefringence using Image J software suite. Student's t-test was performed for each color (green, yellow, orange, and red) and total birefringence.