Integrated targeted metabolomic and lipidomic analysis: A novel approach to classifying early cystic precursors to invasive pancreatic cancer

Pancreatic cystic neoplasms (PCNs) are a highly prevalent disease of the pancreas. Among PCNs, Intraductal Papillary Mucinous Neoplasms (IPMNs) are common lesions that may progress from low-grade dysplasia (LGD) through high-grade dysplasia (HGD) to invasive cancer. Accurate discrimination of IPMN-associated neoplastic grade is an unmet clinical need. Targeted (semi)quantitative analysis of 100 metabolites and >1000 lipid species were performed on peri-operative pancreatic cyst fluid and pre-operative plasma from IPMN and serous cystic neoplasm (SCN) patients in a pancreas resection cohort (n = 35). Profiles were correlated against histological diagnosis and clinical parameters after correction for confounding factors. Integrated data modeling was used for group classification and selection of the best explanatory molecules. Over 1000 different compounds were identified in plasma and cyst fluid. IPMN profiles showed significant lipid pathway alterations compared to SCN. Integrated data modeling discriminated between IPMN and SCN with 100% accuracy and distinguished IPMN LGD or IPMN HGD and invasive cancer with up to 90.06% accuracy. Free fatty acids, ceramides, and triacylglycerol classes in plasma correlated with circulating levels of CA19-9, albumin and bilirubin. Integrated metabolomic and lipidomic analysis of plasma or cyst fluid can improve discrimination of IPMN from SCN and within PMNs predict the grade of dysplasia.


Results
study characteristics. This cohort study included 35 patients undergoing pancreas resection, from whom pre-operative blood plasma (n = 21) and peri-operative cyst fluid (n = 31) were collected ( Supplementary  Fig. S1). Following histological validation of resected tissues, four groups were assigned: serous cystic neoplasm (SCN), IPMN with low-grade dysplasia (LGD), IPMN with high-grade dysplasia (HGD), and invasive IPMN (Cancer) for which clinical parameters are summarized in Table 1. As expected, the IPMN group was older, of mixed gender, and had comparable BMI with SCN controls. Cardiovascular disease (CVD) and diabetes were more common in patients with IPMN. Compared to SCN, IPMN LGD and HGD showed no significant elevation of circulating CA19-9, HbA1c, amylase, albumin, bilirubin, or white blood cell count. Only Cancer had significantly increased circulating CA19-9 or HbA1c levels.
Metabolite profiling reveals alterations of lipid metabolism pathways. Cyst fluid and plasma were profiled single-blinded using a targeted and (semi)quantitative liquid chromatography-tandem mass spectrometry (LC-MS/MS) method. A total of 90 and 91 different metabolites were measured in cyst fluid and plasma, respectively. A hierarchical clustered heatmap of the metabolomic data showed no clear grouping of metabolite profiles according to diagnose group (Fig. 1A,B). However, principal component analysis (PCA) showed that cyst fluid but not plasma from SCN was dissimilar to all other groups (Fig. 1C,D).
We next applied quantitative metabolic pathway enrichment analysis, using metabolite identities (see Supplementary Table S1). Because HGD and Cancer are target groups for resection, these groups were combined (HGD/Cancer). Compared to SCN, 34 enriched pathways were found in cyst fluid from HGD/Cancer and 12 enriched pathways from LGD at a significance level of 0·05 ( Supplementary Fig. S2). Among these, lipid pathways appeared to dominate, e.g. phosphatidylethanolamine biosynthesis, phosphatidylcholine biosynthesis, taurine and hypotaurine metabolism, phospholipid biosynthesis, beta oxidation of very long chain fatty acids, fatty acid metabolism, oxidation of branched chain fatty acids, and sphingolipid metabolism. Several of these were also significantly enriched in plasma samples of HGD/Cancer or LGD compared to SCN, including sphingolipid metabolism, phosphatidylethanolamine biosynthesis, phosphatidylcholine biosynthesis ( Supplementary Fig. S2).
Lipidomic profiling indicates a difference between IPMNs and SCN. As metabolic analysis of IPMN indicated altered lipid metabolism, and considering the pancreatic exocrine function of secreting lipases that might be affected in PCN patients, we next performed a high definition lipid profiling of paired aliquots using the SCIEX Lipidyzer ™ technology, where 1100 lipid molecular species were measured in all the samples. Out of those 1100 measured lipid molecular species, we successfully detected and quantified a total of 430 in cyst fluid and 941 in plasma. Heatmap visualization showed that triacylglycerol (TAG)-related lipids are the most abundant type among all lipid classes in both cyst fluid and plasma ( Fig. 2A,B). Lipidomic profiles of HGD and Cancer appear to be similar to each other, while LGD was clustered slightly differently. Although the lipidomic profiles of plasma and cyst fluid of the SCN group, as projected on the first two principal components (PCA), show a different direction and shape of the clouds of points when compared to IPMN groups, no evidently clear separation of diagnose groups can be observed from the 2D plot (Fig. 2C,D). The lipidomic analyses suggest alterations in lipid compound composition in cyst fluid and plasma, pointing to the possibility of discrimination of pancreatic disease severity, confirming the pathway enrichments we observed. Looking at fold change estimations ( Fig. 3 and Supplementary Table S1) it is possible to observe a clear alteration of the TAG class in plasma samples in both LGD and HGD/Cancer compared to SCN. In IPMN cyst fluid samples, instead, free fatty acids (FFA) and www.nature.com/scientificreports www.nature.com/scientificreports/ ceramides (CER) appear to have, on average, higher concentrations than those of SCN samples while having lower amount of TAGs. Interestingly, the profile of the Cancer group is similar to that of LGD in plasma and to that of HGD in cyst fluid, while only TAGs differ significantly between HGD and LGD in both plasma and cyst fluid ( Supplementary Fig. S3).
Integrated metabolite and lipid data predict IpMN disease groups. Accurate classification of IPMN severity using novel biomarkers in cyst fluid or plasma may facilitate the discrimination of low-risk from high-risk patients. We therefore assessed the predictive capacity of the integrated metabolome and lipidome profiles to classify samples according to their corresponding disease group. As IPMN HGD and Cancer are considered as high-risk lesions, we combined these into a single group. Effects of clinical covariates (Table 1) were estimated and subtracted from the raw data prior to analysis, and the only covariates that improved the classification model were age and BMI. The result of binary classifications and the performance of the CPPLS-DA model are given in Table 2 and Supplementary Table S2. The model discriminated between SCN and IPMN with very high accuracy (100%) in both cyst fluid and plasma samples. Choline, 2-aminoisobutyrate, trimethylamine n-oxide, glycine, alanine, and glyceraldehyde were found to be essential discriminatory molecules in both cyst fluid and plasma. Furthermore, dimethylglycine was a discriminatory compound for cyst fluid while serine and GABA were for plasma. Overall, the two biofluids displayed similar predicting power, with cyst fluid-based classification performing slightly better (accuracy of approximately 90%) when classifying the three groups SCN, LGD and HGD/Cancer. Nevertheless, plasma-based classification could easily detect LGD samples (90% accuracy) while cyst fluid molecules' concentrations were better for predicting HGD/Cancer samples (90% accuracy) ( Table 2). The model had a low sensitivity in discriminating the three IPMN groups from each other when HGD and Cancer were considered separately. SCN samples form a distinct cluster from IPMN samples, whereas the other two groups overlap significantly in cyst fluid (Fig. 4A,B). The top 15 molecules in cyst fluid and plasma ranked by their VIP scores are presented in Fig. 4C,D. Interestingly, only a subset of metabolites, without lipids, were sufficient to achieve best performance. In particular, amino acids were the most important molecules for the classification of plasma samples.

Discussion
Pancreatic IPMNs are common precancerous lesions [6][7][8] . Today, only some radiological and clinical parameters are used to identify patients with high risk for cancer progression or malignancy, for example, main pancreatic duct dilatation, cyst diameter, rate of progression and elevated serum CA19-9 [14][15][16] . Unfortunately, no high-accuracy tools are available that determine the IPMN-associated grade of dysplasia or that offer accurate differential diagnosis from other benign and low-risk PCNs (i.e. SCNs). These diagnostic limitations negatively affect patient management and treatment. Until recently, metabolites involved in IPMN disease progression have been scarcely studied 27 . A holistic view of the plasma and cyst fluid metabolic profile may aid the discovery of biomarkers capable of improving pre-operative PCN diagnosis. www.nature.com/scientificreports www.nature.com/scientificreports/ We have shown that an integrated metabolomics and lipidomics approach can be used to 1) discriminate between IPMN and SCN and 2) determine the IPMN-associated grade of dysplasia. Our analysis offered superior predictive accuracy compared to conventional cross-sectional imaging or EUS-FNA 15 . The LOOCV balanced accuracy of discriminating Cancer/HGD from SCN and LGD was 90.6% for cyst fluid and 81.8% for plasma. When discriminating IPMN as a whole from SCN, accuracy reached 100% for both plasma and cyst fluid. The availability of accurate plasma-based tests could represent a major advantage for patients who do not require invasive procedures like EUS-FNA, which are associated with risk of complications and low-diagnostic accuracy 31 .
While previous PC metabolome studies measured around 50-100 metabolites per case 24,26,29,32 , our high-definition integrated approach measured around 100 metabolites from 15 different biological classes and 1000 lipid molecular species from 13 different lipid classes, largely covering the important metabolome spectrum, i.e. sugars, nucleotides, amino acids, and lipids. Recent elegant metabolome studies 24,28 pointed out that a number of compounds, including very long-chain fatty acids, phospholipids, and taurine, were differentially present in PC patients or PC tissue compared to healthy subjects or parenchyma tissue, respectively. This agrees with our findings of enriched taurine and fatty acid metabolism pathways and phospholipid biosynthesis pathways in cyst fluid of pre-malignant or early malignant cases, e.g. and LGD and Cancer/HGD, as compared to SCN. Moreover, our study also has shown significant alterations of different classes of molecules, mainly TAGs, detected in plasma of Cancer/ HGD, and LGD, compared to SCN. We did not observe significant alterations of molecules in plasma when comparing HGD/Cancer with LGD, suggesting these disease groups display a more comparable plasma profile. www.nature.com/scientificreports www.nature.com/scientificreports/ The tumor marker CA19-9 is used for predicting malignancy of IPMN and monitoring PC progression, but its use as a definitive diagnostic marker, especially detecting IPMN HGD, is limited 33,34 . Combining additional blood markers with CA19-9 was recently shown to improve early detection of PC 33 , and building a broader   www.nature.com/scientificreports www.nature.com/scientificreports/ molecular profile around CA19-9 in IPMN patients may enhance the diagnostic accuracy. The lipid metabolites strongly associated with CA19-9 in this cohort are therefore of interest and need to be examined further, possibly together with metabolites correlating with serum albumin and bilirubin. Our findings that plasma, but not cyst fluid metabolites, strongly correlated with these three markers, suggest that systemic, rather than local factors, may have an influence on development and progression of IPMN.
A strength of this paper is the investigation of a surgical cohort of patients, with definitive histology and assessment of the grade of dysplasia. This allowed us to accurately match the actual IPMN-associated grade of dysplasia with our data, avoiding problems of misdiagnosis that occur in more than one third of the patients undergoing EUS-FNA with cyst fluid analysis 12 . In addition, we used a validated targeted and (semi) quantitative analysis through a robust and reliable LC-MS/MS approach with strict quality management 35 . We furthermore tested several linear mixed models to assess many covariate parameters (see Table 1), and while use of statins was considered as possible confounding factor, it did not improve the final classification model which only adjusted for age and BMI.
However, this study has also some limitations. Firstly, there is possible selection bias because we analyzed a small and homogenous cohort of patients which might not be fully representative of the entire population and thus might potentially restrict the predictive power of lipidomic/metabolomics profiling to certain groups within the general population. In this relatively small cohort all SCNs cases are female, which is not surprising as the prevalence rate in the general population for SCNs is 9-16% of all cystic lesions while approximately 75% of patients with SCNs are women 36 . Although this gender imbalance can be certainly considered a limitation we observed that, after adjusting for BMI and age, the effect of sex did not have any impact on out-of-sample model prediction accuracy. Therefore, we believe that our classification accuracy is mainly a consequence of differences between diseases while we expect unbalances in sex distribution to have only a very negligible influence. Moreover, we did not include other types of mucinous tumors of the pancreas, such as mucinous cystic neoplasm www.nature.com/scientificreports www.nature.com/scientificreports/ (MCN). However, such lesions are easier to recognize, due the peculiar epidemiological and radiological features (typically in body and tail of the pancreas, almost exclusively in young females) that make diagnosis not particularly challenging 37 . Additionally, the current study might suffer from a possible sampling bias, considering that fluid was aspirated from one or two accessible cysts, despite IPMNs often being multifocal and occurring in different locations of the pancreas (head/body/tail). Therefore, one cannot exclude the possibility that metabolic profiles of the sampled cysts might not be representative of all cystic lesions. Lastly, we did not investigate possible factors that might have potential effects on (lipid) metabolism, such as specific genetic mutations (e.g GNAS or KRAS gene) 38,39 .
This study has comprehensively mapped the metabolite and lipid makeup of cyst fluid and plasma from PCN patients with defined pathology, using integrated metabolomics and lipidomics. Our findings have clinical implications and may support assay development for differential diagnostics of PCNs to improve patient management. Future studies are needed to test larger patient cohorts using the proposed model, to better understand associations between metabotypes and IPMN malignancy risks.

Methods study population and ethical considerations.
In this prospective cohort study, patients undergoing pancreatic surgery for suspected pancreatic cystic neoplasm (PCN) with post-surgically validated intraductal papillary mucinous neoplasm (IPMN) and serous cystic neoplasm (SCN) from February 2016 to January 2017 at Karolinska University Hospital, Sweden, were included. Excluded were cases without a cystic component, non-IPMNs, or those without cyst fluid in the resected pancreas ( Supplementary Fig. S1). This study follows the Helsinki convention and good clinical practice with permission of the Ethical Review Board Stockholm and the Karolinska Biobank Board (Dnr 2015/1580-31/1). Written informed consent was obtained from all patients.
Pancreatic cyst fluid collection. Fresh resection specimens were received at the pathology laboratory within 20 minutes of surgical removal, in sterile conditions and on ice. Macroscopic assessment to identify the cystic lesion and main pancreatic duct was done by a specialist pancreatic pathologist. Fluid from the main pancreatic duct was collected using a syringe without needle. When the cystic lesion was readily identified in the intact specimen, the fluid was aspirated using a syringe with needle. For specimens in which the cystic lesion was not readily accessible from the surface the specimen was cut or when the cyst content was too viscous content was aspirated using a syringe without needle. Aspirated fluid was stored at −80 °C until further analysis.
peripheral blood collection and plasma isolation. Venous whole blood was collected in K 2 EDTA Vacutainer ® tubes (BD, Stockholm, Sweden) immediately prior to surgery. Within four hours of collection, blood was processed using Ficoll Paque Plus (GE Life Sciences, Uppsala, Sweden) gradient-density centrifugation following manufacturer's instructions and the plasma fraction was stored at −80 °C until further analyses.
Histopathological diagnosis and cyst fluid classification. Resection specimens were fixed in 4% formaldehyde and processed for routine histopathological diagnosis. The cystic lesions were classified by light microscopic examination of hematoxylin-eosin stained slides by a specialized pancreatic pathologist as IPMN or SCN 40 . The grade of dysplasia in IPMN was assessed using a 2-grade (high/low) scale, according to current international standard 41 . To make the cyst fluid classification more representative of the neoplastic epithelium that produces it, specimens showing <5% high-grade dysplasia (HGD) were classified as low-grade dysplasia (LGD). Specimens with concomitant invasive carcinoma were classified as "Cancer" and considered as a separate class for further analyses.
Chemicals. All metabolite standards used in the analysis were purchased from Sigma-Aldrich (Helsinki, Finland), while isotopically labelled metabolite internal standards (IS) were obtained from Cambridge Isotope Laboratory (Tewksbury, MA, USA). For lipidomics, kits containing 50 labelled internal standards across 13 lipids classes were purchased from SCIEX (Framingham, MA, USA). Ammonium formate, ammonium acetate, and ammonium hydroxide were obtained from Sigma-Aldrich (Helsinki, Finland). Formic acid (FA), acetonitrile (ACN), methanol (HiPerSolv CHROMANORM, LCMS grade), ethyl acetate (HPLC grade), 2-propanol, 1-propanol, and dichloromethane were purchased from VWR International (Helsinki, Finland). Deionized water, up to a resistivity of 18 MΩ⋅cm, was purified with a Barnstead Easypure RoDi water purification system (Thermo Scientific, Marietta, OH, USA). Whole blood was purchased from the Finnish Red Cross blood service (Helsinki, Finland) from which serum samples were prepared and used as internal quality control samples.
Metabolomic analysis. Metabolomic analysis of samples was performed using liquid chromatography-mass spectrometry as previously described in the supplementary data of Nandania et al. 35 . Briefly, 15 labeled internal standards were used to estimate quantitative levels of 100 metabolites. To 100 µL thawed sample (plasma or cyst fluid), 10 µL of labelled internal standard mixture was added, and then metabolites were extracted using protein precipitation by adding acetonitrile +1% formic acid (1:4, sample:solvent). The collected extracts were dispensed in Ostro 96-well plates (Waters Corporation, Milford, USA) and filtered by applying a vacuum at a delta pressure of 300-400 mbar for 2.5 min on robot's vacuum station. Filtered sample extract (5 µL) was injected in an Acquity UPLC-system coupled to a Xevo TQ-S triple quadrupole mass spectrometer (Waters Corporation, Milford, MA, USA) which was operated in both positive and negative polarities with switching time of 20 milliseconds. Multiple Reaction Monitoring (MRM) acquisition mode was selected for the quantification of metabolites. MassLynx 4.1 software was used for data acquisition, data handling and instrument control. Data processing was done using TargetLynx 4.1 software.
www.nature.com/scientificreports www.nature.com/scientificreports/ Lipidomic analysis. Lipids were extracted with liquid-liquid extraction (LLE) method using ethyl acetate and methanol. In borosilicate glass tubes, to 100 µL thawed sample (plasma or cyst fluid), 1 mL methanol and 1 mL water was added. Then, 100 μL of labelled internal standard mixture (prepared as per SCIEX LIPIDYZER manual's instructions) was added and allowed to equilibrate with the samples. To each tube 3.5 mL of ethyl acetate was added after which tubes were put on a rotator shaker for 15 min at 30 RPM, followed by centrifugation at 3000 RPM for 10 min. After centrifugation, the upper layer of ethyl acetate was collected and dried under N 2 gas. Dried samples were reconstituted with 250 μL of mobile phase (dichloromethane:methanol (50:50) containing 10 mM ammonium acetate) for injection. Lipid separation and quantitation was performed on the SCIEX Lipidyzer ™ platform using a SCIEX 5500 QTRAP ® mass spectrometer (SCIEX, Washington, D.C., USA) with SelexION ® Differential ion mobility (DMS) technology by directly infusing 50 μL of extracted samples with a mobile phase at flow rate of 70 µL/min. Two acquisition methods, with and without SelexION ® technology, were used to cover 13 lipid classes using flow injection analysis. The lipid molecular species were measured using MRM strategy in both positive and negative polarities. Positive ion mode was used for the detection of lipid classes -sphingomyelins (SM), diacylglycerols (DAG), cholesteryl esters (CE), ceramides (CER), triacylglycerols (TAG), and negative ion mode was used for the detection of lipid classes -lysophosphatidylethanolamines (LPE), lysophosphatidylcholines (LPC), phosphatidylcholines (PC), phosphatidylethanolamines (PE) and free fatty acids (FFA). Lipidomics Workflow Manager software was used for acquisition of samples, automated data-processing, signal detection and lipid species concentration calculations. statistical analysis. All data analyses were performed with R 3.5.1 and Stan 2.17.1 42,43 . Concentration values in μmol/L were assumed to follow a lognormal distribution and were therefore log-transformed as a preliminary normalization operation. Missing values were removed from the dataset following the "modified 80% rule", according to which a variable is discarded if the relative frequency of missing values is more than 0.8 in all clinical groups 44 . Remaining missing values were imputed with the QRILC function from R package imputeLCMD 45 . Considering the lack of balance for basic clinical parameters (Table 1), all values were adjusted for the effects of confounding factors using a linear mixed model. Let y gij be the log-concentration of molecule j observed at sample i belonging to phenotypic group g. The basic assumption of our model is that observations are conditionally independent and normally distributed, with same standard deviation σ but different mean value μ: gij gij gij 2 Covariates to include in the model were selected by the highest out-of-sample point-wise predictive accuracy 46 . Parameter μ was then defined as the linear combination where θ is the grand mean, φ i is the general effect of sample i, λ j is the effect of molecule j, η gj is the effect of phenotypic group g on molecule j, and γ = v ( 1, 2) v are effects varying with molecules. Fold change of molecule j between groups g and h was therefore defined as η η − e ( ) gj hj . All coefficients associated with the same discrete category are constrained to sum to zero in order to make the model identifiable. We assigned to each unknown parameter weakly informative prior distributions as follows: www.nature.com/scientificreports www.nature.com/scientificreports/ where t refers to the three-parameters Student's t-distribution and halft to the same distribution but truncated at 0 and defined only on the positive values 47 . Total number of phenotypic groups k depended on the particular statistical analysis being conducted. Estimated fold changes and their corresponding 95% credibility intervals, computed from 20,000 posterior samples, are available in Supplementary Table S1. Prior to visualization, classification, and enrichment analyses, the dataset was adjusted for the confounding covariates "age" and "BMI" and subsequently standardized to a mean of zero and unit variance. Principal Component Analysis (PCA) was applied to the adjusted data for exploratory data purposes. Heatmaps and data projection on the first two principal components were used to visualize the dataset. Classification was performed using a Canonical Powered Partial Least Squares Discriminant Analysis (CPPLS-DA), fitted with the pls R package 48,49 . Classification performance was measured with a Leave-One-Out Cross Validation (LOO-CV) strategy, and balanced accuracy (average between sensitivity and specificity of the classifier) is reported 50 . Best explanatory molecules were selected according to their Variable Importance in Projection (VIP) ranking scores according to the following iteration scheme 51 . At each step of the algorithm the performance of the model was recorded with a LOO-CV strategy and the molecules were sorted according to their VIP score. Subsequently, 5% of the molecules with the lowest VIP score were discarded and this operation was repeated until the number of molecules allowed model identifiability. The model with the highest performance was ultimately selected. Pathway enrichment analysis (QEA) was performed with the free web service MetaboAnalyst 4.0 52 . All figures were generated in R 3.5.1 42 . ethical considerations. This study follows the Helsinki convention and good clinical practice. This study was conducted at Karolinska University Hospital under permission of the Ethical Review Board Stockholm and the Karolinska Biobank Board (Dnr 2015/1580-31/1). Written informed consent was obtained from all patients.

Data Availability
The raw datasets generated during the current study are available from the corresponding author on reasonable request. Results from analyzed datasets are available in Supplemental Tables 1-3.