Metabolomics of sebum reveals lipid dysregulation in Parkinson’s disease

Parkinson’s disease (PD) is a progressive neurodegenerative disorder, which is characterised by degeneration of distinct neuronal populations, including dopaminergic neurons of the substantia nigra. Here, we use a metabolomics profiling approach to identify changes to lipids in PD observed in sebum, a non-invasively available biofluid. We used liquid chromatography-mass spectrometry (LC-MS) to analyse 274 samples from participants (80 drug naïve PD, 138 medicated PD and 56 well matched control subjects) and detected metabolites that could predict PD phenotype. Pathway enrichment analysis shows alterations in lipid metabolism related to the carnitine shuttle, sphingolipid metabolism, arachidonic acid metabolism and fatty acid biosynthesis. This study shows sebum can be used to identify potential biomarkers for PD.


Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Prof. Perdita Barran
Oct 16, 2020 MassLynx 4.2 (Waters Corp.) was used for LC -MS data acquisition LC-MS raw data were deconvolved using Progenesis QI (Waters, Wilmslow, UK). Peak picking, alignment and area normalisation were carried out with reference to a pooled QC using default parameters. Features extracted from raw data were annotated using accurate mass match with METLIN (https://metlin.scripps.edu/), Human Metabolome Database (HMDB) and LipidMaps (https:// www.lipidmaps.org/). The data were mean centered and auto-scaled and missing values were replaced with cubic spline interpolation in MATLAB 2019a (MathWorks) prior to statistical analysis. PLS-DA was performed for classification and prediction of data; re-sampling with replacement (bootstrapping) was used for model validation where the correct classification rates (CCRs) from the Y-variable were computed for the (n=250) test data sets. Algorithm used in the script that was used in MATLAB (2019a) to perform PLS-DA, is hosted at www.biospec.net (code available at https://github.com/ Biospec/cluster-toolbox-v2.0/blob/master/cluster_toolbox/plsda_boots.m). Univariate ROC analysis was performed in Origin (Version 2017, OriginLab Corporation, Northampton, MA, USA) and multivariate ROC curve based exploratory analysis was executed using MetaboAnalyst Biomarker Analysis (Version 4.0) in which the data matrix was auto-scaled and PLS-DA was used for the classification method and feature ranking method with a two latent variable input. LC-MSE raw data were deconvolved using Progenesis QI (Waters, Wilmslow, UK). Peak picking, alignment and area normalisation were carried out using one of the QC data files as the reference. Significant features extracted from raw data were aligned to significant features in clinical samples, using a RT window ±15 s and mass tolerance ±10 ppm filters. Features were annotated using accurate mass match and tandem MS data with Lipid Maps (https://www.lipidmaps.org/), Lipid Blast (https://fiehnlab.ucdavis.edu/projects/lipidblast and METLIN (https://metlin.scripps.edu/). Mass tolerances of 10 ppm and 30 ppm were applied for precursor and fragment ions, respectively. Compounds with a fragmentation score < 20 were not annotated. Progenesis QI score, fragmentation score and isotope similarity are reported for all annotations based on a combination of accurate mass and fragmentation data.

October 2018
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.
A total of 274 participants were recruited from three subject groups: control (n=56), drug naïve PD (n=80), medicated PD (n=138). It is difficult to perform sample size calculations for untargeted metabolomics data due to the lack of pilot data and the highly multivariate nature of the data. It is generally accepted that over 100 samples within a study is a minimum sample size to derive sufficient statistical power. We excluded data from control participants who accompanied PD participants to their neurological appointment (paired control) since we found evidence of contamination in the sebum from some of these individuals. We also recruited another control cohort from a separate scheme (independent controls) who were not linked to a PD participant. The contamination hypothesis was tested using PLS-DA classification models between 'paired control' and 'independent control' cohorts which found high levels of classification between these control types. This is further supported by the poor classification seen between 'paired control' vs. both PD cohorts in comparison to the independent control vs. PD models which indicated an overlap in sebum signature within the paired control subjects.
Biological replicate samples were analysed from control (n=56), drug naïve PD (n=80), medicated PD (n=138) participants, The findings were not replicated in any independent cohort however, re-sampling methods were used to ensure the models are trained on a subset of data and tested on unseen data from the experiment.
Samples were randomized (with stratification) such that each analytical batch had equal distribution of samples within groups. This was done to ensure in case of errors or instrument failure during analysis, we do not lose a large number of samples specific to any group. Subsequently, within each batch samples were randomized and also blinded (see below).
During data collection, randomised batches of samples were blinded by a person in the group who was not collecting data. For example, 35 samples that were stratified in batch 1, were randomised and then labelled 1 to 35 and were revealed as samples 1 to 35 to the investigator during data collection. Batch 2 samples were randomised and then labelled 36 to 70 and so on. These blinding were removed once the data was collected in order to use supervised multivariate analysis that requires group labelling to be known to the statistician prior to analysis.