Schizophrenia-risk and urban birth are associated with proteomic changes in neonatal dried blood spots

In the present study, we tested whether there were proteomic differences in blood between schizophrenia patients after the initial onset of the disorder and controls; and whether those differences were also present at birth among neonates who later developed schizophrenia compared to those without a psychiatric admission. We used multiple reaction monitoring mass spectrometry to quantify 77 proteins (147 peptides) in serum samples from 60 first-onset drug-naive schizophrenia patients and 77 controls, and 96 proteins (152 peptides) in 892 newborn blood-spot (NBS) samples collected between 1975 and 1985. Both serum and NBS studies showed significant alterations in protein levels. Serum results revealed that Haptoglobin and Plasma protease C1 inhibitor were significantly upregulated in first-onset schizophrenia patients (corrected P < 0.05). Alpha-2-antiplasmin, Complement C4-A and Antithrombin-III were increased in first-onset schizophrenia patients (uncorrected P-values 0.041, 0.036 and 0.013, respectively) and also increased in newborn babies who later develop schizophrenia (P-values 0.0058, 0.013 and 0.044, respectively). We also tested whether protein abundance at birth was associated with exposure to an urban environment during pregnancy and found highly significant proteomic differences at birth between urban and rural environments. The prediction model for urbanicity had excellent predictive performance in both discovery (area under the receiver operating characteristic curve (AUC) = 0.90) and validation (AUC = 0.89) sample sets. We hope that future biomarker studies based on stored NBS samples will identify prognostic disease indicators and targets for preventive measures for neurodevelopmental conditions, particularly those with onset during early childhood, such as autism spectrum disorder.


Introduction
Despite decades of research, the aetiology of schizophrenia is poorly understood. Schizophrenia is a severe and disabling psychiatric disorder involving impairments in perception, cognition and motivation that usually become evident in late adolescence or early adulthood.
Early diagnosis of schizophrenia is beneficial for patients as shorter periods of untreated psychosis have been linked to better patient outcomes 1 . However, as there are no diagnostic tests for schizophrenia, diagnosis is still based on the evaluation of signs and symptoms in clinical interviews. Consequently, misdiagnosis is common 2 as patients are required to acknowledge the occurrence of symptoms of psychosis, such as hallucinations and delusions. Furthermore, other psychiatric disorders can present with overlapping symptoms.
To date, most proteomic and biomarker studies have focused on the detection of changes in protein levels in patients with confirmed disease status versus healthy individuals. However, for certain adult onset diseases such as type 2 diabetes, hypertension and stroke, increasing attention is being given to detecting prognostic disease markers in early life and even in newborn babies 3 . Early detection of disease predisposition could allow for targeted prevention or amelioration of disease course before overt symptoms develop. This could be achieved by therapeutic or lifestyle interventions.
Since the late 1960s, newborn blood-spot (NBS) screening programs have become routine to test for rare but serious metabolic health conditions, such as cystic fibrosis and sickle cell disease. The stability of DNA, RNA, small molecules and proteins within the dried blood-spot (DBS), combined with the ease of collection, shipping and storage provide a powerful tool for screening programs and for large population-based surveys. DBS sampling will be particularly important for diseases like psychiatric disorders in which patient recruitment is notoriously difficult and expensive. We 4 and other researchers 5 , have previously demonstrated the potential utility of DBS sampling for clinical proteomics and personalised medicine applications using multiple reaction monitoring (MRM). In MRM, a highly specific, reproducible and sensitive mass spectrometry (MS) technique, pre-defined protein peptides or small molecules of interest can be robustly quantified from small sample volumes.
Reported environmental risk factors for schizophrenia that potentially affect early neurodevelopment during pregnancy include infections 6 and nutritional deficiencies 7 , intrauterine growth restriction 8 and other pregnancy and birth complications. Established risk factors following birth include infections 6 , socioeconomic and childhood adversity. Epidemiological studies have also revealed an increased risk of developing psychiatric disorders for individuals born 9,10 and living 11 in urban environments. However, whether the effect of urbanicity on schizophrenia incidence is a consequence of unknown risk factors associated with place of birth, place of residence or both is unclear.
In the present study, we tested whether serum protein abundance differed between first-onset drug-naive schizophrenia patients and controls. We then tested whether those protein differences were present in NBS samples collected from newborn babies who later developed schizophrenia ('future schizophrenia patients') and those without a psychiatric admission. For the latter analysis, we had to initially determine whether we could detect the targeted protein peptides in stored NBS samples collected from neonates born in Sweden between 1975 and 1985. As our study population included babies exposed to urban and rural environments during pregnancy, we also tested whether protein abundance at birth was associated with urbanicity.

Subjects
The Cologne study, as previously described 12,13 , consisted of serum samples from 60 first-onset drug-naive schizophrenia patients and 79 age and sex matched controls recruited by the Department of Psychiatry, University of Cologne (Table 1a). The ethical committees of the Medical Faculty of the University of Cologne and Addenbrooke's Hospital (Cambridge, UK) approved the protocols of this study including procedures for sample collection and analysis. Informed consent was given in writing by all participants.
The Stockholm population, as previously described 14 , consisted of all persons (born in Sweden 1975Sweden -1985 and treated for non-affective psychosis within psychiatric services in Stockholm County as inpatients (from 1987) or outpatients (from 1997 until 2004). The other population consisted of persons born between 1975 and 1985 in two Northern counties (Västerbotten and Norrbotten) and treated for non-affective and other psychoses between 1987 and 2005. Control subjects had no history of inpatient psychiatric admission, according to the National Patient Register 15 , and had to be alive and resident in Sweden. The controls were matched for sex, birth year and birth hospital. The aim was to recruit two controls per patient. Schizophrenia was defined as ICD9-code 295 (excluding 295F and 295H) or ICD10-code F20. Nonaffective psychosis (excluding schizophrenia) was defined as ICD9-code 297, 298C-298X, 295F and 295H or ICD10code F21-F29. The Northern Sweden data also included affective psychosis patients defined as ICD9-code 296 and 299 or ICD10-code F39, F333, F323, F315, F312 and F302. We only formed a 'psychosis patient group' consisting of patients with either non-affective or affective psychosis as a validation sample set to assess whether the urban-rural associations identified in controls could also be detected in an independent cohort (i.e. the future disease status is not relevant for this comparison). At the end of December 2003, the neonatal study consisted of 645 controls (no psychiatric diagnosis, subsequently referred to as 'controls'), 172 psychosis patients and 75 schizophrenia patients (Table 1b). All samples were stored in the same NBS sample repository in Stockholm. We obtained the following information through linkage to the Medical Birth Register: 16 gestational age at birth, birth weight and length, birth order, Apgar score, head circumference, maternal eclampsia, maternal immigration, maternal age and place of residency (municipality) at delivery. Data on population density (number of inhabitants per km 2 ) for each municipality in 1991 was obtained from Statistics Sweden. The study was approved by the regional ethics committee in Stockholm and all participants provided their signed consent. Targeted protein quantification in NBS and serum NBS and serum samples were prepared in a 96 plate format as described previously 4 . Briefly, proteins were extracted from serum and NBS samples using ammonium bicarbonate. Then, disulphide bond reduction and cysteine alkylation were performed using dithiothreitol and iodoacetamide, respectively. Proteins digested overnight using trypsin (Supplementary Information). Isotopically labelled internal standard peptides were spiked into both NBS and serum samples prior to MS run. Quality control (QC) samples were used in this study to monitor method performance and instrument stability (Supplementary Information).
In this study, a total of 101 serum proteins (172 peptides), the majority previously associated with psychiatric disorders, were selected. Three to four interference-free transitions were selected for each targeted peptide as described previously 4 . Tryptic digested peptides were monitored using an Agilent 1290 Liquid Chromatography (LC) system coupled with 6495 Triple Quadrupole MS equipped with jet-stream nano ESI source operated in positive mode. MS data were acquired in MRM mode. The chromatographic separation was carried out on Agilent AdvanceBio Peptide Map column (2.1 × 150 mm 2.7-micron) at 50°C. Peptides were eluted over a linear gradient from 3% to 30% acetonitrile in 0.1% formic acid in 45 min.

Statistical analysis
Data pre-processing and quality control Raw MS files were processed using the Skyline software package (Version 3.1.0). Peaks were manually checked, and peak integrations were adjusted accordingly when necessary. The endogenous and internal standard peptide-transition peak areas were estimated and exported as a comma delimited data file for statistical analysis in R (Version 3.2.3) 17 . MS data pre-processing is described in the Supplementary Information.

Coefficient of variation
We used the geometric coefficient of variation (CV), which describes the amount of variability relative to the mean, to quantify the degree of variation for the peptides across the MS runs. For natural log transformed data, the geometric CV = ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi e sd 2 À 1 p 100 (ref. 18), where sd is the standard deviation of the log-transformed data. Note that the geometric CV was used as it is important to estimate the variability on the original scale of measurement.

Patient-control association analysis
We tested the association between relative peptide abundance and disease status (0 control and 1 schizophrenia) in the Cologne study using a logistic regression model. As body mass index was missing for over 20% of the participants and smoking for over 20% of patients (Table 1a), only age and sex were available for selection. In the analysis of schizophrenia patients and controls from the neonatal study, we used a generalised additive model (GAM) 19 . As proteins dried on filter paper can degrade overtime 20 and degradation may not be a linear function of time, we used a GAM to allow for a smooth to be fitted for year of birth, which represents the time of storage. The smooth may also better fit any changes in protein decay associated with the 1981 change in the storage of the Swedish NBS collection cards from room temperature to 4°and 30% humidity. In the R package mgcv 21 , smooth functions of the GAM are represented using penalised regression splines. The following covariates were available for selection: sex, year of birth (linear or smooth; Supplementary Table 1), whether the mother was born abroad, Apgar score at 1 min, Apgar score at 5 min, parity, whether the child was the first born, caesarean section, completed weeks of gestation, birth weight, length at birth, head circumference, whether the baby was small for their gestation age, age of mother, whether the mother suffered from eclampsia, and population density of the municipality where the mother was living at the time of the birth of the child (grouped as 0.1-49, 50-99, 100-499, 500-999, 1000-2999 and 3000-3999 per km 2 ). We used the R package mice 22 to replace missing covariate values using multiple-imputation (Supplementary Table 2). Model selection was based on forward-selection with Bayesian information criterion. We also fitted a joint effects model to predict disease status using ten-fold cross-validation with least absolute shrinkage and selection operator (lasso; Supplementary Information) regression as implemented in the R package glmnet 23,24 .

Urban-rural association analysis
We tested the association between relative peptide abundance and urbanicity at birth (0 rural and 1 urban) in controls from the neonatal study using a GAM. Rural was defined as a population density <50 per km 2 and urban centre as a population density ≥1000 per km 2 (1500 per km 2 used by European Union Organisation for Economic Co-operation and Development 25 which falls within our population density group 1000-2999 per km 2 ), respectively containing 182 and 214 controls. Model selection, including lasso regression, and variables available for selection were as in the case-control comparison. We attempted to validate the urbanicity prediction model in 34 rural and 45 urban future (affective and non-affective) psychosis patients from the neonatal study. However, only location of birth and not future disease status were relevant for this comparison.

Targeted protein detection and their coefficients of variation
We monitored 77 proteins (147 peptides) in 139 serum samples from Cologne. These samples were randomly assigned to two 96-well plates, the second plate was half filled, and run over one and a half weeks on the MS. We used the CV to quantify the degree of variation (robustness) in the relative peptide abundances measured in a pooled serum sample. The median CV across the plates was 7.23% (6.54% in plate 1 and 7.92% in plate 2; Supplementary Fig. 1a).
As we have previously only processed DBS samples within 6 months of collection 4 , we had to determine whether we could detect the targeted protein peptides in stored NBS samples collected between 1975 and 1985 (Supplementary Table 1). We initially tested ten samples collected in 1975 and 1985, five from each year. A total of 101 serum proteins were monitored in these test samples (data not shown) and 96 proteins (152 peptides) were selected and subsequently, monitored in 892 NBS samples. The samples were randomly assigned to ten 96-well plates and run over 10 weeks on the MS.
The median CV for the relative peptide abundances measured in a pooled NBS sample across plates 2-10 was 10.83% (range 9.50-11.52%; Supplementary Fig. 1b), clearly demonstrating that we could reproducibly measure the targeted peptides in stored NBS samples collected between 1975 and 1985.

Patient and control analysis
After QC, we analysed 68 proteins (128 peptides; Supplementary Table 3) in 60 first-onset drug-naive schizophrenia patients and 77 controls from Cologne. A total of 14 proteins (22 peptides) had an uncorrected P < 0.05 for abundance differences between patients and controls (Table 2a). After P-values were corrected for multiple testing, three Haptoglobin (HPT) peptides and a Plasma Protease C1 Inhibitor (IC1) peptide were significant. The volcano plot suggested that there were more peptidetransitions with higher abundances in schizophrenia patients than would be expected by chance alone (Fig. 1a); which would result in a more symmetric pattern about around the log 2 fold-change of zero. We note that only the apolipoproteins A2, A3, A4, C1 and C3 were downregulated in patients compared to controls (Table 2a; Fig.  1a). In total, 13 of these 14 proteins have previously been associated with schizophrenia (Table 4). Although IC1 has not been linked to schizophrenia before, recent reports have linked IC1 dysregulation to Alzheimer's disease 26,27 . The lasso prediction model consisted of 11 proteins (11 peptides; Supplementary Table 6a), and despite the absence of a clinical rating scale predictor, had a good predictive performance (area under the receiver operating characteristic curve (AUC = 0.80)).
We then investigated whether any of these 14 proteins (Table 2a) also differed in abundance at birth. To this end, we analysed NBS samples obtained from 75 future schizophrenia patients and 644 controls. In total, 12 of the 14 proteins were available for analysis. We found Alpha-2antiplasmin (A2AP), Complement C4-A (CO4A) and Antithrombin-III (ANT3) to be significantly different at birth (one-sided P < 0.05; Table 2b) as well as after the onset of the disorder. We also analysed the remaining 73 proteins (110 peptides) measured in the neonatal study to investigate whether the abundance of any other proteins was significantly different at birth. No other proteins were significantly different between future schizophrenia patients and controls after P-values were corrected for multiple testing (Table 2c). As in the first-onset schizophrenia analysis (Fig. 1a), there were more peptidetransitions with higher abundances in future schizophrenia patients compared to controls (Fig. 1b).

Urbanicity
As birth in an urban environment has been associated with an increased risk for psychiatric disorders, we analysed 85 proteins (125 peptides) measured in 396 controls, 214 from urban and 182 from rural environments, to test whether protein abundance ratios at birth differed by urban-rural environment. Abundancies of 24 proteins (26 peptides) differed significantly after P-values were corrected for multiple testing and had a fold-change >10% ( Fig. 1c; Table 3). We then attempted to validate these 24 proteins using NBS from 79 psychosis patients, 45 from urban and 34 from rural environments. Fifteen of the 24 proteins were validated (16 peptides; one-sided P < 0.05; Table 3). We did not attempt to further validate the associations in the future schizophrenia patients because of the relatively small number of patients from urban and rural environments, 17 and 38 respectively.
The lasso urbanicity prediction model, fitted to the controls, consisted of one covariate and 13 proteins (13 peptides; Supplementary Table 6b) and had an excellent predictive performance (AUC = 0.90; Supplementary Fig.  2). We attempted to validate the fitted model in the psychosis patients and found that the excellent predictive performance was maintained (AUC = 0.89; Supplementary Fig. 2).
The common functional pathways of the differentially expressed peptides listed in Tables 1 and 2 are summarised in Table 4. Table 2 (a) The most associated protein peptides with an uncorrected P < 0.05 for the difference between 60 first-onset schizophrenia patients and 77 controls from the Cologne study. (b) The association for 12 of the 14 proteins reported in (a) and available in the 75 future schizophrenia patients and 644 controls from the neonatal study. (c) The most associated protein peptides with an uncorrected P < 0.10 for the 75   One-sided test conducted when the direction of the fold-change is consistent with that from the Cologne study and the two-sided P < 0.10. b Two-sided P < 0.10, but direction of the fold-change is not consistent. c Same protein peptide but different transition in (b) compared to (a)

Discussion
We have previously demonstrated that we can successfully detect and reproducibly monitor tens of proteins isolated from serum and DBS 4 samples using MRM, and here, we demonstrate that we can also do this in stored NBS samples collected between 1975 and 1985 (median Fig. 1 a A volcano plot summarising the association between protein abundance in 60 first-onset schizophrenia patients and 77 controls from Cologne (Table 2a). Light blue points indicate proteins that were significant after correction for multiple testing using the false discovery rate and had a fold-change >10%. Labelled proteins had uncorrected P < 0.05. b A volcano plot summarising the association between protein abundance in 75 future schizophrenia patients and 644 controls from the neonatal study (Table 2c). Note that none of the protein changes remained significant after correction for multiple testing using the false discovery rate. Labelled proteins had uncorrected P < 0.05. Interestingly, we identified a significantly greater number of increased proteins in the blood of newborn babies who were later diagnosed with schizophrenia. c A volcano plot summarising the association between protein abundance in urban and rural environments at birth. Note that serum albumin (ALBU) was excluded for display purposes only (Table 3). Light blue points indicate proteins that were significant after correction for multiple testing using the false discovery rate and had a fold-change >10%. Ctrl control, Scz schizophrenia CV 10.8%; Supplementary Fig. 2). This has important research implications for countries that routinely store NBS samples and have an associated patient registry (such as Sweden and other Nordic countries) because of the potential to identify prognostic markers for conditions with an onset during early childhood, such as autism, attention deficit hyperactivity disorder and certain types of epilepsy.
We identified serum proteins that differ between firstonset drug-naive patients and controls (Table 2a), 13 of which have previously been associated with schizophrenia (Table 4). We also tested whether any of these proteins were significantly different in NBS samples from newborn babies who later developed schizophrenia and those without a psychiatric diagnosis. The levels of A2AP, CO4A and ANT3 were found to be significantly different at birth (Table 2b). Both A2AP and ANT3 are protease inhibitors, regulating a wide variety of biological processes including coagulation and inflammation and are involved in oxidative stress responses [28][29][30] .
Genetic variants associated with greater expression of CO4A, a split product of C4, have previously been associated with an increased risk of schizophrenia 31 . The classic complement cascade, of which C4 is a member, is  critically involved in synaptic pruning processes 32 . In the immune system, C4 promotes the activation of complement component C3, which in turn modulates inflammation processes in blood. Interestingly, studies in mice indicate that C4 can also mediate synapse elimination during postnatal development [31][32][33] . In support of our current findings, Hakobyan et al. 34 previously reported elevated C4 activity in serum from individuals diagnosed with schizophrenia as compared to controls. Futhermore, the volcano plot (Fig. 1b) of the neonatal study results suggest a greater number of proteins with increased levels in the blood of newborn babies who were later diagnosed with schizophrenia than would be expected by chance alone; suggesting an increase in several inflammationrelated proteins, as previously reported for adult firstonset schizophrenia patients 12 and evident in the Cologne study (Fig. 1a).
Although the pathogenesis of schizophrenia remains unknown, increasing evidence from genomic, transcriptomic and proteomic studies supports a role for coagulation, metabolism and inflammation [35][36][37][38][39] . Other predisposing factors include ethnicity, lifestyle, pre-natal and neonatal infections, maternal malnutrition and complications during birth. A common pathological pathway for these predisposing factors could be their common propensity to induce cellular metabolic stress which increase the possibility of oxidative stress and damage 40 . Our findings could suggest that an increased oxidative stress response may represent an inherent schizophrenia vulnerability.
As birth in an urban environment has been associated with an increased risk for psychiatric disorders 10 , we tested whether protein abundance at birth was associated with urbanicity. We found 24 proteins significantly associated with urbanicity in 397 controls (214 urban and 183 rural; Table 3) and confirmed 15 of the 24 proteins in a validation cohort of 97 psychosis patients (45 urban and 52 rural; Table 3). The majority of these 15 differentially expressed proteins relate to immune, especially the acute phase response, and metabolic function ( Table 4). The protein with the greatest fold-change was albumin, 3.2fold and 4.1-fold in controls and psychosis patients, respectively. This is of interest, as albumin has been shown to be the main plasma protein in newborn babies which is modified by oxidative stress, especially through non-bound plasma metals such as iron 41 . Furthermore, several calcium and copper binding proteins were found to differ between urban and rural birth environments, notably ceruloplasmin (CERU ; Tables 2 and 3). CERU is a major copper binding protein in plasma 42 and has previously been associated with neuropsychiatric diseases including schizophrenia [43][44][45][46][47] . Interestingly, altered levels of CERU has also been linked to Wilson's disease which can present with schizophrenia-like psychosis and can result in misdiagnosis 48,49 . An urban environment is not only associated with more stress and trauma, adverse lifestyles such as drug and alcohol problems among pregnant mothers, but also with air pollution 50 . Previous studies indicate that air pollution in urban locations can affect cognitive and brain development directly. Some air pollutants, such as lead, can cross the blood brain barrier resulting in immune dysregulation and oxidative stress responses at both systemic and brain levels 50 . The predictive performance of the lasso derived model for birth environment was excellent (AUC = 0.90; Supplementary  Fig. 2), and this was maintained when we applied the fitted model to the psychosis patient group (AUC = 0.89; Supplementary Fig. 2). Although the ability to distinguish between birth in urban and rural environments per se may not be clinically relevant, it is of great interest that oxidative stress-related protein changes could be identified in both newborn babies who later develop schizophrenia as well as in babies born in an urban setting. The newborn infant is very susceptible to oxidative damage and a wide variety of consumer products and industrial pollutants have been associated with neurotoxicity in distinct developmental time windows 51 . Antioxidant protection for pregnant mothers and newborn infants in the form of dietary supplementation could be evaluated in future epidemiological studies.
There are several limitations to the present study. First, the number of patients and controls from Cologne limit the statistical power of the analysis. Second, given the time lag between birth and schizophrenia diagnosis, relatively few differences in protein abundance were observed. The investigation of a larger number of individuals who later develop schizophrenia will be required to provide the statistical power to identify robust protein abundance differences at birth. Third, as none of the analysed studies have been genotyped, we cannot test whether genetic variants are also associated with elevated levels of CO4A abundance. Fourth, as population density is based on municipality, and some municipalities are geographically large and include smaller and densely populated areas, the indicator of urbanicity used here is crude.
In conclusion, we have demonstrated that reproducible multiplexed quantitation of proteins in stored NBS samples can be achieved using MRM. We have provided further evidence that A2AP, CO4A and ANT3 may be associated with schizophrenia risk and the early disease process. The CO4A association is of particular interest given that genetic variants in CO4A have previously been associated with schizophrenia risk and offer additional support for its potential role in the aetiology of schizophrenia. In addition, we found and validated proteomic differences associated with birth environment. Future biomarker studies based on stored NBS samples used in conjunction with MRM could have the potential to identify risk factors and/or early disease indicators for conditions with onset during early childhood.