Introduction

Type 1 diabetes is an immune-mediated disease that results from destruction of the insulin-secreting beta cells in the pancreas, necessitating lifelong treatment with exogenous insulin. The disease is characterized by a presymptomatic period of variable length that precedes the clinical manifestation of the disease at the late stage of beta cell destruction1. A better understanding of the events that are associated with the development of type 1 diabetes is needed for the improvement of early prediction and monitoring of the disease and recognition of at risk individuals for clinical interventions and therapy. Currently, the presence of circulating type 1 diabetes-associated autoantibodies is the primary sign of the ongoing disease process and an increased risk for type 1 diabetes2. In particular, subjects who develop two or more autoantibodies carry a high probability of developing the disease, whereas those who remain positive for a single autoantibody have a considerably lower risk3,4. However, even for multiple autoantibody positive subjects, the interval between the first autoantibody detection that is confirmed (i.e. seroconversion) and the clinical manifestation of the disease is unpredictable, varying from months to decades5,6, and the time to diagnosis can only be predicted by following glucose metabolism that starts to deteriorate 1–2 years before the diagnosis7,8.

The discovery of early signs of type 1 diabetes has been challenged by the heterogeneity in the rate of disease development. As a source for such signals serum/plasma has been of particular interest because blood is easily accessible and changes in its composition can be indicative of the disease. Accordingly, mass spectrometry (MS)-based proteomics has been used for the detection of circulating type 1 diabetes-associated proteins9,10,11,12,13,14. Recently, technological improvements in throughput, data acquisition and speed, together with the possibility to analyse serum proteins in an unbiased and hypothesis-free manner, have added to the utility of proteomics for the discovery of disease related patterns15. Previous type 1 diabetes plasma and serum proteomics analyses have ranged from post-diagnosis cross-sectional comparisons9,10,11 to pre-onset sample series obtained from prospective sample collections12,13,14. Studies of diagnosed patients have used pooled samples to determine type 1 diabetes-associated proteins in newly diagnosed patients9,11 and in patients with longer duration of the disease10. To establish changes occurring prior to diagnosis, temporal studies of both heterogeneous12,14 and more homogeneous populations, although with very few samples per individual13, have been reported. Taking into consideration the disease heterogeneity together with the effects of age in children16,17,18,19 and large inter-individual variability in serum/plasma protein levels18,20, these earlier type 1 diabetes serum/plasma proteomics studies have been limited by the number of sampling points or lack of homogeneity in the selection of participants.

In the current study, MS-based proteomics was used in the search of early signs of type 1 diabetes. Focusing on young children who progressed rapidly to type 1 diabetes within months from seroconversion, longitudinal plasma and serum samples collected in the Finnish Type 1 Diabetes Prediction and Prevention (DIPP) study were analysed. Using longitudinal data modelling, plasma protein profiles of multiple autoantibody positive children who developed type 1 diabetes before 4 years of age (progressors) were compared to similar measurements from matched single autoantibody positive (1AAb+) and autoantibody negative (AAb−) children. Statistically significant differences (defined in LonGP analysis in “Methods”) in the plasma protein profiles were observed for several proteins. Out of these, 11 candidates were selected for verification measurements using a targeted proteomics approach and a larger independent yet similar cohort of DIPP children with more frequent sampling points.

Results

Detection of type 1 diabetes-associated proteins by discovery proteomics

To investigate early signs of type 1 diabetes in young children who rapidly progress to the disease after seroconversion, longitudinal sample series were analysed from 10 multiple autoantibody positive DIPP children who were diagnosed with type 1 diabetes under 4 years of age and their matched 1AAb + (n = 10) and AAb- children (n = 10) (Fig. 1a, Supplementary Table S1, Supplementary Fig. S1). The children were matched on the basis of human leukocyte antigen (HLA) risk group, sex, region and date of birth. MS-analysis of these depleted plasma samples resulted in the identification and quantification of 269 protein groups (hereafter referred as proteins), including proteins identified with uniquely associated peptides and proteins that could not be distinguished by the identified peptides and were therefore collapsed to protein groups (Supplementary Table S2), with an average of 254 (± 17) proteins per sample. An additive Gaussian process regression model (LonGP) was used to deconvolute the longitudinal protein patterns into various effects associated with seroconversion, the onset of type 1 diabetes, study groups, age and sex. These effects could separate the study groups, as well as depict the normal variation in plasma proteomes including the effects of age and sex.

Figure 1
figure 1

Schematic presentation of the study design. (a) First, discovery proteomics was used for the identification of disease-associated proteins after which (b) the selected candidates were verified using targeted proteomics. Progressors group = children who progressed to type 1 diabetes. In the discovery proteomics study, all progressors were also positive for ≥ 2 autoantibodies. 1AAb+ group = children positive for a single autoantibody without progression to type 1 diabetes. AAb− group = autoantibody negative children.

Age is one of the strongest factors affecting plasma proteome in early childhood

In these measurements, 115 out of the total 269 quantified proteins showed statistically significant age-associated trends (Fig. 2a, Supplementary Table S2), the majority of which were concordant with previously published proteomics studies on paediatric serum/plasma samples16,17,18,19 (Fig. 2a, Supplementary Table S3). The age-associated protein patterns clustered into two distinct groups with either increasing or decreasing age trend (Fig. 2a–c). Sex-associated differences were observed for three immunoglobulins (Supplementary Table S2), which remained lower in boys in the first 18 months of life. Overall, a wide degree of variability in the plasma proteomes was observed both at the protein level and in longitudinal trends.

Figure 2
figure 2

Proteins with age-associated changes in the discovery proteomics cohort in children up to 36 months of age. (a) Hierarchical clustering of the 115 proteins that followed statistically significant age-associated changes in LonGP analysis. The proteins clustered into two distinct groups with either increasing or decreasing age trends. Child-specific z-scores were calculated for each protein at all time points, each time point was then assigned to a specific age group and the z-scores were averaged across all children within the age group. Proteins reported to have age-associated changes in previous pediatric publications are indicated in the separate color columns (regardless of the reported trend). The proteins color coded with green and orange indicate the most significantly enriched Gene Ontology classifications with increasing and decreasing age trends, respectively. Examples of the proteins with (b) increasing and (c) decreasing age trends are included.

Type 1 diabetes-associated changes in protein levels observed in the discovery data

With the alignment of the longitudinal protein profiles of the progressors, 1AAb+ and AAb− children on the basis of seroconversion dates and age at sampling, 14 proteins with seroconversion-associated changes were subsequently identified by LonGP analysis (Table 1, Supplementary Table S2). The largest changes were observed in prolow-density lipoprotein receptor-related protein 1 (LRP1), immunoglobulin J chain (IGJ) and apolipoprotein A4 (APOA4). Among the seroconversion-associated proteins, apolipoprotein(a) (LPA) and insulin-like growth factor binding protein 2 (IGFBP2) also showed age-associated changes.

Table 1 Type 1 diabetes-associated protein level changes in the discovery cohort using LonGP. LonGP terms refers to changes associated to the time of seroconversion (“sero”), changes associated to the time of type 1 diabetes diagnosis (“t1d”) or protein level differences observed between the three study groups (“group” and “group × age”). To consider the effect as statistically significant, the explained variation of the effect was required to be > 1%. Selected candidates for the SRM analysis are indicated in the last column. aCD5L was selected for SRM analysis on the basis of its known association with IGHM and the similarity between the individual specific protein profiles of CD5L and IGHM.

LonGP analysis identified three proteins that exhibited changes associated with the clinical manifestation of type 1 diabetes (Table 1, Supplementary Table S2). These were revealed through the alignment of the longitudinal protein profiles of the progressors according to the date of diagnosis, and 1AAb+ and AAb− children with the diagnosis age of the matched progressors. Statistically significant differences were observed in the longitudinal profiles of cysteine-rich secretory protein 3 (CRISP3), apolipoprotein C1 (APOC1) and complement C1q subcomponent subunit B (C1QB).

The three study groups were distinguished from each other by protein level differences in immunoglobulin heavy constant mu (IGHM) and alpha-2-macroglobulin (A2M), which were both depletion targets (Table 1, Supplementary Table S2). The lowest IGHM plasma levels were detected in the group of progressors, with higher levels in the 1AAb + group, and the highest levels in the AAb- group. One of the functions of the IGHM pentamer is to act as a carrier for CD5 antigen like (CD5L) in the blood21. Individual specific longitudinal protein patterns of CD5L were strikingly similar with the IGHM patterns and therefore CD5L was selected as an additional target for verification, although its expression level differences remained below the significance threshold.

Targeted proteomics analysis of the selected candidates

Based on the discovery study results, verification measurements were made to further explore the protein levels of 11 candidates (Table 1, Supplementary Fig. S2) in an independent set of longitudinal serum samples from 31 DIPP children who were diagnosed with type 1 diabetes before 5 years of age and matched AAb− children (n = 31) (Fig. 1b, Supplementary Table S4, Supplementary Fig. S3). The selection of the candidates was based on LonGP results, a visual inspection of the data and data completeness. An SRM assay was designed for the analysis of 24 peptides representing the 11 candidates and an internal standard. Analysis of the normalised SRM data was carried out at the peptide level using the LonGP model.

Out of the 11 candidates, the age-associated changes observed for four proteins in the discovery study were verified in the targeted proteomics analysis (Supplementary Table S5, Supplementary Fig. S4). Moreover, the targeted proteomics analysis with a larger cohort of children and more frequent sampling points, revealed age-associated trends for peptides representing APOA4, APOC1, IGJ and serotransferrin (TF), as well as one of the measured peptides representing vitamin D-binding protein (GC) (Supplementary Table S5, Supplementary Fig. S4).

In terms of verification of the disease-associated changes, the levels of both peptides representing APOC1 were consistently decreased after seroconversion (Fig. 3a,b, Supplementary Table S5).

Figure 3
figure 3

Targeted proteomics verification of APOC1 peptides. LonGP moldels of the APOC1 peptides (a) EFGNTLEDK and (b) EWFSETFQK showing normalized longitudinal SRM measurements of each individual on the background and on the top the statistically significant seroconversion-associated changes between progressors (red) and AAb- children (blue) (“sero” term in LonGP). The shaded areas are 95% confidence intervals. Progressors = children who progressed to type 1 diabetes. AAb− children = autoantibody negative children.

Discussion

Better understanding of the early events in the development of type 1 diabetes is needed to help identify increased risk and to monitor the presymptomatic period of the disease that varies conspicuously among individuals. Currently, the detection of autoantibodies provides prognostic markers, which indicate the overall probability of developing the disease. However, heterogeneity in the rate of disease progression has challenged the characterization of the early stages of disease development. To address this challenge and establish protein patterns that would distinguish children who rapidly progress to type 1 diabetes from matched AAb− children, we studied longitudinal plasma and serum proteomes of more homogeneous cohorts of genetically susceptible children, in which the progressors developed the disease before 5 years of age. From these discovery proteomics analysis, 11 proteins showing disease-associated changes were selected for verification using targeted proteomics and a larger independent cohort of children. The measurements revealed how the levels of two peptides representing APOC1 decreased after seroconversion and remained lower in children who developed type 1 diabetes.

In keeping with earlier studies, these data reiterate that age is one of the most dominating factors affecting plasma/serum proteome in early childhood16,17,18,19. Targeted proteomics analysis of the 11 candidates in a larger cohort with more frequent sampling points, verified age-associated protein level changes for four proteins and identified another five age-associated proteins. Of those five proteins, for APOC1, APOA4 and IGJ, the most prominent effect of age was observed during the first year of life. The identification of these additional trends was likely enabled by the more frequent sampling during the first year of life in the targeted proteomics dataset (up to 4 samples per child in the verification cohort before the age of 13 months, compared with up to 2 samples per child in the discovery cohort). Taken together, longitudinal study designs with frequent sampling points and well-matched controls are key considerations to minimize possible sources of error in serum/plasma discovery proteomics studies in young children. Moreover, quantification differences between plasma and serum have been previously noted, particularly with coagulants and platelet proteins22. To avoid such bias in the current study, plasma and serum were not mixed within the study cohorts, and the targets selected for SRM analysis were not among the most significantly altered proteins between plasma and serum22.

In the discovery study, lower APOA4 levels were observed after seroconversion to autoantibody positivity both in the progressors and in the 1AAb+ children compared to AAb− children. Similarly, in earlier proteomics studies lower APOA4 levels have been observed in longitudinal serum profiles of multiple autoantibody positive children who later progressed to type 1 diabetes at the median age of 4.1 years12, and in seroconverted children with a median age of 3.2 years13. In the current verification cohort, however, the disease-associated changes were not observed. Instead, age-associated changes were detected in peptides representing APOA4 (Supplementary Fig. S4). For the discovery measurements, the limited number of samples collected under one year of age, might have caused some bias, as discussed in the previous section. This emphasizes the importance of data completeness in longitudinal studies.

Decreasing levels of IGFBP2 were observed in the discovery cohort in the progressors and in the 1AAb+ children compared to the AAb− children, which is in line with the previous proteomics studies that focused on children during the presymptomatic period of the disease12,14. In contrast, higher levels of IGFPB2 were reported in patients with type 1 diabetes, although with longer disease duration10. Coxsackievirus B1 (CVB1) infections have been associated with the risk of developing type 1 diabetes-associated autoimmunity and in particular with the appearance of insulin autoantibodies (IAA) as the first autoantibody23. In our recent study, persistent CVB1 infection suppressed the secretion of IGFBP2 from human pancreatic ductal-like cell line PANC124. Similarly, IGFBP2 expression was suppressed in a CVB1-infected human lung cancer cell line, and the suppression was reversed after treatment with an antiviral drug25. Moreover, the expression of IGFBP2 is the highest in the pancreas and the liver26, indicating that the changes in IGFBP2 levels in plasma might reflect changes in its expression in these organs. In the current study, the majority of seroconverted children in the discovery cohort had IAA as the first autoantibody, whereas in the targeted proteomics cohort the proportion was for only one third. Distinct autoantibody profiles have been associated with different disease trajectories27, and therefore such differences between the discovery and verification cohorts might influence the ability to verify the disease-associated difference in IGFBP2 levels. Accordingly, it would be important to study IGFBP2 expression specifically in subjects with IAA as their first autoantibody.

Although targeted for immunodepletion in the discovery analyses, plasma IGHM levels separated the three study groups from each other, with levels ranging from the lowest in the group of progressors, intermediate levels in the 1AAb+ group and the highest levels in the AAb− group. However, in the targeted proteomics analysis, the levels of IGHM did not distinguish the progressors from the AAb− children. Nevertheless, reduced total plasma IgM has recently been reported in school aged children with type 1 diabetes28. Certain variants of the IGHM gene locus have been associated with an increased risk to type 1 diabetes29,30. Interestingly, in a plasma proteomic study of twins, both IGHM and its binding partner CD5L were reported among the top proteins, that were mostly affected by heritability20. Further investigation of the risk variants in the IGHM region and their association with serum/plasma IGHM levels remains to be made.

Targeted proteomics analysis of the selected candidates revealed that the levels of two peptides from APOC1 distinguished the children who progressed to type 1 diabetes from AAb− children. A discrepancy was observed between the APOC1 LonGP results in the discovery and the validation dataset, by which the changes were associated with the type 1 diabetes diagnosis and seroconversion, respectively (Supplementary Fig. S2 and Fig. 3a,b). For the discovery dataset, LonGP regression indicated a consecutive peak and trough in APOC1 levels prior to T1D diagnosis (Supplementary Fig. S2), whereas a general decrease in APOC1 after seroconversion was observed in the validation dataset (Fig. 3a,b). In keeping with the coverage and consistency of the validation samples relative to the smaller sample size per individual in the discovery data, we conclude that the data from the progressor group are consistent with a decrease in the APOC1 level after seroconversion. This is described in further detail in the Supplementary Discussion and Supplementary Figs. S5S7. Closer inspection of data from our earlier longitudinal serum proteomics study of children also demonstrated lower serum APOC1 levels in children who progressed to type 1 diabetes compared to AAb− controls12 (see Supplementary Fig. S8). Furthermore, lower plasma and serum abundance of APOC1 was reported in patients with recently diagnosed type 1 diabetes compared to healthy controls although the difference was not statistically significant9. In contrast, increased APOC1 levels have been reported in adult patients affected by type 1 diabetes31. APOC1 plays important roles in lipid homeostasis, and lipid disorders are commonly diagnosed in patients with type 1 diabetes32,33. Several enzymes that participate in lipoprotein metabolism are modulated by APOC1, including cholesterol ester transfer protein (CETP), the activity of which is potentially inhibited by APOC131,34. Increased CETP-activity has been observed in patients with type 1 diabetes and plasma APOC1 concentrations correlated inversely with CETP-activity in normoglycemic-normolipidemic controls but not in individuals with type 1 diabetes, indicating that APOC1 activity could be compromised in this population31. The latter study also suggests that the glycation of APOC1 as a result of hyperglycemia in diabetes may impair its inhibitory function. Differences in plasma APOC1 levels have also been previously reported in the context of type 2 diabetes and in relation to nutrition35,36. Recently, lower plasma APOC1 levels were reported in adults after caloric restriction and increasing levels in response to glucose consumption36. Interestingly, in the latter study, decreased plasma APOC1 was also observed to be associated with undiagnosed type 2 diabetes and prediabetes36. To conclude, lower APOC1 levels are detected during the presymptomatic period of type 1 diabetes and in recently diagnosed patients whereas higher APOC1 levels are observed in adult patients. The reasons behind these contrary observations, as well as the possible role of structurally modified or otherwise dysfunctional APOC1, remain to be addressed.

In summary, our data further explored the strength of moderately abundant serum proteins for the characterization of the emerging risk of type diabetes development among HLA-conferred and autoantibody positive children. These analyses reiterated that age is among the most dominating factors affecting plasma/serum proteome in early childhood and should therefore be carefully considered in study designs. Using LonGP modelling, we were able to take into account the nonlinear effect of age while extracting disease-associated changes from the data. From the initial discovery proteomics measurements, several proteins with prior literature associations were noted. The magnitude of changes of the discussed proteins (APOA4, IGFBP2, IGHM, and APOC1) were at a similar level in the discovery data. However, despite replicating the age-associated changes in these well-matched cohorts, the disease-associated signals were challenging to reproduce. This could be partly explained by the subtle levels of disease-associated changes, the relatively low number of participants, dissimilarities in their sample series and autoantibody profiles, in addition to differences in sample storage, preparation and analysis. Nevertheless, we were able to verify that decreased serum levels of two peptides representing APOC1, were associated with rapid progression to type 1 diabetes. In future follow-up studies, APOC1 measurements could be combined with other relevant targets, which might assist with the monitoring of disease progression and stratification for early interventions, especially in young at risk individuals. The causal connection between low APOC1 levels and type 1 diabetes, and the mechanisms involved, remain to be explored.

Methods

Study participants

Blood plasma or serum samples, collected in the DIPP study from children (n = 92) carrying HLA-conferred genetic risk for type 1 diabetes37, were used for these analyses. The selected participants regularly attended their DIPP clinical study centre (1996–2015), in the cities of Tampere or Turku. At each visit a non-fasting plasma or serum sample was collected. The samples were stored at − 80 °C until analysis. The parents gave their written informed consent for participation. The study followed the guidelines of the Declaration of Helsinki and the Regional Ethics Committee of the Joint Municipal Authority of Northern Ostrobothnia Hospital District, Oulu, Finland, approved the study protocol.

Discovery proteomics

A subset of ten children who had developed multiple (≥ 2) autoantibodies and who all had later developed type 1 diabetes before 4 years of age were defined as progressors. Each progressor was matched, on the basis of HLA risk group, sex, region and date of birth (± 1.4 years), with one child testing positive for a single autoantibody (1AAb+) and another autoantibody negative child (AAb−). For each child, a longitudinal series, with up to four follow-up plasma samples was analysed (Supplementary Table S1, Supplementary Fig. S1). The samples were collected between 3 and 36 months of age, 101 samples in total. Neither the 1AAb+ nor AAb− children developed diabetes, and the AAb- remained autoantibody negative during the follow-up period.

The sample series from each matched triplet (formed by progressor, 1AAb+ and AAb− child) were prepared and analysed batch-wise in a blinded fashion. The sample preparation, including immunodepletion of the 12 most abundant plasma proteins (Pierce Top 12 Abundant Protein Depletion Spin Columns, Thermo Scientific, USA), was carried out as described in Lietzén et al.18 with modified double desalting. Retention time standard peptides (iRT, Biognosys, Switzerland) were spiked into each sample for quality control of the liquid chromatography tandem mass spectrometry (LC–MS/MS) analyses.

Each sample was analysed in triplicate using Q Exactive™ Hybrid Quadrupole-Orbitrap™ mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) that was interfaced with an EASY-nLC 1000 liquid chromatograph (Thermo Fisher Scientific) with a nano-electrospray ion source (Thermo Fisher Scientific). See Supplementary Methods online for further details of the LC–MS/MS method.

Discovery proteomics data processing

The LC–MS/MS data was processed using MaxQuant software version 1.5.5.138 with the built-in Andromeda search engine39. A combined SwissProt human and TrEMBL enterovirus protein sequence database (April 2014, 63,470 entries) with added iRT peptide sequences and common contaminants was used. Label-free quantification (LFQ) was selected with trypsin digestion (maximum two missed cleavages), methionine oxidation as a variable modification and cysteine carbamidomethylation as a fixed modification. A false discovery rate of 1% for protein and peptide levels was applied, determined by searching the reversed database. Otherwise, MaxQuant default settings were used with selection of “match between runs”.

The MaxQuant output file was pre-processed with Perseus software40 using normalised LFQ intensities. Contaminants and proteins detected with less than two unique peptides, or present in less than 50% of the samples were filtered out. The median intensities were calculated from the technical replicates and log2 transformed for statistical analyses. See Supplementary Methods online for further details.

Targeted proteomics verification

For the verification of observations from the discovery analyses, a separate series of 524 serum samples from 62 DIPP children were analysed. These were collected between 2 and 60 months of age from children diagnosed with type 1 diabetes before 5 years of age (n = 31) and their matched AAb− children (n = 31) (difference in dates of birth of the matched pairs ± 3.4 months), with a median of nine samples per child (Supplementary Table S4, Supplementary Fig. S3). The AAb− children remained autoantibody negative and did not develop diabetes during the follow-up period.

The verification samples were prepared without immunodepletion of the abundant proteins, as described elsewhere41. Heavy isotope-labeled synthetic peptide analogues (PEPotec, Thermo Fisher Scientific, USA) for the protein targets were spiked (~ 10 fmol/µl) into the digests together with retention time standards (MSRT1, Sigma-Aldrich, USA). The targets and additional details are provided in Supplementary Table S6 and Supplementary Methods online.

A TSQ Vantage Triple Quadrupole Mass Spectrometer (Thermo Scientific, Bremen, Germany) coupled with an Easy-nLC 1000 liquid chromatograph (Thermo Scientific) was used to perform the selected reaction monitoring (SRM). The analyses were conducted in three batches, using the same column configuration as described for the discovery measurements and slightly varying chromatographic conditions. Aliquots of a reference sample were periodically analysed to monitor the performance and reproducibility of the assay (see Supplementary Table S7). Skyline software42 was used to develop the data acquisition method and inspect the data. The peptide peak areas were normalised by dividing the total peak area of each peptide with the sum of the peak areas of endogenous alpha-1B-glycoprotein (A1BG) peptides (SGLSTGWTQLSK and ATWSGAVLAGR) in the same sample. A1BG was selected as a global standard based on our previous studies, in which it was among the most stable proteins in serum and plasma in children12,18. The exported peptide level data was then analysed with LonGP.

LonGP analysis

LonGP was used to model the longitudinal changes in both datasets43. Selected continuous covariates included age (days from birth to sampling date), sero (days from seroconversion date to sampling date) and t1d (days from type 1 diabetes diagnosis date to sampling date), which were converted into the unit of months. Discrete covariates included sex (male = 1, female = 0), group (progressors = 1, 1AAb+ = 2, AAb− = 3), pair (pair id) and id (individual id). Interaction flags for sero and t1d were set to false. Non-stationary kernels were used for “sero” and “t1d”, where we used the kernel parameters (a = 0.5, b = 0, c = 40) such that the main variations occur between − 12 month to + 12 month. For the data from targeted analysis, an additional discrete covariate “batch” was used to represent different MS batches for the same peptide. Default LonGP parameters were used. The preprocessed data files and parameter specification files for both discovery and validation experiments are provided in a supplementary file (see Supplementary Data online). To be considered statistically significant, the effect was required to be included in the final cross-validated model and the explained variation of the effect to be > 1%. Although the variation threshold of 1% corresponds to a small effect size, the size of some effects was notably larger than 1%, and for consistency the explained variation for each protein is reported.