Introduction

Multisystem inflammatory syndrome in children (MIS-C) is a severe, life-threatening, immunological condition that occurs weeks after infection with SARS-CoV-21,2. Although MIS-C is established as an immunological dysregulation leading to cytokine storm3, the underlying pathogenesis remains unresolved4. The incidence of MIS-C was approximately one in 3000 children infected with SARS-CoV-2 during the pre-Omicron waves, while the incidence decreased substantially as the Omicron variants became dominating5,6,7. This decrease has been attributed to a reduced ability of Omicron to trigger hyperinflammation, as the Omicron variant is phylogenetically distinct from the pre-Omicron variants with enhanced immune escape8. Further, vaccination has been shown to decrease the incidence of MIS-C9,10. Still, sporadic MIS-C cases occur, and resurgence of MIS-C is possible with waning vaccine-induced immunity and novel variants of SARS-CoV-2.

Children with MIS-C present with fever and multiorgan involvement, including mucocutaneous, gastrointestinal, and cardiovascular involvement, often accompanied by circulatory shock1,2,6. The condition can be misinterpreted as sepsis, abdominal emergencies, and Kawasaki disease11,12. Thus, the lack of a diagnostic test for MIS-C can lead to delayed lifesaving immunomodulating therapy and prolonged unnecessary courses of broad-spectrum antibiotics.

To address this diagnostic challenge host-specific innovative omics methodologies have been suggested4. Proteomics can provide a comprehensive unbiased approach that investigates hundreds of plasma proteins simultaneously13. Proteomics has the potential to identify plasma proteins useful as diagnostic markers and explore disease mechanisms as circulating plasma proteins are markers of whole-body metabolic processes14. Due to recent technological improvements in proteomics pipelines, a comprehensive system-wide approach has become feasible15. However, a limitation of utilizing novel omics approaches is the interpretation of large amounts of complex data and the translation of this information into clinical medicine. This limitation may be surpassed by employing artificial intelligence (AI)-based techniques, which offer a powerful avenue for analyzing comprehensive unbiased proteomic data.

We employed AI-assisted proteomics to develop a unique diagnostic signature for children with MIS-C and to gain insight into the underlying disease mechanisms.

Results

We enrolled 94 children, including 27 cases with MIS-C and 67 febrile controls consisting of 28 children with bacterial infection, 22 with viral infection, 7 with Kawasaki disease, and 10 with severe sepsis (Table 1, Fig. 1A). Children with MIS-C all had PCR-confirmed SARS-CoV-2 infection including 15 (56%) with the Alpha variant, 11 (41%) with the Delta variant, and one (4%) with the Omicron variant. None had comorbidities. Twenty-one of 27 (78%) presented with shock. Twenty-four (89%) were admitted to intensive care unit or semi-intensive care unit, and 9 (33%) received inotropes. Their blood samples for proteomics were collected before or within 24 h of treatment initiation. In nine of 27 (33%) patients, therapy with intravenous immunoglobulin was initiated before the blood sample was collected. Seven patients with MIS-C had additional samples for proteomics collected on days 2–4 after treatment initiation, and nine had additional samples collected when fully recovered a median of 39 days (19–76) following the diagnosis of MIS-C. All bacterial febrile controls had urinary tract infection confirmed by positive urine dipstick and urine culture with Escherichia coli. Viral controls had fever, proven viral detection in nasopharyngeal specimen, and C-reactive protein below 25 mg/L (Table 1). There were no deaths in any of the groups.

Table 1 Characteristics of patients with MIS-C and febrile controls
Fig. 1: Patient overview and proteomics workflow.
figure 1

a Overview of the number of samples included in our study distributed on different diagnostic groups. b Laboratory and analytical workflow involving sample preparation of plasma using semi-automated BRAVO robot as well as liquid chromatography using Evosep One system and tandem mass spectrometry by Exploris 480 Thermo Fischer Scientific system. Data analysis was performed by Spectronaut processing of raw mass-spectrometry data followed by bioinformatic analysis with the Clinical Knowledge Graph in a Jupyter Python environment. Created with Adobe Illustrator software and Biorender.com. c Dynamic range of the 450 proteins measured by liquid chromatography. Proteins are ranked by abundance and colored by missingness across all samples. Named proteins illustrate the concentration range measured; from highest abundant (albumin, ALB) to low abundant signaling molecules (cathelicidin antimicrobial peptide, CAMP; adiponectin, ADIPOQ). d Histogram of the number of proteins measured in each sample. Three samples were excluded from further analysis due to low protein numbers (marked in pink).

Plasma proteins in children with MIS-C compared to controls

The data set used to identify differences in protein levels included patients with MIS-C, febrile controls with viral and bacterial infections, Kawasaki disease, and severe sepsis (Fig. 1a). We identified 450 plasma proteins across all plasma samples in the initial proteomic analysis (Fig. 1b, c). Three samples were excluded due to low numbers of measurable proteins, all from children with sepsis (Fig. 1d). After data quality assessment, 245 proteins were selected for further analysis. Further, 66 proteins related to therapy with intravenous immunoglobulin were excluded resulting in a total of 179 proteins in the final dataset (source data file: Supplementary Data 1). Overall, proteomic data separated disease categories as visualized by the uniform manifold approximation and projection (UMAP) plot (Fig. 2a) and the unsupervised heatmap (Fig. 2b), which both revealed a high correlation in children with MIS-C. A total of 105 proteins were significantly different in children with MIS-C compared to febrile controls, Kawasaki disease, and severe sepsis (Fig. 2c; Table 2). Figure 2d displays the overlap between significant proteins findings in MIS-C patients compared to the control groups combined or to each of the control groups separately (Supplementary Data 2).

Fig. 2: Differences in protein levels between MIS-C and controls.
figure 2

a Uniform manifold approximation and projection (UMAP) analysis of proteomic data from each patient. The diagnostic groups are shown by different colors. b An unsupervised heatmap of Pearson correlations between each pair of proteome samples. The color scale represents Pearson correlation values with light colors indicating values close to 1 and dark values close to 0.75. Hierarchical clustering of the samples is shown on the top and left side of the heatmap, where the diagnostic group of each sample is represented by different colors. c A volcano plot showing proteomic differences between MIS-C and febrile controls. Each point represents a protein. The x-axis represents log2 fold change, and the y-axis represents the log10 (p-value). The dashed line represents the statistical significance threshold after adjusting for multiple testing. Proteins that were increased in MIS-C patients compared to febrile controls are shown in red, such as C-reactive protein and Fc Gamma Receptor IIIa (FCGR3A). Proteins that were decreased in MIS-C are shown in blue, including platelet factor 4 (PF4) and pro-platelet basic protein (PPBP). Proteins that did not differ significantly in MIS-C are shown in gray. Proteins marked with a name were both significant and had a high log2 fold change (>1 or <−1). d A Venn diagram illustrating the overlap of proteins that were statistically different when MIS-C protein levels were compared to those of each febrile condition or febrile controls combined. For example, 59 significant proteins overlapped between all comparisons except severe sepsis. No proteins differed between MIS-C and Kawasaki disease after adjusting for multiple testing. e Protein-protein network based on the proteins with significantly different levels in MIS-C compared to febrile controls. The network is based on the Spearman correlation of protein levels followed by Louvain clustering. As an example, cluster 7 illustrates three platelet-related proteins, platelet factor 4 (PF4), pro-platelet basic protein (PPBP), and thrombospondin-1 (THBS1), with significantly lower protein levels in MIS-C patients compared to febrile controls. f Gene ontology biological processes enrichment analysis of proteome data identified 15 biological pathways affected in MIS-C patients compared to febrile controls. The red color illustrates an upregulated pathway and the blue a downregulated pathway. The x-axis shows the order of statistical strength by adjusted p-value to the log10.

Table 2 Plasma protein alterations in children with MIS-C

Most proteins with significantly different levels between children with MIS-C and controls could be categorized into four groups: (1) Immunological response, (2) coagulation, (3) cell death and cell growth, and (4) lipid profile (Table 2). Immunological response: Plasma proteins involved in the immunological response included elevated lymphocyte cytosolic protein 1 and Fc Gamma Receptor IIIa, both involved in adaptive immune response, as well as several elevated acute phase reactants including alpha-1-antichymotrypsin. Further, proteins playing a role in the innate immune response were significantly different in children with MIS-C, such as decreased levels of peptidoglycan recognition protein 2, and increased levels of several complement factors (Table 2). Coagulation: Numerous coagulation-related proteins differed significantly in children with MIS-C with reduced coagulation factors XII and XIII, increased procoagulants fibrinogen and Von Willebrand Factor, and reduced anticoagulants, among others antithrombin, protein C, and platelet factor 4. Children with MIS-C also had different levels of proteins related to the recruitment and activation of platelets. Cell death and growth: The levels of actin B, extracellular matrix protein 1, fibronectin, and other proteins implicated in cell and tissue remodeling were affected in MIS-C. Lipid profile: Finally, the lipid profile in children with MIS-C was different from febrile controls with reduced apolipoproteins A, C, and H, and elevated apolipoproteins E and F.

Unsupervised protein-protein co-expression network analyses, guided by machine learning, revealed eight clusters of proteins (Fig. 2e). These co-expression clusters elucidated interactions between apolipoproteins and proteins involved in the immune response (clusters 0, 1, and 2), proteins participating in the complement cascade (cluster 3), proteins involved in coagulation (cluster 4), proteins playing a role in oxygen transport (cluster 6), and proteins impacting platelet function (cluster 7). Cluster 5 was composed of heterogeneous proteins related to coagulation, inflammation, and liver function. The proteins that differed significantly in children with MIS-C were explored by unbiased pathway enrichment analyses and revealed 15 biological pathways, also primarily involving (1) immunological responses, (2) coagulation, (3) cell death and cell growth, and (4) platelet activation (Fig. 2f).

Diagnostic classifier for MIS-C using machine learning

The data set used to develop a diagnostic signature for MIS-C included children with MIS-C and febrile controls with viral and bacterial infections (Fig. 3a). All 12 machine-learning algorithms, except one, had Matthews Correlation Coefficient (MCC) and area under the curve (AUC) between 0.77-1 (Fig. 3b). We continued the subsequent analysis with the support vector classification (SVC) model that had an AUC and MCC of 100% and 1, respectively (Fig. 3b). Recursive feature elimination revealed that only four proteins were necessary to obtain a high predictive performance (Fig. 3c). The four selected proteins were lymphocyte cytosolic protein 1, Fc Gamma Receptor IIIa, alpha-1-antichymotrypsin and butyrylcholinesterase (Table 2; Fig. 3e, Supplementary Data 3).

Fig. 3: Development of a diagnostic signature using artificial intelligence.
figure 3

a Overview of the strategy used for machine learning analysis. Model development was carried out on samples from patients with MIS-C (acute stage), virus, and bacteria. Model development was split in 5 cross-validation folds with 80% in the training set and 20% in the test set. Cross-validation was used to assess which combination of machine learning algorithms that resulted in the most optimal model. After model training and selection, model performance was reported on the test set from model development, on internally collected validation cohorts, including Kawasaki disease, severe sepsis, MIS-C 2–4 days after treatment initiation (‘during admission’, i.e., still ‘MIS-C positive’) and after full recovery (MIS-C ‘negative’). Lastly, the strategy of machine learning-based diagnostic support was applied to an external MIS-C cohort. b Output of the cross-validation search of algorithms, hyperparameters, and the number of proteins is displayed as the area under the receiver operating characteristic curve (AUROC) and Matthews Correlation Coefficient (MCC) from the predictions on the test sets. The number of proteins used in each model is shown to the right. The error bars represent the standard deviation across the 5-fold cross-validation runs. c Recursive feature elimination for the best-performing algorithm, the support vector classification (SVC) model, is illustrated. The x-axis represents the number of proteins, and the y-axis the weighted F1-score. The SVC model achieved high performance across the entire range of protein numbers, with the highest performance obtained using four proteins. d Boxplots of the probability of each sample being classified as MIS-C was computed using the support vector classification models from each cross-validation. The probabilities were plotted for the test set and for each sample in the different diagnostic groups. The inner quartile range (IQR) is represented by a box, the median as a line in the box and 1.5xIQR as whiskers. e Boxplots of the protein levels across the different diagnostic groups (measured in label-free quantification) for the four proteins included in the diagnostic signature. Fc Gamma Receptor IIIa (FCGR3A), lymphocyte cytosolic protein 1 (LCP1), alpha-1-antichymotrypsin (SERPINA3) and butyrylcholinesterase (BCHE). The inner quartile range (IQR) is represented by a box, the median as a line in the box and 1.5xIQR as whiskers. f Prediction performance of the 4-protein diagnostic signature is depicted with the area under the receiver operating characteristic curve (AUC) and Matthews Correlation Coefficient (MCC). Standard deviation from the 5 cross-validation folds is shown as semi-transparent error borders on the curves. The shadings represent the standard deviation in tpr/fpr values across the 5-fold cross-validation. The text in the plot “AUC = 0.87 (std = 0.1)” refers to the variation in AUC across the 5-fold cross-validation runs.

Validation of the 4-protein diagnostic signature

When the 4-protein diagnostic signature was applied to the test set, an AUC of 100% was achieved (Fig. 3f). The median prediction probability was 83.0% (IQR 11.8) for the patients with MIS-C at the acute stage, 9.1% (IQR 6.0) for viral infections, and 12.2% (IQR 7.1) for bacterial infections (Fig. 3d, Supplementary Data 4). When applying the 4-protein diagnostic signature on the internal validation cohorts, including MIS-C patients on days 2–4 of treatment initiation, fully recovered MIS-C patients, Kawasaki disease and severe sepsis, the combined AUC was 93.4% (95% CI 92.1–94.7). Children with MIS-C, who had received immunomodulating therapy for 2–4 days, had a median prediction probability of 87.7% (IQR 9.3), while children who had recovered fully after MIS-C had a median MIS-C prediction probability of 7.3% (IQR 1.5) (Fig. 3d, Supplementary Data 4). Children with severe sepsis and Kawasaki disease had a median prediction probability of 20.7% (IQR 13.1) and 55.8% (IQR 43.7), respectively.

To evaluate the generalizability of AI-based proteomic prediction of MIS-C, an external U.S. validation cohort of 25 children with MIS-C and 34 healthy controls was investigated (Fig. 3a). As the proteins used in our 4-protein signature were not measured in the external cohort, the 28 plasma proteins measured in both studies were used to assess the validity of our AI-based approach for MIS-C diagnostics. The new support vector classification model including the 28 proteins had a high prediction performance with an AUC of 86.7% (95% CI 79.7-93.7) (Fig. 3f).

Discussion

In this study, we employed AI to complex proteomics data to develop a diagnostic signature for children with MIS-C and explore the underlying biological mechanisms of the disease. The performance of multiple machine learning algorithms revealed that MIS-C could be discriminated from children with bacterial and viral infections as we identified a highly accurate diagnostic signature with an AUC of 100% based on only four proteins. The 4-protein diagnostic signature holds promising avenues for developing a rapid, and low-cost, diagnostic bedside test with important implications for early recognition and targeted treatment. Further, we found proteomics to be a powerful and unbiased tool for assessing disease pathogenesis in children with MIS-C. AI could extract intricate protein patterns, beyond the reach of traditional methods, which indicated MIS-C as a condition with immune dysregulation closely linked to changes in apolipoproteins, global hypercoagulability, and high cell and tissue remodeling.

The diagnosis of MIS-C is based on clinical manifestations and elevated acute phase reactants, such as C-reactive protein, which are often indistinguishable from a wide range of other diseases11,12. The lack of a diagnostic test has resulted in delays with targeted immunomodulating treatment and unnecessary courses of broad-spectrum antibiotics. The diagnostic signature found in this study was based on only four plasma proteins, of which three were involved in the immune response. Applying the machine learning technique, recursive feature selection, revealed that these four plasma proteins, among a total of 179, were sufficient to differentiate MIS-C from other febrile conditions. Few patients with septic shock and Kawasaki disease were overlapping, reflecting the possible shared pathophysiological features between these conditions. The 4-protein diagnostic signature had high accuracy in children with MIS-C 2–4 days following initiation of immunomodulating therapy. This demonstrates its robustness to delayed sample collection and partial clinical recovery. While we could not validate our 4-protein signature on the external U.S. validation cohort (as those four proteins were not part of their protein panel), we successfully demonstrated the validity of our AI-based approach for MIS-C diagnostics, as the new support vector classification model, using different proteins, also achieved a high diagnostic accuracy of MIS-C. During our algorithm selection, we also found that several models, including different proteins, had high AUCs. This supports that several proteins may be used for an MIS-C signature and emphasizes proteomics as a very powerful tool for MIS-C diagnostics.

Previously, a 3-protein signature has been shown to distinguish MIS-C patients from other disease controls with an AUC of 86% in a study investigating seven host proteins16. Additionally a diagnostic signature based on a 5-gene blood RNA expression signature for MIS-C has been reported17. It was validated with an RT-qPCR assay and revealed a high diagnostic accuracy. Collectively, these results suggest that MIS-C has distinct host responses detectable by both transcriptomics and proteomics, which may be suitable for innovative diagnostic tests. However, a diagnostic test based on few plasma proteins may reveal a faster result and may be cheaper to implement in clinical medicine as a routine laboratory analysis.

The pathogenesis of MIS-C was explored using the extensive proteome dataset obtained by unbiased mass spectrometry, which revealed global changes in mechanisms and pathways involved in the pathogenesis of MIS-C. Immune dysregulation was indicated by the increased levels of proteins in multiple pathways leading to hyperinflammation, including both the innate and adaptive immune response, as described in previous studies18,19,20. Consistently, three of the four proteins in the identified 4-protein signature, were involved in immune dysregulation: Alpha-1-antichymotrypsin is involved in complement activation, while Fc Gamma Receptor IIIa is involved in antibody-dependent cellular toxicity. Further, lymphocyte cytosolic protein 1, which was elevated, indicated activation of T-cells, supporting the hypothesis of a super-antigen-mediated polyclonal T-cell activation21. The last protein in the 4-protein signature was butyrylcholinesterase, which hydrolyzes choline esters and was reduced. The significance of this protein in the pathogenesis of MIS-C is unknown.

We found profound changes in lipid metabolism, consistent with previous studies22,23 Lipid mediators have been described to be involved in vasodilation and increased vascular permeability24, a frequent and severe clinical manifestation of MIS-C6. Further, the unsupervised protein-protein co-expression network analyses conducted by machine learning indicated an intricate connection between apolipoproteins and immune dysregulation, which may be due to the function of lipids as proinflammatory mediators, as described in other conditions25. It suggests MIS-C as an immunometabolic condition26. Alterations in numerous coagulation factors with elevated procoagulants, reduced anticoagulants, and impaired fibrinolysis point towards a global hypercoagulability and may explain the risk of thrombosis in children with MIS-C27. Overall, our results align with those of a previous study, which also reported upregulated Fc Gamma Receptor IIIa, immune and complement activation and reduced lipid transport and clearance mechanisms28. Furthermore, our results were consistent with studies exploring single coagulation factors or coagulation profiles29,30. The significant alterations in proteins related to cell growth, cell death, and/or cell remodeling in children with MIS-C are unprecedented but align with the profound dysregulation of cellular and immunological processes and the multiorgan nature of the disease3,6. These results serve as a proof-of-concept of AI-assisted proteomics in exploring disease mechanisms of new diseases, or complex diseases not yet fully comprehended.

This study has several limitations. First, while the quality and size of the dataset were sufficient to develop a diagnostic signature, the size of our validation cohort was too small to identify significant protein changes between children with MIS-C and Kawasaki disease. Second, there is a possibility of spurious findings, unrelated to MIS-C, in our large proteome dataset, as we were unable to explain the function of all significantly altered proteins. As AI techniques model complex data patterns, understanding the precise features influencing the predictions can be challenging. Nevertheless, the disease mechanisms we discovered are supported by clinical correlations and the existing knowledge of the disease pathogenesis and pathophysiology. Third, we were unable to validate the 4-protein signature from external validation cohorts, as studies including those four proteins have not been published. Further, the decline in the incidence of MIS-C during the Omicron era refrained us from validating the accuracy of the 4-protein signature in a prospective MIS-C cohort. Finally, the MIS-C decline could challenge the relevance of our findings. However, as a resurgence of MIS-C remains possible with new variants and waning vaccine-induced immunity, we find that continuous investigation of this severe and potentially life-threatening disease is important.

In conclusion, we harnessed the power of AI to explore complex proteomic data from children with MIS-C, a condition that remains elusive. The study demonstrated the potential of proteomics to impact pediatric disease trajectories through early diagnosis as we identified a 4-protein diagnostic signature that was accurate in distinguishing MIS-C from children with phenotypically similar diseases. We provided a global characterization of proteomic changes in the pathogenesis of MIS-C, emphasizing AI-assisted proteomics as a powerful and unbiased tool for assessing disease pathogenesis and potentially paving the way for more efficient future interventions.

Methods

This nationwide population-based study prospectively included patients aged 0–17 years with MIS-C from all 18 Danish pediatric departments from April 1, 2020 to March 15, 20226,10. Patients met the US Centers for Disease Control and Prevention MIS-C case definition. Febrile controls consisted of children with viral and bacterial infections. Patients with Kawasaki, and severe sepsis were enrolled from January 1, 2019, to December 31, 2019, before the COVID-19 pandemic. Children with Kawasaki disease met American Heart Association criteria for complete or incomplete disease. Two pediatric infectious disease specialists adjudicated the final diagnosis for all patients when testing results and clinical outcomes were known.

Sample size calculations were not performed due to the exploratory nature of the study.

Patients were recruited under approval by the research ethics committees of the Ethics Committee of Capital Region of Denmark (H-20028631) and the Danish Data Protection Agency (P-2019-29). Informed oral and written parental consent was provided before participation. All ethical regulations relevant to human research participants were followed. The study was registered at ClinicalTrials.gov, NCT05334134.

Liquid chromatography mass spectrometry data analysis

Venous blood samples were collected into EDTA-containing tubes, spun at 3000 g for 10 min at 4 °C within 2 h, and stored at -80 °C. Sample preparation for proteomic analysis was performed as previously published31. Samples were analyzed using an Exploris 480 Thermo Fischer Scientific system by Evosep One (Evosep Biosystem, Denmark) and proteomic data were acquired in a data-independent acquisition mode. Proteins related to therapy with intravenous immunoglobulin, including heavy-chains, light-chains, j-chains and variable regions, were excluded32. The mass spectrometry raw files were processed with Spectronaut version 17 (Biognosys, Zurich, Switzerland). A previously generated plasma spectral library containing 2137 protein groups and 16,254 peptides was used.

Statistics and reproducibility

Data was processed using the Clinical Knowledge Graph and Jupyter Notebook33. In short, protein intensities were log-transformed, a stringent filter for missingness was applied (>70% completeness across all samples and at least 50% within each group), and missing values were imputed based on a downshifted normal distribution. Sample quality was assessed as described previously34. Batch correction was performed with Clinical Knowledge Graph (pyCombat) to ensure that the plate a sample was run on would not affect downstream results. Unpaired t-tests were used to identify proteins with significantly different levels between the cohorts. Multiple hypothesis correction was applied using the Benjamini-Hochberg method, with adjusted P-values < 0.05 considered statistically significant.

UMAP was performed to illustrate the underlying structure of the data. The UMAP plot reduces the multidimensional proteomic data into two dimensions, thereby allowing the separation of patient group by visual interpretation. Hierarchical clustering using Pearson correlation distance was used to compute a sample correlation heatmap. Both the UMAP and heatmap of Pearson correlations were unsupervised, meaning that the grouping of individuals was based on the proteomic data alone and not informed by disease grouping. Volcano plots were used to visualize plasma proteins that differed significantly between MIS-C and controls. Gene Ontology Biological Process enrichment analysis was performed to identify the enrichment of biological processes based on a set of significantly different proteins. Protein-protein co-expression clusters were identified with Clinical Knowledge Graph by Spearman correlation analysis, followed by Louvain network clustering. The clusters were visualized using Cytoscape35.

Supervised machine learning analysis was performed to investigate the feasibility of a diagnostic signature for MIS-C. Our dataset including children with MIS-C and febrile controls with viral and bacterial infections was divided into a training set (80%) for model development and a test set (20%) for model validation in a 5-fold cross-validation scheme. Each set had the same ratio of MIS-C and febrile control samples. Z-scored data (mean 0 and standard deviation 1 within each sample) was used as input. MIS-C diagnosis (yes/no) was used as the classification target. Twelve different machine-learning algorithms were investigated. A final model was based on a hyperparameter search (random grid search specific to each algorithm) and recursive feature elimination (only for models with a feature importance attribute) combined with 5-fold cross-validation to identify the balance point between a minimal combination of proteins and a high predictive performance. Model selection was based on the highest prediction performance using the area under the receiver operating characteristic curve, AUC and MCC. Prediction probabilities were calibrated for the final model. Machine learning analyses and model calibration were performed using Python (3.7.9) in combination with the scikit-learn (sklearn) library36. Analyses were performed using the scikit-learn (sklearn) Python library.

The performance of the developed model was tested on both the test set and internal validation cohorts including Kawasaki disease, severe sepsis, and children with MIS-C, both during partial and full recovery. The model was also tested on an external validation cohort from the Proteomics Identifications Database (PRIDE) consisting of 22 children with MIS-C and 25 healthy controls (PXD029375)37. Data were z-scored (sample-wise) and only proteins found in both cohorts were used as features in the model. We used the same algorithm type and hyperparameters as the previous model, but the model was refitted with the new input consisting of the protein overlap between the two cohorts. The model was trained on our training set and applied to the test set and the external validation cohort.

The performance metrics used included the MCC, AUC, ROC curves, the distribution of prediction probabilities, and confusion matrices.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.