Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach

Kueffner, Robert; Zach, Neta; Bronfeld, Maya; Norel, Raquel; Atassi, Nazem; Balagurusamy, Venkat; Di Camillo, Barbara; Chio, Adriano; Cudkowicz, Merit; Dillenberger, Donna; Garcia-Garcia, Javier; Hardiman, Orla; Hoff, Bruce; Knight, Joshua; Leitner, Melanie L.; Li, Guang; Mangravite, Lara; Norman, Thea; Wang, Liuxia; Xiao, Jinfeng; Fang, Wen-Chieh; Peng, Jian; Yang, Chen; Chang, Huan-Jui; Stolovitzky, Gustavo

doi:10.1038/s41598-018-36873-4

Download PDF

Article
Open access
Published: 24 January 2019

Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach

Robert Kueffner¹,
Neta Zach²,
Maya Bronfeld³,
Raquel Norel ORCID: orcid.org/0000-0001-7737-4172⁴,
Nazem Atassi⁵,
Venkat Balagurusamy ORCID: orcid.org/0000-0002-1994-2634⁴,
Barbara Di Camillo ORCID: orcid.org/0000-0001-8415-4688⁶,
Adriano Chio ORCID: orcid.org/0000-0001-9579-5341⁷,
Merit Cudkowicz⁵,
Donna Dillenberger⁴,
Javier Garcia-Garcia⁸,
Orla Hardiman⁹,
Bruce Hoff ORCID: orcid.org/0000-0002-4629-2091¹⁰,
Joshua Knight⁴,
Melanie L. Leitner¹¹,
Guang Li¹²,
Lara Mangravite ORCID: orcid.org/0000-0001-7841-3612¹⁰,
Thea Norman¹⁰,
Liuxia Wang¹³,
The ALS Stratification Consortium,
Jinfeng Xiao ORCID: orcid.org/0000-0002-1215-8038¹⁴,
Wen-Chieh Fang¹⁵,
Jian Peng¹⁴,
Chen Yang¹⁶,
Huan-Jui Chang¹⁷ &
…
Gustavo Stolovitzky⁴

Scientific Reports volume 9, Article number: 690 (2019) Cite this article

7747 Accesses
38 Citations
16 Altmetric
Metrics details

Subjects

Abstract

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease where substantial heterogeneity in clinical presentation urgently requires a better stratification of patients for the development of drug trials and clinical care. In this study we explored stratification through a crowdsourcing approach, the DREAM Prize4Life ALS Stratification Challenge. Using data from >10,000 patients from ALS clinical trials and 1479 patients from community-based patient registers, more than 30 teams developed new approaches for machine learning and clustering, outperforming the best current predictions of disease outcome. We propose a new method to integrate and analyze patient clusters across methods, showing a clear pattern of consistent and clinically relevant sub-groups of patients that also enabled the reliable classification of new patients. Our analyses reveal novel insights in ALS and describe for the first time the potential of a crowdsourcing to uncover hidden patient sub-populations, and to accelerate disease understanding and therapeutic development.

An autoantibody signature predictive for multiple sclerosis

Article 19 April 2024

Genome-wide association studies

Article 26 August 2021

Generative models improve fairness of medical classifiers under distribution shifts

Article Open access 10 April 2024

Introduction

Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disorder which causes the death of motor neurons that control voluntary muscles, leading to progressive muscle weakening and paralysis and death within an average of only 3–5 years from symptom onset¹. Existing therapeutic options extend survival by merely a few months^2,3. One of the biggest challenges today is the well-established heterogeneity of ALS^1,4, with patients displaying widely different patterns of disease manifestation and progression, and genetic analyses suggesting heterogeneity of the underlying biological mechanisms^5,6,7,8. This heterogeneity has detrimental effects on clinical trial planning and interpretation³, on attempts to understand disease mechanisms, and on clinical care, as it increases uncertainty about prognosis and optimal treatment. Thus, successfully stratifying ALS patients into clinically meaningful sub-groups can be of great value for advancing the development of effective treatments and achieving better care for ALS patients.

Early classification systems for ALS patients were based on clinical presentation of the disease and were intended for ascertainment of an ALS diagnosis, but had limited capacity to predict disease prognosis or suggest underlying disease mechanisms^4,9,10. More recent attempts towards ALS patient classification focused on prediction of clinical outcomes but were often limited by small sample sizes and sparse clinical information^11,12,13,14. In the current study, we sought to use the power of state of the art machine learning algorithms applied to a large-scale, diverse and clinically detailed database of ALS patients to uncover and characterize clinically relevant subpopulations of ALS patients.

Two complementary data sources were used in this challenge. The first was data from ALS national or regional registers from Ireland and the Piemonte and Valle d’Aosta region in Italy, representing ALS community data collected as part of standard clinical visits. The second dataset was ALS clinical trial data, from the Pooled Resource Open-Access ALS Clinical Trials platform (PRO-ACT, www.ALSdatabase.org, developed by Prize4Life and Massachusetts General Hospital), an open-access database containing harmonized and de-identified data of over 10,000 ALS patients from 23 completed clinical trials¹⁵.

The PRO-ACT database was previously used for a crowdsourcing computational challenge: The 2012 DREAM-Phil Bowen ALS Prediction Prize4Life challenge (The ALS Prediction Challenge)¹⁶. This Challenge invited participants to develop computational algorithms that could predict ALS disease progression, with the best performing algorithms achieving a prediction accuracy that would allow a 20% reduction in the number of patients needed for a trial¹⁶ and are currently being tested in real-world clinical trial applications^17,18.

Building upon the success of the earlier prediction challenge, the 2015 DREAM ALS Stratification Prize4Life Challenge (The ALS Stratification Challenge) sought to extend the scope of prediction algorithms by inviting participants to stratify the ALS patient population into distinct clusters and develop separate predictive models for each subpopulation. The ALS Stratification Challenge included both disease progression and survival as predicted outcome measures and used both clinical trial and community-based registries data. A prize of $28,000, collected through a crowdfunding effort, was offered to best performing algorithms (see Supplementary material part 1 and 2 for detailed description of both datasets and the challenge description as given to participants, respectively).

In this publication we describe the results of the challenge including analysis of the best performing algorithms’ performance and methods, as well as novel methods to uncover the patient sub-populations, their statistical significance and relative importance of different predictive features obtained from cross-model assessment.

Results

Challenge outline

The ALS Stratification Challenge was developed and ran through a collaboration between the nonprofit organizations Dialogue for Reverse Engineering Assessments and Methods initiative (DREAM, http://dreamchallenges.org/) and Prize4Life (www.prize4life.org.il) using the Sage Bionetworks Synapse platform (www.synapse.org). The challenge ran between June and October 2015 and drew 288 registrants, eventually leading to final submissions by 30 teams (88 individual participants) from 15 countries (see Supplementary material part 3 for participant survey).

The challenge was based on two datasets: (1) ALS clinical trials data collected through the PRO-ACT database, and (2) community-based ALS clinical data collected through ALS registries. Both datasets contained longitudinally sampled demographic and clinical information with some additional genetic (specific mutation) and family history data in the registries and detailed laboratory tests in PRO-ACT (See Supplementary material part 1 and 2). We solicited predictions across four sub-challenges, namely of (1) disease progression or (2) survival probability using PRO-ACT data and (3) disease progression or (4) survival probability using ALS registries data.

Challenge participants were asked to use patients’ data from the first 3 months of records to predict disease progression at 12 months or probability of survival at 12, 18 & 24 months. Disease progression was defined as the slope of the ALS Functional Rating Scale (ALSFRS or ALSFRS-R) between 3 and 12 months (see online methods). To avoid overfitting of algorithms¹⁹, data from the publicly available PRO-ACT database was used as the training set, and a separate set of data from six additional clinical trials, which were not previously publicly available, was used for validation. The registry data, which was never before made publicly available, was divided randomly (split evenly across the two registries) into training and validation sets.

The challenge introduced an additional requirement that predictions are limited in the number of used clinical features (Fig. 1). The requirement for limiting the number of features was highlighted by our clinical advisors to facilitate the application of predictive algorithms in natural clinical setting¹³. A preliminary analysis indicated the benefit from clustering (in terms of improved prediction accuracy) tends to increase when the number of features is restricted, and this effect plateaued at around 6 features (see Supplementary material part 4). Thus, participants were asked to write algorithms that first selected the most informative features (up to 6 features), and in a second step used only the data from these selected features to make progression or survival predictions (Fig. 1).

To enforce the separation between these steps we required participants to implement their algorithms in Docker containers (https://docs.docker.com) executed on the IBM Z cloud (https://www.ibm.com/it-infrastructure/z/capabilities/enterprise-security). Algorithms were thus run in a secured environment where the participants could not see the validation data or other jobs.

Comparative assessment of prediction methods

In general, the top performing teams in each sub-challenge (except for the registry progression sub-challenge) significantly outperformed the best baseline algorithm (Fig. 2, See Supplementary material part 5 for full results). Random forest was by far the most commonly used prediction method, with overall very good results, scoring among the first three ranks in each subchallenge (Fig. 2). It was the method used by the best performing team in the registry progression sub-challenge and by most algorithms ranked between the 2nd and 8th places across all four sub-challenges (Fig. 2). Another successful method included a Gaussian process regression model with an arithmetic mean kernel, which was the best performing for the two survival challenges but did not perform so well for the progression challenges.

The performance of the top teams, as was observed in other crowdsourcing challenges^20,21, varied substantially even if the same machine learning method was used, and depended to a large extend on data pre-processing, especially feature selection and representation of time-resolved features. Here, the best performing teams represented time-resolved information by a combination of simple summary statistics (for instance minimum, maximum and average of the feature values). For feature selection, the best teams evaluated the contribution of the complete set of selected features (as opposed to selecting them one by one), for example through evaluating sets of features by their combined information gain or by aggregating their weight along the paths of all trees in a random forest.

Survival predictions deserve special consideration, as one team substantially out-performed other participants in both sub-challenges. Survival predictions are particularly challenging due to the right-censored outcome variable survival time: data can be terminated by either patient death or by trial drop out. The standard Cox proportional hazards model, routinely used to explore the dependency between clinical features and survival, ignores such censored cases. In the current challenge, Yuanfang Guan generalized this right censored problem via a novel strategy GuanRank by complete ranking of training examples regardless of the censoring status, enabling GuanRank to be built-in to any base-learners (in this case, Gaussian Process Regression)²². This defined the outcome variable more precisely, which led to a more adequate training of regression models (here: Gaussian process models) explaining the algorithm’s superior prediction performance. Indeed, the strategy outperformed a standard Cox model by 20% accuracy²², and outperformed the respective second best team in 98.4% and 99.4% of the bootstrap samples (in the PROACT and registry subchallenges, respectively, compare Fig. 2).

Notably, for the sub-challenges running on the PRO-ACT database, the best performing algorithms significantly outperformed the winning method from the first challenge¹⁶. This is even more noteworthy given that the validation set was not randomly divided from the training set, but actually included the more difficult and realistic criteria of application of the algorithms to a data comprising of six new never used before trials.

Predictive clinical features

The challenge’s requirement of using only up to six features for prediction encouraged participants to identify the most informative features for prediction of disease progression or survival (Fig. 3). The most frequently used features across all sub-challenges were those that are well described in the literature as being strongly related to ALS prognosis: time from disease onset and total ALSFRS score^23,24,25. Age and gender were more informative for predicting survival rather than disease progression, in line with literature^{22,23,24,26,27,28}. While age and gender were generally predictive in this data as well, they were not specifically more predictive for one particular subgroup of patients, and therefore less relevant for stratification. Bulbar function is also known to be particularly informative for clinical outcome prediction^24,25,26 and ALSFRS questions 1 and 3 (bulbar functions) were selected more frequently compared to any other functional domains across all sub-challenges. Interestingly, ALSFRS question Q2 (salivation) was rarely used, in line with previous works indicating this question might be less well correlated with total ALSFRS scores and/or disease progression, due to effective available treatment options²⁹.

Data availability was another important factor for feature selection. For example, while weight or BMI, known to correlate with ALS prognosis^30,31, was frequently selected for predictions for the PRO-ACT dataset, it was recorded in only < 10% of the cases in the registry data, making it unusable for predictions (Fig. 3). Similar potential predictive benefit of features was observed with respect to features evaluating breathing capacity, which is the main cause of death for ALS patients³², but not routinely collected for registry patients. This suggests that clinicians could potentially improve their insight into individual patient prognosis by incorporating a few rather accessible measures into routine clinical monitoring (relevant features are bolded in Fig. 3). Indeed, since the challenge ran, clinics involved in the Italian registry used in this challenge have been careful to add measurements of patient weight, BMI and respiratory functions.

Clinically relevant patient clusters

In this challenge we used a crowdsourcing approach to explore different stratification schemes for ALS patients and use them to identify clinically significant patient sub-populations. The main goal of stratification, however, was not necessarily to impact prediction accuracy, and indeed we did not observe any consistent advantage (in terms of improved prediction accuracy) for clustering in any of the evaluation metrics. Instead, as described in the following, we developed novel strategies utilizing this crowdsourcing approach to reveal, across a large variety of methods, consistent clusters of patients, to support understanding the ALS pathology, improving clinical care and planning better, more efficient clinical trials. This sort of analysis can only be accomplished in the context of a large communal effort that allows comparison of independently designed algorithms working on sufficiently large datasets, as was the case in this challenge.

In comparison, individual clustering methods led to heterogeneous results and the numbers of clusters per method varied from 2 to more than 100. Integrating clustering across a large variety of methods on the other hand, revealed a small set of discrete consensus clusters (Fig. 4, see online methods). In both sub-challenges based on PRO-ACT, patients within consensus clusters were significantly more strongly connected than expected by chance (pairs of patients co-clustered significantly with FDR < 5% are depicted as edges in Fig. 4b). The stronger clustering effect in the PRO-ACT vs. the registry sub-challenges is likely a reflection of sample size (10,000 + vs. ~1,500 patients, respectively).

Overall, consensus clusters could be broadly regarded as classifying patients as slow progressing (“red”), fast progressing (“blue”), early stage (“purple”) or late stage (“green”). We chose to focus on clinical characteristics of the PRO-ACT progression sub-challenge, since these consensus clusters reached the highest level of statistical significance, but similar clusters were found across all sub-challenges.

In order to demonstrate how the identified clusters can be utilized in a clinical setup, we also examined whether new patients can be assigned into their respective cluster reliably. By using the most significant 20 features (FDR < 0.01%) we were able to re-associate 84% of the core patients (the patients closest to their respective cluster center, compare Fig. 4). Note, that the consensus clustering that was derived solely from the participants clustering, while the re-clustering is solely based on the values of the clinical features (see online methods). In other words, based on the commonalities of the methods generated by separate teams we were able to uncover a network of sub-groups that can be recreated with good success, strongly suggesting clinical usability in a field where patient stratification is pivotal to development of an effective ALS treatment.

In the following, we characterize these clusters and describe the relationship between patient strata and the most discriminative clinical features: One cluster, the “red” or “slow progressing” cluster included patients who, despite having experienced symptoms for a relatively long time (2.2 years from disease onset, on average) still maintained relatively high functional capabilities (average ALSFRS-R scores of 40.25 at the beginning of the clinical observation period), with functional impairments mostly limited to limbs and little bulbar or respiratory involvement. Accordingly, these patients had slow disease progression (annual average loss of −0.48 on the ALSFRS-R scale). These are the patients with the best prognosis for ALS and therefore merit closer investigation, as they might hold clues for clinical or biological features underlying enhanced resilience. First, only few (3%) of these “slow progressing” patients had bulbar onset. While bulbar onset had been frequently correlated with poorer prognosis^22,23,24, this cluster analysis suggests that bulbar patients will rarely be classified as a slow progressing disease presentation. A second important observation is low creatinine levels (average of 67 mcmol/L in the first three month of data collection, which is considered abnormally low level compared to the desired range of 74.3–107mcmol/L). Creatinine was reported as a predictor in the previous challenge¹⁶, and these results suggest that it might also serve as a predictor of this special case of patients with improved prognosis.

A very similar cluster was observed in the analysis of clusters derived from the PRO-ACT survival sub-challenge, with patients living with ALS for an average of 1.5 years while displaying little functional decline (average total ALSFRS-R score 39.4), largely intact breathing functionality (FVC 94.4% of normal on average) and an ALSFRS progression rate of −0.72 points/months.

To characterize clusters and the involved patients for clinical relevance, we compared all pairs of clusters (using ANOVA and t-test, resulting in multiple-testing corrected false discovery rates or FDRs) to assess which features had values specifically different between the clusters (Fig. 4c). We also examined the correlation between feature values and clinical outcome (progression rate) in each cluster to identify features which were important for prediction in some clusters but not in others. Based on the clusters clinical features analysis, the two features unique for this “slow progressing” cluster were urine creatinine and segmented neutrophils (Fig. 4b,c). Both of these features are therefore potential biomarkers of slow progression in ALS. Neutrophils were indeed found to be connected to ALS progression in some studies^33,34 but not others³⁵ and the stratification to sub-groups might shed further light on these results.

Another cluster that was superficially similar but in fact quite different was the “purple” or “early stage patients”. In our pair-wise comparison of features across clusters, the “early stage” and “slow progression” clusters (“purple” and “red” clusters respectively) were similar in having high ALSFRS-R and FVC scores in the first 3 months of assessment, indicating little functional impairment (Fig. 4b,c). However, the distinctive feature of patients in the “purple” cluster was the fact that they were early in their disease, on average ~10 months from symptom onset. Thus, the largely preserved functional state of these patients could be attributed to their early disease stage, rather than a slow progression rate. Indeed, with time these patients became fast progressors (−0.93 ALSAFRS-R points decline monthly on average) and had marginally lower than normal creatinine levels (72.25 mcmol/L on average on the first three months of data). Curiously, urine creatinine was correlated with disease progression (r² = 0.467 p = 0.01, see Supplementary material 7 for figures) only for this cluster, suggesting again that urine creatinine might be useful predictor of disease progression already early in the disease. The relationship between urine creatinine and serum creatinine and with both to muscle breakdown is not straight forward³⁶. The correlations to creatinine and Urine creatinine in the “red” (slow progressors) and “purple” (early patient) cluster suggests a scheme to enable early stage assessment of expected disease progression that can aid clinical trial recruitment and clinical care planning, both highly needed early in the disease.

A third “green” cluster of “late stage” patients included patients that were clinically advanced in their disease while being, on average, only 1.7 years from disease onset. These patients were severely disabled in all functional domains (average ALSFRS-R score of 29 points at the beginning of the clinical observation period) and were displaying early signs of breathing dysfunction (average FVC 77%) and shorter survival time (1.5 years on average from first recording time, with an average disease duration of 3.2 y).

These patients had a significant correlation between their ALSFRS “trunk” score (questions about dressing and hygiene and turning in bed) and disease progression (r² = 0.314, p = 0.001), which might be indicative of their advanced disability status. A second feature unique predictive for this cluster was Slow Vital Capacity (SVC) (Fig. 4c, examples for both features are available in Supplementary material part 7). These two classifiers should be taken into consideration clinically as they might be stronger indicators that the patients are reaching the final stages of their disease. The “trunk” score indeed represents complex functions that require the combined efforts of upper and lower motor neurons, and is therefore more clearly impaired later in the disease. SVC, while highly correlated with FVC, might become predictive later in the disease when respiratory function diminishes. They could also be used for clinical trial exclusion criteria to improve patient survival throughout the trial. Indeed, SVC was recently suggested as an indicator of respiratory failure in ALS³⁷.

The last cluster, “blue” or “fast progressing” patients, represents the most critical patients, who have been experiencing symptoms for only 10 months on average, but were already significantly impaired in all functional categories (average ALSFRS-R score of 35.75) at the beginning of the clinical observation period. These patients continue to have a very fast disease progression rate (ALSFRS progression slope of −1.05points/month) and an overall average survival of only 2.7 years from disease onset. Half of the patients in this cluster had bulbar onset (compared to 20% bulbar onset across all patients) and were more likely to have lower scores in all ALSFRS-R functional domains (leg, hand, trunk, bulbar and respiratory functions) and to have significant weight loss over the 1-year observation period (average of 5 kg lost per year).

Importantly, a similar yet more severe cluster was observed in the PRO-ACT survival data, with patients showing diminished disease states (initial ALSFRS-R of 24 ALSFRS points) and survival (average of 443 days from trial onset). Women were more likely to be fast progressors, making up 53% of this cluster, compared to 40% women across all clusters, even beyond their higher likelihood to have bulbar onset (32% of women). This cluster of patients with the poorest prognosis was also found in the registry consensus clustering for both the progression and the survival data.

When looking at features that were significantly different between almost all pairs of clusters, a noticeable observation is the discriminative power of ALSFRS question 1 (speech) and the combined “mouth” measure (averaging ALSFRS questions 1–3), highlighting again the important role of bulbar function in discriminating ALS consensus clusters. Overall this cluster helps integrate information, some already accepted (such as the association of bulbar onset and respiratory signs with poorer prognosis) and some suggested (the potential predictive roles of creatinine, urine creatinine, neutrophil and others) in a statistically supported unified framework, enabling discerning fast and slow progressing patients earlier in their disease course, as well as markers helping to identify patient reaching the final stages of their disease.

Discussion

Disease heterogeneity, and particularly large unexplained variance in disease progression rate and survival, is a hallmark feature of ALS disease³⁸. The hypothesized existence of distinct subgroups of ALS patients and their importance for ALS research and clinical care were highlighted in recent years by clinical trials in which only a subset of patients responded to the tested treatment^3,39. Despite some recent advances from genetic studies^40,41, there is currently no generally accepted stratification scheme for ALS patients, and more importantly, no consistent way to tailor survival and disease progression estimates to individual patients. The ALS Stratification Challenge was a global crowdsourcing effort aimed to develop new tools and insights for understanding patient subpopulations as they relate to ALS disease progression and survival. Thirty teams submitted algorithms to the challenge, with the winning solutions outperforming currently available prediction algorithms (adapted from the previous ALS Prediction challenge)¹⁶.

Compared to the ALS Prediction Challenge¹⁶, the current challenge included additional data and design features that made its resulting algorithms more robust and more relevant for clinical application, including the use of community-based data, the limitation on the number of features used for prediction, and the prediction of survival as well as disease progression. Another requirement highly relevant to the application for future clinical trials: the validation of algorithms on a dataset derived from completely independent clinical trials.

The current challenge invited participants to develop prediction algorithms either based on clinical trial data or from ALS registries containing data collected through ALS clinics. This is the first time that registry data was made publicly available, and the design of the challenge enabled us to directly compare performance of prediction algorithms when applied to PRO-ACT vs. registry data. We suggest a number of clinical features (Fig. 3), such as FVC and weight, which could be added to routine clinical assessment to potentially improve prediction accuracy and aid clinicians in predicting individual patient prognosis. Conversely, a number of features that could be found exclusively in the registries data, including common genetic mutation data and detailed onset site assessments, were both highly informative for prediction and should be considered for incorporation into ALS clinical trial screening or baseline assessment.

The main goal of this challenge was to uncover clinically meaningful subgroups of ALS patients, a challenging task since no known “ground truth” exists for ALS patient stratification. In this study we designed a novel “bottom-up” method for the identification of consensus patient clusters and the determination of discriminating features (see online methods). We did not make any a-priori assumptions regarding patient sub-populations, but instead defined patient clusters by a “consensus vote” based on participants’ submitted algorithms. Challenge participants were free to base their clustering on any subset of the available clinical data, choose any type of clustering method and any number of clusters. While clustering was not strongly related to immediate benefit in algorithms’ prediction accuracy, it did reveal consistent patterns of patient classification that are of great clinical interest and that was robust enough to enable classification of new patients with high degree of success. We suggest that these clusters could be used to identify subgroups of patients to guide further research of disease mechanisms and the planning of individual patient care programs and ALS clinical trials. As most clinical trials aim to enroll patients early in their disease to ensure a sufficient therapeutic window, separating patients that will be slow or fast progressors early is critical to enable correct clinical trial development and interpretation. Similarly, signed of end of life in patients can aid patient decision making and clinical care substantially.

The results of this study can help accelerate disease understanding in several ways: the stratification scheme suggested in this analysis offers novel insights that can integrated in the development of novel ALS therapeutics, aiding patient selection and result interpretation. Novel differentiating features such as creatinine or SVC can also help shed light on mechanisms related to disease progression, as well as mechanisms related specifically to end of life in ALS, a topic of critical clinical importance. Ideally, in the future, clinical data such as described here would be further integrated with data obtained from different types of high-throughput technologies (such as transcriptomic, genomics, metabolomics), allowing for the identification of predictive biomarkers for early diagnosis and treatments. Several such highly needed large scale initiatives are being developed now⁴².

Given the covert nature of ALS patient stratification, only a large-scale crowdsourcing effort, where different and independent teams apply diverse methods on a similar and large enough dataset can uncover such an underlying population structure free from a-priory assumptions. This communal approach indeed revealed a few sub-groups of patients which not only tended to cluster together across different algorithms but also displayed similar characteristics across different sub-challenges - clusters which may be the basis for a new stratification framework for ALS patients. Overall, we could significantly differentiate four patient groups: slow progressing patient and fast progressing patients, as well as patients with an average progression rate which were either early or late in their disease at the beginning of the recorded clinical observation period.

We examined the features most often chosen for prediction by the different challenge participants to assess their predictive power. This analysis revealed several features that could help classify patients into sub-groups. While some features are already well described, such as age, gender^22,23,24 and respiratory capacity.^{22,29,43,44,45} and other, such as limb motor function, specific ALSFRS-R scores⁴⁶ and creatinine^{16,29,47,48,49} and specific ALS staging scores were at least suggested as predictive, our results not only supported these findings but help put them in to a more usable and testable context. For example, creatinine was found to be predictive specifically for patients early in their disease. Segmented neutrophils were also suggested by our analysis as a relevant novel predictor, specifically for slower progressing patients, while SVC and ALSFRS “trunk” scores were associated with outcomes only specifically for patients later in their disease.

Overall, these results suggest a novel stratification scheme for ALS, with relevant classifiers and group-specific predictors. Stratification is highly needed to advance clinical development, for clinical care and to allow more personalized treatment. The tools and insights presented in this study can offer a first attempt for improvement in clinical trial development, selection and interpretation, accelerating the development of a much-needed treatment for a dire aliment such as ALS. More broadly they open the door to a new avenue of research using crowdsourcing approaches to uncover patient sub-groups unattainable by other means.

Methods

Datasets

The challenge made use of two datasets: data collected during clinical trials (PRO-ACT) and data collected during routine visits to ALS clinics (ALS registries). Both datasets included both static and time resolved measurements covering a wide range of data types and clinical measurements (full data dictionaries for both datasets can be found in the supplementary material). The time in which measurements were taken was noted in days relative to clinical trial onset or to first clinic visit available on records (time “delta”). Data was provided to challenge participants in tabular form. Each line represented the measurement of a single feature for a single patient at a particular time point Outcome measures (ALSFRS slope for progression and survival) for the training datasets were provided to challenge participants in separate tables.

PRO-ACT

The Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) Database was formed in 2011 by Prize4Life in collaboration with the Neurological Clinical Research Institute (NCRI) of Massachusetts General Hospital, the Northeast ALS Consortium, and with funding from the ALS Therapy Alliance. PRO-ACT contains data collected during phase II/III ALS clinical trial, volunteered by PRO-ACT Consortium members. Data from 17 clinical trials¹⁵ was used for the previous prediction challenge¹⁶ and later made publicly available via the PRO-ACT web platform (www.ALSdatabase.org). This data was used for algorithms’ training in the current challenge. Data from 6 additional clinical trials^{50,51,52,53,54,55}, never before made publicly available, was homogenized to comply with PRO-ACT data format standards and used for algorithms’ validation and assessment. Similar to the design of the previous challenge, the data used for validation was randomly subdivided into a test set (leaderboard data, 400 patients) and a validation set (1,488 patients). Challenge participant could submit versions of their code to be tested on the test set (limited to once per week per team to avoid overfitting) with the results published on a public leaderboard which served to provide feedback to participants. After the challenge was completed all submitted algorithms were assessed using the validation set data (see main manuscript, Fig. 2).

ALS registries data

Community-based data, never before publicly released, used in this challenge was collected through two ALS registries: 1) The Irish National ALS Register including data collected from ALS clinics in Ireland. 2) The Piemonte and Valle d’Aosta Register for ALS, including data collected from ALS clinics in Piemonte and Valle d’Aosta region of Italy. Data from the two registries was merged, harmonized and converted to the same format as the PRO-ACT data. Data was divided into a training (986 patients) and a validation (493) set. Stratification of data ensured a similarly proportional representation of patients from the Irish and Italian cohorts but was otherwise done randomly. Due to the small number of patients available in this dataset there was no test set (leaderboard data) available to challenge participants.

Definition of predicted outcome measures - “ground truth” calculation

Disease progression

ALS disease progression was defined as the slope of the total ALSFRS score, similar to the definitions used in the 2012 prediction challenge¹⁶. Briefly, ALSFRS was calculated as:

$$\frac{ALSFRS({t}_{2})-ALSFRS({t}_{1})}{{t}_{2}-{t}_{1}}$$

where t₁ and t₂ were the time points at the first and last clinic records in the relevant time period 92–365 days in which total ALSFRS scores were recorded. Whereas time data in the both challenge datasets was given in days, it was converted to months for the calculation of ALSFRS slope, according to the following: t_(months) = (t_(days)/365) *12). Patients had to have at least two clinical records in the relevant time period for their data to be used for validation. Participants were required to write algorithms that would predict ALSFRS slope between 3 and 12 months, based on data collected in the first three months of clinical records.

Survival

Survival was defined as time until death or until tracheostomy surgery (the introduction of invasive breathing tube- time where without intervention the patient was unlikely to survive), whenever this information was available. For patients who had no time of death logged on the clinical records the time of the last clinical visit was recorded in the survival records with a status indicating they were alive.

Patients whose final records were on or before day 90 (from onset of clinical trial or of clinical records) were excluded from the survival analysis. Challenge participants were required to write algorithms that predicted the likelihood of survival for each patient at three time points; 12, 18 and 24 months from the onset of clinical records.

Predictions assessment and scoring scheme

All methods for performance assessment were based on evaluating how close the algorithms’ predictions were compared to the respective ground truth, averaged across all patients. We used three different evaluation metrics to assess submitted algorithms’ performance in the disease progression sub-challenges. Two methods, the root mean square deviation (RMSD) and Pearson’s correlation (PCC) were used and described in the previous ALS prediction challenge¹⁶. In the current challenge we added a third evaluation metric: the concordance index (CI), which evaluates the similarity of ranks between predicted and actual ordered lists of measurements. The concordance index was the only metric used to evaluate performance in the survival prediction sub-challenge, since it is commonly used in survival analysis and is best suited for assessing censored data⁵⁶. When there is no censored data or ties, the c-index between a predicted list of survival times of n patients, $Pred=\{{p}_{1},{p}_{2},\ldots ,{p}_{n}\}$, and the actual survival times for the same n patients, $Actual\,=\,\{{a}_{1},{a}_{2},\ldots ,{a}_{n}\}$, is calculated as:

$$CI=\frac{2}{n(n-1)}\sum _{i < j}h(i,j)$$

where,

$$h(i,j)=\{\begin{array}{c}1,\,\mathrm{if}\,({a}_{i} > {a}_{j}\,\& \,{p}_{i} > {p}_{j})\,OR\,({a}_{i} < {a}_{j}\,\& \,{p}_{i} < {p}_{j})\\ 0,\,\mathrm{if}\,({a}_{i} > {a}_{j}\,\& \,{p}_{i} < {p}_{j})\,OR\,({a}_{i} < {a}_{j}\,\& \,{p}_{i} > {p}_{j})\end{array}$$

Please see supplementary data on details accommodating for the possibilities of ties and censored data.

The three performance assessment scores (or three CI values, in the case of survival prediction) were combined using Z-score transformation. We generated 100,000 random sets of predictions for each of the four sub-challenges, by shuffling the assignment of “ground truth” progression or survival values to patients. We then calculated the RMSD, PCC and CI scores for each of these random shuffles and used the resulting three sets of 100,000 values to calculate the mean and standard deviation (SD) of each of the three scoring metrics. These mean and SD values were used to calculate the Z-scores for the three assessment metrics for each of the submitted algorithms, given by:

$${z}_{score}=(score-mean)/SD$$

$${z}_{slope}={z}_{CI}+{z}_{PCC}-{z}_{RMSD},\,{z}_{survival}={z}_{12}+{z}_{18}+{z}_{24}$$

Outcomes of the survival sub-challenges were further validated by a receiver operating characteristic analysis (time ROC analysis). For the purpose of the challenge, time ROC was calculated using the R-package timeROC (https://cran.r-project.org/web/packages/timeROC/timeROC.pdf). On the plus side, the timeROC analysis incorporated the three survival time frames to be predicted (12, 18, 24 months), and performance is thus specific to the time frame in contrast to the analysis via CI, where the same set of predictions would evaluate the same irrespective to the time frame. However, the CI is better suited to represent the right censured nature of the survival data. The rank ordering of submitted algorithms for both survival sub-challenges was very similar when comparing the outcomes of the combined z-transformed CI index and the time ROC analysis. Results of the time ROC analysis could be found in the supplementary material.

Consensus clusters and determination of discriminating features

For each subchallenge, we integrated the teams’ clustering into a consensus clustering. First, we created a square patient x patient co-clustering matrix M where each matrix element m_i,j contained a normalized score that expressed how often the corresponding patients i and j appear together in a single cluster across all team submissions. The normalized score takes the size of the teams’ clusters into account such that two patients clustered together in a smaller cluster receive a larger weight. The scores m_i,j were calculated as the sum of contributions (eq. 1), across all submissions s, where patients i and j appear together in the same cluster:

$${m}_{i,j}=\,{\sum }_{s=1}^{size\_s}\mathrm{log}\,\frac{p}{{C}_{s}\,}if\,i,\,j\,in\,the\,same\,cluster\,in\,S;0\,otherwise,$$

where p is the total number of patients in the given sub-challenge and c_s is the number of patients in the cluster of submission s that includes both i and j (to normalize by cluster size).

We next determined the statistical significance of m_i,j values, i.e. to determine whether patients i and j are clustered together more often than expected by chance. We randomly assigned patients into clusters of the same size as contained in the original submissions and repeated this random sampling process 100 times. We then calculated m_i,j scores for each set of randomly assigned clusters, giving us 100 m_i,j scores for each i, j patient pair. We used these permuted m_i,j value to calculate a false discovery rate for any given m_i,j score derived from the participants’ submissions data, by assigning:

$$FDR(i,j)=\frac{100\ast {N}_{perm}\ast {N}_{true}}{{N}_{false}}$$

where N_perm is the number of permutations, N_true and N_false refer to the number of scores computed from submitted and permuted data, respectively, that were less or equal than the given score. This approach to FDR calculation was adapted from significance analysis of microarrays⁵⁷ and described in detail in previous work⁵⁸. Subsequently a FDR threshold of 5% was applied to identify significantly pairs of patients that significantly tend to be clustered together.

A graph of significant pairs was then plotted using the graphviz software package in order to visually determine a plausible number of clusters k. To generate the final patient clusters, we performed k-means clustering for k clusters based on the matrix M using average linkage and Pearson’s correlation metric. Thus, the correlation metric calculates the distance between patients i and j by comparing the corresponding rows, denoted as m_i and m_j, in the matrix M. This analysis identified three consensus clusters for both survival and the registry progression sub-challenges and four clusters for the PROACT progression sub-challenge.

Detection of features differentially distributed across patient groups

After finding a small number of consensus clusters of patients for each sub-challenge, we determined the features that discriminate between consensus clusters by statistical tests. Values of longitudinal variables (e.g., ALSFRS) were averaged across time, and we looked at average values of two different time periods consistent with the challenge’s overall design: 0–3 months and 3–12 months. We applied one-way analysis of variance (ANOVA) to continuous features (e.g. age of onset) and Fisher’s exact test to discrete features (e.g. site of onset). This leads to two matrices, C and D, capturing the values of the continuous and discrete features, respectively. Rows in these matrices correspond to the features, columns to the patients. In addition, each feature vector (containing the values for that feature across the p patients) was randomly permuted a hundred times and statistical tests were applied to the 100 randomized datasets to compute FDRs (analogous to the description above) separately for the results from Fisher’s exact test and ANOVA.

We then regard the relative ranks from the ANOVA and Fisher as new test values. From these, we compute FDRs now across discrete and continuous features together. By this integrated statistical assessment, we obtained FDR values for all features. Features were deemed statistically significant across all clusters if they exhibited FDR values of <5%. See below pseudo code (Fig. 5) for details.

Subsequently, t-tests were applied to identify all pairs of clusters where ANOVA determined that continuous features exhibited significantly differential values (FDR < 5%). The same permutations were applied to the feature vectors as above, such that we were able to transform t-test p-values into false discovery rates analogously. As before, we regard a feature as statistically significantly different between two clusters if such a comparison resulted in an FDR <5%. FDRs do not need multiple testing correction as it is already built into the permutation test. Note that we applied the ANOVA “trick” here: pairwise comparisons were only performed (and thus subject to multiple testing correction) for features with overall significance across all clusters. This leads to a less severe multiple testing correction and accordingly to a more sensitive test for the pairwise comparisons in contrast to the case were pairwise comparisons would have been performed and corrected for all features.

In order to summarize the differences between the clusters more succinctly, we aimed to integrate the pairwise comparisons, by collapsing the metrics based on pairs of clusters into a metrics based on individual clusters. We created a simple rank order statistics for each feature and cluster by counting pairwise comparisons in the following way: If a comparison between a pair of clusters (a, b) is significant, and the given feature displays higher values in a as compared to b, we increase the rank of a via r(a) = r(a) + 1 and decrease the rank of b via r(b) = r(b) − 1. After all pairwise comparisons are integrated, a heatmap is created (Fig. 4c) by linearly scaling each feature to [−1 … +1] range.

Assignment of patients to consensus clusters (re-clustering)

The purpose of re-clustering is to assign new patients to the previously defined consensus clusters. While the consensus clusters were derived based on the submissions of the individual challenge participants, the re-clustering is based on the values of the features, i.e. new patients are assigned to clusters where feature values of patients and average feature values of the consensus clusters match best. The features to be matched are those features previously determined to be discriminating. The procedure involves two steps, (1) normalization of feature values and (2) matching. Feature values are normalized by subtracting the average and dividing by the standard deviation of each feature, i.e. they are transformed into z-scores. Subsequently, patients are tested against the averaged and z-score normalized feature vector of each consensus cluster via the uncentered correlation. Each patient is then assigned to the cluster that resulted in the highest value of the uncentered correlation. A tenfold cross validation (10CV) has been applied to determine assignment accuracy. Here, 10% of all patients (CV test set) have been removed in each step of CV before differentiating features were determined. After doing this 10 times, accuracy was determined based on the assignment of patients across the 10 test sets.

References

Swinnen, B. & Robberecht, W. The phenotypic variability of amyotrophic lateral sclerosis. Nat Rev Neurol. 10, 661–70 (2014).
Article PubMed Google Scholar
Miller, R. G., Mitchell, J. D., Lyon, M. & Moore, D. H. Riluzole for amyotrophic lateralsclerosis (ALS)/motor neuron disease (MND). Cochrane Database Syst Rev. CD001447 (2002).
Edaravone (MCI-186) ALS Study Group. Safety and efficacy of edaravone in well defined patients with amyotrophic lateral sclerosis: a randomised, double-blind, placebo-controlled trial. Lancet Neurol. 16, 505–512 (2017).
Article Google Scholar
Ravits, J. M. & La Spada, A. R. ALS motor phenotype heterogeneity, focality, and spread: deconstructing motor neuron degeneration. Neurology. 73, 805–11 (2009).
Article PubMed Central PubMed Google Scholar
Logroscino, G. Classifying change and heterogeneity in amyotrophic lateral sclerosis. Lancet Neurol. 15, 1111–2 (2016).
Article PubMed Google Scholar
Kenna, K. P. et al. Delineating the genetic heterogeneity of ALS using targeted high-throughput sequencing. J Med Genet. 50, 776–83 (2013).
Article CAS PubMed Google Scholar
Turner, M. R. et al. Controversies and priorities in amyotrophic lateral sclerosis. Lancet Neurol. 12, 310–22 (2013).
Article CAS PubMed Central PubMed Google Scholar
Sabatelli, M., Conte, A. & Zollino, M. Clinical and genetic heterogeneity of amyotrophic lateral sclerosis. Clin Genet. 83, 408–16 (2013).
Article CAS PubMed Google Scholar
Brooks, B. R. El Escorial World Federation of Neurology criteria for the diagnosis of amyotrophic lateral sclerosis. Subcommittee on Motor Neuron Diseases/Amyotrophic Lateral Sclerosis of the World Federation of Neurology Research Group on Neuromuscular Diseases and the El Escorial “Clinical limits of amyotrophic lateral sclerosis” workshop contributors. J Neurol Sci. 124(Suppl), 96–107 (1994).
Article PubMed Google Scholar
Carvalho, M. D. & Swash, M. Awaji diagnostic algorithm increases sensitivity of El Escorial criteria for ALS diagnosis. Amyotroph Lateral Scler. 10, 53–7 (2009).
Article PubMed Google Scholar
Ganesalingam, J. et al. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups. PLoS One. 4, e7107 (2009).
Article ADS PubMed Central PubMed Google Scholar
Su, X. W. et al. Biomarker-based predictive models for prognosis in amyotrophic lateral sclerosis. JAMA Neurol. 70, 1505–11 (2013).
PubMed Google Scholar
Elamin, M. Predicting prognosis in amyotrophic lateral sclerosis: a simple algorithm. J Neurol. 262, 1447–54 (2015).
Article CAS PubMed Central PubMed Google Scholar
Marin, B. et al. Stratification of ALS patients’ survival: a population-based study. J Neurol. 263, 100–11 (2016).
Article PubMed Google Scholar
Atassi, N. et al. The PRO-ACT database: design, initial analyses, and predictive features. Neurology. 83, 1719–25 (2014).
Article CAS PubMed Central PubMed Google Scholar
Küffner, R. et al. Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nat Biotechnol. 33, 51–7 (2015).
Article PubMed Google Scholar
Zach, N. et al. Being PRO-ACTive: What can a Clinical Trial Database Reveal About ALS? Neurotherapeutics. 12, 417–23 (2015).
Article CAS PubMed Central PubMed Google Scholar
Taylor, A. A. et al. Pooled Resource Open‐Access ALS Clinical Trials Consortium. Predicting disease progression in amyotrophic lateral sclerosis. Ann Clin Transl Neurol. 3, 866–875 (2016).
Article PubMed Central PubMed Google Scholar
Norel, R., Rice, J. J. & Stolovitzky, G. The self-assessment trap: can we all be better than average? Mol Syst Biol. 7, 537 (2011).
Article PubMed Central PubMed Google Scholar
Marbach, D. et al. Wisdom of Crowds for Robust Gene Network Inference. Nature methods 9, 796–804 (2012).
Article CAS PubMed Central PubMed Google Scholar
Rhrissorrakrai, K. et al. Understanding the limits of animal models as predictors of human biology:lessons learned from the sbv IMPROVER Species Translation Challenge. Bioinformatics. 31, 471–83 (2015).
Article CAS PubMed Google Scholar
Huang, Z. et al. Complete hazard ranking to analyze right-censored data: An ALS survival study. PLoS Comput Biol 13, e1005887 (2017).
Article PubMed Central PubMed Google Scholar
Magnus, T. et al. Disease progression in amyotrophic lateral sclerosis: predictors of survival. Muscle Nerve 25, 709–714 (2002).
Article CAS PubMed Google Scholar
del Aguila, M., Longstreth, W., McGuire, V., Koepsell, T. & Van Belle, G. Prognosis in amyotrophic lateral sclerosis: a population-based study. Neurology 60, 813–819 (2003).
Article PubMed Google Scholar
Pastula, D. M. et al. Factors associated with survival in the national registry of veterans with ALS. Amyotroph. Lateral Scler. 10, 332–338 (2009).
Article PubMed Google Scholar
Czaplinski, A., Yen, A. A. & Appel, S. H. Amyotrophic lateral sclerosis: early predictors of prolonged survival. J Neurol. 253, 1428–36 (2006).
Article PubMed Google Scholar
Paganoni, S. et al. Uric acid levels predict survival in men with amyotrophic lateral sclerosis. J. Neurol. 259, 1923–1928 (2012).
Article CAS PubMed Central PubMed Google Scholar
Chiò, A. et al. Piemonte and Valle d’Aosta Register for Amyotrophic Lateral Sclerosis. Amyotrophic lateral sclerosis outcome measures and the role of albumin and creatinine: a population-based study. JAMA Neurol. 71, 1134–42 (2014).
Article PubMed Google Scholar
Pinto, S., Gromicho, M. & de Carvalho, M. Sialorrhoea and reversals in ALS functional rating scale. J Neurol Neurosurg Psychiatry. 88, 187–188 (2017).
Article PubMed Google Scholar
Paganoni, S., Deng, J., Jaffa, M., Cudkowicz, M. E. & Wills, A. M. Body mass index, not dyslipidemia, is an independent predictor of survival in amyotrophic lateral sclerosis. Muscle Nerve 44, 20–24 (2011).
Article PubMed Central PubMed Google Scholar
Paganoni, S., Deng, J., Jaffa, M., Cudkowicz, M. E. & Wills, A. M. What does body mass index measure in amyotrophic lateral sclerosis and why should we care? Muscle Nerve 45, 612 (2012).
Article PubMed Central PubMed Google Scholar
Corcia, P. et al. Causes of death in a post-mortem series of ALS patients. Amyotroph Lateral Scler. 9, 59–62 (2008).
Article PubMed Google Scholar
Murdock, B. J. et al. Increased ratio of circulating neutrophils to monocytes in amyotrophic lateral sclerosis. Neurol Neuroimmunol Neuroinflamm. 3, e242 (2016).
Article PubMed Central PubMed Google Scholar
Murdock, B. J. et al. Correlation of peripheral immunity with rapid amyotrophic lateral sclerosis progression. JAMA Neurol (2017).
Chiò, A. et al. Amyotrophic lateral sclerosis outcome measures and the role of albumin and creatinine: a population-based study. JAMA Neurol. 71, 1134–42 (2014).
Article PubMed Google Scholar
Baxmann, A. C. et al. Influence of Muscle Mass and Physical Activity on Serum and Urinary Creatinine and Serum Cystatin C. Clinical Journal of the American Society of Nephrology: CJASN. 3, 348–354 (2008).
Article CAS PubMed Central PubMed Google Scholar
Andrews, J. A et al. Association between decline in slow vital capacity and respiratory insufficiency, use of assisted ventilation, tracheostomy, or death in Patients with amyotrophic lateral sclerosis. JAMA Neurol. (2017).
Bedlack, R. S. et al. How common are ALS plateaus and reversals? Neurology. 86, 808–12 (2016).
Article PubMed Central PubMed Google Scholar
Fiala, M., Mizwicki, M. T., Weitzman, R., Magpantay, L. & Nishimoto, N. Tocilizumab infusion therapy normalizes inflammation in sporadic ALS patients. Am. J. Neuro. Dis. 2, 129–139 (2013).
Google Scholar
Fogh., I. et al. Association of a Locus in the CAMTA1 Gene with Survival in Patients With Sporadic Amyotrophic Lateral Sclerosis. JAMA Neurol. 73, 812–20 (2016).
Article PubMed Central PubMed Google Scholar
Umoh, M. E. et al. Comparative analysis of C9orf72 and sporadic disease in an ALS clinic population. Neurology. 87, 1024–30 (2016).
Article CAS PubMed Central PubMed Google Scholar
Project MinE, A. L. S. Sequencing Consortium. Project MinE: study design and pilot analyses of a large-scale whole-genome sequencing study in amyotrophic lateral sclerosis. Eur J Hum Genet. 26, 1537–1546 (2018).
Article Google Scholar
Traynor, B., Zhang, H., Shefner, J., Schoenfeld, D. & Cudkowicz, M. Functional outcome measures as clinical trial endpoints in ALS. Neurology 63, 1933–1935 (2004).
Article CAS PubMed Google Scholar
Kollewe, K. et al. ALSFRS-R score and its ratio: a useful predictor forALS-progression. J. Neurol. Sci. 275, 69–73 (2008).
Article PubMed Google Scholar
Vender, R. L., Mauger, D., Walsh, S., Alam, S. & Simmons, Z. Respiratory systems abnormalities and clinical milestones for patients with amyotrophic lateral sclerosis with emphasis upon survival. Amyotroph. Lateral Scler. 8, 36–41 (2007).
Article PubMed Google Scholar
Chiò, A., Hammond, E. R., Mora, G., Bonito, V. & Filippini, G. Development and evaluation of a clinical staging system for amyotrophic lateral sclerosis. J Neurol Neurosurg Psychiatry. 86, 38–44 (2015).
Article PubMed Google Scholar
Bozik, M. E. et al. A post hoc analysis of subgroup outcomes and creatinine in the phase III clinical trial (EMPOWER) of dexpramipexole in ALS. Amyotroph Lateral Scler Frontotemporal Degener. 15, 406–13 (2014).
Article CAS PubMed Google Scholar
Chen, X. et al. An exploratory study of serum creatinine levels in patients with amyotrophic lateral sclerosis. Neurol Sci. 35, 1591–7 (2014).
Article CAS PubMed Google Scholar
Van Eijk, R. P. A. et al. Monitoring disease progression with plasma creatinine in amyotrophic lateral sclerosis clinical trials. J Neurol Neurosurg Psychiatry 89, 156–161 (2018).
Article PubMed Google Scholar
Cudkowicz, M. E. et al. EMPOWER investigators. Dexpramipexole versus placebo for patients with amyotrophic lateral sclerosis (EMPOWER): a randomised, double-blind, phase 3 trial. Lancet Neurol. 12, 1059–67 (2013).
Article CAS PubMed Google Scholar
Cudkowicz, M. E. et al. Ceftriaxone Study Investigators. Safety and efficacy of ceftriaxone for amyotrophic lateral sclerosis: a multi-stage, randomised, double-blind, placebo-controlled trial. Lancet Neurol. 13, 1083–1091 (2014).
Article CAS PubMed Central PubMed Google Scholar
Gordon, P. H. et al. Efficacy of minocycline in patients with amyotrophic lateral sclerosis: a phase III randomised trial. Lancet Neurol. 6, 1045–53 (2007).
Article CAS PubMed Google Scholar
Kaufmann, P. et al. Phase II trial of CoQ10 for ALS finds insufficient evidence to justify phase III. Ann Neurol. 66, 235–44 (2009).
Article CAS PubMed Central PubMed Google Scholar
Sorenson, E. J. et al. Subcutaneous IGF-1 is not beneficial in 2-year ALS trial. Neurology. 71, 1770–5 (2008).
Article CAS PubMed Central PubMed Google Scholar
Aggarwal, S. P. et al. Safety and efficacy of lithium in combination with riluzole for treatment of amyotrophic lateral sclerosis: a randomised, double-blind, placebo-controlled trial. Lancet Neurol. 9, 481–8 (2010).
Article CAS PubMed Central PubMed Google Scholar
Noren, D. P. et al. A Crowdsourcing Approach to Developing and Assessing Prediction Algorithms for AML Prognosis. PLoS Comput Biol. 2016(12), e1004890 (2016).
Article Google Scholar
Tusher, V. G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98, 5116–21 (2001).
Article ADS CAS PubMed Central PubMed Google Scholar
Marozava, S. et al. Physiology of Geobacter metallireducens under excess and limitation of electron donors. Part I. Batch cultivation with excess of carbon sources. Syst Appl Microbiol. 37, 277–86 (2014).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We are grateful to the following people for their important assistance with this manuscript: The clinicians and researchers behind the Irish and Italian ALS registers and the pharmaceutical companies which provided data to the PRO-ACT dataset, that enabled this entire endeavor, the hundreds of participants on the crowdfunding effort that provided this challenge’s award. Prof. David Schoenfeld for his assistance with statistical considerations, and, of course, the solvers who participated in the challenge and the patients who inspired this effort. Data used in the preparation of this article were obtained from the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) Database. As such, the following organizations and individuals within the PRO-ACT Consortium contributed to the design and implementation of the PRO-ACT Database and/or provided data, but did not participate in the analysis of the data or the writing of this report: Neurological Clinical Research Institute at MGH, Northeast ALS Consortium, Novartis, Prize4Life Israel, Regeneron Pharmaceuticals Inc., Sanofi, Teva Pharmaceutical Industries Ltd.

Author information

Authors and Affiliations

Icahn School of Medicine at Mount Sinai, New York, NY, USA
Robert Kueffner
Teva Pharmaceuticals, Netanyah, Israel
Neta Zach
Prize4Life, Haifa, Israel
Maya Bronfeld
IBM Research, Yorktown Heights, NY, USA
Raquel Norel, Venkat Balagurusamy, Donna Dillenberger, Joshua Knight & Gustavo Stolovitzky
Massachusetts General Hospital, Boston, MA, USA
Nazem Atassi & Merit Cudkowicz
Information Engineering Department, University of Padova, Padova, Italy
Barbara Di Camillo
University of Turin, Turin, Italy
Adriano Chio
Pompeu Fabra University, Barcelona, Spain
Javier Garcia-Garcia
Institute of Neuroscience, Trinity College, Dublin, Ireland
Orla Hardiman
Sage Bionetworks, Seattle, WA, USA
Bruce Hoff, Lara Mangravite & Thea Norman
Accelerating NeuroVentures, Boston, MA, USA
Melanie L. Leitner
Amazon, Seattle, WA, USA
Guang Li
Zillow, Seattle, WA, USA
Liuxia Wang
Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA
Jinfeng Xiao & Jian Peng
Department of Information and Learning Technology, National University, Tainan City, Taiwan
Wen-Chieh Fang
Department of Computer Science and Information Engineering, National University, Tainan City, Taiwan
Chen Yang
Faculty of Information Technology, Monash University, Clayton, Australia
Huan-Jui Chang
Department of Human Genetics, McGill University, Montreal, Canada
Rached Alkallas
Ontario Institute for Cancer Research (OICR), Toronto, Canada
Catalina Anghel, Shadrielle Melijah G. Espiritu, Fan Fan, Julia Hopkins, Barbara Huang, Jouhyun Jeon, Christopher Lalansingh, Xihui Lin, Dorota Sendorek, Yu-Jia Shiah & David Wang
Departement d’Economie Ecole Polytechnique, Paris, France
Jeanne Avril, Thomas Larrieu & Violette Nahmias
Interdisciplinary Computing and Complex BioSystems (ICOS) research group, Newcastle University, Tyne, UK
Jaume Bacardit & Nicola Lazzarini
Veristat Inc, Southborough, MA, USA
Barbara Balser, John Balser, Robin Bliss, Jialu Cai, Bhavna Ahuja Nicole Corriveau, Eve Desplats, Jarrett Lowe, Adrienne O’Donnell, Susan Paadre, Jean-Karl Sirois, Zhiqing Xu & Li Zhang
Medical Research, Kfar Malal, Israel
Yoav Bar-Sinai
Department of Computer Science, Ben-Gurion University of the Negev, Beersheba, Israel
Noa Ben-David
Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Negev, Israel
Eyal Ben-Zion & Boaz Lerner
Analytica Laboratories, Hamilton, New Zealand
Anatoly Chernyshev
Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City, Taiwan
Jung-Hsien Chiang & Hsih-Te Yang
Princess Margaret Cancer Centre, Toronto, Ontario, Canada
Davide Chicco
Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS, USA
Junqiang Dai, Stefan Graw, Alex Karanevich, Devin C. Koestler, Richard Meier, Rama Raghavan & Joseph Usset
MIT, Department of Mathematics, Cambridge, MA, USA
Yash Deshpande
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Joseph S. Durgin
Centre de Recherche en Economie et Statistique (CREST), Paris, France
Philippe Fevrier
Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA
Brooke L. Fridley
Program on Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
Adam Godzik
Institute of Informatics, University of Białystok, Ciołkowskiego, Białystok, Poland
Agnieszka Golińska, Aneta Polewko-Klim & Witold Rudnicki
Department of Engineering, University of Cambridge, Cambridge, UK
Jonathan Gordon
RTI International, Research Triangle Park, NC, Triangle Park, USA
Yuelong Guo
KU Leuven, Department of Public Health and Primary Care, Kortrijk, Belgium
Tim Herpelinck & Celine Vens
Department of Biochemistry, University of Colorado, Boulder, CO, USA
Jeremy Jacobsen
Origent Data Sciences, Inc., Vienna, VA, USA
Samad Jahandideh
National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Huairou, China
Wenkai Ji & Shihua Zhang
Stanford University, Center for Biomedical Informatics Research, Stanford, CA, USA
Kenneth Jung
Department of Statistics, Tel-Aviv University, Tel Aviv-Yafo, Israel
Michael Kozak
Helmholtz Zentrum München, Institute of Health Economics and Health Care Management, Munich, Germany
Christoph Kurz
Department of Bioinformatics, University of Bialystok, Ciolkowskiego, Bialystok, Poland
Wojciech Lesinski
Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, China
Xiaotao Liang & Shanfeng Zhu
Microsoft Research New England, Cambridge, MA, USA
Lester Mackey
School of Computer, Wuhan University, Wuhan, China
Wenwen Min
Computational Centre, University of Białystok, Ciołkowskiego, Białystok, Poland
Krzysztof Mnich & Witold Rudnicki
Children’s Mercy Hospital, Kansas City, MO, USA
Janelle Noel-MacDonnell
LinkedIn, Sunnyville, CA, USA
Ji Park
Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Pawińskiego, Warsaw, Poland
Witold Rudnicki
Department of Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
Ehsan Saghapour
Ceremade Universite Paris-Dauphine, Paris, France
Jean-Bernard Salomond
Université Paris-Est, Laboratoire d’Analyse et de Mathématiques Appliquées, Créteil, France
Jean-Bernard Salomond
Stanford University, Department of Statistics, Stanford, CA, USA
Kris Sankaran
Stanford University, Department of Electrical Engineering, Stanford, CA, USA
Vatsal Sharan
Department of Computer Science and Engineering, University of Moratuwa, Moratuwa, Sri Lanka
Dinithi N. Sumanaweera
Department of Computer Science, University of California, Irvine, CA, USA
Yeeleng S. Vang & Xiaohui Xie
Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
Dave Wadden
Department of Computer Science, City University of Hong Kong, Hong Kong, China
Wing Chung Wong
Dept of Computer Science, Bren School of Information and Computer Sciences, University of California, Irvine, CA, USA
Xiaohui Xie
University of Pennsylvania, Philadelphia, PA, USA
Xiang Yu
University of Maryland, Baltimore, MD, USA
Haichen Zhang
Centre for Computational System Biology, ISTBI, Fudan University, Shanghai, China
Shanfeng Zhu

Author notes

A comprehensive list of consortium members appears at the end of the paper

Authors

Robert Kueffner
View author publications
You can also search for this author in PubMed Google Scholar
Neta Zach
View author publications
You can also search for this author in PubMed Google Scholar
Maya Bronfeld
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Norel
View author publications
You can also search for this author in PubMed Google Scholar
Nazem Atassi
View author publications
You can also search for this author in PubMed Google Scholar
Venkat Balagurusamy
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Di Camillo
View author publications
You can also search for this author in PubMed Google Scholar
Adriano Chio
View author publications
You can also search for this author in PubMed Google Scholar
Merit Cudkowicz
View author publications
You can also search for this author in PubMed Google Scholar
Donna Dillenberger
View author publications
You can also search for this author in PubMed Google Scholar
Javier Garcia-Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Orla Hardiman
View author publications
You can also search for this author in PubMed Google Scholar
Bruce Hoff
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Knight
View author publications
You can also search for this author in PubMed Google Scholar
Melanie L. Leitner
View author publications
You can also search for this author in PubMed Google Scholar
Guang Li
View author publications
You can also search for this author in PubMed Google Scholar
Lara Mangravite
View author publications
You can also search for this author in PubMed Google Scholar
Thea Norman
View author publications
You can also search for this author in PubMed Google Scholar
Liuxia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jinfeng Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Chieh Fang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Peng
View author publications
You can also search for this author in PubMed Google Scholar
Chen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Huan-Jui Chang
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Stolovitzky
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

The ALS Stratification Consortium

Rached Alkallas
, Catalina Anghel
, Jeanne Avril
, Jaume Bacardit
, Barbara Balser
, John Balser
, Yoav Bar-Sinai
, Noa Ben-David
, Eyal Ben-Zion
, Robin Bliss
, Jialu Cai
, Anatoly Chernyshev
, Jung-Hsien Chiang
, Davide Chicco
, Bhavna Ahuja Nicole Corriveau
, Junqiang Dai
, Yash Deshpande
, Eve Desplats
, Joseph S. Durgin
, Shadrielle Melijah G. Espiritu
, Fan Fan
, Philippe Fevrier
, Brooke L. Fridley
, Adam Godzik
, Agnieszka Golińska
, Jonathan Gordon
, Stefan Graw
, Yuelong Guo
, Tim Herpelinck
, Julia Hopkins
, Barbara Huang
, Jeremy Jacobsen
, Samad Jahandideh
, Jouhyun Jeon
, Wenkai Ji
, Kenneth Jung
, Alex Karanevich
, Devin C. Koestler
, Michael Kozak
, Christoph Kurz
, Christopher Lalansingh
, Thomas Larrieu
, Nicola Lazzarini
, Boaz Lerner
, Wojciech Lesinski
, Xiaotao Liang
, Xihui Lin
, Jarrett Lowe
, Lester Mackey
, Richard Meier
, Wenwen Min
, Krzysztof Mnich
, Violette Nahmias
, Janelle Noel-MacDonnell
, Adrienne O’Donnell
, Susan Paadre
, Ji Park
, Aneta Polewko-Klim
, Rama Raghavan
, Witold Rudnicki
, Ehsan Saghapour
, Jean-Bernard Salomond
, Kris Sankaran
, Dorota Sendorek
, Vatsal Sharan
, Yu-Jia Shiah
, Jean-Karl Sirois
, Dinithi N. Sumanaweera
, Joseph Usset
, Yeeleng S. Vang
, Celine Vens
, Dave Wadden
, David Wang
, Wing Chung Wong
, Xiaohui Xie
, Zhiqing Xu
, Hsih-Te Yang
, Xiang Yu
, Haichen Zhang
, Li Zhang
, Shihua Zhang
& Shanfeng Zhu

Contributions

R.K., N.Z., R.N., T.N., L.M., G.S. and M.L.L. designed the challenge. O.H., M.C., N.A. and A.C. contributed and prepared the data and providing clinical insight, R.K., B.D., J.G.-G., R.N., L.W. and G.L. prepared the data and baseline algorithm, J.G.-G., R.K., B.H., V.B., J.K. and D.D. provided challenge data and technical support, N.Z. and R.K. managed the challenge execution. R.K., M.B., R.N. and N.Z. analyzed the results and wrote the paper. The ALS Stratification Consortium, J.X., W.-C.F., J.P., C.Y. and H-J.C. provided methodology. S.J. edited and reviewed portions of the paper.

Corresponding authors

Correspondence to Robert Kueffner or Neta Zach.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Info 1

Supplementary Info 2

Supplementary Info 3

Supplementary Info 4

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kueffner, R., Zach, N., Bronfeld, M. et al. Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach. Sci Rep 9, 690 (2019). https://doi.org/10.1038/s41598-018-36873-4

Download citation

Received: 17 July 2018
Accepted: 26 November 2018
Published: 24 January 2019
DOI: https://doi.org/10.1038/s41598-018-36873-4

This article is cited by

Ensemble-imbalance-based classification for amyotrophic lateral sclerosis prognostic prediction: identifying short-survival patients at diagnosis
- Fabiano Papaiz
- Mario Emílio Teixeira Dourado
- Joel Perdiz Arrais
BMC Medical Informatics and Decision Making (2024)
Leveraging process mining for modeling progression trajectories in amyotrophic lateral sclerosis
- Erica Tavazzi
- Roberto Gatta
- Barbara Di Camillo
BMC Medical Informatics and Decision Making (2023)
Stratification of responses to tDCS intervention in a healthy pediatric population based on resting-state EEG profiles
- Paulina Clara Dagnino
- Claire Braboszcz
- Aureli Soria-Frisch
Scientific Reports (2023)
Elevated peripheral levels of receptor-interacting protein kinase 1 (RIPK1) and IL-8 as biomarkers of human amyotrophic lateral sclerosis
- Jun Wei
- Min Li
- Yunhong Zha
Signal Transduction and Targeted Therapy (2023)
PRO-ACTive sharing of clinical data
- Neta Zach
- Melanie L. Leitner
Nature Biotechnology (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.