To the Editor — Crowdsourced data science challenges can achieve in months what would take years through conventional research approaches. However, they remain largely untapped for underserved or critical biomedical challenges, such as treatment of malaria, in which computational modeling lags and accelerated innovation are urgently needed to combat the emergence of drug resistance.
The diversity and quality of computational solutions obtained in data science challenges1 combined with the rapid pace at which they are developed (typically weeks to months for a single challenge) could in principle accelerate research in fields in which data science expertise is highly underrepresented. Regarding disease-related research questions, data science crowdsourcing platforms have been used in only a limited way and for a very restricted number of diseases. For example, of the 19 ongoing challenges on Kaggle at the time of writing, only one is for a disease, cancer. Since the establishment of DREAM Challenges in 2007, not a single challenge has involved an infectious disease, whereas 15 have involved cancer. In this context, we believe that there is an untapped opportunity for the data science challenges model of innovation to expand beyond the areas to which it has been traditionally applied into new areas of need, such as neglected diseases. Here, we consider the case of emerging drug resistance to the anti-malarial drug artemisinin and present a call for participation in the Malaria DREAM Challenge for computational models of drug resistance.
The need for crowdsourced data science challenges in malaria
Malaria can serve as a clear example of a disease area in which data science challenges could accelerate the pace of research. First, despite numerous eradication efforts, malaria remains a global health challenge. Multiple reports of resistance to artemisinin, the last line of defense against multi-drug-resistant malaria, have emerged in recent years within Southeast Asia2. If artemisinin resistance were to spread to Africa, where most cases and deaths due to malaria occur, decades of progress and efforts in malaria eradication could be erased. Before artemisinin resistance reaches Africa, a concerted effort must be made to understand the mechanistic changes that malaria parasites undergo to obtain resistance and to determine what drugs might be most effective in combination with artemisinin derivatives to deter or counter further resistance. This calls for accelerated innovation for alternative drugs. Collaborative crowdsourcing through data science challenges can accelerate innovative solutions with modest funding investment.
Second, although investment in global health research, the availability of large-scale datasets and overall sharing of data have increased, the field of malaria research is lacking in data scientists and job openings. For example, a search on LinkedIn for data science jobs in malaria yields only 11 jobs, compared with 1,601 jobs for cancer-focused data scientists. As has happened in other challenges, such as the ALS DREAM Challenge3, a malaria-focused data science challenge could enable the field to tap into data scientists who do not ordinarily work on malaria and, with the right incentives, elicit interest in the disease. In addition, malaria research has historically not kept pace with modeling advances in machine learning, because it has failed to attract expertise outside the field, a trend also seen across other research areas of global health. For example, many data scientists are turning to cancer modeling, but it is difficult to identify any who have turned to malaria. Crowd-sourced data science challenges open to a wider community of modelers are bound to inject new modeling ideas into the field, so that real-time discoveries can guide refined disease-control strategies.
Finally, even though data generation has grown over the past decade, many areas in malaria research lack what would qualify as ‘big’ data. Developing computational solutions that could work on ‘small’ biological datasets through crowdsourcing malaria challenges could drive innovation in data science with limited data and convince more communities that crowdsourced data science challenges are a good avenue for unpublished, difficult-to-create clinical datasets4.
Time for a malaria DREAM Challenge
Recently, we held the ‘DREAM of Malaria’ hackathon, a 1-week effort that brought together young scientists from various African countries, many with no expertise in malaria, to assess whether malaria transcriptomic data might be used to predict artemisinin resistance5. The hackathon demonstrated the potential for data scientists outside the current malaria research community to make valuable contributions in pre-publication exploratory data analysis5, and it highlighted an opportunity to bring a level of rigor inherent in community analysis that cannot be captured in a typical research project run by one group.
In late April, we launched the Malaria DREAM Challenge, which is open to anyone interested in contributing to the development of computational models that address important problems in advancing the fight against malaria6. The overall goal of this Malaria DREAM Challenge is to predict artemisinin drug-resistance levels for a test set of malaria parasites by using their in vitro transcription data and a training set consisting of published in vivo and unpublished in vitro transcriptomes (Fig. 1). The in vivo dataset consists of approximately 1,000 transcription samples from various geographic locations covering a wide range of life cycles and resistance levels, with other accompanying data such as patient age, geographic location and artemisinin combination therapy used7. The in vitro transcription dataset consists of 55 isolates, with transcriptional data collected at two time points (6 and 24 h post-invasion), in the absence or presence of an artemisinin perturbation, for two biological replicates, by using a custom microarray designed at the laboratory of M. T. Ferdig at the University of Notre Dame. Using these transcription datasets, participants will be asked to predict three different resistance states of a subset of the 55 in vitro isolate samples: 50% inhibitory concentration values (IC50), patient clearance half-life and categorical resistance state (resistant/sensitive). The Malaria DREAM Challenge could enable the field to gain insights into the mechanisms that underlie a lack of correlation between in vitro (IC50) and in vivo (clearance rate) measures of artemisinin drug response. This is a major drawback for laboratory studies of artemisinin resistance that are essential for studying mechanisms of resistance as the in vitro drug responses fails to capture the in vivo parasite clearance rates8.
Potential for impact in malaria and beyond
The internet is a powerful enhancer of human collaboration globally. Crowdsourced data science challenges enable a wider community to be engaged via the internet, including individuals not traditionally involved in a field, by lowering the entry barriers to a research subject through well-curated datasets and a community of interested data scientists who compete collaboratively. Furthermore, data science challenges marshal a large number of solutions to a single well-defined problem within a field, thus increasing the chance for new and better solutions to emerge. Crowdsourced data science challenge platforms, such as DREAM Challenges, also promote sharing of open-source code, which anybody can access and reuse after the challenge. A Malaria DREAM Challenge on emerging artemisinin drug resistance will engage perhaps the largest community of modelers to work simultaneously on the problem of drug resistance in the disease. Our experience suggests that this process will inspire new ways of problem-solving in the field.
We also contend that a Malaria DREAM Challenge could serve as a model for the potential of data science challenges to provide solutions to underexplored global health challenges in biomedicine, such as in neglected tropical diseases9. Indeed, the value of this challenge approach is further shown by the funding support for this challenge from the Bill & Melinda Gates Foundation as well as the Foundation’s exploration on crowdsourcing global health data as a mechanism to vastly accelerate learning and address difficult global health problems in which data and modeling can function as key drivers.
Saez-Rodriguez, J. et al. Nat. Rev. Genet. 17, 470–486 (2016).
Das, D. et al. N. Engl. J. Med. 51, e82–e89 (2009).
Küffner, R. et al. Nat. Biotechnol. 33, 51–57 (2015).
Khare, R. et al. Brief. Bioinform. 17, 23–32 (2016).
Ghouila, A. et al. Genome Res. 28, 759–765 (2018).
Malaria DREAM Challenge. Synapse https://www.synapse.org/#!Synapse:syn16924919/wiki/583955 (2019).
Mok, S. et al. Science 347, 431–435 (2015).
Amaratunga, C. et al. Lancet Infect. Dis. 12, 851–858 (2012).
Feasey, N. et al. W. Br. Med. Bull. 93, 1–5 (2010).
The data collection for the Malaria DREAM Challenge is funded by an NIH R21 grant to M.T.F. (AI103872-01AI). The Malaria DREAM Challenge has received funding support from the Bill & Melinda Gates Foundation. H3ABioNet was supported by the National Human Genome Research Institute (NHGRI) and the Office of The Director (OD), National Institutes of Health under award number U41HG006941. We are grateful to K. Davis (Center for Research Computing) for designing the figure presented in this paper.
The authors declare no competing interests.