Main

There remains a substantial burden from infectious disease in low-resource rural communities, not least as a consequence of malaria. After two decades of decline in prevalence, the disease is now increasing in 13 countries, with around 228 million malaria cases and 405,000 deaths globally each year. More than 90% of these cases are in Africa1. Diagnostic testing continues to underpin control and prevention strategies, primarily through the use of rapid, point-of-care, lateral flow immunoassays, which are affordable, sensitive, specific, user-friendly, rapid and robust, equipment-free and deliverable devices, meeting the World Health Organization (WHO) ASSURED criteria2.

Despite recent successes in testing, the 2019 WHO World Malaria Report1 highlights that a considerable proportion of populations living in rural areas still do not have access to prompt diagnosis, and emphasizes the need for rapid integrated diagnostic testing that can inform treatment and underpin elimination strategies. Current reports also point to the need for infectious disease diagnostics to become embedded in regional or national case management systems, with improved digital connectivity, enabling local access to surveillance data in remote communities3.

One challenge is that such digital connectivity must be compatible with the different levels of medical record keeping within low- and middle-income countries (LMICs), which may range from sophisticated internet-enabled systems in urban settings to more distributed and often fragmented infrastructure in low-resource rural districts (where paper records and registries are often considered the norm)4. Providing methods to enable connectivity between such rural communities and centralized medical facilities is particularly important1, as this information drives the flow of healthcare resources from governments, healthcare systems and charities5.

A second challenge is that data collected in rural settings are often reported and re-recorded as they travel through administrative structures, from village communities, local district offices and regional administrative centres to the health ministries of national governments. Thus, improving connectivity in decision making (as well as embedding trust in the recording, transfer and reporting of data) is important, as it informs the timing of national campaigns, such as regional mass drug administrations, when treating reservoirs of infectious diseases in local communities6.

Linking the collection of patient healthcare diagnostic data from testing with geospatial information in communities is also increasingly important across all healthcare systems, allowing disease prevalence to be mapped in real time and interventions to be focused rapidly (whether this be for endemic diseases such as malaria in low-resource communities or for seasonal flu in high-resource healthcare settings7). The development of systems that can be readily adapted to accommodate existing reporting mechanisms and can ensure that information is transmitted securely will increase trust in the recorded data that underpin intervention, treatment and prevention strategies.

Within LMICs, sub-Saharan Africa is at the forefront of developing and adopting digital technologies to improve healthcare8. For example, mobile phones are already widely used in the logistics and implementation of diagnostics, surveillance, prevention and treatment programmes and have the potential to be fully integrated into geotagged information systems. Examples of mobile health applications using artificial intelligence (AI) also already exist, including those for eye diagnostics in rural settings9. Furthermore, mobile phone-enabled data connectivity has been used for imaging and cloud-based analysis to standardize malaria detection using rapid diagnostic test (RDT) lateral flow immunoassays, most commonly detecting the histidine-rich protein-2/3 (hrp2/3) antigen present in the malaria parasite10.

However, problems with malaria diagnosis using RDTs have recently emerged. It was reported in 2016 that eight commercial RDTs gave sensitivities to detect malaria parasites of only ~75%, with a negative predictive value of ~75% compared to the gold-standard laboratory-based polymerase chain reaction (PCR)11. The WHO12 has recently attributed the poor performance of the current immunodiagnostic RDTs in part to the increasing prevalence of parasites with hrp2/3 gene deletions, causing a high prevalence of false-negative RDT immunodiagnostic results among symptomatic malarial patients13. The WHO now classifies hrp2/hrp3 deletions as a threat to malaria control and elimination and has called for disease monitoring using DNA-based assays14,15. However, these technologies require bulky and complex instrumentation and often need training in the interpretation of results where multiple test and control outputs are measured as part of a care pathway.

In this Article, we report a smartphone-based system that uses deep learning algorithms to provide local decision support for a multiplexed paper-based microfluidic lateral flow DNA molecular assay for Plasmodium sp. (Fig. 1). Our approach includes blockchain end-to-end connectivity to enable local healthcare staff to securely interpret and report the outcomes of diagnostic readouts. The system offers high diagnostic accuracy in the collection, interpretation and reporting of results, while improving the trustworthiness of data collection and transfer and providing end-to-end diagnostics for low-resource, rural settings. We illustrate its capabilities via field testing in rural communities in East Uganda.

Fig. 1: System architecture.
figure 1

Schematic of the system architecture, which includes a mobile heater, an Android app to control the heater and manage the diagnostic (paper-based microfluidic) assay (including start/stop), as well as a backend engine comprising a blockchain network for secure and trusted connections and a deep learning model for decision support.

Standard web-based security approaches are currently not sufficient to support the transfer of safety-critical and sensitive data including medical diagnostic information over wireless networks16. Alternative security systems that provide data provenance and management often require specialized equipment and trained personnel, adding an increased burden to resource-limited settings17. By contrast, blockchain provides a low-power and low-cost approach to incorporate digital security into governed processes, improving interoperability while supporting immutability and high levels of trust by allowing access for only ‘endorsed’ transactions. In such methods, an individual’s information is stored in a tamperproof digital ledger, secured by a unique digital signature. Copies of this ledger can be held locally by healthcare workers in a blockchain network, which ensures that it remains accessible and consistent, and that each change to the network is verified by a consensual mechanism. Such methods are widely used in financial transactions and have recently found application in geospatial tracking of individuals’ interactions during the COVID-19 pandemic7. They have also been used before for medical data-sharing schemes in well-resourced settings to alleviate security and privacy issues18,19,20.

The paper-based microfluidic diagnostic test we developed differentiates the endemic malaria-causing parasitic species in East Africa, Plasmodium falciparum, from all other parasitic species that cause malaria21, enabling informed species-specific therapy or, in the future, surveillance. Our species-specific DNA-based diagnostic devices closely align with the ASSURED criteria of the WHO2 and have been designed to be capable of being integrated with wireless communication systems through existing cellular networks, without additional requirements. By using a common, secure protocol GitHub OAuth22, which is open-source and independent of the manufacturer, we can provide information exchange that is flexible (including images and metadata) and capable of integration into existing testing and reporting systems (so as to be accessible securely through devices such as smartphones, basic computers and mobile apps). Our system can be readily adapted to other sources of data input, including those for other different infectious and chronic disease co-morbidities. It can also be used to input information into existing digital health management systems being used either locally or nationally23. One potential example could involve linking our mobile platform into DHIS2, an open-source platform used extensively for healthcare-related digital information.

Diagnostic system

The key components of our diagnostic system (Fig. 1) are a mobile phone-controlled heater for DNA amplification through isothermal heating, a paper microfluidic chip for DNA testing21, a blockchain architecture and an AI component for decision support (Fig. 2).

Fig. 2: System design.
figure 2

a, The assembled device, showing the phone used to supply power, control the assay conditions (on/off, start/stop and temperature), collect results, communicate with the cloud, analyse data and provide geotagging. The diagnostic chip is shown, inserted into the heating element. The whole device, including the mobile phone, is lightweight (<500 g) and can be held in one hand, with the potential to enable diagnostics to be delivered anywhere (without the need to transport equipment, for example). b, Open section view of the device and associated circuit. The numbered parts, respectively, are (1) the casing and main body of the device, (2) the aluminium band for receiving the diagnostic device and conducting heat for the nucleic acid amplification assay, (3) circuit components including a microcontroller, heater controller and power supply unit, and (4) the external port for thermal calibration. c, The plastic cartridge including a microfluidic circuit with chambers for the LAMP reaction and lateral flow strips for readout, as well as the QR code for traceability. The dashed lines outline the cropped area for analysis by AI, with the test and control lines indicated (see Supplementary Fig. 6 for details).

The heater’s performance was characterized using a thermocouple to validate the temperatures sensed and applied, with different target temperatures. Figure 3 shows that the errors between the real temperature and the target temperatures were all within ±0.5 °C (standard deviation) at 40, 65, 75 and 90 °C. A 10,000-mAh power bank could be used to provide more than the 9 h of the phone’s battery life, if required, in the absence of mains power supply.

Fig. 3: Mobile heater characterization.
figure 3

The temperature was recorded for different target temperatures, 90 °C (purple), 75 °C (blue), 65 °C (red) and 40 °C (black). The temperature decreased when the battery power became limited. The lifetime of the phone’s batteries was dependent upon whether other functionalities were being used (for example, Wi-Fi connectivity). If required, the battery lifetime of the mobile phone was extended by use of a hand-held battery pack. Inset: zoom-in on the temperature ‘ramping’ up, demonstrating the effectiveness of the control of the proportional, integral and derivative algorithm (axes are the same as in the main panel). Heating to 65 °C took 10 min (600 s), providing the ability to run a full LAMP assay in <1 h (including sample processing21).

The performance of a blockchain network (Fig. 4), including latency and maximum throughput, can influence user experience when uploading diagnostic tests onto the cloud for data provenance and long-term preservation. Evaluation was targeted at these two main functions of the blockchain network and was measured against the benchmark Hyperledger Caliper 0.2.0. The test environment was an Ubuntu 18.04 virtual machine with 4-GB RAM and a four-core processor. The test was carried out with a Caliper 2-organization-1-peer test model and the whole process included 12 rounds (6 rounds for each transaction) with different numbers of transactions and send rates. Table 1 shows that the maximum throughput of the blockchain was ~10 transactions per second, and the system did not lose any data when tested under conditions involving high send rates (Supplementary Table 1 provides the system resource usage during the test).

Fig. 4: System architecture of the blockchain network.
figure 4

The business network archive (BNA) file was packed with the model file, the script file, the access control file and the query file, and was deployed to a Fabric Runtime. Users’ access to the blockchain network was from a web browser on a standard desktop or laptop computer or through the mobile app. OAuth 2.0 provided the authentication service. The users’ network cards containing private and public keys were stored in their wallet.

Table 1 Blockchain performance evaluation results

A dataset with five categories containing 92 test images was used for training the AI decision support tool. These images comprised examples collected from the loop-mediated isothermal amplification (LAMP) diagnostic tests performed in the laboratory and included 11 ‘1N2P’ (one negative ‘N’ and two positive ‘P’ test lanes), 13 ‘1P2N’, 23 ‘double-positive’, 15 ‘negative’ and 30 ‘invalid’ tests, as defined later). This library of results was used to test the accuracy of the convolutional neural network (CNN) model (classification credentials are provided in Supplementary Table 2). The test dataset and the training dataset were independent of each other and generated randomly (Methods). The sparse categorical cross-entropy loss function, which can be presented as equation (1), was used to evaluate the performance of the tool:

$$L_i = - \mathop {\sum }\limits_j t_{i,j}{\mathrm{log}}(p_{i,j})$$
(1)

where i indicates the samples and j the class (1–5), enabling the loss value Li to be calculated using pi,j as the likelihood of prediction and ti,j the true value. The whole training process included 20,000 steps (equivalent to 500 epochs).

Figure 5a,b demonstrates the efficiency of training convergence (with an accuracy of 97.83%) with low loss (0.16 loss). The confusion matrix (Fig. 5c) shows that three of the diagnostic categories demonstrate 100% accuracy (1N2P, invalid and negative), while 8% of 1P2N were wrongly classified as invalid and only 4% of the double-positive cases were mistakenly predicted as 1N2P.

Fig. 5: AI performance.
figure 5

a,b, Accuracy (between 0–1) (a) and loss (defined in equation (1)) (b) during the AI training process during epochs (the red trace provides results for the training dataset and the blue trace for the validation dataset). c, Confusion matrix of the test results for the CNN model, representing the predicted label and the actual label of every test image. The background colour of each grid of the matrix represents the number of images that were classified into that case (darker indicating more images), and the number on each grid is the relative success of predictions in that case (where 1.0 represents 100%). d, Precision–recall curves for our CNN (blue) and the SSD ResNet50 (red) to compare their predictive abilities, with recall measured as TP/(TP + FN) and precision as TP/(TP + FP), where TP are the true-positive predictions, FN the false-negative predictions and FP the false-positive predictions. If the confidence level of a prediction exceeds a threshold (for example, 0.8), the result is deemed a positive case; if not, it is a negative case. If the prediction matches the true label of the input then the output is true; if not, it is false. The area under the curve (AUC) of the CNN and ResNet50 curves, respectively, are 0.993 and 0.983.

In a diagnostic context, any invalid classification has only minor repercussions if it is identified immediately, as the test is being performed at the point of care and can simply be repeated. Such an event translates into only minor delays (as the assay is rapid, at <1 h from ‘sample to answer’) and marginal increases in costs. For example, the mislabelling of double-positive tests, where a patient with a P. falciparum infection is detected but does not obtain a positive Pan test, will result in a prompt for the operator to repeat the test, with limited impact on the patient’s outcome (the Pan primers cover all Plasmodium species, including P. falciparum, so any patients positive for P. falciparum should be positive for Pan). Alternatively, in cases where a test is misclassified as invalid or a double positive is misclassified as 1N2P, the patient would still be given the correct treatment. In all cases, as long as the test result is immediately available and the healthcare practioners are informed by the decision support tool, any repetition of the assay will not result in a delay to treatment of >1 h. Thus, in both cases, the system’s trustworthiness is not negatively impacted.

When compared to previously demonstrated approaches in decision support10, our CNN model is able to efficiently and accurately provide outputs that can be trusted. Furthermore, it does so using smartphone edge computing that does not rely upon connectivity to the cloud, thus making it more suitable for use in rural settings and inherently more secure. Interpretation of the AI output provides a prompt that supports the practitioners’ ability to understand what care pathway or treatment/action is suitable for each result in all possible eventualities. There is no further interpretation required by the practitioner to estimate possible misclassification probabilities or errors, enforcing high explainability with no need for transparency in the decision taken by the algorithm, so providing dimensions that translate to accountable reliable and trustworthy (ART) principles24.

Compared to the state of the art in malaria cloud-based diagnostics10, our approach greatly increases accuracy to 97.83%. It should also be noted that the use of AI decision support architectures allows the training to be continuously enhanced, thus providing further improved decision support capabilities over time. This would be beneficial for new tests that do not attain the same level of accuracy using laboratory-based training sets. In this context, the use of blockchain technology also ensures that images and results can be used for future training while abiding by stringent privacy constraints.

To validate the applicability of the platform, we performed field testing in a rural community in Uganda (as described in the Methods and elsewhere21). Information on the diagnostic device manufacturing and logging were recorded on the mobile phone in the UK, before arrival in Uganda, linked via a QR code, printed on each device. The operator in Uganda first scanned the QR code, before entering patient information (most commonly as an anonymized ID number), and then again before performing the sample nucleic acid amplification in the heater (Supplementary Fig. 1 and Supplementary Video 1). The analyst also scanned the QR code to record the results manually (as part of our internal validation protocols).

To further demonstrate the versatility of the blockchain/AI interface in enabling different AI strategies to be applied to the platform, the system was modified to detect each individual DNA-based lateral flow result, assign one of three classes (positive if two lines are found, negative if only one is read, invalid if none is present) and then combine these outcomes to reveal an overall readout, providing a decision support prompt for the result of the test as information on the detection or not of Plasmodium (Supplementary Fig. 2).

The ability of the platform to support different AI systems was validated by comparing an SSD ResNet50 neural net to obtain diagnostic test results with our CNN model. Both showed excellent performance in giving diagnostic predictions. The SSD ResNet50 and our CNN model have different advantages in analysing the diagnostic results. The CNN is both simpler and faster, while the SSD ResNet50 can provide more information, such as the result on each strip. Figure 5d provides the precision–recall curves for our CNN model and for the SSD ResNet50.

A total of 40 tests were carried out on 40 school children in a village setting. Only one test was incorrectly labelled by the model, which was a test on which the positive control (human gene) was labelled as negative (it should have been positive and thus valid). The model was able to correctly label 11 tests as invalid where experimental outcomes had been compromised (there were three reasons for the tests to be categorized as ‘invalid’, most commonly when channels were blocked and no sample reaching the strips, or when test strips did not show the control lane or when a control assay did not provide the expected result).

In future device iterations, when mass manufacturing processes such as moulding can be used, we would expect these issues to be greatly reduced. Importantly, this ‘invalid’ test outcome validates that our decision support system can identify such errors and inform the user if a test needs to be repeated. This can be performed readily in practice, at the point of care, so contributing to an enhancement of ‘trust’ in the technology by determining the efficacy of the test and distinguishing between failures and valid cartridges, so further improving the explainability of the model’s decision.

All results were unblinded retrospectively at The University of Glasgow against independent manual recording to avoid potential bias, but also to compare the results with the gold-standard PCR assay results (Supplementary Methods). Of the 28 tests that were correctly assigned and valid, 16 were true positives (positive for the manually recorded test, the blockchain records and real-time PCR), six were true negatives, three were false negatives and three were false positives (with respect to the gold standard). The results of all tests are provided in Supplementary Data 1. The blockchain implementation ensured the security of transactions, opening up the possibility for integration into surveillance databases, while maintaining the required safety around data privacy.

Integration and trustworthiness

Globally, the number of smartphone users has grown from 2.5 billion to 3.2 billion from 2016 to 2019, and this number has been predicted to rise to around 3.8 billion in 202125 worldwide. It is also anticipated that sub-Saharan Africa will remain the fastest-growing region, with growth of ~5% and an additional 167 million subscribers over the period to 2025. Smartphone penetration in sub-Saharan African countries continues to rise in the general population, with Uganda reaching 23% and the local Ministry of Health using smartphone apps to provide frontline health workers with access to patient healthcare records26. As the number of individual mobile connections increases, so will Internet of Things (IoT) connections (now approaching nine billion devices globally) also rapidly expand. IoT is already predicted to be one of the principal vehicles driving improvements in global healthcare provision8.

It is also predicted that increasing the versatility of smartphones will greatly reduce the costs of digital interventions when compared with traditional methods. Mobile health innovations are actively being developed for a wide variety of use cases5,27. However, there is still much room for improvement in terms of latency-tolerant solutions that do not depend on continuous network support, as there are still many geographic ‘dead zones’ with limited or no cellular service. We integrated this latter concern in the design of our system, which we ensured had a facility to enable all transactions to be stored in the mobile phone until a cellular service was available. Stakeholders in sub-Saharan Africa, including telecommunications industries and government agencies, continue to demonstrate a keen interest in this activity, with an overarching aim to address cost and infrastructure challenges.

By combining smartphones into this IoT context, we have demonstrated the capability, capacity and opportunity for edge computing to be used in such remote, rural geographical environments, where local conditions of connectivity to the internet (and thus the cloud) can be intermittent—so advancing the state of the art compared to existing cloud-based diagnostics such as the IoT solution for e-health28. Importantly, our system enables the integration of geotagged infectious disease prevalence and treatment information, endorsed through blockchain communication, to be introduced within digital data healthcare management systems, such as those being developed on the open-source platform DHIS223.

We have used machine learning approaches to classify images. In contrast to existing methods that have previously been implemented on a standard smartphone to assess data under a variety of conditions (for example, background colour, blur and light conditions) by using a CNN, we have obtained better, more accurate results when suitably trained with sufficient data. Necessarily, the size of the dataset is a key factor to be considered in the selection of a classification algorithm, particularly when environment conditions are varied29. Our results show that the CNN performs well, even when trained with a small dataset, and was sufficiently able to adjust to any lighting/background conditions encountered during testing in the rural community settings. Importantly, CNNs have very low demands in terms of memory and central processing unit in the inference stage, making them suitable for deployment on mobile phones and IoT scenarios in under-served community settings30,31.

We also note that, when compared to traditional computational vision approaches, which need developers to define with a high degree of granularity which features are important in every image, the CNN requires no expert analysis and adjustment in image classification tasks30,31,32. In this case, we trained our CNN with a dataset that only included images and corresponding labels, although we also note that our CNN model is retrainable, so it could be updated with different datasets and used for different diagnoses or different multiplex number, without having to redesign the algorithm.

Trust in healthcare delivery is perceived through trust in the data recorded and generated through the particular technology/system33. Interestingly, the use of AI in neural networks has recently been challenged in terms of trustworthiness34, particularly in safety-critical outputs, thus presenting a potential barrier in its acceptability in healthcare. Despite recent approvals of AI-as-medical-devices35, when diagnosis accuracy is the critical priority, trust is still linked directly to the ease of identification of false predictions and their subsequent effect. Through the implementation of blockchain, we can improve data provenance and enable standardization, thus improving trustworthiness in our overall system.

To achieve this, we developed end-to-end trustworthiness, which was enabled through the system itself, with layers of trust mapped hierarchically onto the device, while using the mobile app to implement the CNN inference and the blockchain network. By applying this architecture, we were able to integrate three distinct layers of trustworthiness within our diagnostic system: trust in data accuracy at the sensor layer, supporting the accountability and reliability of AI; trust in the decisions generated from the data, within the application layer, supporting accountability and explainability; trust in the security of the whole system, within the networking and processing layer, addressing confidentiality, integrity and availability. By doing so, we propose that our system addresses the full scope of trust, required in AI systems, as defined by the ART principles24.

We also addressed trustworthiness through authenticity and data privacy in the provisions of blockchain, with data integrity being addressed through the identification of valid/invalid tests within the CNN. Data availability was also supported by the blockchain architecture and the edge computing app, providing both global provenance and local diagnostic decision support. As a consequence, within this framework, the diagnostic data quality was ensured through the model’s accuracy in evaluation, including, for example, in discarding invalid tests. Device interoperability was itself fundamentally supported by the blockchain architecture, and data freshness was ensured by the point-of-care collection and processing of the diagnostic test. Such features are closely aligned with the WHO recommendations to support the benefits of digital health involving the use of decision support systems for healthcare workers36.

Finally, we note that the issue of trust, including that associated with cloud-enabled diagnostic testing, raises ethical concerns, including the capability and capacity to transmit personal identifiable information. Privacy preservation of private and individual data is of paramount importance, motivating the need for privacy preservation frameworks including the recently proposed BeepTrace networks7. Developing secure, trusted data transmission that can be endorsed is needed to overcome the concerns of privacy, security and data ownership. This is particularly important in decentralized diagnostics in low-resource areas, where data are collated and used by multiple agencies (including government, charities and universities). It is often the case that stakeholders involved in different aspects of the implementation of screening and treatment programmes within these ‘care systems’ are from different national states and may be operating under different legal frameworks. For example, data centres and servers, upon which information may be stored, may be subject to different data protection laws depending on their own location and jurisdiction, leading to potential issues not just over their ability to securely store data, but also over its ownership37.

We believe that our approach provides a framework that could inform an open-source connectivity standard for disease surveillance, treatment and ultimately for elimination programmes to build upon this resource. It provides a secure mechanism to connect the actionable information from rural and urban healthcare infrastructures with governments and healthcare agencies to implement and improve care pathways and outcomes. By way of example, in the case given in this study, permissions were needed both from central and regional offices from within the Ministries of Education and Health to both collect and use the data, while including a duty to report epidemiological findings directly to government.

Limitations

Following guidelines from the WHO, we have developed an ASSURED system that provides point-of-care testing2. The approach is not fully equipment-free as it requires a mobile phone and a heater device, although we argue that this is the minimum necessity for assay testing and an improvement in overall vertical cost. We note that the requirement for the use of smartphones (over earlier models with non-smartphones) represents a potential barrier and should be further evaluated in-country. However, as discussed earlier, such phones are becoming increasingly pervasive and ubiquitous, with widespread use in diagnostic applications in sub-Saharan Africa.

In terms of AI trustworthiness, we have evaluated the possible misclassification errors and proven that, in all cases, any errors would lead to a prompt, requesting repetition of the test. Any unintended biases introduced through the training set did not demonstrate a negative effect. This is evidenced through both the accuracy of the model and the specific misclassifications. The misclassification cases were effectively leading to appropriate actions and thus to a fully trustworthy system, in accordance with the ART principles24.

Our approach bases its safety and security provision on the implementation of blockchain features, although we recognize that recent studies have demonstrated that blockchain is not fully tamperproof, and that an approach using distributed ledger and a combination of classic cryptographic security for both the local data held in the app and the website application may be required in the future35. We also recognize that information is temporarily stored on the phone (for later propagation to the cloud) and that this, including similarly locally stored information on web browsers, could be another point with a security risk.

According to recent regulation, anonymization is no longer enough to preserve privacy38, and strategies that include obscuring the exact geolocation while preserving the locality are being proposed in the literature39. In our approach, we have focused on primary anonymization, using IDs to protect sensitive information. In the future, further steps might be required to ensure privacy in the data collected on the cloud database. However, the geolocation information that we collect relates only to the place the test was conducted (for example, local clinic or community centre), which does not in itself individually identify the people visiting this facility (thus satisfying the privacy requirements in terms of geolocation obscurity).

Finally, and as stated above, our CNN system allows for models to be incrementally retrained, either as more data are accumulated in the cloud or as new diagnostic opportunities arise, and, in the future, the effect of this on accuracy must be evaluated (such that if higher-accuracy models are developed, the improved model can be updated to all phone users).

Conclusions

We have reported a smartphone-based end-to-end platform for multiplexed DNA-based lateral flow diagnostic assays in remote, low-resource settings. Our decision support tool provides automated detection of the results and their analysis, supporting human expertise, and transactions involved in data handling are secured, trusted and endorsed using blockchain technology. In anticipation of future AI guidelines for healthcare40, we designed our platform so that it supports the following functionalities: explainability41, accuracy to enable AI decision trust, ethical use of data through privacy-preserving blockchain networks, interoperability to enable wider connectivity with divergent standards and policies, and data formatting for standardization and provenance. In the future, we will improve user-friendliness for practitioners in different sub-Saharan countries.

Methods

The diagnostic platform comprises both hardware and software. The hardware includes a three-dimensional (3D) printed mobile heater for LAMP-based diagnostics42 as well as a mobile phone and a low-cost disposable sensor cartridge, while the software includes an Arduino program, an Android app and a Hyperledger blockchain network. These are described in detail below.

Hardware

Circuit design for the diagnostic instrumentation

The instrument was designed as a low-cost diagnostic device, controlled by an Android mobile phone. Integral to the delivery of the DNA-based diagnostic assay is LAMP, performed by integrating paper microfluidics within low-cost disposable cartridges, as described and validated in our previous publication21. To implement the assay, the sample needs to be heated to a constant temperature of 65 °C, which is enabled either using the mobile phone On-To-Go (OTG) functionality or with a back-up battery power pack (through a micro-USB port, a two-way switch and a voltage regulator (LM317T)). The temperature is maintained using a control circuit (the circuit diagram is provided in Supplementary Fig. 3), a microcontroller unit, two temperature sensors (one acting as reference, one measuring the cartridge temperature) and a heating unit. This forms a small, lightweight, low-cost and long-lasting instrument (Fig. 2a and Supplementary Fig. 4).

The microcontroller uses a Bluno Beetle Arduino board (DF Robot). The heating unit comprises a thermoelectric generator (TEG; Peltier module, 0.76 W, 600 mA, 2.5 V, 15 × 15 mm, RS) and an n-channel MOSFET (IRLB8721). The temperature sensor unit uses two AD-8495 analogue k-type thermocouple amplifiers (Adafruit). The TEG serves as the heat source, controlled by the MOSFET. Through interaction with a mobile phone, the state of the heater (and thus the function of the device) can be both monitored and managed, including switching and cycling heaters ‘on’ or ‘off’ to maintain constant isothermal heating. For all assays, temperature profiles during LAMP are recorded as part of the quality assurance process.

Device design

The casing of the heater was designed with Autodesk Inventor 2019 and 3D-printed (Stratasys F170) with acrylonitrile butadiene styrene co-polymer (Fig. 2a,b). The heater includes an aluminium band around the LAMP reaction chambers in the cartridge (numbered 2 in Fig. 2a and 4 in Supplementary Fig. 2) to enhance thermal transfer and ensure homogeneity of temperature across the device.

The heater enables control of the temperature of the LAMP reaction chambers embedded within the plastic microfluidic chip. This uses a proportional, integral and derivative (PID) control mechanism in the Arduino code, adjusting the duty cycle of the output pulse width modulated (PWM) signal. Supplementary Tables 3 and 4 provide indicative manufacturing and operating costs, on a laboratory scale, illustrating the suitability for resource-limited settings.

Blockchain network

The blockchain network, shown in Fig. 4, was based on the open development toolkit Hyperledger Composer and Hyperledger Fabric blockchain network, which were hosted on a Google Cloud server. The core of the Hyperledger Composer blockchain network is a business network archive (BNA), including a model file, script file, access control file and query file. The BNA is deployed to an existing Hyperledger Fabric Runtime (including fabric ordering service, certificate authority and peer nodes). Users need to use a peer card, which contains the public key and their private key, to obtain access to the blockchain network.

The support of the representational state transfer (REST) application program interface (API) and GitHub Oauth authentication allows users to access the blockchain from a web browser on a standard desktop computer or from the mobile phone using a purpose-built bespoke Android app. The database service in the cloud allows a central point to collect information, enabling later analysis on geotagged disease propagation in communities, with a secure point accessible by healthcare providers across the hierarchy of the healthcare system. Anonymization of information in this database ensures privacy, while trust in the recorded data is always maintained, greatly improving the endorsement and privacy aspects compared with either manual or email transfer of records.

The asset (in our case the diagnostic device), participants (manufacturers), operators (the healthcare workers involved in the delivery of the diagnostic assays and their analysis) and transactions (as connections) are all defined in the BNA file (Supplementary Fig. 5). The diagnostic device is addressed with a unique identifier (ID), and the related information (including, for example, the date of manufacture) is printed as a QR code on the device (Fig. 2c). Participants have their own ID and username stored on the chaincode (the ledger). The role they can play is limited by access control, although they can create a new device record or update device information.

The algorithms are as follows:

Algorithm 1

ProduceDevice

Input: device ID, test name, manufacturer, date of manufacture, expire date, bench number, production place, status

Result: Add new device record to the chaincode

If device exists then

return

else

set test name, participant (manufacturer), date of manufacture, expire date, bench number, production place, status to device attribute

get asset registry

emit ‘produce device’ event

Algorithm 2

DoTheTest

Input: device ID, status, operator, test date, patient ID, gender, weight, URL (link to image of device after test), result, geo-location

Result: Add test information to existing device

If device does not exist then

return

else

update status

set operator, test date, patient ID, gender, weight, URL,

result, test place to device attribute

emit ‘do the test’ event

Deep learning

We implemented two different neural networks to analyse the results as images of the devices after the diagnostic test21, an example of which is shown in Fig. 2c. We developed a CNN model based on Keras TensorFlow 2.0, using an object-detection model within a Python program for data analysis as a fast classifier, helping local healthcare workers to test the results after each diagnosis. The functions include loading images from Firebase cloud storage and validating and comparing the test results with the records stored on the blockchain network.

An object detection model based on a faster region-based convolutional neural network (R-CNN) ResNet50 model was also developed alongside the CNN but was not implemented in our app. Instead, it was used as a gold-standard reference for post analysis to independently validate the results. Both methods were developed based on TensorFlow 2.0 Keras.

Classification network

The CNN model was developed and integrated into a mobile app to classify the images of the paper-based microfluidic diagnostic tests, automatically. The five-plex DNA diagnostic strips, including species-specific diagnostics for Plasmodium sp. as well as controls, were used as designed previously21, based on lateral (capillary) flow with a control line and a test line (Fig. 2c). These comprise two test strips for detecting Plasmodium falciparum (Pf) and Plasmodium pan (Ppan), which cause malaria, one positive control channel (using a BRCA1 human gene) and two negative control strips (one for each species) (see below).

The test strip in each channel has three possible outcomes: negative, positive and blank (invalid). Thus, using all combinations of results across the five lateral flow strips gives 243 different possible result scenarios, including operator errors. To reduce the complexity of classification, the outputs are subdivided into five categories, as described in Supplementary Table 2 (‘+’ for positive, ‘−’ for negative, ‘/’ for invalid). Supplementary Fig. 6 shows an example of the result in each class.

The training datasets were obtained by carrying out targeted tests on synthetically prepared samples. Positive samples were obtained from LAMP of a Pf target sequence (the WHO DNA standard obtained from the National Institute for Biological Standards) at 105 copies per reaction. Negative samples were obtained by LAMP using Pf primers and probes without any target (in this case using deionized water). Both networks were initially trained in the cloud. The trained network was then incorporated in the Android app for edge-computed decision support.

To increase the range of intensities in the bands available for training, amplicons were used at different amplification times (5, 10 and 15 min), leading to 100 images in each class. To reduce the training time and improve the accuracy, the images were cropped in the app to a small 16:9 picture that contained results (Fig. 2c). All training images were resized to 128 × 128 × 3 before sending to the model. An image generator (TensorFlow) was used to address the overfitting issue caused by the small dataset. During the training process, the image generator randomly adjusted the parameters of the input images, such as brightness, contrast, zoom range and orientation, at the beginning of every training step.

The CNN was based on the TensorFlow Keras sequential model, which is ‘a plain stack of layers where each layer only has one input tensor and one output tensor’43. The structure of the sequential model is simple, allowing us to build it in a shorter time with Keras API, which generated computationally lightweight models suitable for smartphone deployment44.

The structure of the CNN was fine-tuned by adjusting the number of layers and parameters such as the number of nodes, batch size and learning rate, through an iterative process to give our final CNN. To balance the model between overfitting and underfitting, several models with different structures were trained with the same training dataset and tested with the same test dataset (the training and test sets are independent of each other) to increase accuracy and lower loss.

Our model hosted 16 layers—four convolution layers, four max-pooling layers, a flatten layer, three dropout layers and four dense layers. The model structure is shown in Supplementary Fig. 7. The convolutional layers extract features from the input images by scanning the input with a weighted matrix (convolution kernel). The process of generating a single feature map can be presented as

$$A_j = f\left( {\mathop {\sum }\limits_{i = 1}^N I_i \times K_{i,j} + B_j} \right)$$
(2)

Every input matrix Ii is convolved with kernels Ki,j, and a bias Bj is added to every element in the sum of convoluted matrices. The nonlinear activation function f(x) is applied to the matrix. All convolution layers use the activation function ReLU (rectified linear unit) to improve the learning speed and nonlinearity of the model, setting all negative values of input matrices to zero. The max-pooling layer reduces the dimension of the output matrices of the previous convolution layer, using a 2 × 2 kernel with stride 2 to scan its input and taking the largest number from four adjacent elements.

To extract sufficient features and detail while reducing the number of parameters in the training process, four convolutional layers were implemented, with a pooling layer following each convolutional layer. The output of the last pooling layer was flattened to a 1D tensor and sent to the fully connected dense layers by the flatten layer. As the training dataset was relatively small and only had three categories, the model needed to have more fully connected (FC) layers and relatively fewer neurons45. Consequently, three dense layers (size 128) and one dense layer (size 5) were used to obtain better accuracy. Between every dense layer, a dropout layer was utilized to prevent overfitting. The first three dense layers also used ReLU as their activation function, with the last dense layer, which was the output layer, using SoftMax (S(xi)) as its activation function to provide the predictions and their probability:

$$S\left( {x_i} \right) = \frac{{{\rm{e}}^{x_i}}}{{\mathop {\sum }\nolimits_j {\rm{e}}^{x_j}}}$$
(3)

Object detection network

The same dataset without cropping was used and labelled with LabelImg for training the object detection model. There are five labels in the label map—negative, positive, empty, device and QR code—where negative, positive and empty indicate the outcome of each strip. After labelling, the images were divided into two separate subsets, 90% for training and 10% for testing, and corresponding tf-record (a format of the TensorFlow dataset) files were created.

Android app

The Android app was designed with Android Studio v3.5+ in Java. The minimum requirements of using this app are Android version >5.0 and Bluetooth 4.0. It provides different functions and screens for different users (screenshots are provided in Supplementary Fig. 1 and three clips of use in Supplementary Video 1 as the different participants). Manufacturers can add new device records to the blockchain from the app.

The operators also use the app to control the LAMP heater via Bluetooth. After each diagnostic test, the healthcare worker only needs to take a picture of the device and enter any metadata, as a text file, about the test. The app connects the image to cloud storage and updates the device information on the blockchain. Device information can be viewed simply by scanning the QR code on the manufactured device or manually entering the device ID.

The location API provided by Google Play is used to collect geographic information. When the app is launched, it requests permission for using the location data. Once a new device record is created or a diagnostic test is complete, the app obtains the location information using Wi-Fi, a cellular tower or GPS (depending on availability and battery charge levels). This geographic data, including latitude and longitude, are uploaded with other information to the blockchain network.

Multiplex LAMP system

The primer sets used for the LAMP assay were based on previously published21 primer sequences for P. falciparum, and a BCRA1 gene fragment serves as positive control. The primers were all purchased from Eurofins Genomics. The sequences were provided by Reboud et al.21, and the reactions were amplified for no more than 45 min at 65 °C. For training the CNN, lateral-flow test strips were obtained from devices with experiments carried out with artificial Pf templates as described above.

Field testing

Field testing was carried out in Uganda and followed the same protocol as previously used21 to demonstrate the functionality of the platform in the field. Briefly, we tested blood samples collected from 40 school children from Kocoge Primary School in Tororo District. This study was conducted as part of the activity carried out by the Vector Control Division (VCD) of the Ministry of Health (MoH) in Kampala, Uganda on neglected tropical diseases, and was approved by the VCD MOH Research and Ethics Committee (VCDREC/078) and Uganda National Council for Science and Technology (HS 2193). Anonymized pupil details were computerized and tagged using ID numbers. The participants were 5–12 years old and gender balanced. No personal data were revealed to the investigators. Written informed consent of the children’s parents and the head teacher were also obtained (see protocols in the previous study21 for details). All samples were retrospectively retested by PCR in the UK21.

Ethics approval was carried out on the basis of presumed positive, given the high prevalence of the disease, which is endemic in the region. All individuals were treated accordingly following testing under MoH Uganda guidance. Analyses were performed in the children’s classrooms. For each individual, a finger-prick (~5 µl) of whole blood was used, with sample processing including sample lysis, DNA extraction and amplification performed using the paper ‘origami’ protocol, as previously published21 (further details are provided in the Supplementary methods).

For each sample, the person running the test scanned the QR code of the device to be used (already ‘created’ by the manufacturer) and entered the required information on the test (Supplementary Video 1) before inserting the device into the heater, controlled by the phone for amplification. The QR code was scanned and a picture of the results taken, directly in the field without any specific control (that is, without any ‘reader’), generating images within a changing environment. The phone then returned the results for interpretation by the ‘analyst’, who could provide decision support to the person in charge of treatment. All testing steps (including results) were also recorded manually to ascertain the validity of the results presented. When network connectivity was not available, the transactions were stored in the phone until connectivity was restored.

All analyses were double-blinded between on-site field testing and reference tests, performed retrospectively using real-time PCR in the laboratories at the University of Glasgow as a gold standard, as described in detail in the Supplementary methods. Data analysis was performed using Microsoft Excel for Mac (v16.44) and Origin (OriginLabs, v2016). After testing at the local community school in Uganda, all used paper devices and small plastic consumables were incinerated by burning, while glass slides and RDTs used as reference techniques were stored in a biohazard container for safe disposal at the VCD, Kampala.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.