## Main

In this supplement, we aim to provide the first in a series of simple, user-friendly operational guides on how to design and conduct evaluations of diagnostic tests for infectious diseases that are of public health importance in the developing world. Each guide will contain a set of general principles on the design and conduct of diagnostic evaluations followed by disease-specific considerations. The first in this series is the malaria guide. This article provides background information and discusses why such guides are needed and their importance in improving the diagnosis of infectious diseases in the developing world.

The need for good quality diagnostic tests

The lack of access to good quality diagnostic tests for infectious diseases contributes to the enormous burden of ill health in the developing world, where infectious diseases are the major causes of death and account for more than half of all deaths in children1 (Table 1, Fig. 1). Each year, more than 2 million people die of malaria, approximately 4 million of acute respiratory infections and almost 3 million of enteric infections. HIV and tuberculosis together are estimated to kill some 5.8 million people each year2,3. More than 95% of these deaths are in developing countries. Early diagnosis and treatment not only reduces the risk of the patient developing long-term complications but for diseases such as tuberculosis, sexually transmitted infections (STIs) and HIV, prompt treatment also reduces further transmission of the disease to other members of the community.

A confident diagnosis can sometimes be made on the basis of clinical signs or symptoms but accurate diagnosis usually requires a specific diagnostic test, often involving access to a diagnostic laboratory. In settings where access to diagnostic laboratory services is limited, the WHO recommends the use of a syndromic approach to clinical management, where patients presenting with a particular syndrome are treated for all of the major causes of the syndrome. Algorithms for syndromic management have been developed for STIs and for common childhood diseases, the latter through the integrated management of childhood illness (IMCI)4,5. Although such algorithms are simple to use and the recommended treatment packages are generally inexpensive, a major disadvantage of this approach is the risk of giving inappropriate treatment to people without the syndromically diagnosed disease and the accompanying potential for inducing antibiotic resistance. Diagnostic tests can complement syndromic management by facilitating evidence-based management of patients, improving the specificity of treatment and, in some diseases, allowing contact tracing and other disease-control measures.

Laboratory testing is perhaps most useful for detection of asymptomatic infections to prevent development of sequelae and transmission, and for public health surveillance and interventions. Table 2 shows the role of diagnostic tests in the control of some of the diseases that are prevalent in developing countries6.

Good quality diagnostic tests that are fit for purpose and provide accurate results are therefore of paramount importance in reducing the burden of infectious diseases (Box 1). The choice of which diagnostic test to use depends on which tests have been approved for use by regulatory authorities in a particular country (if they are regulated at all) and which tests have been purchased for use in the health service; and the physician's decision on which of the available tests he or she judges might be useful in clinical decision making. Unfortunately, in many developing countries, clinical care is often critically compromised by the lack of regulatory controls on the quality of diagnostics, and physicians can be faced with having to select tests based only on information provided in the product insert or on published data that often originate from inadequate or flawed study designs.

The development of diagnostic tests

The development of a diagnostic test usually follows a path from identification of the diagnostic target and optimization of test reagents to the development of a test prototype (Fig. 2). Proof-of-principle studies are then conducted to establish that the test detects the intended target. The test then undergoes further evaluations, first using 'convenience' samples or archived specimens, followed by evaluations in populations of intended use. These trial results are used to obtain data for regulatory submission and approval so that the tests can be marketed and sold in a country. For post-approval marketing purposes, companies often fund physicians to conduct studies to demonstrate the utility and potential impact of the diagnostic test.

Test characteristics requiring evaluation

Diagnostic tests can be purchased by patients, health providers, clinics, hospitals, national disease control programmes, procurement agencies for organizations or donors. Although the criteria on which procurement decisions are made can vary, selections are generally based on the factors discussed below.

Test performance. Test sensitivity and specificity, and the positive and negative predictive values of a test, are important considerations. High sensitivity is important for a screening test for diseases such as syphilis where a missed diagnosis has serious consequences. Poor specificity might matter less if over-treatment rarely results in adverse side effects, as in the treatment for syphilis, but might be a serious disadvantage if the treatment is highly toxic as, for example, is the case with drugs used to treat advanced trypanosomiasis (sleeping sickness).

Ease of use. The number of processing steps, whether the test can use whole blood and the need for accurate timing will influence the extent of training and supervision required.

Conditions of use. In hot or humid conditions, the selection of tests that are heat-stable and individually packaged in moisture-proof pouches is a priority.

Conditions of storage. There are defined storage temperatures for most tests. If the temperature in the clinic is above 30°C and the accuracy of the test results is not guaranteed above this temperature, periodic quality-control checks to ensure the ongoing validity of the tests are needed.

Shelf life. A long shelf life reduces the pressure on the supply chain and the probability of wastage of expired tests. Tests which have a shelf life in excess of 18 months are recommended for use in remote, poorly resourced areas.

Of these factors, the test performance is of paramount importance. Many diagnostic evaluations therefore focus primarily on evaluations of test performance, that is, the sensitivity and specificity or the positive and negative predictive values.

Lack of regulatory standards and guidelines

National regulatory processes should provide safeguards for the safety and effectiveness of drugs used in a country. The tightening of governmental regulatory requirements for drugs in developing countries has done much to improve the standardization and quality of drug trials, in which efficacy and adverse effects are assessed and compared. Unfortunately, regulatory standards are often lacking for diagnostic tests, especially those targeting diseases that are uncommon in industrialized countries. As a result, diagnostic tests are often sold in the developing world without any formal evaluation of their performance and effectiveness. An exception to this is tests used for blood banking, for which rigorous international standards exist.

WHO/TDR conducted a global survey of regulatory practices for diagnostic tests in 2001. A questionnaire was sent to all 191 WHO member states to enquire whether in vitro diagnostics, other than those used for blood banking, were regulated in their country and, if so, whether clinical trials were required for regulatory approval. Of the 85 countries that responded, less than half (48%) reported that they regulated in vitro diagnostics for infectious diseases7. A greater number of countries in the developed world regulate in vitro diagnostics compared with the number in the developing world (Fig. 3a). Of the countries that regulated diagnostics, 68% required the submission of clinical trial data (Fig. 3b).

There is also variability from country to country in terms of which tests for specific infectious diseases are regulated. Of the 24 countries that provide these data, 83% regulated diagnostics for HIV, 92% for hepatitis, 42% for STIs and 13% each for tuberculosis and malaria7.

An industry survey conducted by WHO/TDR in 2003 found that companies can spend from as little as US$2,000 to more than US$1,000,000 on diagnostic trials of different products, with some diagnostic trials conducted in as few as 15 patients (unpublished TDR data).

Even when clinical trials are mandated by regulatory authorities, there is a lack of national and international guidelines for the evaluation of diagnostic tests for diseases that are prevalent in developing countries. Standards for the evaluation of diagnostic tests are set by regulatory bodies such as the US Food and Drugs Administration (FDA) and the European Union, and, for example, the Clinical and Laboratory Standards Institute in the USA publishes standards that are widely used by manufacturers targeting markets in established economies. However, these standards were developed for the evaluation of tests in developed countries and are often not applicable for diseases that are prevalent in the developing world.

Data from clinical trials designed to evaluate the performance characteristics of diagnostic tests are often found on product inserts or they remain in the company files. Although every product insert contains claims of high sensitivity and specificity, there is no requirement to report the sample size or the confidence intervals. One product dossier recently submitted to WHO/TDR showed that the test was evaluated in more than 100 patients, of whom only three were positive for the disease by the reference standard (unpublished TDR data); the claim was that the test is 100% sensitive and 100% specific. In many countries the lack of regulatory oversight on the design and conduct of diagnostic evaluations has led to inflated claims of test performance in product inserts.

This underscores the need for a set of international standards to regulate diagnostics for infectious diseases (outside of blood banking). A global harmonization task force has published guidance on the regulation of medical devices and a scheme for classifying medical devices, but plans for international standards for regulatory approval of diagnostic tests of public health importance in the developing world are still in the distant future.

Evaluations published in the peer-reviewed literature

The design and quality of trials of diagnostic tests have profound effects on the estimation of the performance characteristics of these tests. Trials to evaluate the performance and operational characteristics of diagnostic tests can be conducted by test manufacturers, public health agencies and end-users such as physicians and laboratory managers in hospitals and clinics. These studies are sometimes published in the peer-reviewed literature, either sponsored by the manufacturing company or conducted by independent investigators. Reviews of diagnostic publications since the late 1970s have shown that although the quality of diagnostics trials is improving, many are still lacking in rigour8,9,10,11,12. For industry-sponsored studies, this might be because diagnostics for infectious diseases that are prevalent in the developing world tend to be produced by small biotechnology companies that have relatively few resources and limited expertise in field trials13. Some common design problems in diagnostic evaluations are listed below.

Evaluation in an inappropriate study group. To assess properly how a test will perform in routine use, diagnostic evaluations must be performed in a study group that is sampled from the population for which the test is intended. The diagnostic performance can vary in symptomatic and asymptomatic patients, and can also differ when diagnostics are used to detect active versus latent disease. The data from diagnostic evaluations are only useful if there is an adequate description of the study group used in the evaluation, with well-defined inclusion and exclusion criteria and an adequate sample size for each sub-population.

Evaluation in an inappropriate setting. Evaluations in low-prevalence settings can result in a much higher proportion of false-positive to true-positive results than would be found in a high-prevalence setting.

Inappropriate purpose. Diagnostic tests used for screening of asymptomatic patients, diagnosis in symptomatic patients, surveillance, and verification of elimination all require different performance characteristics. Diagnostic trials should be designed and conducted for a specific purpose to yield meaningful results.

Inappropriate reference standard test. The reference standard test is the comparator for the test under evaluation. The selection and the quality of the reference standard test directly affect the measurement of test performance. An ongoing challenge for diagnostic evaluations is to deal with trials where the test under evaluation is more sensitive and/or specific than the reference standard test.

Inadequate sample size. For reasons of economy and time, diagnostic evaluations are often conducted in a small number of patients, leading to wide confidence intervals around the estimates of sensitivity or specificity14.

Lack of blinding. Providing readers of the reference standard test with the results from the test under evaluation, or vice versa, might artificially inflate the agreement between both. A recent review of studies reporting diagnostic evaluation of tests for tuberculosis showed that only 34% reported any form of blinding12.

The quality of evaluation trials. The proficiency of the site staff in performing the reference standard test and the test under evaluation is often difficult to discern from publications or from manufacturers' dossiers.

Reid et al. examined diagnostic evaluations reported in four prominent general medical journals from 1978 to 1993 and found that less than half the studies fulfilled more than three of the seven methodological standards outlined in Figure 4 (Ref. 10; see also Ref. 15 for a more recent evaluation).

Initiatives to improve the standard of diagnostic evaluations

Apart from the standards set by national regulatory agencies such as the US FDA or the equivalent organization in Thailand, there are several other initiatives that provide guidelines on diagnostic evaluations.

Meta-analysis and systematic reviews. In 1994, guidelines were published for the conduct, reporting and critical appraisal of meta-analyses evaluating diagnostic tests16. A systematic review of near-patient test evaluations in primary care was conducted in 1999 in an attempt to identify and synthesize results from studies that examined the performance and effect of such tests17. One hundred and one relevant papers published between 1986 and 1996 were identified. The authors concluded that the quality of the papers was generally low. The performance of most tests had not been adequately evaluated and most papers reported biased assessments of the effect of near-patient tests on patient outcomes, organizational outcomes or cost.

Such meta-analyses serve both to provide an overall summary of diagnostic accuracy from several studies and to identify deficits in published studies that need to be addressed in future studies.

The STARD initiative. The variable quality of publications on diagnostic evaluations led to the launch of the STARD (Standards for Reporting of Diagnostics Accuracy) initiative in 2003 (Refs 18, 19). STARD aims to improve the quality of diagnostic test evaluation reported in the peer-reviewed literature. The STARD checklist is included in the general guide (Evaluation of diagnostic tests for infectious diseases: general principles) in this supplement. It is hoped that this initiative will gradually have an impact on the design and execution of trials in the developing world.

Other reference material. Other available reference material ranges from articles on how to read a paper on diagnostics to books on the design and conduct of field trials of health interventions in developing countries20,21. Although general recommendations can be found in various sources, there is limited disease-specific guidance on the design and conduct of diagnostic trials for diseases prevalent in the developing world. These disease-specific considerations include which populations should be targeted, what reference standard should be used, how to define case and control populations, how sampling should be performed and how to ensure blinding of results.

Deep and the development of best practice

In the absence of robust standards for diagnostic trials, scarce public sector resources might be wasted on diagnostics that not only lead to mismanagement of patients but also have little impact on reducing the disease burden. There is a need for stricter controls on the introduction and use of diagnostic tests in national public health programmes in many developing countries, based on the rigorous evaluation of tests before, or during, deployment. Data on the performance and operational characteristics of diagnostic tests from well-designed trials are required to allow those responsible for procuring tests to make informed decisions about the choice of specific tests.

WHO/TDR has assembled a Diagnostic Evaluation Expert Panel (DEEP) to advise WHO/TDR and its close collaborator, the Foundation for Innovative New Diagnostics (FIND) (Box 2), on recommendations for best practice in the design and conduct of diagnostic trials for selected infectious diseases of public health importance in the developing world. One of the first tasks of the panel was to produce a set of general principles for the design and conduct of diagnostic evaluations that are harmonized with the current standards established by national and international agencies and, by various initiatives, to improve the standard of diagnostic evaluations. This will be followed by a series of disease-specific recommendations on how the necessary methodological standards can be fulfilled in the evaluation of diagnostics for diseases of public health importance to the developing world. The first in this series is the malaria guide.

Our aim is to provide a set of simple, user-friendly operational guidelines on the design and conduct of diagnostic trials, to support regulatory agencies in the consideration of registration applications, to provide procurement agencies and international health agencies with performance benchmarks, and to enable scientists in developing countries, especially those working on disease control in the public sector, to evaluate diagnostic tests in accordance with international standards.