Main

There has not been a medication approved for neonates that has significantly improved neonatal outcomes in over 25 years (1, 2) despite the plethora of drugs developed to treat disease processes in adults. In fact, this vulnerable population has often been excluded from clinical drug trials due to safety concerns and the absence of accurate and reproducible outcome measures (2, 3, 4), leaving physicians with no choice but to prescribe off-label medicines. Regulatory-approved evidence-based treatments and the development of safe formulations are vitally important in all clinical conditions unique to the neonatal population (2), including, but not limited to, neonatal abstinence syndrome, necrotizing enterocolitis, retinopathy of prematurity, and bronchopulmonary dysplasia.

To expedite and harmonize rigorous neonatal drug development and clinical research in these areas, there is an urgent need for biomarkers to better define disease states and treatment response. A biomarker is an indicator that can be used to objectively measure and reproducibly evaluate a biological condition or a process (5, 6). Classifications of biomarkers have been defined by the FDA-NIH Biomarker Working Group in the BEST (Biomarkers, EndpointS and other Tools) resource: diagnostic, monitoring, pharmacodynamics/response, predictive, prognostic, safety, and susceptibility/risk biomarkers (6). The focus in this report is on the biomarkers used to measure the treatment response (pharmacodynamics and safety), which indicates a physiological change in response to an intervention, and can be used to define primary and secondary outcome measures in clinical trials. Monitoring biomarkers are serial measurements taken to evaluate the changes in disease status during or after the treatment (6). Pharmacodynamic biomarkers indicate the effectiveness of the intervention as a measureable change in physiological response (5, 6). Safety biomarkers reflect the toxicity and may signal that adverse event is likely to occur (5).

The objectives of this study are to determine how biomarkers are currently being used as clinical trial outcomes in a sample of published neonatal intervention studies, to evaluate the quality of the response biomarker reporting and identify the gaps in knowledge to inform future biomarker research initiatives.

Methods

A validated search strategy (Supplementary Material S1 online) for prospective, pediatric intervention studies published in 2014 was performed in March 2015 (see Figure 1 for inclusion and exclusion criteria). The goal of this exercise was to characterize the methodological features of published clinical trials, including population and outcomes. All trials were reviewed in duplicate with independent, verbatim data extraction, and were coded with trial design characteristics, including population, intervention, control group and outcome measures according to a previously described standardized data dictionary (available by request from the corresponding author). Participant age groups were defined by the time of recruitment according to the standards for research in child health age groups (7). The reported primary and secondary outcomes were classified as pertaining to death/survival, life impact, resource use and pathophysiological manifestation based on the OMERACT(Outcome Measures in Rheumatology) 2.0 filter (8). Outcomes that were classified as pathophysiological manifestation underwent a subclassification into six subgroups: biomarkers, pain, physiological, psychosocial, and behavioral and others. The biomarkers were defined according to the NIH definition: a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention (e.g., blood glucose and urine cultures). Pain outcomes included any measure of pain relief or pain prevention. Physiological outcomes were defined according to an adapted NIH definition: a characteristic or variable that reflects how a patient feels, functions, or survives (e.g., measurements of disease progression, absence of illness, and bowel movements). Neonatal weight, head circumference, and body composition (lean mass and fat mass) were classified as physiological (growth and development) outcomes. Psychological and behavioral outcomes included attitudes or responses, for example, motor and neurocognitive effects, depression assessment scores, and caregiver satisfaction. Outcomes classified as “other” included those that did not align with the aforementioned categories, for example, the use of an exogenous surfactant after prevention therapies or nonspecific (no details provided) “adverse events.”

Figure 1
figure 1

Modified PRISMA flow diagram (23).

A subset of pediatric trials published in 2014 was selected for analysis in this review using predetermined criteria. A convenience sample of 1 year was selected to provide a rapid snapshot of the landscape of biomarker use as trial outcomes. The inclusion criteria were prospective, intervention studies recruiting neonates (preterm and/or term) that also reported the use of a biomarker for either the primary and/or secondary outcome. Biomarkers were classified as safety and pharmacodynamics depending on their purpose. Safety response biomarkers were used to measure toxicity (e.g., neuron-specific enolase levels to mark cerebral injury), while pharmacodynamic response biomarkers were used to measure the physiological change (e.g., measuring amino-acid profiles to evaluate the tolerance of parenteral nutrition) and included serial measurements (monitoring biomarkers). Sampling techniques were evaluated, including the type of sample, frequency of collection, and the volume collected (where appropriate). If a range of volumes was reported, the average was used (e.g., 3,000–5,000 μl was considered as 4,000 μl). Sample volumes reflect the total amount collected for analysis and may include more than one biomarker. The assay and instruments used for biomarker assessment were reported as well as established reference ranges with source information. Categorical data are reported as percentage and number, and sample volume and the number of participants are reported as median and range.

Results

Our initial search strategy yielded 11,947 pediatric intervention studies published in 2014. Following abstract screening, 3,375 publications were downloaded in full text. This included 167 intervention trials recruiting term and/or preterm neonates, and 35% (59/167) of these trials met our inclusion criteria and reported the use of response biomarkers. A total of 59 trials met our inclusion criteria and were evaluated in this analysis. Fifty-eight percent (34/59) of the included trials recruited preterm neonates, while thirty-four percent (20/59) recruited term neonates and five of them included studies recruited from both preterm and term neonates. The methodological features of the included trials are summarized in Table 1. The median total number of recruited participants (all gestational ages) was 87 (range 9–1,100). The majority of publications (83%, 49/59) declared their funding source. The most common funding sources were government 32% (n=19), academic or research institute 29% (n=17) and private funding including foundations 29% (n=17). Industry funding was reported by 9 studies, while 13 studies reported multiple funding sources.

Table 1 Characteristics of neonatal trials in 2014 reporting response biomarkers (N=59)

The 59 included trials reported a total of 275 response biomarkers as primary or secondary outcomes (Table 2). There were 133/275 unique response biomarkers that were used only once. A detailed list of all the reported response biomarkers can be found in Supplementary Material S2 online.

Table 2 Summary table of reported pharmacodynamic and safety response biomarkers

The most frequently reported category of response biomarkers was markers of homeostasis that include a broad range of proteins and ions such as albumin, amino acids, and potassium (Supplementary Material S2 online). In terms of individual neonatal response biomarkers, however, oxygen saturation and heart rate measurements were the most commonly used and reported in ~25% (13/59) and 19% (11/59) of interventional studies, respectively. Yet, the methods of measurement varied considerably: methods of measuring oxygen saturation were described as oximeters (n=6; brand not specific), blood sampling (n=1), patient monitors (n=1), infrared spectroscopy (n=1), or were not described (n=4). The methods used to measure heart rate included ECG (n=2), oximeters (n=1), patient monitors (n=1), or were not described (n=7). Other common response biomarkers included serum creatinine and bilirubin, which were reported in seven and six trials, respectively (Table 2). Interestingly, there were 76 unique response biomarkers reported only once. Of note, most response biomarkers were reported as secondary outcomes (88%, 241/275) and were typically used to measure the pharmacodynamic response following an intervention (84%, 227/275).

Safety response biomarkers were reported in 27% (16/59) of neonatal intervention trials published in 2014, reflecting that only 18% (49/275) of all neonatal response biomarkers were assessed in this study. Only one publication (9) included references to support their safety biomarker rationale and definition. This subset of safety response biomarkers was primarily used to define adverse drug reactions, for example, “sepsis was defined as clinical signs of infection and either a positive blood culture result or hs-CRP [high-sensitivity C-reactive protein] level>10 mg/l” (7) or “cholestasis was defined as a direct bilirubin concentration >20% of the total bilirubin concentration” (8).

With respect to the analytical technique, 59% (162/275) of published biomarkers included the analytical method or technique used for quantification. The references for the selected analytical techniques were provided in only 21% (57/275) of the cases, with only 7% (21/275) that provided the data on the quantification limitations of detection/analysis. Moreover, there was considerable variability not only in the technique used for quantification, but also in the sample type (e.g., arterial vs. cerebral oximetry), and in the cutoffs/definitions across studies. For example, oxygen saturation endpoints were reported as the number of intermittent hypoxia events per hour, seconds with <80% SaO2 per hour, time <85% SaO2 per hour, time below 90% SaO2, and time with arterial oxygen saturation within the target range (10, 11).

In terms of specimens collected for response biomarkers, blood samples were the most common (54%, 148/275) (Table 3). The volume collected was reported in only 17% (25/148) of the publications assessed, with a median-reported collection volume of 4,000 μl (range 200–4,000 μl; 17 biomarkers were evaluated in 3–5 ml of blood). Although venous blood sampling was reported in 53% (78/148), 35% (52/148) of studies did not specify the source.

Table 3 Reporteda neonatal response biomarker sampling type

Discussion

Identification, validation, and a consistent approach to the use of response biomarkers are essential in both preclinical and clinical drug development (12). Recent reports highlight the importance of biomarkers in the drug development process with the opportunity to bridge clinical chemistry with patient-centered outcomes (13). Our surveillance of recent clinical trials illustrates the overall lack of harmonization in the selection, collection, measurement, and reporting of response biomarkers in neonatal intervention studies. Furthermore, even for the commonly reported response biomarkers such as heart rate and oxygen saturation, we identified a considerable variability in quantification techniques, definitions and cutoff values. Thus, it becomes increasingly difficult to establish normal ranges and the expected population variability when each study is evaluating different response biomarkers with various analytical techniques often for the same purpose (e.g., evaluating the effects on normal growth and development), and frequently without reporting complete methodological details.

Distinctions should be clearly made when investigators measure a biomarker as an outcome measure. Standardizing neonatal biomarker definitions presents an additional challenge as a significant overlap exists in biomarker classification. For instance, creatinine was used both as a safety response biomarker to indicate renal toxicity and as a pharmacodynamic response biomarker in a nutritional study as a marker of nutritional status. As a biomarker can be used in multiple contexts such as diagnostic tools, a predictive marker or surrogate endpoints (5), it is important to provide a clear context and rationale to support the purpose of the biomarkers use as well as the reference values. Given the limited resource of published literature, it would be advantageous for investigators to report all the information available regarding response biomarker selection, sampling, handling, and analysis. This is evidenced by our finding that for 20% of the biomarkers described in the 2014 literature, there was limited information on the sampling medium or technique used, making it nearly impossible to replicate and validate the findings. Further hindering the replication efforts, our results revealed that analytical techniques were not described for 40% of the reported response biomarkers. Thus, a transparent methodology and reporting guidelines are needed to correct this problem (13, 14, 15) to ensure that sufficient information is collected from studies that typically enroll a relatively small number of neonatal patients as compared to trials involving adult patients.

It is important to note that the nomenclature and analytical techniques can be a great source of variability in the interpretation and significance of response biomarkers (16, 17). As analytical equipment, methods (including assays or antibodies) and software change over time, it is essential that the methods for quantifying biomarkers are well described. Analytical performance characteristics related to precision and accuracy, such as the lower limit of detection (LOD) and test–retest reliability, are critical for interpretability. If neonatal response biomarkers are being evaluated on assays developed for adults, the LOD may not be sensitive enough to detect or quantify the change. Reporting on performance characteristics ensures the reader that appropriately sensitive and specific techniques were used for quantification while addressing the potential limitations of analysis. The Standards for the Reporting of Diagnostic Accuracy Studies (STARDs) initiative was founded in 2013 to improve the reporting of diagnostic testing and foster study replication (18). STARDs have made publically available recommendations for reporting test methods, as summarized in Table 4. This may be a useful starting point for future initiatives to develop harmonized reporting guidelines for the use of response biomarkers (18). If approached with rigor, academic-led neonatal clinical research can provide the foundation of data required to support the biomarker qualification by regulatory authorities, can yield useful evidence regarding surrogate endpoints, and may provide a framework for advancing a particular area of the disease (5). Incomplete reporting squanders the opportunities to bridge the gap between regulatory requirements and academic research.

Table 4 Test methods reporting from the Standards for the Reporting of Diagnostic Accuracy (18)

This study is limited due to the short time frame included. In addition, it did not include biomarkers for diagnostic, prognostic, and predictive purposes. That said, this cross-sectional 1-year analysis was meant to provide a snapshot of the quantity and quality of neonatal response biomarker selection and reporting to illustrate a common problem among all trials. We are also limited as to quantifying the use of monitoring biomarkers, as we did not prospectively plan to collect data on the number of times a response biomarker was measured. To improve the quality and promote the use of harmonized response biomarkers as pharmacodynamic and safety outcomes in neonatal intervention studies, we suggest the following avenues for improvement:

  1. 1

    Studies should provide a clear context, definitions and a rationale regarding the selection of response biomarkers.

  2. 2

    Researchers should ensure that the assay used is specific and sensitive. Studies should document the references for assay validation and methods.

  3. 3

    For replication purposes, studies should provide execution details including the procedures for sample collection, handling, storage, preparation, method of analysis, limits of detection, and the number of replicates.

  4. 4

    To foster the interpretability regarding the response of neonates and the number of neonates included in a mixed-age cohort study, neonatal subgroup analysis is encouraged, when appropriate. Neonatal subgroups should be clearly defined in the data-analysis plan.

Disease-specific reviews of neonatal biomarkers have been conducted (9, 19, 20) and have concluded that validation of biomarkers is a critical next step for implementing routine measurement in clinical care and drug development. Future endeavors should establish which biomarkers have been validated in neonatal patients. Harmonized methods for neonatal response biomarker validation studies are needed. Publication reporting guidelines to harmonize the terminology and improve the reliability and interpretation of studies would improve the replication efforts as well. Ideally, gestational age and disease-specific reference ranges for response biomarkers will need to be established and validated in large multicenter studies. The CALIPER initiative (21) has recognized the paucity of information available on the pediatric population. CALIPER (22) has published reference intervals for 40 biomarkers, closing some of the gaps in knowledge. Expanding this initiative to include preterm neonates and an increasing global participation would capitalize on their network and experience to add value.

Conclusions

There is a lack of harmonization in the use of biomarkers as pharmacodynamics/efficacy endpoints, or to assess the risk/benefit in neonatal clinical trials. Many disease activity indices using biomarkers have been developed to evaluate the outcomes, but their biometric properties, such as responsiveness, reliability, and validity, have not been properly clinically validated. To advance neonatal drug development, the development and implementation of reporting guidelines would increase the utility of information in published neonatal intervention studies using biomarker outcomes.

Additional references

FDA Guidance for industry and FDA staff. Qualification process for drug development tools, 2014. http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM230597.pdf

Advancing Adoption of Novel Safety Biomarkers into Drug Dvelopment through voluntary submission of data at US FDA, EMA, and PMDA, 2013 (http://www.fda.gov/downloads/ScienceResearch/SpecialTopics/CriticalPathInitiative/CriticalPathOpportunitiesReports/ucm113411.pdf).

Japan PDMA https://c-path.org/wp-content/uploads/2013/11/walker-SOT-2013-poster.pdf.

Health Canada Biomarker Guidance Document 2016. (http://www.hc-sc.gc.ca/dhp-mps/alt_formats/pdf/prodpharma/applic-demande/guide-ld/ich/efficac/e16-step-4-etape-eng.pdf).