Introduction

The Clinical and Laboratory Standards Institute (CLSI) standard regarding the timing of newborn blood collection on filter paper for genetic disease screening has been at least 24 hours after birth, preferably within 24 to 48 hours.1,2 This suggested timing of specimen collection is based on newborns’ maturation progression, especially the stability of endocrine and metabolic systems, as well as empirical screening-performance reviews from newborn screening programs. As a result, most state newborn screening programs specify specimen collection timing between 24 and 48 hours after birth; this is sometimes included in regulations or statute.3,4,5 There are also state programs that consider specimens collected before 24 hours of age as unsatisfactory and require a repeat blood draw.6,7,8 As one of the largest newborn screening programs in the world, the Genetic Disease Screening Program (GDSP) of California Department of Public Health is the only state program that recommends newborns be at least 12 hours old before blood is collected.9 This timing recommendation has been in effect since 1996, when the validity of tandem mass spectrometry screening results was demonstrated in specimens collected in the first 24 hours of life and was further supported by the proceedings of a 1995 conference on early hospital discharge and impact on newborn screening.10,11

The improvement of specimen collection timeliness has become an emerging issue for state newborn screening programs and has garnered the focus of media scrutiny.12 The recently enacted Newborn Screening Saves Lives Reauthorization Act of 2014 has also elevated this as a national issue.13 Early specimen collection could contribute to improving the timeliness of specimen processing nationwide and, more importantly, could expedite release of urgent positive results to practitioners.

Historically, there are two major concerns about collecting blood specimens before 24 hours after birth. One is the potential increase of false positives (normal screening results reported as positive results) caused by surging endocrine (especially thyroid-stimulating hormone levels) and metabolic imbalances as the result of underdeveloped biological systems and/or birth stress—an effect that is exaggerated in premature or sick babies.14,15,16,17,18 The increase of false positives is associated with perceived increased burden on parents/families and the health-care system. The other concern is the potential increase of false negatives (true cases reported as negative screening results).19,20 The primary concern is with disorders of amino acid metabolism, where early collection could mean that the newborn’s metabolism has not been functioning independently for long enough for the metabolic disorder to be evident from the values measured in the newborn screening specimen. An increase of both false-positive and false-negative rates could have adverse impact on newborn screening performance and the well-being of newborns and their families.

Few published studies have examined the efficacy and efficiency of newborn screening conducted on blood specimens collected when the newborn is less than 24 hours old.10,18,19,21 To understand the impact of early blood collection in newborn screening, we studied false-negative and false-positive rates in GDSP’s blood specimens collected from 12 to 23 hours of age in comparison to the specimens collected from 24 to 48 hours of age. To our knowledge, this is the first study on specimens collected between 12 and 23 hours of age conducted at a population level. The validation of the early collection policy will provide evidence that could support a more flexible blood collection standard that is valuable to state newborn screening programs facing mounting financial and health-care challenges such as repeat test fees and early postpartum discharge.

Materials and Methods

Data source

We analyzed screening data from the California GDSP on four types of genetic diseases: metabolic disorders detectable by tandem mass spectrometry (MS/MS); congenital adrenal hyperplasia (CAH); congenital hypothyroidism (CH); and cystic fibrosis (CF). Dried blood spot specimens using the heel-stick procedure are collected and handled according to the CLSI guidelines with the exception that the time of blood collection can be as early as 12 hours after birth.1 GDSP currently screens for 48 metabolic disorders (fatty acid oxidation disorders, organic acid disorders, and amino acid disorders including urea cycle disorders) with MS/MS. CAH screening consists of two tiers. The first tier is screened with an immunofluorescence assay that measures 17-hydroxyprogesterone (17-OHP) levels with different cutoff ranges corresponding to different birth weight ranges. Newborns with highly elevated results are reported as positive and referred to a state-approved endocrine center for diagnostic evaluation. If a specimen’s 17-OHP value is moderately elevated but not high enough for immediate reporting (classified as “questionable”), then a second tier test will be performed on the specimen using MS/MS that measures 17-OHP, androstenedione, and cortisol to determine the positive status. CH screening is also conducted with immunofluorescence assay on thyroid-stimulating hormone levels. GDSP also uses a multitier method to screen for CF. After a specimen’s immunoreactive trypsinogen (IRT) assay (using immunofluorescence assay) is found elevated, it is further tested using the California cystic fibrosis transmembrane conductance regulator mutation panel.22 A specimen with only one mutation identified on the California cystic fibrosis transmembrane conductance regulator mutation panel is then sent for cystic fibrosis transmembrane conductance regulator gene sequencing. For all the conditions tested, newborns with positive results are referred to a regional specialty-care follow-up center that coordinates the confirmatory diagnosis process; if a condition is confirmed, then the center provides ongoing clinical care for patients.

All testing and demographic data associated with the specimen are stored in GDSP’s web-based Screening Information System, which enables GDSP staff to extract and query the data through an SQL-based data management system. As a state mandate, all clinically confirmed genetic disorders must be reported to the Newborn Screening Registry maintained by the GDSP. These reports are collected from pediatricians, newborn screening coordinators, and clinicians at the specialty-care centers mentioned above for the various categories of screened disorders.23 We used confirmed registry cases from 2006 to 2013 along with timing of blood specimen collection data in the Screening Information System for the false-negative analysis. (The only exception is the data from CF screening, which have been available since July 2007.) A confirmed case identified as “missed by newborn screening” in the registry was defined as false negative. Although varying depending on the type of condition, the process of confirming a diagnosis after a case is referred to the appropriate specialty-care follow-up center is similar for all cases. For the false-positive analysis, we used initial screening interpretation results (positive or negative) and final resolution results (disease or no disease) to determine false-positive status and true-negative status of the newborns. Although we analyzed initial IRT positive results for CF screening in the current study, it should be noted that specimens with elevated IRT were not called out as positive to the primary-care provider and were further tested with the California cystic fibrosis transmembrane conductance regulator mutation panel as mentioned above and are not counted as false positives.22 Only 1 year of initial screening data (2013) was used for false-positive analysis because it provided sufficient power to detect significant differences between two collection-timing groups based on sample size calculation prior to the final data analysis. During the study period, some analyte cutoffs were modified as the result of adjustment to different testing kits and laboratory methods.

Statistical analysis

The focus of the study was to examine whether there were significant differences between early collection (12 to 23 hours after birth) and standard collection (24 to 48 hours) on false-negative and false-positive rates. Cases missed by newborn screening and cases detected through newborn screening were cross-tabulated by two collection-timing groups (early collection and standard collection). For false-positive analysis, we cross-tabulated the number of false positives for the initial screening test and all non-disease specimens (false positives plus true negatives) by collection-timing groups. To illustrate how GDSP takes prematurity into consideration when testing 17-OHP, we further analyzed CAH screening-performance difference between early- and standard-collection groups by birth weight group. As mentioned, each birth weight group has its corresponding 17-OHP cutoffs. (Lower-birth-weight groups have significantly higher cutoffs.) Specimens collected before 12 or after 48 hours of age were excluded from the analysis.

We performed χ2 tests to detect significances of difference for frequency distributions between categories. All two-tailed P values of 0.05 or less were considered statistically significant. For false-negative comparison, Fisher’s exact test was used to detect significance while taking small sample sizes into consideration.

All data queries were conducted as of January 2015 through Microsoft’s SQL Server 2008 R2 (Redmond, WA), and all analyses were performed with SAS/STAT software version 9.3 of SAS system for Windows (SAS Institute, Cary, NC).

Results

In 2013, GDSP collected 488,681 initial newborn blood specimens. More than two-thirds of blood specimens were collected (68.08%) between 24 and 48 hours after birth. There were 106,390 newborns with specimens collected from 12 to 23 hours (21.77%; Table 1 ).

Table 1 Newborn screening count by age at collection in 2013 (N = 488,681)

From 2006 to 2013, the Newborn Screening Registry, maintained by GDSP, recorded a total of 4,228 confirmed cases among newborns whose blood specimens were collected either from 12 to 23 hours or from 24 to 48 hours after birth. Overall, as shown in Table 2 , a very small percentage of confirmed cases were missed by newborn screening (1.09% in the early-collection group and 1.74% in the standard-collection group). The only condition with a relatively high false-negative rate in our study population was CAH (2 cases in the early-collection group, 4.55%, and 10 cases in the standard-collection group, 6.45%). For all cases that were not detected through the newborn screening, there were no significant differences between the two blood collection-time groups. Most missed cases (7 of 11) of metabolic disorders detectable by MS/MS were urea cycle disorders, due primarily to the absence of low citrulline cutoff and arginosuccinic acid from the MS/MS panel. There were no significant differences between the two age collection groups for urea cycle disorders (data not shown).

Table 2 Newborn screening false-negative rates by genetic conditions by age at specimen collection, 2006–2013 (n = 4,228)

Statistically significant differences between blood collection-time groups were found in false-positive analysis for each of the disorder categories. Table 3 illustrates that, compared with the standard-collection group, specimens collected between 12 and 23 hours of age had a significantly higher false-positive rate for CH (0.10 vs. 0.01%) and a moderately higher percentage of elevation for IRT (1.85 vs. 1.54%). By contrast, the early-collection group had a lower false-positive rate for the combined group of metabolic disorders detectable by MS/MS (0.11 vs. 0.18%). For MS/MS disorders, false-positive rates were significantly lower in the early-collection group for amino acid and fatty acid oxidization disorders. The rates were similar for the organic acid disorders (P = 0.46), whereas the false-positive rates were higher in the early-collection group for the urea cycle disorders (P = 0.02).

Table 3 Newborn screening false positives by genetic conditions and age at specimen collection, 2013

For CAH, false-positive rates are higher in the lower-birth-weight groups, regardless of the timing of collection. For the group with birth weight less than 2,500 g (representing 71% of all false positives), the false-positive rate was lower in the early-collection group ( Table 4 ). Only for the birth weight group more than 2,500 g was the false-positive rate higher in the early-collection group.

Table 4 CAH false positives by birth weight and age at specimen collection, 2013

Discussion

Newborn screening programs have traditionally used a standard 24- to 48-hour time frame for collecting newborn specimens. However, our analysis shows that specimens collected between 12 and 23 hours of age in California performed similarly in general compared with specimens collected between 24 and 48 hours, with a few manageable exceptions. Although the limited number of false negatives could not provide a stronger statistical inference, they did indicate that earlier specimen collections (within 12 to 23 hours of age) did not yield more missed cases by newborn screening for all the analyzed disease categories. This finding suggests that specimens collected after 12 hours but before 24 hours of age are at least as effective as specimens collected from 24 to 48 hours after birth. Based on 2013 California GDSP data, earlier specimen collection does generate 96 extra CH false-positive specimens, balanced against 74 fewer MS/MS-positive specimens and 42 fewer CAH-positive specimens. For the initial CH positives, the follow-up regimen is relatively less burdensome for families because the retesting of serum thyroid-stimulating hormone and free T4 is often conducted by the primary-care provider and the families do not have to travel to the regional specialty-care center for follow-up services. By contrast, the earlier collection policy actually reduced the number of MS/MS positive newborns who require a more complicated follow-up regimen. For CAH, 71% of the overall false positives occurred in birth weight groups less than 2,500 g, where the false-positive rate for early collection was lower. Thus, the burden of increased initial false positives for CH needs to be balanced against the reduced false positives for metabolic disorders and CAH. For CF screening, an increase in the number of IRT elevations leads to an increase of CF mutation panels run, which adds to the workload of the state laboratory and overall newborn screening cost. However, in California, the vast majority of specimens with initial elevated IRT were never called out as positives following the mutation panel analysis, so there was minimum burden of follow-up on the physicians and families. Other programs using IRT alone as a CF screening marker could expect additional false positives and associated follow-up burden in the early-collection group.

Evaluations of both false-negative and false-positive rates are complex issues that involve multiple factors extending beyond the timing of specimen collection. For example, sensitivity of 17-OHP testing could be low in a moderate (less severe) form of CAH or among newborns whose mothers used corticosteroids during pregnancy or in the neonatal period (thus suppressing adrenal function), resulting in a relatively high percentage of false negatives among all diagnosed cases for CAH.24,25 Some evidence also suggested that missed cases were almost inevitable for some inherited metabolic diseases despite the best effort of newborn screening.26 In our analysis, we observed that newborns tested in neonatal intensive care units (NICUs) were the main factor linked to the higher false-positive occurrence for CAH (data not shown), a possible combined effect of prematurity and other health conditions. This relationship contributed to the higher positive rates in the 24–48-hours group, as the percentage of infants tested in NICUs was significantly higher in this group. Future research using both population-level data and other clinical information may identify other factors that can improve screening performance.

The current study used multiple years of data from a state program to demonstrate that, overall, earlier blood collection timing (after 12 hours of age) is reliable and efficient at a population level. The specificity and sensitivity of the screening performance did not show alarming differences between the earlier blood collection group and the standard 24–48-hours group.

The present study is the first large-scale population-based investigation to report the validity and efficiency of using dried blood spots collected from 12 to 23 hours after birth for newborn screening. An early study by Coody et al.27 questioned the efficacy of newborn screening specimens collected before 24 hours in response to the dramatic increase in early hospital discharge before 24 hours that began in the 1970s and accelerated during the first half of the 1990s.28,29 Other published studies focused on the impact of early hospital discharge (before 24 hours) on adverse newborn outcomes among healthy newborns compared with late-preterm newborns.30,31,32 In 1996, the Newborns’ and Mothers’ Health Protection Act was passed and mandated that health insurance must cover hospital stays through 48 hours after vaginal delivery. Despite this legislation, early discharge is still common, and in California slightly more than 20% of our specimens are collected from 12 to 23 hours.33 Thus, the validity of newborn screening specimens collected before 24 hours is still an important issue today.

By allowing blood specimen collection at 12 to 23 hours after birth, state newborn screening programs can benefit through timely release of urgent positives. The Advisory Committee on Heritable Disorders in Newborns and Children (ACHDNC) of the US Department of Health and Human Services (HHS) currently recommends “presumptive positive results for time-critical conditions should be immediately reported to the child’s healthcare provider but no later than the fifth day of life.”34 An earlier blood collection policy could lead to earlier transport to the laboratory and testing, and thus could help state programs reach the recommended timeliness goal.

Ideally, our study should examine only the conditions known to be associated with timing of the blood collection. The nature of the data (having only a very small number of false-negative cases) as well as the lack of definitive knowledge on the relationship between blood collection timing and screening outcomes limited our methodologies. A well-controlled experimental design could minimize the effect of other factors on screening performance. Using statistical modeling to control for demographic and other birth-related factors such as gender, race/ethnicity, nursery type, and cutoff changes could be useful to further analyze screening performance once a large number of false-negative samples are available.

A potential challenge facing the implementation of early blood collection is how to establish appropriate protocols for premature and/or sick newborns in NICUs. Many factors common in NICUs such as prematurity, transfusion, the use of total parenteral nutrition, and carnitine supplementation could artificially increase or decrease the testing results.35,36,37 The CLSI suggested serial specimen collection whereby a first specimen is collected on admission to the NICU and a repeat specimen would be obtained during 48 to 72 hours of life if the first specimen was collected within the first day; for preterm newborns and NICU-admitted newborns with positive results, a final specimen is collected either at 28 days or at discharge, whichever comes first.38 Implementing the CLSI guidelines for premature newborns to interpret CAH screening results using the final specimen is very likely to yield different screening performance. Currently, California’s screening practice uses birth weight to stratify 17-OHP cutoffs. In this study, we showed that the number of false positives in the early-collection group across birth weight strata was small enough to not burden the system with extra follow-up. On implementation of the CLSI guidelines, many low-birth-weight newborns tested in NICUs would have their specimen collection postponed, resulting in a lower false-positive rate. Furthermore, our experience and that of others in managing screening performance for MS/MS detectable metabolic conditions suggested that timely cutoff review and adjustment, including the addition of analyte ratios and adopting available new analytical technologies, are imperative to successfully control false-negative and false-positive results.39

Enhanced laboratory technologies, standards, and state regulations have led to improved sensitivity and specificity for the newborn screening of most genetic conditions. Blood specimens collected from 12 to 23 hours of life provide newborn screening programs satisfactory screening information with similar efficacy and efficiency compared to specimens collected from 24 to 48 hours, which is the current standard. Allowing blood collection as early as 12 hours of age could have benefits not only as a protocol for early postpartum discharge but also as an important factor to help accelerate the reporting of urgent positive results and improving the timeliness of diagnosis and treatment of affected newborns.

Disclosure

The authors declare no conflict of interest.