## INTRODUCTION

The “psychosine hypothesis” was advanced by Kunihiko Suzuki in 19721 to explain a central paradox about the pathogenesis of Krabbe leukodystrophy (OMIM 245200): Krabbe disease (KD) is a lysosomal disorder, but there is no accumulation of galactosyl-ceramide, the deficient enzyme’s presumably pathogenetic substrate.1,2 The amounts present are much lower in KD brains than in the brains of unaffected children.1,2 There is also a striking and rapid disappearance of the white matter–producing oligodendroglial cells early in the course of KD.2

Suzuki consequently hypothesized that galactosylsphingosine, or psychosine (PSY), accumulates in KD and has cytotoxic effects that explain the critical paradox about its unique pathological and biochemical character.2 This elegant hypothesis remains the prevailing explanation of the pathogenesis of KD.

Only recently has the possibility emerged that PSY, in addition to its apparently key role in pathogenesis, is a diagnostically useful biomarker that may predict the symptoms of KD.3,4,5 Identification of a predictive biomarker could aid in resolution of the current newborn screening (NBS) controversy for KD.6,7 The largest experience with screening for KD is that of New York State, but after 10 years of screening almost 2 million infants, it was found that the screening algorithm had a high false positive rate resulting in an extremely low positive predictive value (PPV) of 1.4%.6 Wasserstein et al. concluded that based upon the New York experience, improvements in NBS protocols are needed before widespread adoption of Krabbe NBS.6,7 Other studies, however, have shown that incorporating PSY into NBS protocols may substantially reduce the false positive rate.3,4,5 Furthermore, KD screening has spread to several other states,8 reinforcing the urgent need for an improved newborn screening model.

Psychosine can be determined in dried blood spots, and it has been reported recently to be elevated in several newborn spots from infants who developed early infantile Krabbe disease (EIKD).3,5 However, equivocal values were noted in some high-risk and later-onset patients.5 PSY has not been established as a stand-alone test to predict KD symptoms without consideration of additional diagnostic tests.5 Krabbe NBS still requires enzyme determination and additional tests, comprising either the second tier of study of the original newborn spots analyzed for KD, or secondary additional diagnostic testing performed by physicians who later see the infants after a positive newborn test.4,5,8

The first tier of Krabbe newborn screening is analysis of the activity of galactocerebrosidase (GalC), the enzyme considered to be deficient in all cases of KD.8,9 However, GalC measured in leukocytes, as determined during the New York KD screening experience, did not predict onset or type of symptoms.10 Even though improvements in the GalC leukocyte assay may be feasible,11 there is no current evidence that GalC activity in leukocytes alone can predict symptoms with sufficient accuracy to obviate the significant false positive problem in Krabbe NBS.

Because of the high false positive rate of newborn screening for KD, we previously investigated the potential for a bivariate analysis of PSY and GalC, as measured on the newborn screen dried blood spot specimens, to improve the false positive rate. Under the assumption that loge(GalC) and loge(PSY) have a bivariate normal distribution, a tool based on bivariate normal limits (BVNL) was developed from aggregate statistics previously reported in the literature. The tool showed great potential for highly sensitive and specific identification of EIKD cases.4 Lab-specific standardization of loge(GalC) broadened the successful application of the tool to data from multiple labs. Without the benefit of simultaneously measured GalC and PSY from a normative sample, however, it was not possible to test the bivariate normal assumption, and estimation of the distribution’s parameters was necessarily ad hoc in nature. Furthermore, the questions of whether lab-specific standardization of loge(PSY) and/or time-specific standardization of either biomarker are necessary for general application remain open.

The goals of the current study are to finalize development and validation of the BVNL-based diagnostic tool for NYS based on simultaneous measurements of PSY and GalC from a normative sample and to continue investigating questions relevant to the more general applicability of the tool. The finalized tool could potentially prevent the anxiety experienced by parents whose infants screen positively but who are not destined to develop infantile KD, as well as the expense and potential discomfort of additional diagnostic testing of these infants.4,6,8 We report here on the development and accuracy of the finalized BVNL tool based on analysis of dried blood spots (DBS), including newborn spots that have been accumulated as part of the expansion of the World Wide Registry (WWR) for KD.12,13,14

## MATERIALS AND METHODS

### Bloodspots and patient data

The WWR contains detailed genetic and phenotypic information regarding 198 KD patients.12,13,14 A de-identified version of the WWR data has been contributed to the Longitudinal Pediatric Data Resource (LPDR) of the Newborn Screening Translation Research Network (NBSTRN).15,16 This study has been approved by the University at Buffalo’s IRB (IRB 030-385325). Each patient in the World Wide Registry has a signed consent form on file. Residual DBS were obtained, where available, from the newborn screening programs of states where registrants of the WWR were born. Fifteen of the newborn DBS were available for retrieval. Each of these DBS were analyzed for both PSY and GalC. These included the DBS of 12 infants who later developed EIKD, defined as onset of symptoms by age 6 months12,17 and 3 later-onset cases who developed symptoms prior to 30 months. We shall refer to these 15 cases studied as early childhood KD.

The analyses also included 174 DBS from New York State’s newborn screening program. Of these, 166 were from normal newborns who were born in October 2016 and 8 were from newborns considered to be at high risk for KD, according to the current New York State protocol. These 8 infants were defined as high-risk due to low GalC enzyme levels on diagnostic testing consistent with the development of EIKD. However, they did not have genotypes that predicted the early onset of symptoms.6,9 All 8 remained symptom-free during the 6 to 24 month follow-up period.6 Three additional DBS were obtained from patients evaluated at the Program for the Study of Neurodevelopment in Rare Disorders at Children’s Hospital of Pittsburgh of the University of Pittsburgh Medical Center (UPMC). These DBS were from symptomatic, early childhood, KD patients who developed symptoms before the age of 4 years.18

In summary, the samples analyzed in the current study consisted of 166 control newborn blood spots, 15 DBS from newborns who later developed KD, 8 high-risk New York cases, and 3 early childhood cases whose blood was collected early in their symptomatic course.

Additional normative samples comprised of either GalC or PSY measurements made at various times were available to address the need for time-specific standardization of either loge(GalC) or loge(PSY). Determinations of GalC enzyme activity and PSY were performed for each DBS as described previously using liquid chromatography and mass spectrometry methodologies.3,19,20

All DBS analyses were conducted in the Wadsworth Laboratory of the New York State Department of Health using published protocols.5 Some cases had additional determinations at Thomas Jefferson University that were performed on blood drawn as part of subsequent diagnostic testing. The results of these analyses are presented in Table 1.

### Statistical analyses

Normal probability plots of the natural logarithms of PSY and GalC observations in the normative sample (n = 166) provided necessary (although, technically, not sufficient) evidence for the bivariate normality assumption. The points in both plots followed a straight-line pattern consistent with the assumptions that each has a normal distribution. Thus, development of the proposed NBS tool was based on multivariate normal distribution theory21 and formulas for (1–p)100% prediction regions.22 Prediction regions (ellipses) were calculated from univariately standardized loge(GalC) and loge(PSY) values of the 166 normal newborns. The prediction ellipse for p= 10−6 was then used to develop a BVNL-based newborn screening tool as previously described.4

Tolerance for false positive rate (fp) dictated thresholds for the definitions of “low” GalC, “high” PSY, and our choice of p when developing a BVNL NBS tool. Thresholds of −2.9 and 2.9 for standardized loge(GalC) and loge(PSY), respectively, and p = 10−6 were chosen in an attempt to control fp at approximately 10−7 and specificity at (1–10−7)100%.

The (1–10−7)100% BVNL tool, so defined, was applied to appropriately standardized observations (see below) to test the 166 normal newborns, 15 KD cases from the WWR, 8 “high risk” but disease-free newborns, and 3 KD cases measured postsymptomatically for KD. The specificity of the tool for correctly identifying non-KD cases was calculated separately for the samples of 166 normal newborns and 8 “high-risk” non-KD cases as the percentage of each sample that tested negative for KD. The sensitivity of the tool was calculated separately from results of its application to the 12 EIKD and 3 later onset WWR patients as the percentage of each sample that tested positive. The sensitivity of the tool when testing postsymptomatic cases was similarly calculated from the results of its application to the three EIKD patients from the UPMC.

A newborn screening program that predicts KD for all newborns falling outside the (1–10−7)100% BVNL should expect approximately 1 falsely predicted early childhood KD in every 10 million normal newborns. To determine whether this targeted fp and associated specificity were achieved, approximately, 100 million observations were generated from the standardized version of the following multivariate normal distribution:

$$\left( {\begin{array}{*{20}{l}} {log_e(GaLC)} \hfill \\ {log_e(PSY)} \hfill \end{array}} \right)\sim MVN\left[ {\left( {\begin{array}{*{20}{l}} {0.67} \hfill \\ { - 1.00} \hfill \end{array}} \right),\left( {\begin{array}{*{20}{l}} {0.80} \hfill & { - .022} \hfill \\ { - 0.22} \hfill & {0.26} \hfill \end{array}} \right)} \right],$$

where the mean vector and covariance matrix are sample estimates from the sample of 166 normal newborns measured simultaneously in October 2016. The prediction rule then was applied to the simulated observations. Treating generated points as observations from the actual standardized distribution of normal newborns, the false positive rate of the BVNL tool was calculated as the percentage of these 100 million simulated data points that fell inside the (1–10−7) BVNL defined above. The specificity of the tool was then approximately estimated as 1.0 minus the calculated fp. The calculated fp was compared with the nominal fp= 10−7 to illustrate that the targeted nominal rate is indeed approximately achieved.

The fp calculated from the simulation results, the average prevalence reported in articles reviewed by Foss et al.,23 and the sensitivity estimated in the current study were substituted into the formula for PPV24 to obtain an estimate of the PPV of our proposed tool.

To test the need for temporally specific standardization, the data were categorized according to when the sample was analyzed at the Wadsworth lab. Group 1 consists of the 166 joint measurements of GalC and PSY, from October 2016. Group 2 consists of 30 joint measurements from spring 2017 (GalC in March and PSY in May). Group 3 consists of 1453 GalC measurements from November 2017. Lastly, group 4 consists of 57,796 GalC measurements from May 2009. To assess whether the mean of the distribution of these biomarkers varied from time to time, analysis of variance was performed on GalC measures across the four normative groups and on PSY measures across groups 1 and 2.

The 166 normal newborns’ loge(GalC) and loge(PSY) measures and the simulated observations were standardized using group 1 means and standard deviations. Otherwise the statistics used to standardize observations were based on these comparisons and the best measurement time match for each of our study samples. Specifically, the 15 KD cases, 8 high-risk NYS NBS, and UPMC patients’ loge(GalC) measures were standardized using group 3 loge(GalC) statistics, and their loge(PSY) measures were standardized using the weighted mean and pooled standard deviation across the two normative samples of loge(PSY) in groups 1 and 2. Observations from the three UPMC patients’ simulated observations were standardized using group 1 statistics.

## RESULTS

### Temporal change in GalC enzyme activity determined in dried blood spots

The time-specific samples are described in Table 2. Loge(GalC) in normal newborns varied significantly across time-varying samples (P < 0.0001), while loge(PSY) did not (P = 0.2634).

### BVNL classification of normal newborn bloodspots

Figure 1 illustrates results of the application of the proposed NBS tool to the 166 normal newborns from New York State. All 166 data points fall within normal limits; i.e., false positive rate = 0% and specificity = 100%.

### BVNL results for Krabbe patients

Figure 1 also illustrates the results from the application of the BVNL NBS tool to measures from newborn DBS of the 15 newborns who developed early childhood KD. The BVNL tool correctly identified each of the 12 EIKD patients and each of the 3 later-onset early childhood patients. The ages of symptom onset varied from 1 to 29 months. Thus, the tool was 100% sensitive for early identification of early childhood KD.

All three postsymptomatic cases followed in Pittsburgh were correctly diagnosed by the BVNL tool (sensitivity = 100%; see Table 3). These children developed symptoms prior to 4 years of age.

### BVNL results for high-risk New York cases

None of the eight high-risk cases fell outside the (1–10−7)% BVNL ellipse (Fig. 2) and thus none were incorrectly predicted to develop early childhood KD. All eight were below the univariate normal limit for loge(GalC) and five were above the univariate normal limit for loge(PSY). All remain free of Krabbe symptoms for over 2 years of observation.

### Specificity, sensitivity, false positive rate, and positive predictive value

Table 3 summarizes the data presented thus far and examines the accuracy of the BVNL tool for newborn KD screening. In terms of predicting symptoms from the newborn blood spots of infants destined to develop symptoms in early childhood, the tool demonstrated 100% sensitivity, as did psychosine alone (Fig. 1). Results provided from analysis of the 166 controls indicate 100% specificity, whereas the stand-alone PSY test had an unacceptable false positive rate (Fig. 2).

Results of BVNL classification of points in the simulation sample of 100 million observations generated from the estimated bivariate normal distribution for normal newborns are presented in Table 3. The estimated rate of false positives was 1.1 per 10 million normal newborns. (This translates to an estimated specificity of 99.9999%) Thus, the anticipated rate of false positives if each of the approximately 4 million normal newborns each year in the United States was screened, would then be only 1 every 2.5 years. The PPV would be 98.5%.

## DISCUSSION

The results above establish the validity of bivariate analysis of PSY and GalC enzyme levels measured on newborn blood spots and show that this analysis can highly accurately predict the onset of symptoms of KD in infancy or early childhood. For children enrolled in the WWR identified as having high risk and for whom the exact age of onset is known, symptoms appeared between 1 and 29 months of age (Fig.1). Early and accurate diagnosis of Krabbe symptoms presents a critical advance in management of this disorder because the optimal outcomes of hematopoietic stem cell transplantation result only if this treatment is applied presymptomatically.18,25 The efficacy of treatment will be enhanced by the earliest possible diagnosis.

The proposed BVNL tool has a PPV of 98.5% (Table 3). It would therefore virtually eliminate the false positive problem that has plagued KD newborn screening since its inception in New York State in 2006.4,6 This predictive potential also exceeds that of 80% reported previously as a potential outcome of newborn screening for lysosomal disorders.26

Demonstrating that use of a newborn diagnostic tool based upon these two parameters is feasible will require a prospective study of a greater number of patients and normal newborns in states that screen for KD, because this disorder is quite rare. In the published New York State experience, only 5 cases were confirmed after screening over 2 million children.6 After screening almost 3 million newborns in NY State, an additional late infantile case has been identifed and the false positive rate has improved, but remains a concern for this very rare disorder (J.J. Orsini, personal communication). Analysis of death certificates suggested a frequency of EIKD of 1 in 244,000 in the United States.27 At the current time, only a few states are screening all newborns for Krabbe disease.8 Field testing of the BVNL tool may therefore be challenging, but based upon the current results, it is also timely.

High-risk cases such as those shown in Fig. 2 are extremely challenging for clinicians. Their low initial GalC enzyme levels provoke concern, but they do not appear to be at risk of EIKD even though they may develop later-onset variants.12,13,17,28 It has been estimated that there are over 140 distinct variants in the human gene encoding the GalC enzyme, with only a few predicting late-onset KD; genotype/phenotype correlation therefore remains elusive.17,29

For these eight cases, only two were heterozygotes for the 30-kb deletion accompanied by a probable allele of late onset.9,29 The remainder had other putative late-onset alleles combined with variants of unknown significance. The BVNL tool’s analyses of their newborn blood spots predicted correctly that they would main free of symptoms of early childhood KD. Nevertheless, it is prudent to acknowledge the possibility that they may represent late-onset cases.6,12,13

Figure 2 also illustrates an advantage of the BVNL tool over analysis of either PSY or GalC as univariate measures, or in a combined analysis lacking the elliptical portion of the BVNL. All of these cases were below the cutoff for GalC, so univariate analysis of the enzyme alone would cause false assignment or risk. But in addition, five of these eight high-risk cases were also above the upper limit for PSY (Fig. 2). Therefore, either separate or joint consideration of GalC or PSY relative to univariate thresholds of −2.9 and 2.9, respectively, for these five cases would result in an incorrectly assigned risk of developing EIKD symptoms and place them at risk for receiving unneeded treatment. One could define more stringent univariate thresholds to solve this problem for these eight newborns but at the risk of identifying fewer true cases. The general point here is that the BVNL can be expected to perform better (i.e., lower false positive rate given the same sensitivity) than simple application of univariate thresholds jointly. This is due to the fact that the BVNL takes advantage of additional information about the bivariate distribution of the biomarkers.

It should be emphasized that the tool can be adjusted to conform to different tolerances for false positives by appropriate choice of thresholds and p when defining the BVNL. Obviously, it is desirable that the tool have 100% sensitivity, so that no child who needs treatment will fail to receive it, and 0% false positive rate (100% specificity), so that no child who will not get KD will be subjected to unnecessary treatment. While this goal is not theoretically possible to attain in the entire population, our (1–10−7) BVNL-based tool achieved it in the samples of 166 normal newborns and 18 early childhood KD cases in this study (See Table 3).

How might this tool be used in practice? Knowing that a child will develop KD in infancy or early childhood would result either in early treatment, or in extremely vigilant follow-up.5,6 At the current time, the decision for early treatment also considers other diagnostic information, such as genotype and the results of such ancillary tests as magnetic resonance imaging (MRI), spinal fluid protein, and evoked responses. But it is difficult to acquire this information in a timely fashion. Few of the children followed as part of the New York experience had the full recommended diagnostic panel in New York.6 There is clearly a need for fewer and more accurate tests and for better prediction from newborn blood spot data.

If the diagnostic paradigm after a positive newborn screen for KD could be simpler and more accurate, it would be reasonable to expect less parental anxiety and fewer painful medical procedures for the infants.6,7,30 In addition, improved and earlier prediction of KD symptoms would result in a substantial savings of health-care expenses. The cost of the New York State KD newborn screening program has been significant.30 Screening for all lysosomal disorders in the United States was estimated to cost $28–$140 million annually with a false positive rate of at least 0.1%.26 It would therefore be reasonable to expect that significant savings would result from reducing the significant false positive rate of Krabbe screening.

Can the BVNL tool challenge or augment current NBS KD paradigms? Postanalytical platforms for lysosomal NBS have been defined successfully.26,31 These involve an analysis of six to ten enzymes.26 Advantages of the BVNL tool include lacking requirements to share patient profiles and reference data,26 and free availability to anyone with access to the R statistical program.32 Because there is considerable variability between state newborn programs,33,34 it is conceivable that some states would not be able to provide six or ten enzyme results for multiplex analysis. The BVNL tool would, in contrast, rely on rapid second-tier analysis of newborn blood spots for only two biomarkers, while allowing state screening programs more flexibility/control over Krabbe screening decisions.

It is apparent that the successful application of a statistical tool that relies on determining GalC enzyme activities, such as the BVNL analysis presented here, will over time require up-to-date standardization of loge(GalC) observations as noted in our previous study4 (Table 2). The experience of the New York State NBS program indicates that certain features of blood spot acquisition, such as age of the patient and birth weight, as well as changes in reagent lots, can cause variation in enzyme levels at NBS (unpublished data). The reasons for the temporal drift in the enzyme activity are unknown.

The New York State NBS program currently collects normative samples of GalC measures. In addition, while the current data suggests that infants who are destined to develop early childhood KD will have standardized values that fall in the high-risk region of the BVNL, no data exists yet about the course of these measures as the disease progresses.

The psychosine determinations used here for development of the BVNL tool did not reflect temporal variation. These were determined from normal newborn blood spots and performed at the Wadsworth Laboratory of the New York State Department of Health. However, it must be acknowledged that other laboratories have reported predictive PSY values that differ from those used here.4,5,26

Broader use of the tool would consequently require restandardization of PSY. The cost of PSY determination may also vary among states.

The oldest patient who developed symptoms with high risk determined from BVNL newborn blood spot analysis was 29 months old at symptom onset, but we do not know the late age limit for symptom development that can be predicted. The data do indicate that none of the asymptomatic high-risk patients from NY State had high-risk BVNL status. Presumably these high-risk patients are at risk for emergence of symptoms in later childhood or even adulthood.6,7 Could periodic reapplication of the BVNL tool at intervals as they get older predict symptoms in these possible late-onset cases? Years of follow-up on these children may be necessary to answer this question.

Successful application of the BVNL tool would presumably also facilitate development and testing of experimental therapies, for example, those combining transplantation with gene therapy.35

The results presented here support application of the BVNL analysis of PSY and GalC prospectively to second-tier analysis of blood spots after a positive KD screen in states that are routinely screening for this illness. If the potential of the tool to predict symptoms is confirmed by its prospective use in more infants, this would be a significant step forward in fulfilling goals that are presumably shared by all who care for children who either have KD or are at risk of developing it. These goals include accurately diagnosing KD, ensuring the prompt treatment of children who need it, and protecting those not at imminent risk from the cost, morbidity, and potential mortality of unnecessary treatment.5,6,7,30