Population-based newborn screening has much to offer in the prevention of disease. At a time when there is an ever-growing list of potentially treatable, perhaps even curable disorders and promising new technologies that can detect these disorders, there is a common perception that we can conquer all disease if only we would screen—and screen now. But we have much to learn—about disorders, about their associated disease, and about their fit within current and future newborn screening. Wasserstein et al.1 describe a well-conceivedand well-implemented study that offers insights about newborn screening for five lysosomal storage disorders (LSDs). Two of the five LSDs are currently included on the federal Recommended Uniform Screening Panel (RUSP) with a suggestion by the Health and Human Services (HHS) advisory committee that makes recommendations to the RUSP that more data should be collected, two of the five LSDs have been rejected (due to insufficient data), and one has not been considered by the committee. The Wasserstein paper1 brings forward two possible paradigm shifts for newborn screening: one being a way by which data generation and evaluation might be accomplished for new disorders that are under consideration for implementation in newborn screening and the other being a recognition that there is a need for more stringent case definitions or specific assays for any disorder under consideration. The importance of both is major.

The longstanding success of population-based newborn screening stems from a thoughtful combination of well-documented scientific and medical advances with a public health approach.2,3,4,5,5 Use of time-tested criteria6 for inclusion of particular disorders in newborn screening contributes to a level of public trust in the state authority to require testing on the part of parents. Is there something so different about disorders that are likely to come up for consideration in the future that we should change our approach? I think not, and the impetus to screen for LSDs provides a good example of why.

LSDs are rare disorders and our worldwide knowledge about presymptomatic treatment is limited. While all of the patients diagnosed with these disorders display an enzyme deficiency, enzyme activities are not necessarily strongly related to prognosis. Likewise, while there may be certain genotypes associated with pathogenicity, most genotype–phenotype relationships are not yet strong. Still, we know that we can find newborns who have these disorders by using enzyme assays combined with sequencing. What more is needed? Some might say that we have all the data we need: there is a reliable assay/screening algorithm to find newborns who have these LSDs, there is promised benefit of early treatment, and there are data from de-identified studies that give us sufficient information about the US distribution of genotypes among samples with low enzyme activity.7,8 Some might say that committee deliberations for Pompe and mucopolysaccharidosis type I (MPS I) are done, that capability to screen for other LSDs has been demonstrated, that we should move forward, and not look back. Others might say that the Wasserstein study offers a new and high-quality data set about newborn screening for these disorders that was not previously available, and that a reconsideration of their appropriateness for population-based screening may become warranted. The Wasserstein study is able to offer a high-quality data set because it is a carefully designed study. Six New York City hospitals recruited families who consented to have their newborns screened for LSDs. Newborn screening specimens from consenting families were clearly marked and sent to the New York newborn screening laboratory separately to ensure appropriate processing. The high-throughput newborn screening laboratory processed all specimens for routine testing first and then tested only the consented specimens for LSDs. Infants whose screen indicated a high risk that they had an LSD disorder were referred to a study investigator, who performed clinical evaluation and confirmatory testing. The chain-of-custody was clear, the process was standardized, and follow-through on immediate clinical referrals was effective. Most importantly (and unlike previous data from de-identified testing that had provided information on some genotype frequencies), this study has a component of long-term follow up; data from this study’s cohort will continue to accrue. A prospective pilot study that evaluates both short- and long-term clinical outcomes is necessary to know if and when the identified infants become symptomatic with the disorders for which they were screened.

There are other models that allow the medical community to obtain critical population-based prospective data. As Wasserstein et al.1 note, Massachusetts has been carrying out successful and consented statewide pilot studies for other disorders since the 1990s,9 which have facilitated several expansions of the state’s mandated newborn screening panel. However, experience shows that most state newborn screening programs have been unlikely or unable to adopt the Massachusetts model. Both the Wasserstein and Massachusetts models incorporate short- and long-term follow up of clinical outcomes. Both have advantages and challenges. Wasserstein’s limited-hospital model offers controlled recruitment, education, and specimen collection and the Massachusetts model offers universal access and a sense of feasibility for a statewide implementation from specimen collection through subspecialty referral. Both models recognize and respect the spirit of “Part A: Boundaries Between Practice & Research” of the Belmont Report10 in addition to other ethical rules. Both offer a way by which data generation and evaluation might be accomplished for new disorders that are under consideration for implementation in newborn screening. The Wasserstein model is an ethical alternative to a statewide pilot.

The Wasserstein data reveal that the predominant yield from their screening for these five LSDs is a set of infants who appear to have disorders with phenotypes “that typically manifest during adulthood.” Given what we know about the phenotype distribution among clinically presenting LSDs, this observation would be expected. All the infants identified by Wasserstein meet the conventional case definition of the disorder for which they were screened. Whether any or all will suffer disease is unknown.

The authors appropriately question the value of screening for disorders with late- or later-onset phenotypes when screening and presymptomatic diagnostic assays cannot easily differentiate early from late phenotype and when the documented benefit and risk from screening that identifies all phenotypes is limited or absent. They acknowledge that conventional newborn screening criteria and recommendations about childhood testing from the American Academy of Pediatrics (AAP) and the American College of Medical Genetics and Genomics (ACMG) are generally against testing children for adult-onset disorders. The data prompting the questions posed by the authors and the questions themselves are extremely important and require further study. The system by which we evaluate clinical outcomes will be critical, challenging, and dependent on accurate language used to build our knowledge base. Wasserstein and colleagues should be able to stratify the spectrum of LSD disease among the infants they identify through screening. Other LSD screening studies will contribute by following suit and a better understanding of what comprises these disease states will grow. Regions that do not study LSD screening will contribute by carefully documenting natural histories for comparison. Without such stratification and comparison, we will never be able to know whether good outcomes are attributable to early treatment or simply represent the natural history of a late-onset disorder, and we will never know of or appreciate the bad outcomes of early treatment of infants with late-onset phenotypes. The study of LSDs has implications for many disorders that will be considered for population-based newborn screening. The issue at hand does not have to be early versus late; it could just as well be severe versus mild, primary versus secondary target, classic versus nonclassic, type 1 versus type 4; if we cannot first characterize an infant’s disorder, how will we know the extent of benefit from newborn screening?

Studies that require collaboration between clinical and public health investigators will help us to know better what should be included in mandatory newborn screening panels and how these conditions should be described in our policies. Decisions about whether or not to screen should consider the public trust needed for success in public health and the transparency that studies afford.

Research that is done with consent, careful documentation of outcomes so as to be able to reproduce or dismiss the observations and conclusions of others, critical thinking about the issues at hand…are these new paradigm shifts? Not really. What Wasserstein et al.1 have presented is science done well.