Introduction

In the twenty-first century, the debate about the relative utility of population-based versus targeted approaches to prevention, particularly with regard to the role of genomics in prevention, has come to center around the dual concepts of precision public health and precision medicine.1,2,3,4 Precision public health has been defined as “the application and combination of new and existing technologies, which more precisely describe and analyze individuals and their environment over the life course, in order to tailor preventive interventions for at-risk groups and improve the overall health of a population”5 and is a natural extension of the field of public health genomics,1 while precision medicine has been defined as an approach to interventions tailored to subcategories of disease, often defined by genomics.3 In both cases, genomics has come to be seen as an important, but insufficient component differentiating these precision approaches from classical medical and public health approaches to prevention.6

In this review, we present evidence from diverse disciplines and populations to identify the current and emerging role of genomics in prevention as a part of traditional and precision approaches to both medical care and public health, as well as key challenges and potential untoward consequences of increasing the role of genomics in prevention. We do not seek to adjudicate a “best” approach or otherwise resolve the underlying tensions between these approaches, but highlight them for further consideration in the readers’ own work. We acknowledge genomics is merely one part of a precision approach to prevention, and refer readers to recent comprehensive reviews of the competing roles of genomics in precision medicine and precision public health, which are beyond the scope of this review.5,7,8,9 The underlying tension between the personalization, which genomics promises (“the n-of-1”) and the population-based approach (“the n-of-many”) to prevention, which underlies many public health success stories is not unique to genomics. In the nineteenth century, it was the germ theory and tuberculosis control that catalyzed this debate.10 In the twentieth century, Geoffrey Rose11,12 raised the specter of the “prevention paradox,” whereby interventions targeting high-risk individuals may have little population impact when most cases arise from low-risk individuals, and, conversely, programs of benefit to populations may accrue little benefit to specific individuals. Today, in the twenty-first century, this debate centers on the role of genomics in improving population health.2,13,14,15,16 Indeed, part of the precision prevention response to the challenges laid forth by Rose has been the emergence of the so-called pseudo-high-risk prevention strategy attempting to leverage more and more data to spread the benefits of the high-risk strategy to more and more subpopulations of variable, but elevated, risk.17 That this debate is even imaginable in the context of genomics is a testament to relatively recent technological advancements in sequencing technology and intellectual property law,18 which have both led to the dramatic reduction in cost of genomic sequencing19 and the rapid advancement of our understanding of the genome and its contribution to human disease since the Human Genome Project.20,21,22,23,24,25,26,27

The positive view of genomic prevention is tempered by several challenges and potential untoward consequences of increasing the role of (and thus the allocation of scare resources to) genomics in public health as well as a host of ethical, legal, and social challenges. As is common in pediatric settings, there is a paucity of data specific to children, diseases of childhood, and the diagnosis of adult-onset conditions in presymptomatic children and adolescents. Where this is the case and adult analogs are illustrative, we make reference to these. As three physicians practicing at a quaternary care referral center in the United States with significant delegated public health responsibilities, we take as our perspective the fragmented public health and medical system of the United States. While certain specific examples may not hold in all locales, the general themes illustrated should have wide applicability throughout the developed world.

Pediatric genomics is not just adult genomics done early

An oft repeated phrase in pediatrics is that children are not little adults.28 These differences also underlie differences in the applications of genomics to child health and prevention. Among these differences is the developmental nature of childhood where disease phenotypes evolve against the context of normal (or abnormal) physiologic development. Children have more frequent and near universal contact with the infrastructure necessary to support public health genomics in the forms newborn screening (NBS) and regular health maintenance visits for screenings and immunizations. Thus, while there are ample opportunities to engage pediatric populations in public health genomics, there will be significant challenges in interpretation, a key theme that will recur throughout this review. This challenge is greatest in settings where variants are rare and only asymptomatic or potentially presymptomatic individuals have been identified.29 Clarifying these issues will require population scale research enterprises with technical, implementation, and ethical/legal/social implications aims to combine both genomic data with environmental and social determinants of health. A key way in which precision public health activities can support the role of genomics in prevention is to “use [this] population level data to better identify how individuals can be aggregated into larger groups.”7 Current efforts to address these include the UK biobank30 as well as the twenty-first century Cures Act including the NIH-funded “All of US” study.31 Unfortunately, the UK biobank is recruiting middle-aged individuals and the All of US study, while seeking to recruit a diverse cohort, is currently limiting recruitment to adults. While studies that recruit adults may help to identify long-term outcomes, their data on childhood outcomes of interest will be limited by their retrospective view of these outcomes. Furthermore, they will be biased by excluding children with life-limiting disorders, who do not reach sufficient age to be included in the so-called inclusive adult cohorts. The net effect of these is to limit the utility of these landmark studies for the improvement of pediatric preventative interventions.

From an ethical perspective, children are incapable of consent, but may, in certain contexts, provide assent.32 Rather, we rely on parents, with guidance from public health and medical authorities, to act in the best interests of their child. This requires having an understanding of the benefits as well as the risks of participation in genomics initiatives. The challenge for the parent of a healthy child is not merely the discovery of susceptibility to later onset disease (the closing of the so-called “open future”33) but also the knowledge that genetic testing today may expose their child to discrimination tomorrow. At the federal level in the United States, the Genetic Information Nondiscrimination Act currently protects individuals from discrimination in employment and health insurance (but not life nor disability insurance) on the basis of genetic testing results.34 There is, however, no guarantee that these protections will be in effect in two decades when a child born today will enter the workforce. Integrating genomics into pediatric prevention will require parents to confront this possibility in a way that current efforts do not.

The paradigm of NBS

NBS began with Guthrie cards for phenylketonuria.35 The success of this program in saving lives and preventing or mitigating severe disability led to the expansion of screening to a handful of other treatable conditions.36,37 In the 1990s, the availability of tandem mass spectrometry allowed states to add dozens of conditions to their screening programs in a cost-effective manner.37,38 In response to this changing technological landscape, the Maternal and Child Health Bureau commissioned, and the American College of Medical Genetics (ACMG) authored, guidelines to establish a Recommended Uniform Screening Panel, or RUSP. This initially included 29 primary conditions and a further 25 secondary conditions,39,40 and today has grown to include 35 primary conditions and 26 secondary conditions, including organic acid, fatty acid oxidation, amino acid, endocrine, hemoglobin, and other disorders, most of which have a genetic basis. Despite the significant increase in the number of conditions screened for, tandem mass spectrometry has allowed for cost containment with estimated screen-specific costs (i.e., excluding follow-up and treatment costs) on the order of US$50–100 per child born.41

In recent years, investigators have asked whether or not DNA can augment/improve NBS and some states have used DNA in various ways to improve their screening programs.42,43,44 However, these genotyping assays will inevitably miss rare or population-specific variants that may not be well represented. An alternative approach is to use sequencing. The state of California successfully demonstrated the use of sequencing as a third-tier screen for cystic fibrosis (after immunoreactive trypsinogen and a specific variant panel) to reduce the number of referrals by 2/3 compared to two-tier screening.45 The challenges of offering this at the level of the state newborn screen include maintaining the high throughput of the lab if too many novel variants or variants of uncertain significance (VUS) are discovered and educating primary providers about sequencing results they will now be seeing in the context of NBS. An ancillary benefit is that when the state lab provides molecular confirmation, it guarantees equitable access to a confirmed diagnosis, an outcome of particular value at a time when there are significant disparities in access to genomic medicine.46

As sequencing approaches to specific conditions become available, it is logical to ask if next-generation sequencing (NGS) approaches can be used to allow these to scale beyond one or two disorders, much the way tandem mass spectrometry allowed biochemical screening to scale. NBS programs are currently developing and/or evaluating sequencing panels targeting genes of interest for NBS programs.47,48,49 A limitation of such targeted panels is that they need to be redesigned and revalidated every time a new gene is added to them. It also raises the cost of retrospective analyses in either the research or validation contexts where one might wish to go back to old cases and determine if variants in a given gene not on the then contemporaneous panel were present. In this context, the question has arisen as to whether or not exome sequencing (ES) or genome sequencing (GS) can fulfill the primary service mandates of NBS programs by (a) safely reducing the number of referred cases,50,51 (b) expanding the range of conditions amenable to screening to include those with a Mendelian basis, but not well detected by other analytes or enzymes assayed by current NBS methods,52,53 and (c) simultaneously provide data on off-target genes to facilitate the future expansion of this valuable public health program as new gene–disease associations become known.54 Data from recent studies looking at the role of NGS and NBS suggest that current exome-based methods are likely to result in both false positives and false negatives with turnaround times far in excess of biochemical screening.48,55,56 Additionally, ES/GS will inevitably result in larger numbers of VUS than targeted NGS panels.56 If reported, these would quickly overwhelm the screening system.54,57 If suppressed, there will certainly be some missed cases. Data from symptomatic genetic testing suggest that these burdens will fall hardest on minority populations already subject to health disparities58 but also support the importance of recognizing the impact of population-specific variants.59 While this has not been assessed, the available data suggest that a more immediate role for ES/GS in NBS is thus to supplement abnormal or equivocal biochemical screening in order to more efficiently reduce the time it takes to close a case as opposed to supplanting biochemical screening altogether.48

ES/GS in symptomatic patients

ES/GS has more commonly been used in symptomatic children, either as a first-line diagnostic tool60,61,62,63 or to help end the so-called diagnostic odyssey.64,65 The hope is that some of the detected disorders have treatments or a sufficiently known natural history to allow more traditional screenings, which may prevent later morbidities.66,67,68,69,70,71 Additionally, it has become clear that early diagnosis allows recognition of futile interventions and prevention of iatrogenic harms.72 However, because the primary indications to date for ES/GS in symptomatic children has been in rare diseases, the scale of benefit can be difficult to appreciate. While it is true that rare diseases, with a prevalence of <200,000 persons in the United States (~1 in 1650 Americans), are individually rare (many with incidences of 1 in 10,000 live births or less), they collectively affect millions.73 The diagnostic yield in ambulatory pediatric populations varied from 25 to 45% in a recent review74 with diagnostic yield in critical care settings at the high end of that range.72 Despite the high costs of ES/GS, the benefits of the avoided morbidities and unnecessary or harmful interventions can be massive (on the order of hundreds of thousands or millions of dollars), permitting ES/GS to be cost effective overall.72,75

When ES/GS is done in symptomatic individuals, it is possible to detect variants in genes unrelated to primary purpose of testing. These findings have been designated secondary findings by the ACMG.76,77 In particular, current guidelines for the reporting of secondary findings emphasize actionability with a goal of preventing morbidity and mortality.76,77,78,79 The current list of ACMG secondary findings emphasizes cardiovascular and oncologic disease. The reader will note that many secondary findings are for adult-onset disorders and testing and disclosure of these findings in children has provoked controversy78,80,81 by precluding an open future.33 However, this criticism is not universal and has been challenged of late.82

ES/GS for presymptomatic detection of “adult”-onset Mendelian disorders

Given that secondary findings are deemed important enough to report regardless of the reason for testing, it becomes natural to ask whether and in what circumstances to test for such conditions in an asymptomatic population.83 The most robust data here comes from the adult cancer literature. This is not surprising because cancer is relatively common and guidelines from National Comprehensive Cancer Network (NCCN) drive care.84 However, recent data support the finding that many individuals with pathogenic variants in genes predisposing to serious adverse health outcomes would not meet the current NCCN criteria for testing; indeed, 1–10% of unselected adult populations has an actionable, presymptomatic finding on genomic sequencing63,85,86 and a similar percentage of pediatric cancers are associated with cancer predisposition syndromes.87 If genomic medicine is deployed at scale in pediatric populations to accrue the potential benefits described in this review, these “adult-onset” conditions are potentially discoverable. A key limitation is that the ACMG secondary findings list was never designed with population screening in mind88 as the penetrance of many disorders in unselected populations is not known.89 Understanding the population health implications of a potential scaling of these findings from individuals to populations represents a key precision public health endeavor.5

These debates are not just about prevention in adults. Several of the disorders on the secondary findings list have screening guidelines based on taking the earliest age of onset in the family and starting several years earlier. This can easily fall into the pediatric age range. Indeed, it is increasingly recognized that the childhood/adult-onset dichotomy is simplistic and more nuanced age-based triage may be appropriate.90 Furthermore, identification of serious predisposition syndromes in the child who is far from the age of earliest screening or intervention informs other family members (i.e., parents) of their potential risk. This allows those individuals to pursue their own genetic testing and begin appropriate intervention or screening to reduce morbidity or mortality. This is an example of cascade testing. If a relative is found to have the same variant, that person’s first- and/or second-degree relatives are offered specific testing. In this way, the benefits of presymptomatic genetic diagnosis cascade across the population91 even to individuals who do not present for population-based screening. Cascade testing has been demonstrated to be cost effective in multiple conditions.92,93 Importantly, the pediatric patient whose genomic testing started this cascade of testing may benefit from the improved health of his or her parents by avoiding loss of that parent to early mortality,82 a recognized adverse childhood experience.94

Realization of the benefits of cascade testing secondary to the testing of pediatric probands in a socially acceptable manner faces certain challenges. First among these is the need to contact family members. Analysis of the various ethical arguments comparing proband-initiated and direct contact of at-risk family members suggests that direct contact by healthcare providers and/or public health programs is ethically permissible in specific, limited circumstances.95 Cascade testing requires access to contact information on at-risk family members, which in the United States, is generally not available unless provided by the proband. This preserves the proband’s right to privacy, and also limits the reach of cascade testing as some probands will decline to share the results of genomic testing. In contrast to adult probands, the pediatric proband has the built-in disclosure of the proband’s results to at least some at-risk family members—parents who provided consent for the initial testing on behalf of the minor proband. Parents in these cases act on behalf of themselves and all their other minor children at risk. It is worth noting that, in the case of adult probands or more distant relatives of minor probands, the threshold for an ethical breach of confidentiality is high,95 and that, in the United States, there is no legal “duty to warn” an at-risk family member regarding a genetic diagnosis. Whether this legal doctrine prevails over time in the United States or other jurisdictions is unclear.96

All debate on the nature and efficacy of contact in cascade testing, however, is predicated on accurate assessment of actionability of the identified variant. This requirement is challenged in the case of the asymptomatic child with an asymptomatic parent who share a novel variant in an otherwise actionable gene. When the natural history of the novel variant is unknown, the benefits of cascade testing are essentially unquantifiable and the potential for iatrogenesis rises. Again, resolution of these cases benefits from large, population-based approaches, which can leverage diverse, “big data” sources on genomic, environmental, and social determinants of health.5

Pharmacogenetics and pharmacogenomics

In addition to using genomic data to inform more classical health prevention efforts, the related fields of pharmacogenetics and pharmacogenomics hold promise in the prevention of iatrogenic harms.97 The goal of this branch of genomic medicine is to use genetic and genomic information to predict drug response and guide drug selection (or avoidance). These goals are supported by the Clinical Pharmacogenetics Implementation Consortium (CPIC).98,99,100 In addition to standardized nomenclature,100 CPIC promulgates guidelines for a number of gene–drug pairs, primarily drugs used in adult populations. While there is conflicting data on the utility of this testing to guide Warfarin dosing101 and evidence for a variety of other agents is mixed,102,103,104 systematic review suggests that, at least with regard to cost-effectiveness, much of the conflicting evidence is due to testing costs and availability of results at the time of prescribing. These results do not reflect rapidly falling price of testing and the potential to obtain pharmacogenetics and pharmacogenomic results in the context of ES/GS obtained for other reasons (e.g., reanalyzing data captured during NBS or as part of a diagnostic work-up), thus having them available at the time of prescribing.103 In general, the data for pediatric pharmacogenomics is limited compared to adult populations.105 In contrast to broadly prescribed drugs like anticoagulants in adults, the best available data in pediatrics are in transplant106,107 and oncology.108,109 Interestingly, because the pediatric agents of interest from a pharmacogenetic and genomic perspective are primarily prescribed by sub-specialists at tertiary or quaternary care centers, pediatric pharmacogenomics has the opportunity for better electronic health record integration than adult pharmacogenomics.110 This nexus suggests that, in the United States, pediatric pharmacogenetics and pharmacogenomics may overcome concerns about the challenges of re-consent111 when repurposing genomic information in this way by having this (re)consent obtained in the context of a broader clinician–patient–family relationship.

Polygenic risk scores

We have thus far been concerned with highly penetrant, Mendelian, disorders, but these collectively affect perhaps 10% of all people.55 Far more common are the common, complex diseases such as diabetes. Historically these were studied using genome-wide association studies (GWAS).112 However, despite some early success finding common alleles, which confer significant risk of disease,113 subsequent efforts, although numerous, have yielded disappointing results.114 Because of the large number of GWAS done over the years, it has recently been recognized that rather than looking for single risk alleles for common disease, there is now sufficient power to test the classical theory of polygenic risk.115 The first such test, against a phenotype of schizophrenia, was successful in 2009116 and has subsequently been followed by hundreds of other such analyses.117 While it is often the case that such polygenic risk scores demonstrate superior test characteristics to other classifiers based on genomic data, their actual clinical utility in driving preventative care above and beyond that informed by good primary care and knowledge of family history is unclear.118 Key barriers include increased uncertainty when moving from population estimates to individual patient risk assessment and the importance of ancestry or genomic background in applying polygenic risk scores developed in one population to a member of another population.119 Furthermore, because of the number of subjects needed to develop such models is large and the availability of pediatric subjects is limited relative to adult subjects and adult diseases, the number of robust polygenic risk scores for pediatric disease is likely to remain low for some time and/or focus on adult-onset conditions with childhood risk factors.120

Ongoing controversies, threats, and future directions

In describing the various ways in which genomics can contribute to prevention in the twenty-first century, we have not addressed several important concerns, which may affect the realization of this vision for the future of prevention. First and foremost are concerns about privacy, both for the tested individual and that individual’s relatives. Whether these concerns are well founded or not, they can have an enormous impact on faith in preventative health systems and/or uptake of services. Recent examples include objections to prolonged retention of dried blood spot cards in NBS121 as well as the use of genealogic DNA databases to catch criminals.122 This distrust is particularly problematic because much of the promise of genomics as a tool for improving population health by improving prevention is predicated on having data on a sufficiently diverse sampling of the population that interpretation remains a tractable problem as uptake of genomic services increases. Unfortunately, for too long, what we have considered a “normal” or “reference” genome has obscured the spectrum of normal genotypic variation in diverse and under-represented populations.123,124 This has already led to appreciable harm to minority populations.58

Whereas problems of lack of diversity in medicine and biomedical research long predate the new approaches to prevention discussed in this review, the related problems of variant reinterpretation and the extent of the duty to re-contact have several facets that are unique to genomics. Genomic variants are classified by laboratories according to the criteria from ACMG as benign or likely benign, uncertain significance, likely pathogenic and pathogenic.125 Over time, as knowledge accrues, these interpretations may change.126,127,128,129 Additionally, with ES/GS, new phenotypic features may suggest a need for reanalysis focused on new phenotypes or the discovery of new gene–disease pairs may necessitate reanalysis focused on old sequencing data.130,131,132,133,134 However, if this is the case, healthcare systems need to thoughtfully consider their policies for re-contact, both from a laboratory capacity perspective and from a clinician and patient expectations perspective.128,129,133,134,135,136 The ACMG has published points to consider to help guide this discussion.137

Data sharing has been crucial to efforts at harmonizing variant interpretation and supporting laboratory efforts to remain abreast of variant reinterpretations. Key resources supporting these efforts have been ClinVar,21,22,23,24,25 a public database of variants with their interpretations, and ClinGen,26,27,138,139 an expert curation of variant knowledge. Additionally, given the importance of having diverse samples to understand the normal distribution of alleles across humanity, data from resources such as ExAc140 and gnomAD141 have aided tremendously in the recognition of genomic variants that are common in various populations. Despite the help of these tools in fulfilling the promise of genomics, much of the work of applying specific genomic findings to patient care falls upon the limited number of genetics specialists, while most preventative care is coordinated by primary care providers (PCPs) who, while interested in the role of genomics in prevention, are also underprepared for this paradigm shift.142,143,144,145 Given the vastness of the genome, the limitations of the genetics workforce146 and the limited time available for primary care appointments,147 it is not realistic for a PCP, the most likely member of the medical team to direct preventative care, to manually check whether each of a patient’s genomic variants related to preventative care have been reclassified. Clinicians and public health workers will need access to accurate, up to date interpretations at the point of care if any of the roles for genomics in prevention in the twenty-first century is to come to pass. All of this suggests that as patients move between health systems, their genomic information needs to follow them. The difficulty is that there are no good standards for representing these data in medical records and thus transmission of these data cannot piggyback on existing standards for the secure transmission of medical data. Recent efforts of the eMERGE consortium have begun to address these issues;148 however, none of the developed solutions has been demonstrated to scale to genome scale across a health system while also promoting portability between systems.149

While next-generation sequencing approaches can generate massive amounts of data, particularly with regard to single-nucleotide variation, small insertions and deletions, and, increasingly, copy number variation150 and mobile element insertions (long interspersed nuclear element and short interspersed nuclear element),151 there are a variety of other forms of genomic variation that are not well characterized by current short read methods, including low-complexity repetitive regions (e.g., GC-rich regions), long-range phasing of variants, and complex structural variation, as well as epigenetic modifications. Emerging long read and single-molecule technologies may address these concerns, but will bring their own challenges of interpretation.152 However, even as we increasingly generate data on a wider array of genomic variation, interpretation (a necessary prerequisite for application to prevention) will continue to be a challenge, particularly outside coding regions.153

Conclusions

In this review, we have enumerated several means by which genomics may contribute to prevention of morbidity and mortality in the twenty-first century with both medical and public health contexts. Each of these methods faces barriers to fulfilling its promise. Importantly, the vast majority of these barriers cannot be addressed simply by making genomic sequencing faster or cheaper. Rather, they represent important choices societies need to make about the provision of public health services as well as clinical and public health informatics challenges. It remains unclear, nearly 20 years into the twenty-first century, whether the allocation of scarce resources to the integration of genomics into prevention will represent a diversion of resources from successful, broad, population-based preventative measures or, more optimistically, a renaissance in prevention and precision public health.