Introduction

Mapping the relationship between human genetic and phenotypic variation is a challenging task for human biology. When such variation is so extreme as to disrupt normal functioning, it becomes a major challenge for medicine as well. In an effort to address this, and as part of its mission to advance mental health research, the United States’ National Institute of Mental Health (NIMH) supports a broad research program in human genetics. The overarching goal is to uncover psychiatric disease mechanisms grounded in human biology.

It is clear for psychiatric disorders, and other complex disorders, that there is no simple unitary mapping between a gene and disease. For any given individual, genetic risk is a unique combination of many variants, from common to rare, that are often shared across psychiatric diagnoses. Even in the case of rare disorders and highly penetrant mutations, the cumulative genetic inheritance of the individual has a strong influence on the type and severity of the clinical presentation [1, 2]. Thus, when basic researchers ask, as they often do, “Which gene should I study for [insert psychiatric disorder],” the answer is not as straightforward as the question implies because it belies the underlying complexity. Indeed, the study of single genes, in isolation, may be inadequate for understanding the pathogenesis of common psychiatric disorders and should be supplanted with approaches addressing the complex interplay of genomic and other risk factors in shaping phenotypic outcome [3].

In an effort to enhance its psychiatric genetics research program and prioritize follow-up studies, the NIMH convened the Genomics Workgroup of the National Advisory Mental Health Council (NAMHC) comprised of geneticists and neuroscientists. Their recommendations emphasize the primacy of rigorous statistical support from properly designed, well-powered studies for pursuing genetic variants reliably associated with disease [4]. In light of these recommendations, here we provide broad guiding principles for investigators to consider when conducting studies motivated in whole, or in part, by an association between human DNA sequence variation and a psychiatric disorder or related trait. When evaluating proposed studies, we weigh these points in the context of reviewer comments, the existing literature, and current investments in related projects. Following the NAMHC report, statistical strength, and robustness of the underlying genetic discovery weighs heavily in our funding considerations as does the suitability of the proposed experimental approach. Importantly, discovery in human genetics is proceeding rapidly and genetic risk factors identified through various rigorous study designs and analytic methods across multiple cohorts and studies are more likely to stand the test of time and warrant deeper investment.

From genetic architecture

A critical lesson from the past decade of human genetics research is that regardless of whether variants are common or rare, occur in coding or noncoding regions, or alter single base pairs or large segments of the genome, strict statistical methodology is required to identify robust associations with a disease or trait. Deciding which variants to pursue for biological follow-up thus requires understanding what the initial genetic discovery does and does not tell us about the relationship to disease.

A now-common approach for genetic discovery is genome-wide association (GWA) studies. These identify statistical relationships between common single nucleotide variants across the entire genome and a phenotype of interest. Critically, they implicate regions of the genome (loci) and do not necessarily pinpoint the causal variant(s), gene(s), or mechanism(s) underlying the association. When assessing GWA findings from the literature, caution is warranted because published studies may simply list genes within risk regions, and this may give the false impression that these are the relevant disease genes. For complex traits, hundreds of regions and potentially thousands of genes may be involved that require tens to hundreds of thousands of subjects to identify [5,6,7,8,9]. Approaches that take into account the cumulative effects of these numerous and subtle changes in gene function may thus be more informative with regard to disease mechanisms than those focusing on individual, associated variants or genes [10].

In another approach, rare variant association studies identify genes across the genome that harbor more mutations in a group of subjects (e.g., ASD cases) compared to a well-matched group (e.g., ASD siblings) or expected by background mutation rates [11,12,13]. Due to limited power, sequencing studies of rare protein coding (“exome”) variants often implicate genes and specific mutation categories (e.g., loss-of-function or missense mutations), but seldom individual variants. Given that each of us carry hundreds of protein altering mutations, some of which are damaging, rigorous statistical methods are required to guard against biologically interesting yet irrelevant associations [14]. Rare variant discovery methods differ in their assumptions, power, error rates, and how they leverage genetic and nongenetic information to identify genome-wide statistical associations [15,16,17,18]. With any approach, a highly rigorous statistical framework is essential [19]. In some cases, more liberal statistical thresholds may be appropriate to identify larger sets of genes to be interrogated collectively, either bioinformatically or experimentally, where the biological signal prevails over the noise. With limited resources, however, a conservative approach relies on the strictest statistical criteria to select genes of high confidence for deep, targeted experimental follow-up (see Supplemental information).

In some cases, symptoms of rare diseases and syndromes may overlap with those of more common and etiologically complex disorders. For example, individuals with Fragile X or velocardiofacial syndromes often have features of autism or schizophrenia, respectively. Although these syndromes have well-defined genetic causes, the manifestation of different symptoms is likely influenced by other genetic factors. Thus, mutations in FRM1 and copy number variants (CNVs) of 22q11 that cause these syndromes should not be considered synonymous with autism or schizophrenia. In fact, the more common neurological feature for both these syndromes is intellectual disability [20, 21]. Even in the case of fairly penetrant variants such as CNVs, unbiased genetic associations are required to establish a definitive link between these variants and a psychiatric diagnosis [22]. This holds true for individual genes within large, multigenic CNVs. Currently, there is no robust statistical genetic evidence that these CNVs can be resolved to individual risk genes [17] nor are there consensus functional experiments for determining which gene or genes within or near an associated CNV is driving disease risk.

Overall, there are many factors to consider when assessing the strength of evidence for a genetic association and it may be difficult without familiarity of statistical or quantitative genetics (see Supplemental information for an FAQ). We recognize that there are still only a handful of causal variants and genes with a high level of statistical confidence across psychiatric disorders, but we expect this number to grow with increases in sample size, better functional annotation and improved methods for causal inference [23]. We provide some suggestions (Box 1a) for prioritizing where to focus experimental efforts to gain relevant biological traction.

To biology

Once a reliable association is identified, the crucial and more difficult step is transforming that genetic discovery into biological insight. Although true genetic associations must, by necessity, signal a causal effect, that effect may be indirect and far removed mechanistically from the trait of interest [24]. This has important implications for the selection, design, and interpretation of experimental paradigms. Psychiatric disorders lack specific markers and genetic variants often influence many traits (i.e., pleiotropy) [25]. This reality limits the ability to draw causal inferences about disease mechanisms. It is thus critical to clearly define the research question and carefully select the most appropriate experimental system to address that question (Box 1b). To illustrate these points, we provide and evaluate five examples of hypothetical studies related to mental health aiming to biologically follow-up genetic findings.

(1) Embracing pleiotropy and polygenicity

Analyses of well-powered GWA and exome studies show that implicated genes as a whole are preferentially expressed in region X of the human brain relative to other regions. Region X consists of a mixed population of cells that project to different downstream brain regions. Our preliminary data indicate that well-powered GWA and sequencing studies from related diseases also show an enrichment, albeit weaker one, in region X. Our data further indicate an enrichment of orthologous genes in brain region X of species Y. Here we use our method of isolating cells based on their projection targets in species Y to better resolve the genetic enrichment observed across these diseases into projection-specific cell types. We will further develop transcriptional signatures of these cells and validate our findings in human brain. This study will increase our understanding of how genetic risk across related diseases map onto specific cells and circuits in the brain.

This proposed study does not focus on any one gene or gene variant, but rather follows up on well-powered analyses that take into account many potential genes associated with disease. Furthermore, given the known overlap in the causes and symptoms of psychiatric disorders, it does not focus on any one disease. It takes a finding based on human genetics and human tissue, identifies a similar biological signal in another species, then leverages that experimental system to further refine that signal and validate it back in humans. Using genetic information in this manner may reveal common biological mechanisms that would otherwise be missed by single gene and single disease studies.

(2) Getting back to basics

Gene X has been fine-mapped as a likely causal gene within a genome-wide significant disease risk locus and replicated across multiple cohorts. (Alternatively, gene X has a genome-wide significant increased mutation burden in cases versus controls across multiple cohorts). Here we propose to further understand the function of gene X in the brain. Our preliminary data show that in both an experimental organism and humans gene X is most highly expressed in a specific cell type known to be important for a particular process. We will transcriptionally profile this cell type across development in the organism to identify the time course of gene X expression and determine if it coincides with the maturation of the relevant cellular process. We will also transcriptionally profile wildtype and conditional, cell-type selective gene X knockout cells, assess the impact on the cellular process and identify potential molecular mechanisms via rescue of differentially expressed targets. This study will advance our understanding of gene X biology.

Although this proposed study is based on a statistically robust genetic finding, it is intended as a basic science study into the biology of gene X. The strength of the study depends on how much and how well it will advance our understanding of gene X and/or fundamental principles of biology. These findings may or may not be relevant for understanding how human genetic variation in gene X contributes to variation in disease risk, but it will contribute basic knowledge whose ultimate impact will be determined in the future.

(3) Updating poor priors

A recent genome-wide association study identified 12p34 as a genome-wide significant disease risk locus. Interestingly, one of the three genes within the risk locus is gene X that is part of a pathway long hypothesized to be involved in disease pathogenesis. How gene X is involved in this pathway is still not fully understood. Our preliminary data show that manipulating expression of gene X produces changes in the pathway similar to those observed in patients. Here we will use a series of loss- and gain-of-function experiments to characterize how the function of gene X influences the function of the pathway. This study will advance our understanding of gene X and how it contributes to disease.

This proposed study is based on the location of gene X within a genome-wide significant risk region. There is, however, no evidence that the association with disease is operating via that gene as opposed to the other two genes within the region or possibly other more distant genes. Such evidence would require a statistical and/or functional fine-mapping procedure [26, 27]. The information that gene X is part of a disease-relevant pathway may appear to provide a strong prior for focusing on that particular gene. Yet, in the absence of a formal unbiased assessment of how many pathways have been associated with disease, how many genes are within each of those pathways, how these genes are distributed across the genome, or how many risk regions and genes overlap with those regions, it is not possible to determine whether this is a chance occurrence or not. There may be other lines of evidence that indicate gene X and its pathway are important to study in terms of fundamental biology, but in this example the evidence of disease relevance is circumstantial and may in fact detract from the potential basic science value of the study.

(4) Studying rare variants

Gene X has a genome-wide significant increased burden of heterozygous loss-of-function mutations in cases versus controls across multiple cohorts using various analytic methods. In order to understand disease pathogenesis and pathophysiology, here we propose to characterize the effects of gene X knockout at the molecular, cellular, and behavioral levels in an experimentally tractable organism. Our preliminary data show disease-like deficits in behavior that are rescued by expressing gene X in adult animals. This study will identify the molecular and cellular mechanisms by which gene X causes disease and identify potential therapeutic targets.

Although this proposed study is based on a statistically robust genetic finding, there are limitations to the experimental approach. The application proposes characterization of an animal knockout (null allele) that may or may not mimic heterozygous loss-of-function mutations observed in patients. The goal is to explore disease pathogenesis and pathophysiology and not the basic biology of the gene, but the proposal assumes that because mutants exhibit “disease-like” behavioral deficits, changes at the molecular, or cellular level will be de facto relevant for understanding human disease mechanisms. Such changes must be interpreted in a wider biological and clinical context. For example, how many null mutations in other genes not associated with the disease cause similar changes in the organism? This problem may be mitigated by collectively studying multiple disease-associated genes to look for common patterns of effects. Yet, that design would require an appropriate set of well-matched control genes assessed for similar changes. Critically, “disease-like” behavior in an animal does not establish a causal link among a gene, molecular/cellular effect, and disease nor does the absence of such behavior refute it.

It is important to keep in mind that loss-of-function mutations in particular genes are often associated with multiple neurological and psychiatric phenotypes in people, are not fully penetrant and are modified by polygenic effects [2, 28]. This lack of specificity at both the genetic and phenotypic levels should guide and constrain the interpretation of phenotypes observed in mutant organisms. In general, broad characterization of mutant animals are less informative than those that first take into account wider biological and clinical factors to formulate well-defined research questions (Box 1).

(5) Studying common variants

Gene X has been fine-mapped as a likely causal gene within a genome-wide significant disease risk locus and replicated across multiple cohorts. In order to understand disease pathogenesis and pathophysiology, here we propose to characterize the effects of gene X knockout at the molecular, cellular, and behavioral levels in an experimentally tractable organism. Our preliminary data show disease-like deficits in behavior that are rescued by expressing gene X in adult animals. This study will identify the molecular and cellular mechanisms by which the gene causes disease and identify potential therapeutic targets.

This proposed study is similar to the previous example except that it is based on a common variant association mapped to a gene. In addition to the limitations noted above, the approach has the added issue of attempting to recapitulate, with a null allele, what are often subtle, cell-type specific regulatory effects of common noncoding variants. These risk variants in humans are typically of low effect and operate in the context of hundreds of other such variants. The biological effect of a null allele in an experimental system is thus not necessarily related to a single risk variant in the context of this polygenic effect in humans. It is for this reason and those mentioned above that studies attempting to create a genetic ‘model of disease’ are potentially more problematic than those focused on either the basic biology of likely causal variants, or their associated genes, within a risk locus or a more integrative approach that accounts for a multitude of risk variants [29].

Summary

The recent and rapid progress in psychiatric genetics has simultaneously created a set of opportunities and challenges (Box 1c). Previously, when genetic clues were scarce, investments in experimental studies tolerated potential false alarms to avoid possible misses for much needed hits. Now, however, with a steady stream of reliable genetic findings, false positives have the danger of diverting precious resources. The availability of large-scale, multidimensional genetic, and genomic datasets combined with incentivized motivated reasoning, confirmation bias, and apophenia [30] poses the danger of generating appealing but unsupported biological narratives. An emphasis on the strength of statistical support for genetic discoveries serves as an important check and a guide for resource investment. This does not imply that genetics are not informed by prior knowledge and biological data, but such information needs to be formally and rigorously incorporated into the whole body of evidence [14, 31, 32], not applied in an ad hoc manner—it is important to systematically integrate statistical and biological evidence to reduce the risk of introducing bias. We hope that such an approach, coupled with appropriate experimental systems, will at most reveal disease-relevant mechanisms and at least advance our understanding of fundamental biology.