The Use of Whole Exome Sequencing in a Cohort of Transgender Individuals to Identify Rare Genetic Variants

Approximately 0.5–1.4% of natal males and 0.2–0.3% of natal females meet DSM-5 criteria for gender dysphoria, with many of these individuals self-describing as transgender men or women. Despite recent improvements both in social acceptance of transgender individuals as well as access to gender affirming therapy, progress in both areas has been hampered by poor understanding of the etiology of gender dysphoria. Prior studies have suggested a genetic contribution to gender dysphoria, but previously proposed candidate genes have not yet been verified in follow-up investigation. In this study, we expand on the topic of gender identity genomics by identifying rare variants in genes associated with sexually dimorphic brain development and exploring how they could contribute to gender dysphoria. To accomplish this, we performed whole exome sequencing on the genomic DNA of 13 transgender males and 17 transgender females. Whole exome sequencing revealed 120,582 genetic variants. After filtering, 441 variants in 421 genes remained for further consideration, including 21 nonsense, 28 frameshift, 13 splice-region, and 225 missense variants. Of these, 21 variants in 19 genes were found to have associations with previously described estrogen receptor activated pathways of sexually dimorphic brain development. These variants were confirmed by Sanger Sequencing. Our findings suggest a new avenue for investigation of genes involved in estrogen signaling pathways related to sexually dimorphic brain development and their relationship to gender dysphoria.

Whole exome sequencing. Genomic DNA was extracted from the blood of all enrolled subjects and treated with RNAse to remove residual RNA. DNA (2-3 μg/subject) was sent to the Yale Center for Genome Analysis for WES. DNA was sheared to a mean fragment length of 220 bp using focused acoustic energy (Covaris E220; Woburn, Massachusetts). Fragments were then blunt ended and phosphorylated using T4 DNA polymerase and T4 polynucleotide kinase. Custom adapters were ligated to each fragment using T4 DNA ligase before DNA fragments were PCR amplified.
The amplified DNA was then heat-denatured and mixed with biotinylated DNA probes (IDT xGen Exome Panel; Coralville, Iowa). Hybridizations were performed at 65 °C for 16 hours. The captured fragments were PCR amplified and then purified with AMPure XP beads to generate each subjects' DNA library. The Illumina NovaSeq. 6000 S4 platform was used to perform paired-end sequencing on these DNA libraries, with reads of 2 × 100 bp. Burrows-Wheeler Aligner (BWA) was used to map sequence reads to the genome. The Yale Exome Pipeline was used to call exome-wide variants, and Annovar and Variant Effect Predictor were used for variant annotation.
filtering of variants. Annotated variants were filtered using the following criteria: 1) not present in 88 in-house control exomes from non-transgender individuals (17 males and 71 females), 2) frequency less than 0.01 in the ExAC, 1000 Genome, and Yale databases, 3) not listed in the dbSNP, 4) ACMG Class 3 and 4 variants: includes frameshift, splice-site (intronic variants occurring 1-2 bps from the intron/exon junction), splice-region (exonic or intronic variants occurring 1-3 bps or 3-8 bps from the intron/exon junction respectively), nonsense variants as well as missense variants with Combined Annotation Dependent Depletion (CADD) score ≥20 (Fig. 1). Variants with a CADD ≥20 are predicted to be in the top 1% of those most likely to affect the function of a given gene when compared to wild-type 34 . Selection of candidate genes for sanger sequencing. After filtering, candidate variants were selected for Sanger Sequencing confirmation if they involved genes related to pathways of sexually dimorphic brain development. Because sex-specific brain development is still poorly understood in humans, we began by performing a thorough literature review of prior research involving sexually dimorphic brain development in animal models Scientific RepoRtS | (2019) 9:20099 | https://doi.org/10.1038/s41598-019-53500-y www.nature.com/scientificreports www.nature.com/scientificreports/ ( Supplementary Fig. 1). We then focused our analysis on variants of genes associated with those developmental pathways that had the potential to be conserved in humans. The selection process was as follows: (1) Data was collected regarding the descriptions, known functions, and ontologies of all genes involved by the variants that remained after the initial filtering process. Three databases were used to collect this data including NCBI Reference Sequence, Gene Ontology Consortium, and Online Mendelian Inheritance in Man (OMIM) 35-38 ; (2) the information gathered from these databases was compared to known pathways of sexually dimorphic brain development in rodent models 28,33 ; (3) genes were selected as candidates if their descriptions, functions, or ontologies suggested an association with these pathways; (4) variants that both remained after filtering and involved the above candidate genes were selected for confirmation via Sanger Sequencing. Sanger sequencing. In each instance, DNA primers were selected using NCBI Primer-BLAST.
Amplification of DNA was performed using polymerase chain reaction under the following conditions: samples www.nature.com/scientificreports www.nature.com/scientificreports/ were first heated to 95 °C for 5 minutes to allow DNA denaturation. They were then subjected to 30 cycles of the following conditions to allow primer annealing: one minute at 95 °C, 30-60 seconds at 55 °C, and 30-60 seconds at 72 °C. Finally, samples were heated to 72 °C for 7 minutes to allow DNA extension. After PCR amplification, the presence of DNA fragments of the anticipated size was confirmed with agarose gel electrophoresis, visualized under UV light. Confirmed PCR products were then subjected to ethanol precipitation, before and after a sequencing reaction was performed. Products of the sequencing reaction were then analyzed with an ABI 310 Sequencer and individually confirmed by the investigators.

Results
For this preliminary study, 30 transgender individuals were enrolled, including 13 transgender males (i.e. natal females transitioning to male) and 17 transgender females (i.e. natal males transitioning to female). WES with a coverage of 98.36% and average read depth of 75 demonstrated 120,582 genetic variants. After filtering, 441 variants in 421 genes remained for further consideration. This included 21 nonsense, 28 frameshift, 13 splice-region, and 225 missense variants. The remaining 154 intragenic variants appeared in non-coding regions of the genes, defined as the 5′UTR, 3′UTR or intronic variants located >8 bp away from a splice junction. These variants from non-coding regions were not considered for further analysis. From the 421 genes that remained after filtering, 19 were found to have associations with pathways of sexually dimorphic brain development and were thus selected as candidate genes. Twenty-one variants were noted in these candidate genes and confirmed with Sanger Sequencing. nonsense variants. After filtering, twenty-one nonsense variants, of those originally identified by WES, remained (Table 1). With the exception of a variant of DIAPH2, which is localized to the X chromosome, each variant was observed in only a single subject, and all were heterozygous. The variant DIAPH2, c.G736T (p.G246X), was noted in three transgender males and one transgender female, with the transgender female reported as hemizygous and the transgender males reported as heterozygous. frameshift variants. Of the frameshift variants identified by WES, there were 28 that remained after filtering ( Table 2). In general, all frameshift variants were heterozygous, including a variant of PPP2R3B, c.356dupC (p.P119fs), located on chromosome X, in one transgender male. One exception to this was noted in a single transgender male, who was homozygous for a variant of PRAMEF13, c.1291_1292insA (p.A431fs).

Splice-region variants.
Of the splice-region variants initially identified by WES, 13 remained after filtering (Table 3). All splice-region variants were heterozygous and noted in only a single subject.

Missense variants.
The majority of variants present after filtering were missense variants with a CADD score ≥ 20. In total, 225 variants were noted involving 213 genes (Supplemental Table 2  When specifically considering missense variants of genes located on the sex chromosomes, in transgender females there were seven hemizygous variants noted in six genes located on the X chromosome: ASMT, CXORF57, GTPBP6, P2RY8, PLCXD1, and RGAG1. None of these genes has been shown to escape X-linked inactivation 39 . All but one variant was noted in only a single individual. Two transgender females were noted to have two concomitant hemizygous PLCXD1 variants, c.T512C (p.I171T) and c.G295A (p.G99R). In transgender males, four variants were noted in genes on the X chromosome: ATRX, GTPBP6, PPP2R3B, and ZXDA. All of these were heterozygous and observed in only a single subject. After filtering, there were no remaining variants of genes located on the Y chromosome.
Genes related to pathways of sexual differentiation in the brain. The variants described above were compared to the previously described pathways of sexual differentiation in the brain, and 21 variants involved in 19 genes related to these pathways: AKR1C3, BOK, CDH8, CDK12, CTNNA2, DNER, DSCAML1, EGF, EFHD2, GRIN1, KCNK3, MAP4K3, PIK3CA, PPARGC1B, RIMS3, RIMS4, SPHK1, SYNPO, and TNN (Table 4). These included 17 missense and 4 nonsense variants, and each was confirmed via Sanger Sequencing. Each of these variants was heterozygous found in a single i n d iv i d ual.

Discussion
The factors that lead to gender dysphoria are poorly understood, but given that prior research has suggested a significant heritable component, we sought to explore what the possible genetic contribution could be 22 . Somewhat counterintuitively, the ER initiated developmental pathways occurring in the VMN, mPOA, AVPV, and the arcuate nucleus are active in natal males rather than in natal females 33 . This is because, during the perinatal period, the testes initiate a rapid but transient surge in serum testosterone, which can readily be aromatized to estradiol, thus driving the ER activated neurodevelopmental pathways that result in male pattern behavior 33  www.nature.com/scientificreports www.nature.com/scientificreports/ during this same perinatal period, the ovaries are quiescent in natal females, and the resulting lack of ER stimulation drives sex-specific neurodevelopment toward the path of feminization 33 .
In a series of animal model studies confirming these pathways, researchers exposed developing rodents to substrates that would either induce ER activated pathways in females (i.e. exposure to testosterone or estradiol) or disrupt those pathways in males. In each of these models, investigators were able to induce cross-sex neurodevelopment and resulting cross-sex behavior in rodents, scored by examination of lordosis and proceptive behavior in males and by mounting and thrusting events in females 42 . Interestingly, despite the fact that cross-sex behavioral changes only manifested at sexual maturity, they only occurred if neurodevelopmental alterations were induced during the perinatal period 28 . This suggested that sex-specific neurodevelopment must be initiated during a short perinatal window, but that the resulting behavioral effects only emerge with the awakening of the hypothalamic-pituitary-gonad axis (i.e. puberty). This short span of neurodevelopmental malleability was dubbed the "critical period", and it corresponds to the spontaneous surge in testosterone production that occurs in natal males 33 . Of note, similar patterns have been observed in at least nine other mammalian species including non-human primates 41 . Overall, the above evidence suggests that, in many species and by proxy of estrogen, testosterone's presence or absence during the perinatal period drives diverging pathways of permanent sex-specific neurodevelopment, which later result in sex-specific behavior.
Though sex-specific brain development has not yet been thoroughly evaluated in humans, the above pattern is consistent with the average developmental timeline of gender dysphoria, which often significantly worsens at the time of puberty, and there is ample evidence to suggest that sex-hormone exposure in the prenatal period does affect sex-specific behavior in humans. The preponderance of data has been obtained through research involving natal females with classic congenital adrenal hyperplasia. These individuals are exposed to elevated levels of androgens prior to birth, with varying degrees external genital virilization. Several studies evaluating behavior patterns in natal female children with congenital adrenal hyperplasia have demonstrate that they are more likely to engage in male typical play and less likely to engage in female typical play 41 . In addition, the rate of gender dysphoria is significantly increased when comparing natal females with congenital adrenal hyperplasia to unaffected natal females, at 3.0% and ~0.2% respectively 2,27,41 . When considering natal males, there is less available data, but there have been case reports in which natal males expressing gender dysphoria were later diagnosed with congenital hypogonadotropic hypogonadism, a condition in which levels of endogenous androgens are significantly diminished throughout development and adult life 22,43 . Given the above data, the conservation of mammalian estrogenic metabolic pathways, as well as the corollaries between sex-specific developmental and behavioral patterns, we believe that evaluation of variants of genes associated with these developmental patterns constitute one initial avenue for exploring the origins of both gender dysphoria and gender identity in general. the medial preoptic area. In general, ER activation during neurodevelopment results in sex specific differences in brain volume or dendritic spine density, but notably, both the intermediary pathways occurring after ER activation and the final effects on brain volume and density are specific to each region 33 . Within the mPOA, ER activation causes increased activity of cyclooxygenase-2 leading to increased levels of presynaptic prostaglandin-E2. Prostaglandin-E2 is released from the presynaptic neuron, where postsynaptic prostaglandin receptor activation, in conjunction with astrocyte signaling, leads to increased mobilization and activation of dendritic AMPA-type glutamate receptors, resulting in increased dendritic spine formation 28 . In rodents, increased number and density of dendrites in the mPOA is associated with male pattern behavior at maturity (i.e. mounting and intromission) 33 . As mentioned above, when AMPA receptor agonists were given exogenously to neonatal females during the 10-day critical period after birth, this resulted in adult females that demonstrated male pattern sexual behavior. In total, nine candidate genes with confirmed variants (by Sanger sequencing) were identified that could affect this pathway 44 . One gene, SPHK1, has been shown to modulate cyclooxygenase-2 acetylation in neurons, another, DNER, has been shown to mediate neuron-glia interaction, and seven others, CDH8, CTNNA2,  the ventromedial nucleus. In the VMN, stimulation of ER on presynaptic neurons results in presynaptic release of glutamate, which in turn stimulates postsynaptic NMDA receptors. Activation of these receptors causes an influx of calcium into the neuron and activation of MAP kinase, leading to an increased number of dendritic spines in the VMN 33 . As in the mPOA, increased number and density of dendrites is associated with male pattern behavior in animal models. Four Sanger confirmed variants were noted in genes associated with this pathway. RIMS3 and RIMS4 both play an important role in presynaptic vesicle fusion and neurotransmitter release. GRIN1 is an NMDA receptor and MAP4K3 is a MAP kinase, both of which are expressed in the brain (Table 4) 53-55 . the anteroventral periventricular nucleus. In the AVPV, ER activation leads to increased levels of TNF-α, which thereby decreases BCL-2 activity and increases levels of BAX/BAD. This shift leads to neuronal apoptosis and lower overall AVPV volume in males. Decrease in AVPV volume is thought to prevent adult male rodents from exhibiting the LH surge characteristic of females 33 . One confirmed candidate gene, BOK, has been associated with the apoptosis response to BAX/BAD (Table 4) 56 .
the arcuate nucleus. In the arcuate nucleus, neuronal ER activation during development leads to increased levels of the activated chloride channel pNKCC1, leading to an influx of Clinto the neuron 31 . This has a polarizing effect, due to the resulting Clconcentration gradient across the neuronal cellular membrane, with the higher concentration now present within the neuron. GABAAR is a bidirectional Clchannel that is also present on the cellular membrane of developing neurons 30 . When activated by GABA, GABAAR allows Cl − to flow down its concentration gradient toward equilibrium, resulting in a neuronal depolarization that in turn opens L-type voltage-gated calcium channels, allowing the influx of calcium into the neuron. Potentially via calcium activated calmodulin kinase or mitogen activated protein kinase, this influx of calcium causes intraneuronal CREB to be phosphorylated to pCREB 57 . Through a still poorly understood mechanism, increased intraneuronal pCREB leads to decreased dendritic spine number in the arcuate nucleus of the developing male rat 29,33 . One confirmed candidate gene, KCNK3, transcribes a transmembrane K+ channel that has been shown to be upregulated in a compensatory fashion in neurons where GABAAR activity was experimentally disrupted (Table 4) 58 .
Variant effects in transgender males compared to transgender females. Given that gender dysphoria in male and female individuals are generally considered to be distinct phenotypes, we felt that it was important to not only analyze the combined results of all enrolled individuals but also to analyze the results when considering transgender males and transgender females separately. It has been shown that ER activation has differing and sometimes opposing developmental effects in different brain regions and in different sexes. This leaves open the possibility that genes associated with those developmental pathways could contribute to gender dysphoria in both natal sexes. Along those lines, one variant of interest is DIAPH2: c.G736T (p.G246X), which was heterozygous in three transgender males and hemizygous in one transgender female. DIAPH2 is located on the X chromosome and disruption of this gene has been associated with premature ovarian failure in natal females 59 .
Other ACMG class 3 and 4 variants. In addition to the candidate genes noted above, WES revealed multiple other unique ACMG class 3 and 4 variants that we could not link to known pathways leading to sexual dimorphism in the brain. We felt that it was important to include these variants for two reasons. First, as previously mentioned, the genetic contributors and neurodevelopmental pathways contributing to gender identity are, as of yet, poorly understood. Second, there have been many cases in which new and unexpected functions have been discovered in genes with seemingly unrelated activity described previously. Though only candidate genetic variants were confirmed by Sanger Sequencing, each of the genes listed throughout this article was suggested by WES to have a variant that was not present in either the ExAC or dbSNP databases, and was also not present in 88 in-house non-transgender controls.

Special considerations in gender identity genetic research. Several factors add layers of complexity
to the investigation of the genetic contribution to gender dysphoria and transgender identity. First, like many human traits, gender identity is unlikely to result from the variation of a single gene. Notably, even the binary categories of "transgender male" and "transgender female" are not sufficient to describe all members of the gender-expansive community, with some individuals for example self-describing as gender non-binary or agender. The broad spectrum that characterizes human gender identity suggests that, rather than being tied to variation within a single gene, an individual's gender identity is more likely the result of a complex interplay between multiple genes as well as environmental and societal factors. The above concept, in which variation of multiple genes can additively contribute to the final formation of complex human traits, is termed the Polygenic Threshold Model, and it has been cited by prior investigators as the most likely mechanism by which gender identity develops in humans 22 . These investigators noted that this model is both "inherently destigmatizing" in its presentation of a wide phenotypic spectrum and that it contradicts the idea that a specific complement of genetic variants could be used to identify or predict an individual's gender identity. The inherent challenge to genetic research within this framework is that, rather than approaching investigation as the search for a single so-called "transgender gene", the investigation must rather strive to understand the complexities of gender development through the lens of genetics. This requires the acknowledgement that the genetic milieu contributing to gender identity may be completely different from one individual to the next. While, in some individuals, a single genetic variant may be sufficient to result in gender dysphoria, it does not follow that that particular variant would be necessary or sufficient to cause gender dysphoria in the population at large. It is for that reason that newer modalities of genetic research, such as genome wide association studies and whole genome/exome sequencing, are more likely to be contributive to this field in a positive way. Technologies such as these, WES in the case of our study, allow for the evaluation of patterns of genetic variation that exceed the search for individual genetic variants. A second factor that must be considered when conducting genetic research involving the transgender community is the potential societal impact that could result from the findings. This requires not only input from experts within the field of genetics and research ethics, but importantly must also consider the attitudes, opinions, and beliefs that exist within the transgender community. Of note, one of the primary reasons that we began this study was in response to inquiries by transgender patients from our clinics. Surprisingly, upon searching for formalized studies assessing these views, we were unable to find any. For that reason we have now begun a study, which is now in the early stages of data collection, precisely to assess those values. As should be true in all fields of research, but even more crucially in this area, investigators must recognize that the reporting of their findings must be made with great care and with precise language so that the results cannot be misconstrued, exaggerated, or used to negatively impact the transgender community at large.
Complicating each of the above issues is a third challenge inherent to research involving the genetic contribution to transgender identity. In general, genetic research focuses on the etiologies of pathologic traits. Because of this, the terminology currently accepted for description of genetic variants reflects an assumption that a trait caused by a potential variant is pathologic. Of note, the World Health Organization no longer classifies transgender identity as a pathologic trait, stating that the "evidence is now clear that [gender dysphoria] is not a mental disorder 60,61 ". When investigating the etiology of a trait that we do not consider to be pathologic, such as transgender identity, extreme care must be taken in the descriptions of candidate variants and their potential relation to that phenotype. For that reason, though we do utilize the American College of Medical Genetics recommendations for the description of variants, we use this terminology with the understanding that terms such as "deleterious", "pathogenic", or "likely pathogenic" simply describe the predicted functional effect that a particular variant will have when compared to wild-type, rather than the phenotype associated with that change. And though a full discussion of this topic is outside the scope of this article, we feel that it is worthwhile to consider the potential benefits of adopting a modified version of the current variant classification system, that could be utilized in genetic studies evaluating transgender identity and other non-pathologic states. Similar to the current classification system, the modified system would allow investigators to convey varying degrees of likelihood that a functional genetic change would be imbued by a particular variant, but without the inherently negative connotation associated with terms such as "pathogenic".
Limitations of this study. A primary limitation of this study was that it included only 30 subjects, though this does constitute a larger sample size than the majority of prior studies utilizing WES to study gender dysphoria. It is for that reason that we consider the above findings to be preliminary in nature. We acknowledge that for any conclusions to be drawn regarding the extent to which a specific genetic variant contributes to gender dysphoria, segregation and in-vitro analysis will be essential. However, we felt that it was important to report this preliminary data to provide a new framework (i.e. consideration of variants affecting sexually dimorphic brain development) for gender identity research. We continue to enroll new patients and will continue to report significant findings as they become available.
In addition, we are unable to characterize the extent of the majority of subjects' transition processes, as this information was not collected as part of the enrollment process. However, we did make certain that each subject met the clinical criteria for gender dysphoria before enrollment. Moving forward, we may include questions assessing the timing and extent of transition as part of the enrollment process. Finally, this study was limited in that whole exome, rather than whole genome, sequencing was utilized to identify variants. This was primarily due to cost, and may be addressed in the future as the cost of whole genome sequencing continues to fall.
In summary, our study has identified genetic variants in 19 candidate genes that may be involved in pathways of gender development in the brain. These variants, first identified through WES, were then confirmed with Sanger sequencing, and each was categorized as class 3 or 4 using the ACMG classification system. None were found in the ExAC or dbSNP databases, or in 88 non-transgender controls, suggesting that they are not common polymorphisms. The new candidate genes have each been shown to have some relation to key contributors to the currently known pathways of sex-specific brain development in rodent models. Though these neurodevelopmental pathways have not been characterized in humans to the degree that they been described in animals, we believe that genes involving these pathways constitute a reasonable avenue for investigation into the genetic contribution to gender dysphoria in humans.

Data availability
All data generated or analyzed during this study are included in this published article (and its Supplementary Information File).