Introduction

Schizophrenia (SZ) is a devastating chronic psychotic illness affecting 0.5–1.0% of the population worldwide. The inherited component to risk for SZ is estimated at 64% (Lichtenstein et al, 2009). Genome-wide association studies (GWAS) with sample sizes exceeding 50 000 persons reported more than 100 genetic loci containing common alleles conveying only minor influences (odds ratios <1.2) on risk (Ripke et al, 2014). Rare copy number variations increase disease risk substantially, but only in a small fraction of patients diagnosed with SZ (PDS) (Lee et al, 2012 for review).

Somatic brain DNA variations have been found to contribute to the incidence of early-onset Alzheimer’s disease (Beck et al, 2004), brain malformations (Evrony et al, 2012), and Sturge–Weber syndrome (Shirley et al, 2013; Nakashima et al, 2014). Thus, one novel approach to identifying SZ risk alleles is the study of somatic mutations within brain DNA sequence.

One source of somatic brain DNA variation is long interspersed element-1 (L1) retrotransposons (Muotri et al, 2005, 2010; Coufal et al, 2009). Full-length L1s are ~6 kb mobile DNA elements with a promoter, 5′-UTR, 3′-UTR, and two open reading frames encoding an RNA-binding protein and an endonuclease/reverse transcriptase fusion protein (Mathias et al, 1991; Feng et al, 1996). L1s multiply and accumulate in the host genome by a ‘copy-and-paste’ mechanism; they are estimated to comprise ~17% of the human genome (Lander et al, 2001; Beck et al, 2010). Most L1s have lost their ‘genomic mobility’ due to truncations or mutations (Ostertag and Kazazian, 2001), but ~100 full-length ‘competent’ L1s, mostly from the human-specific L1 (L1Hs) Ta subfamily, are capable of retrotransposition in the human genome (Beck et al, 2010; Brouha et al, 2003).

Kazazian et al (1988) first demonstrated that a germ line L1 retrotransposition event caused human disease. Since then, >125 cases of germ line L1-mediated gene disruption have been reported to cause Mendelian diseases (Goodier, 2014; Hancks and Kazazian, 2016). Complex diseases such as neurofibromatosis type I (Vogt et al, 2007), chronic granulomatous disease (Roos et al, 2010), primary immunodeficiency (de Boer et al, 2014), colon cancer, and choroideremia (Hancks and Kazazian, 2016) have been caused by somatic L1 retrotransposition into various, different genetic loci. These somatic L1-mediated diseases provide a rationale for the hypothesis that brain somatic L1 retrotranspositions may contribute to psychiatric disease risk.

Bundo et al (2014) studied L1 burden in gDNA from dorsolateral prefrontal cortex (DLPFC) and liver of three PDS and three control individuals by whole-genome sequencing. Gene Ontology (GO) analysis yielded significant over-representation of SZ brain-specific L1s in genes for the GO terms ‘synapse’ (Bonferroni corrected p=3 × 10−9), ‘synapse part’ (p=4.4 × 10−5), and ‘plasma membrane part’ (p=1.5 × 10−5). Many of the genes harboring L1s have been implicated previously in GWAS of PDS (Bundo et al, 2014; Ripke et al, 2014).

We studied postmortem DLPFC from PDS and normal controls. DLPFC tissue was studied because PDS have working memory deficits and semantic and verbal memory loss, dysfunctions involving DLPFC (Gur et al, 2001; Carter et al, 2009). Brain imaging has revealed abnormal DLPFC activity among PDS (Weinberger et al, 1986; Manoach et al, 2000; Carter et al, 1998; Curtis et al, 1999). This study, done using different methodology, serves as a confirmation and extension of the results published by Bundo et al (2014).

Materials and methods

Study Subjects

All postmortem brain samples were obtained from the University of Pennsylvania BioBank brain collection. Supplementary File S1 contains data on brain tissue characteristics, including postmortem interval (PMI), weight, RNA integrity number (RIN), and pH, as well as complete subject demographics, including age at onset and chlorpromazine equivalents for the 36 PDS (mean age=79±10; 64% female; 34 European-American and 2 African-American) and 26 controls (mean age=67±14; 54% female; 18 European-American, 7 African-American, and 1 multiracial) studied. Before death, the diagnosis of SZ was established for all patients after examination by a psychiatrist. All subjects had at least one full antemortem clinical assessment, as described (Arnold et al, 1995). Assessments are DSM-IV diagnosis, Brief Psychiatric Rating Scale, history of medications, Scale for Assessment of Negative Symptoms, and Scale for Assessment of Positive Symptoms. Control individuals had no history of psychiatric illness and had a non-neurological cause of death.

Blood DNA Samples from PDS and Controls

De-identified gDNA from EBV-transformed lymphoblastoid cell lines of European-American subjects who met DSM-IV criteria for SZ (n=1316) and controls (n=2303) were acquired from the Rutgers University Cell and DNA Repository (Infinite Biologics, Piscataway, NJ, USA). gDNA was diluted to 20 ng/μl in sterile water before use in genotyping experiments.

Neuronal Nuclei Isolation

L1s mobilize more frequently during differentiation of neurons than during differentiation of non-neuronal cell types (Coufal et al, 2009), and L1s can mobilize in post-mitotic neurons (Macia et al, 2017). Thus, we chose to enrich for and analyze neuronal cell genomes for L1 retrotransposition. We isolated neuronal and non-neuronal nuclei from frozen postmortem DLPFC tissue (Jiang et al, 2008). A unit of 10–15 mg (wet weight) of individual DLPFC samples were used to create seven pools for analysis: four pools were derived from 8–10 individual DLPFC samples from PDS and three pools were derived from DLPFC samples from 8–10 individual controls (Supplementary File S1). Each pool of DLPFC samples was thawed and lysed simultaneously by homogenization in ice-cold 0.32 M sucrose solution containing 0.5% Triton X-100, lysates layered onto a 1.8 M sucrose cushion, and nuclei pelleted by ultracentrifugation (~107 000 g, 2.5 h, 4 °C). Pelleted nuclei were suspended in 1 × phosphate-buffered saline containing 3 mM MgCl2, labeled with an AlexaFluor-488-conjugated anti-NeuN antibody (Millipore, Temecula, CA, USA), counterstained with diamidino-2-phenylindole (Thermo-Fisher Scientific, Waltham, MA, USA) and strained through a 30 μm filter-cap tube (Becton-Dickinson, Franklin Lakes, NJ, USA). Labeled nuclei were sorted into NeuN-positive (neuronal) and NeuN-negative (non-neuronal) populations on an AriaII fluorescence-activated cell sorter (Beckman-Coulter, Brea, CA, USA). Sorted nuclei were pelleted by centrifugation (2000 g, 30 min, 4 °C), then lysed in 1 × proteinase K digestion buffer, incubated with 0.857 μg/μl proteinase K at 56 °C for 16 h, and gDNA purified by chloroform extraction and ethanol precipitation in the presence of 20 μg molecular biology grade glycogen (Roche, Indianapolis, IN, USA). All DNA samples were quantified by fluorimetry using the Quant-it high-sensitivity double-stranded DNA kit (Thermo-Fisher Scientific). Although not all gDNA samples were assessed for intact high-molecular-weight DNA by gel electrophoresis, post-FACS-sorted gDNA samples prepared in our lab by these methods showed the purified gDNA migrated above the 23 kbp fragment of the HindIII-digested lambda DNA size marker without evidence of excessive degradation.

L1 Amplification and Sequencing

Purified gDNA from NeuN-positive nuclei was subjected to L1-seq as described (Ewing and Kazazian, 2010), except primary PCR was done using 25 ng of gDNA template for each of the eight hemi-specific degenerate primer reactions. DNA amplicons of each degenerate primer reaction were size selected by gel electrophoresis and purified. DNA concentration of the L1 amplicons from each reaction was assessed by Agilent high-sensitivity double-stranded DNA chip on an Agilent BioAnalyzer 2100. L1 amplicons were then mixed in equimolar quantities to produce the amplicon libraries for sequencing. Final library concentrations were assessed by KAPA Biosystems Illumina qPCR before clustering on the Illumina HiSeq2000. Amplicon libraries were sequenced using 100 nucleotide single-end reads in one lane (per pool library) of an Illumina HiSeq2000.

L1-seq Bioinformatics

L1-seq bioinformatics analysis was done using the most current version (available at https://github.com/adamewing/l1seq) of the L1-seq software (Ewing and Kazazian, 2010). Briefly, sequencing reads were trimmed to remove 5′-primer sequence and 3′-low-quality bases, and aligned to human genome build hg19 using bowtie2-2.1.0 (Langmead and Salzberg, 2012). Samtools-0.1.19 (Li et al, 2009) converted aligned reads to .bam files. Peaks consisting of aligned reads within a common 500 bp window were identified and used to calculate putative L1 insertion points. Predicted L1 insertions were compared across sample populations and annotated as being reference (a genetically ‘fixed’ L1 in the human reference (hg19) genome), known non-reference (KNR; a previously described ‘polymorphic’ L1 in the human population, but not in the reference genome), or previously uncataloged (novel; a previously ‘unidentified’ L1 in the human population) L1s by comparison of L1 positions to a catalog of published L1 insertion data sets (Ewing and Kazazian, 2010; Sudmant et al, 2015). Identified novel and KNR L1 insertions were screened for high quality in Excel (Microsoft, Redmond, WA, USA). Quality metrics for an L1 insertion were that the reads aligned well to the reference genome (‘mean mapq’ 30, ‘mappability’>0.5), had >98% average match (‘mean matchpct’ >0.98), at least six total reads across pooled populations (‘total reads’ >5), and at least two unique reads in a ‘peak’ (‘unique alignments’ >1). All subsequent analyses were limited to intragenic L1 insertions, defined as a predicted L1 located within an intron, an exon, or within 500 bp of a transcription start site or 3′-UTR.

LINE-1 Validations

Supplementary File S2 contains information on L1 validations attempted (successful validations are highlighted in green). Owing to the paucity of neuronal NeuN-positive gDNA remaining after L1-seq library constructions, whole-genome amplification (WGA) by multiple displacement amplification was performed on 10 ng of each NeuN-positive and NeuN-negative gDNA sample using a Repli-G mini kit (Qiagen, Hilden, Germany) before PCR validation with GoTaq polymerase (Promega, Madison, WI, USA), 10 ng WGA DNA, and 0.2 μM primers in a 25 μl reaction. We used the Primer3 program (Rozen and Skaletsky, 2000) to design genome-specific ‘filled site’ (FS) and ‘empty site’ (ES) primers for confirmatory PCR (Ewing and Kazazian, 2010). Two initial primer pairings, FS with ES and FS with L1HsTAILSP1AP2 (L1HsT, 5′-GGGAGATATACCTAATGCTAGATGACAC-3′), were used to amplify the genomic region surrounding or including, respectively, a putative L1 insertion (Ewing and Kazazian, 2010). When necessary, we performed nested PCR using 2 μl of the initial PCR (FS-L1HsT primer pairing) as template and nested ‘FS’ and L1-specific (L1HsG, 5′-TGCACATGTACCCTAAAACTTAG-3′) primers. Gel-purified amplicons were cloned by ligation into pCRII-TOPO (Thermo-Fisher Scientific), ligation products transformed into bacteria, and purified plasmid DNA containing putative L1 insertions Sanger-sequenced using big-dye chemistry (Supplementary File S3). Resultant sequences were queried against hg19 using BLAT (Kent, 2002). To determine the occurrence of confirmed L1 inserts among individuals of a pool, PCR was performed on bulk gDNA purified from a second DLPFC sample isolated from each individual in the pool.

Taqman Genotyping of L1 in ERI3

We developed an allelic discrimination Taqman assay for detection of the ERI1 exoribonuclease family member 3 (ERI3) alleles with and without the validated L1. Briefly, two probes that overlapped the L1 insertion point were used: a FAM-labeled probe was designed to detect the L1-containing allele; and a VIC-labeled probe was designed to detect the normal, L1-lacking allele. Three primers were used: two genome-specific primers (forward and reverse) surrounding the ERI3-L1 insertion point enabled detection of the L1-lacking allele; and the third primer (L1HsG) allowed detection of the L1-containing allele. Cycling parameters were denaturation at 95 °C for 10 min followed by 40 cycles of 95 °C, 15 s; 56 °C, 5 s; and 60 °C, 1 min. Reaction volumes were 5 μl containing 40 ng gDNA as template. Amplified PCR reactions were post-read on an ABI 7900HT with two-cluster calling enabled.

Droplet Digital PCR: Cellular Allele Frequency Determinations

L1Hs- and gene-specific primers designed to amplify 200–250 bp products were used for droplet digital PCR (ddPCR). The L1Hs-specific 5′-FAM-labeled probe was designed as described (White et al, 2014) except it was double-quenched with 3′-Iowa Black and internal ZEN quenchers (Integrated DNA Technologies). Approximately 100 ng of gDNA was XmnI-digested in ddPCR master mix at 37 °C for 1 h before droplet formation using a QX100 droplet generator (Bio-Rad, Hercules, CA, USA). Droplets were cycled at 95 °C for 10 min, 40 cycles of 94 °C, 30 s; 60 °C, 1 min; then 98 °C, 10 min; and 12 °C soak. Amplified droplets were read using a QX200 droplet reader (Bio-Rad) under absolute quantification for FAM (L1 allele probe, unknown) or VIC (RPPH1 allele probe, reference, Thermo-Fisher Scientific; Catalog No. 4403328) fluorescence. QuantaSoft v1.7.4 (Bio-Rad) was used for data analysis and graph generation.

Database for Annotation, Visualization, and Integrated Discovery Analyses

L1-seq bioinformatics analysis generated lists of genomic L1 positions within the PDS and control populations. The Database for Annotation, Visualization, and Integrated Discovery (DAVID; version 6.7 or 6.8; Huang et al, 2009) algorithm was used to assess lists of genes harboring either novel or KNR L1 insertions for enrichment of GOs and for Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.

Analyses were done with DAVID 6.7 because this is the version that was used in Bundo et al (2014), thus allowing direct comparison of our results with theirs. DAVID 6.8 analyses were also done (Supplementary Table S1 in Supplementary Information) because this version contains the most current Knowledgebase of GOs and pathways. In each case, default settings were used. Supplementary File S4 contains information on the analyzed gene lists corresponding to intragenic L1s found in the control or PDS populations (Supplementary File S5).

Statistics

DAVID uses a hyper-geometric distribution to calculate fold enrichment and p-values (Huang et al, 2009; see Methods in Supplementary Information). ‘Raw’ p-values calculated by DAVID were adjusted by Bonferroni correction for multiple testing. Goodness-of-fit (Kolmogorov–Smirnov) test for normal distribution of data was done online at http://contchart.com/goodness-of-fit.aspx. To assess the length of genes harboring novel L1 insertions, an unpaired t-test assuming equal variance was done on the normally distributed log10-transformed data using GraphPad Prism version 4.00 for Windows (GraphPad Software, San Diego, CA, USA). All other statistical analyses comparing the number of L1 insertions within and among populations were done by χ2-tests online at https://www.medcalc.org/calc/comparison_of_proportions.php or at http://www.quantpsy.org. All reported p-values are two-tailed.

Results

The numbers of total and intragenic KNR and novel L1 insertions, meeting our criteria, are given in Table 1 (also Supplementary Figure S1 in Supplementary Information). Given previous estimates using a linear regression model for the number of non-reference L1Hs elements segregating in populations of various sizes (Ewing and Kazazian, 2010), we expected 863 and 901 total non-reference L1 insertions in our control (26 individuals) and PDS (36 individuals) populations, respectively. We observed 903 and 937 total L1 insertions in our control and PDS populations, respectively; these values were not significantly different from expectations based on counts (χ2=0.008, p=0.929). Comparisons of the numbers of intragenic to intergenic L1 insertions among populations indicated a statistically significant increase in the proportion of intragenic novel L1 insertions in DLPFC of PDS; the PDS population had ~15% more population-specific intragenic novel L1s than controls (Table 1). In contrast, the proportion of intragenic KNR L1 insertions was not statistically different (Table 1). Thus, only the intragenic novel L1 insertions were increased in the SZ cases studied here. There was no difference between the control and PDS group with regard to the lengths of genes in which novel L1 insertions were detected (unpaired t-test; p=0.851).

Table 1 Statistical Analyses of L1 Proportions (Intragenic vs Intergenic) Within Each Population

Our PCR confirmation rate for intragenic novel L1 insertions identified in the PDS cohort was ~50% (Supplementary File S2). Some validations were attempted on L1s that did not meet our final quality criteria. Removal of these L1s from the validation list (four unsuccessful) increased our effective confirmation rate to ~64%. L1 insertions in ERI3, GRID2, KHDRBS2 (Figure 1a–c), NRG3, HTR2C, RYR2, SYNE1, SYN3, and ABCF1 (data not shown; Supplementary File S3) were confirmed by conventional PCR. ddPCR data (Figure 2) suggest that the GRID2, KHDRBS2, and SYNE1 L1s are germ line or occurred de novo very early during development in the affected individuals because they had L1 : RPPH1 ratios of ~50%. The GRID2 L1 insertion may be a personal de novo mutation because it was detected only in the DLPFC gDNA of 1 of 33 PDS, and it was not observed in lymphoblastoid gDNA from 995 SZ cases by Taqman assay (data not shown). The ERI3 L1 insertion is germ line and was significantly more common (χ2=4.323, p=0.0376) in lymphoblastoid gDNA from SZ cases of European origin (minor allele frequency=1.9%, n=1316) compared to controls (minor allele frequency=1.3%, n=2303).

Figure 1
figure 1

PCR confirmation of novel L1 RTP insertions in various genes. Gel images showing the ‘filled site’ (‘F’; L1HsT/FS or L1HsG/FS primer pair) and ‘empty site’ (‘E’; ES/FS primer pair) PCR amplicons. The L1 insertions were detected in both the NeuN-negative (−) and NeuN-positive (+) gDNA reactions. (a) ERI1 exoribonuclease family member 3 (ERI3). (b) Glutamate receptor, ionotropic, delta 2 (GRID2). (c) KH RNA-binding domain-containing, signal transduction-associated 2 (KHDRBS2). All L1s were confirmed in the pools in which each predicted L1 insert was initially detected. Lane ‘M’ is the NEB 100 bp marker. Lanes A, C, and E are normal control populations; lanes B, D, F, and H are SZ populations. Minus or plus signs indicate amplicons of gDNA from NeuN-negative (not sequenced, non-neuronal) or NeuN-positive (sequenced, neuronal) nuclei, respectively.

PowerPoint slide

Figure 2
figure 2

ddPCR allele frequency of L1s in various genes. Graphs show the absolute copy numbers of L1-containing genes (blue squares) and RPPH1 (green squares) genes, as well as the ratios (Gene-L1 : RPPH1, orange diamonds), detected in 100 ng of XmnI-digested gDNA from DLPFC various SZ samples. (a) The KHDRBS2-L1 was detected in SZ patient 063, but not in SZ patients 062 or 068. (b) The GRID2-L1 was detected in SZ patient 033, but not in SZ patients 035 or 040. (c) The SYNE1-L1 was detected in SZ patient 103, but not in SZ patients 052 or 070. In all cases, the allele frequency of the L1s was ~50%, the theoretical frequency for an L1 that is in heterozygosity in all genomes analyzed. NTC is the no template control.

PowerPoint slide

Bundo et al (2014) reported 751 and 385 genes with brain-specific L1 insertions in their SZ and control populations, respectively. In comparing our PDS vs control cohorts, we identified 230 vs 249, and 194 vs 143 non-redundant genes harboring KNR and novel L1s, respectively (Supplementary File S4). A total of 36 vs 29 genes containing KNR L1s in our SZ and control cohorts, respectively, were among those genes identified with brain-specific L1 insertions in the PDS and the control populations of Bundo et al (2014; Table 2A and C). In all, 27 vs 7 genes containing novel L1 insertions in our PDS and control cohorts, respectively, were among those genes identified with brain-specific L1s in the SZ and control populations of Bundo et al (2014) (Table 2B and D). At least 15 of 36 genes containing KNR L1 insertions in the PDS cohorts of this study and Bundo et al (2014) have a genetic association with SZ, including disruption by CNVs (Lee et al, 2012; Table 2A). Similarly, at least 8 of the 27 genes containing novel L1 insertions identified in both PDS cohorts have an association with SZ (Table 2B). In contrast, at least 10 of the 29 and 1 of 7 genes containing KNR and novel L1 insertions, respectively, in the normal cohorts have an association with SZ (Table 2C and D). Notably, of 210 novel, intragenic L1 insertions identified in our analysis, only one (in an intron of NPAS1) was potentially identical by position (chr19:47540689 vs chr19:47542090) to a brain-specific L1Hs identified by Bundo et al (2014).

Table 2 Genes Harboring L1s in Both Bundo et al (2014) and in the Present Study

Several genes identified as having L1 insertions in the PDS cohorts of Bundo et al (2014) and the present study were common among the 350 genes within the 108 loci identified by the SZ GWAS (Ripke et al, 2014; Supplementary Figure S2 in Supplementary Information). The number of genes found in common with the SZ GWAS was not statistically different between our study and Bundo et al (2014) for the PDS (χ2=0.126, p=0.723 for KNR; χ2=0.149, p=0.699 for novel) or control (χ2=0.003, p=0.956 for KNR; χ2=0.092, p=0.762 for novel) populations. Of 350 SZ candidate genes identified by GWAS (Ripke et al, 2014), only CSMD1 was found in common among the L1-containing genes identified in the PDS populations of Bundo et al (2014) and the present study.

Significant GO terms were determined by DAVID 6.7 analyses (Table 3). For novel intragenic L1s, the GO term ‘cell projection’ was only statistically significant in the PDS group, regardless of inclusion of genes in common between the two populations (Table 3A and B). Consistent with Bundo et al (2014), KNR intragenic L1 insertions found in our study showed statistical significance for the GO term ‘plasma membrane part’ only in the PDS group (Table 3C). In contrast to Bundo et al (2014), the GO terms ‘synapse’ and ‘cell junction’ were found in both the normal and PDS populations (Table 3C). There were no significant GO terms identified when genes containing KNR L1 insertions overlapping the two populations were excluded from the analysis (Table 3D). However, when population-specific lists of KNR and novel L1-containing genes were combined, ‘regulation of apoptosis’ was statistically significant only in the control population, whereas ‘synapse’ was significant only in the PDS population (Table 3E). DAVID 6.7 failed to identify significant KEGG pathways for either normal or PDS groups. When gene lists were analyzed with DAVID 6.8, ‘postsynaptic membrane’ was only statistically significant in the PDS population (Supplementary Table S1 in Supplementary Information).

Table 3 DAVID 6.7 Analysis

Discussion

Our results further document a statistically significant difference in DLPFC neuron L1 distribution between PDS and controls. Similar to Bundo et al (2014), the present study revealed an increased L1 burden in DLPFC from PDS (Table 1; Supplementary Figure S1 in Supplementary Information). However, only differences in the relative numbers of intragenic to intergenic novel, but not KNR, L1 insertions were significantly greater between PDS and controls (Table 1). These findings suggest that mutagenesis by L1 retrotransposition during embryogenesis, or specifically during neurogenesis, may contribute to the etiology of SZ. Alternatively, the novel intragenic L1s may be rare hereditary insertions that disrupt gene function, leading to neurodevelopmental changes that increase risk for SZ.

One such example may be the L1 confirmed in ERI3 (Figure 1a). ERI3 interacts with the cellular prion protein during signal transduction (Spielhaupter and Schätzl, 2001), and other ERI family members are involved in the endogenous RNAi pathways of gene regulation (Thivierge et al, 2011; Duchaine et al, 2006). The pathophysiologic role of ERI3 in SZ is presently unclear; however, the fact that the relatively rare ERI3-L1 allele is polymorphic in the germ line, and was verified in multiple SZ individuals by PCR (Figure 1a), and that Taqman genotyping assays (data not shown) showed it to be associated with SZ (χ2=4.323, p=0.0376) suggest ERI3 disruption may increase risk for SZ.

Novel L1 insertions in GRID2 and KHDRBS2 were also confirmed (Figure 1b and c). GRID2 expression is associated with pre-pulse inhibition of the acoustic startle response, an endophenotype for SZ (Swerdlow et al, 2013). A novel L1 in the serotonin 2C receptor gene HTR2C was also confirmed (data not shown; Supplementary File S3). Serotonin 2C receptors oppose the inhibitory effect of serotonin 1A receptors on NMDA signaling in pyramidal neurons of prefrontal cortex (Yuen et al, 2008). Potential L1-mediated disruption of GRID2 and HTR2C expression in DLPFC may interfere with glutamate regulation, and, consistent with the glutamatergic dysregulation hypothesis of SZ, increase risk for SZ.

ddPCR data (Figure 2) suggest the KHDRBS2, GRID2, and SYNE1 L1 insertions are possibly polymorphic in the germ line. This is likely true of the KHDRBS2 L1 because, unlike the GRID2 L1, it was found within the 1000 Genomes Project populations (Sudmant et al, 2015). KHDRBS2 regulates alternative splicing of neurexins (Iijima et al, 2014) that are themselves implicated in SZ pathophysiology (Brown et al, 2011). Although we cannot exclude the possibility that the GRID2 L1 is an extremely rare polymorphic L1 in the human population, allelic discrimination experiments (data not shown) suggest it could be a private, early developmental, somatic mutation in the PDS in whom it was identified. Inheritance of either putatively L1-disrupted allele could impact normal brain development through improper neurexin splicing (KHDRBS2-L1) or altered glutamatergic physiology (GRID2-L1). The confirmed L1 insertion in SYNE1 (Figure 2c) is likely polymorphic in the germ line because it was a KNR L1 (Table 2A). SYNE1 is a disrupted in SZ 1 (DISC1)-interacting protein (Costas et al, 2013) that has been implicated in bipolar disorder and other neuropsychiatric disorders (Xu et al, 2014). Indeed, SYNE1 was among 15 genes that showed overlapping associations with at least 5 of 6 neuropsychiatric disorders, including SZ (Lotan et al, 2014).

Several differences exist between our DAVID 6.7 GO results (Table 3) and those of Bundo et al (2014). Whereas some statistically significant GO terms were SZ-specific in both studies, some of the SZ-specific GO terms from Bundo et al (2014) were enriched in our normal cohort, for example, ‘synapse’ and ‘cell junction’ (Table 3C). The ‘synapse’ GO term was statistically significant in the controls of Bundo et al (2014) under their ‘less stringent’ criteria. In our analysis, SZ-specific GO terms included ‘cell projection’ (Table 3A and B), ‘plasma membrane part’, and ‘plasma membrane’ (Table 3C), as well as ‘synapse’ (Table 3E). Results of DAVID 6.8 analysis revealed ‘postsynaptic membrane’ as a SZ-specific GO term (Supplementary Table S1 in Supplementary Information). Given the different methods used (whole-genome vs L1 amplicon sequencing) and differences in sample populations, it is notable that our DAVID 6.7 and 6.8 GO term analyses are strikingly consistent with Bundo et al (2014). If it can be confirmed that the L1 insertions in genes within these ontologies disrupt gene function, it would bolster the theory that risk for SZ may partially reside in improper cell migration and junctions, leading to anatomically aberrant synaptic structures.

Several genes within the ‘synapse’ GO term have been implicated in SZ genetics and/or pathophysiology. Whereas most genes harboring L1 insertions were found in both our normal and PDS populations, 3 of 16 genes within the ‘synapse’ term (Table 3C) were specific to PDS: GABRA4; RIMS2; and SYNE1 (see above). Potential L1-mediated disruption of GABRA4 would be consistent with GABA receptor composition playing a role in SZ (Verdurand et al, 2013).

A recent meta-analysis of GWAS data sets yielded 108 genomic loci containing 350 genes (excluding the MHC locus genes) that were associated with SZ (Ripke et al, 2014). Notably, 11/350 SZ GWAS-implicated genes (ASPHD1, CNTN4, CSMD1, DPYD, GATAD2A, IMMP2L, KCTD13, KLC1, NEK1, PITPNM2, and TBC1D5) contained L1 insertions in our PDS population; 2/11 (ASPHD1 and KCTD13) were absent from our control population (Supplementary Figure S2 in Supplementary Information). Most were confirmed by detection in previous L1 studies (Supplementary File S5); however, L1 insertions within GATAD2A, KLC1, and PITPNM2 were novel, as was an L1 that ‘hit’ both ASPHD1 and KCTD13. Whether these L1 insertions disrupt gene expression or mark alleles with detrimental rare variants remains unknown.

Limitations to this study include the following: (1) limited power to detect SZ-associated L1 insertions due to the relatively small sample size; (2) detection of L1 insertions limited to DLPFC given that other brain regions may be involved in SZ; (3) the population allele frequencies of most detected L1 insertions remain unknown making it difficult to firmly establish associations of each with SZ; (4) the functional significance of the intragenic L1 insertions remains unknown making mechanistic predictions related to SZ phenotypes currently speculative; (5) we analyzed only intragenic L1 insertions, but intergenic L1s could disrupt nearby gene function; (6) although no increase in relative genomic L1 content was observed when various antipsychotic medications were assayed in vitro (Bundo et al, 2014), we cannot exclude the possibility that certain medications taken by PDS in our cohort caused L1 mobilization in the DLPFC; and (7) the L1 insertions identified and confirmed in our study were found in both the glial (NeuN−) and neuronal (NeuN+) fractions. Thus, we cannot exclude the potential contributions of L1-mediated gene disruptions in the glial cells to the pathophysiology of SZ.

In summary, this study presents substantial evidence that PDS harbor L1 insertions in genes implicated in the pathophysiology of SZ. The convergence of genes identified here with those of other studies of SZ (GWAS, CNV, and so on), and consistencies between our results and those of Bundo et al (2014) lend credence to these results. However, these data require extension to independent groups of postmortem DLPFC samples from PDS. Whether novel L1 insertions are the cause or the result of putative alterations in brain development in PDS remains undetermined. If L1-mediated gene disruption during neurodevelopment increases risk for SZ, then genes identified with such disruptions may present new opportunities for the development of novel therapeutic approaches to treat this devastating disease.

Funding and disclosure

This work was supported by NIH/NIMH grants 1R21MH101065 and 1R01MH109260-01A1 to WHB, NIH/NIMH p50 grant MH096891 to CGH, and NIH/NIDA grant K01-036751 to RCC. ADE was supported by an ARC Discovery Early Career Researcher Award DE150101117 and the Mater Foundation. The authors declare no conflict of interest.