Introduction

Oxytocin is a nine-amino-acid peptide that acts as a hormone in peripheral tissues and as a neuromodulator in the brain. This short peptide was first noted and studied for its effects in promoting uterine contractions during childbirth and stimulating milk ejection, and is currently widely used for labor induction. Beyond these well-documented functions, recent studies demonstrate that oxytocin also has a critical role in regulating a wide range of social behaviors including pair bonding, maternal parenting and formation of social memory,1 whereas the molecular mechanisms remain largely unknown. Oxytocin is synthesized in the hypothalamus, an almond-sized region deep inside the brain, and is released into the blood or diffused to other brain regions through exocytosis. Oxytocin exerts its effect by activating its specific receptor, the oxytocin receptor (OXTR)—a G-protein-coupled receptor—which relays the messages to downstream effectors.2

Given the importance of oxytocin in modulating social behaviors, it was proposed that dysfunction of oxytocin signaling pathway underlies autism spectrum disorder (ASD)—a severe early-onset neurodevelopmental disorder characterized by social impairments and communication difficulties.3,4 This hypothesis has been supported by an accumulating body of evidence from animal and human studies.

In animal studies, either oxytocin or OXTR knockout mice display impairments in social interaction and preference for social novelty, as well as elevated aggressive behaviors.57 Intriguingly, the abnormal behaviors can be reduced by oxytocin or oxytocin agonist administration, indicating that oxytocin might be a potential valid pharmacological therapy for social deficits in ASD patients.8

In human genetic studies, significant associations between common variants in OXTR and ASD have been observed in multiple populations.912 We previously reported positive associations of two single-nucleotide polymorphisms (SNPs) within the third intron of OXTR, rs53576 and rs2254298, with ASD in the Japanese population.11 A recent meta-analysis based on 3,941 ASD individuals from 11 independent studies further supported the association between OXTR and ASD.13 In addition, OXTR has also been implicated in a broad range of conditions including affective temperaments, social recognition skills and psychological resources such as optimism, mastery and self-esteem.1416

Although common variations of OXTR have been studied extensively, little attention has been paid to rare variations. It is yet unknown whether there is any rare variant specific to ASDs, and if there is, how frequently it occurs and where it is located. We suggest that the third intron of OXTR should be prioritized for extensive screening. This region has been strongly implicated by association studies and is known to contain regulatory elements.17 Currently, two resequencing studies have been reported;18,19 however, only the coding regions were examined. The OXTR gene spans 19.2 kb, and would be expensive and laborious to sequence the full length of this gene considering the large number of samples needed for traditional Sanger sequencing.

The advent of the high-throughput next-generation sequencing (NGS) technology enables a wide range of applications for genetics and other biomedical sciences. In this study, we sought to use the Illumina MiSeq sequencer—a so-called ‘benchtop sequencer’—to resequence the entire OXTR gene. To further reduce the running cost, we performed the sequencing on pooled samples rather than on individual level. We also used long-range PCR (LR-PCR) to amplify the whole OXTR gene in a single reaction to simplify the workflow. Our strategy can sensitively and reliably detect rare variations in a hundred samples in a cost-effective way. We have uncovered dozens of variants in the ASD patients, which have not been reported before. Taken together with the results from the burden analysis, we suggest that rare variations might be another source adding to the risk of ASD.

Materials and Methods

Genomic DNA

DNA samples of 105 ASD patients were used for variation screening. All subjects met the DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, 4th Edition) diagnostic criteria for ASD through interviews and reviews of clinical records. The study was reviewed and approved by the Ethics Committee of the Faculty of Medicine, the University of Tokyo (approval no. 605).

Long-Range PCR

We first performed LR-PCR to amplify the OXTR gene. The primers are 5′-AGCCTCAGAGTTTCCACGTTCACT-6′ (forward) and 5′-GGCGCAGACAAGCAGAATCACTTT-6′ (reverse). The amplicon is 21 132 bp in length (chr3: 8 790 600–8 811 731, hg19) and includes the full length of OXTR. The PCR reaction mixture (50 μl) contains 100 ng of genomic DNA, 25 μl of KOD FX Neo 2× buffer, 1 μl of KOD FX Neo (Toyobo, Osaka, Japan) and 0.2 μmol of each primer. Thermal cycling conditions begin with 2 min at 94 °C, followed by 30 cycles of 15 s at 98 °C and 11 min at 68 °C. The LR-PCR products of all samples were further examined by electrophoresis in 0.5% agarose gels.

DNA pooling

We measured the concentration of LR-PCR products with the Qubit High-Sensitive Assay Kit (Qubit fluorometer; Invitrogen, Carlsbad, CA, USA) using a 5 μl input. To achieve an equal representation, 200 ng LR-PCR products from each sample were added into the pool. The pooled DNA was further purified using the Ampure Magnet Beads according to the manufacturer’s protocol (Beckman Coulter, Brea, CA, USA).

MiSeq sequencing

We used the Nextera DNA Sample Preparation Kit (Illumina, San Diego, CA, USA) to construct libraries for pooled LR-PCR products. Four different index 1 primers (N702, N703, N704 and N705) were coupled with an index 2 primer N504. Briefly, the input DNA (50 ng for each pool) was simultaneously fragmented and tagged with the adapter sequences by transposome in a single reaction step. This was followed by another PCR to add index 1 and index 2 to the 5′ and 3′ end of the fragments. Finally, the fragments were purified and sequenced. The paired-end sequencing was conducted on Illumina MiSeq sequencer for 500 cycles (2×250) according to the manufacturer's protocol.

Data analysis

Raw FASTQ files from both sequencers were retrieved and aligned to the hg19 reference using Burrows–Wheeler aligner aligner tool,20 and subsequently converted to sort BAM files with SAMtools.21 SNVer, a software specifically designed to detect variants in pooled NGS data, was used for variant calling using default parameters.22 All variants were mapped to GRCh37 (hg19). Taking advantage of paired-end reads, variants called from only one strand were removed as false-positive findings. wAnnovar was used for variant annotation (http://wannovar2.usc.edu/).23 Variants not registered in either dbSNP 138 database, 1000 genome database or NHLBI Exome Sequencing Project (http://evs.gs.washington.edu/) are regarded as novel variants. The burden analysis was performed with RAREMETAL (http://genome.sph.umich.edu/wiki/RAREMETAL).24 To search for putative splicing variant, we examined each variant to see whether it is located within or near-splice junctions. Furthermore, we evaluate the potential functional impacts of the nonsynonymous variants using two in silico predication tools sorting intolerant from tolerant (SIFT) and polymorphism phenotyping v2 (PolyPhen-2).25,26

Sanger sequencing validation

We validated the variants by Sanger sequencing on an ABI 3130Xl Sequencer with BigDye Terminator Reaction Kit ver 3.1 (Applied Biosystems, Foster City, CA, USA). Rather than using the LR-PCR products as template, the regions inspected were PCR amplified using the original DNA with either FastStart DNA Taq polymerase (Roche) or La Taq polymerase (Takara, Japan). The PCR primers and sequencing primers are provided in the Supplementary Information 1. In addition, we also performed Sanger sequencing on 312 healthy individuals for the inspected regions.

Results

Successful LR-PCR was confirmed for 95 samples by electrophoresis. The PCR products of these samples were divided into four pools for library construction and sequencing. The number of samples is 25, 26, 28 and 16 for pools 1–4, respectively. A total of 18.23 million paired-end reads (3.01 Gb data) were generated by the MiSeq sequencer. After alignment, the mean depth of coverage was calculated to be 752× on the individual level. The coverage and depth are illustrated in Figure 1. The reads cover the whole OXTR region uniformly even for the high GC regions such as the third exon. After variant calling and annotation, 127 variants were identified, among which 28 are novel ones. The novel variants are shown in Table 1. A complete list of all variants detected in this study is provided in the Supplementary Information 2.

Figure 1
figure 1

The sequencing depth, coverage and variants detected in this study. The top track shows the chromosomal location of OXTR, which is represented as a red line. The second track shows the gene structure of OXTR, where the boxes represent exons and the lines represent introns. The narrow part of the box is the UTR region and the full width of the box is the coding region. The arrow indicates the direction of transcription. The third track shows the depth and coverage of reads mapped to the target region. As seen in the GC track, the region from exons 1 to 3 has high GC contents. The DHS track is retrieved from the Encode project, and 6 high confidence DHS regions with signal scores above 600 (maximum score 1000) are shown as orange boxes. The last two tracks in the bottom illustrate all variations and novel variants detected in this study. If the mutations are too close to others, they were shown in new lines. The ID number of each novel variant corresponds to the variant number displayed in Table 1. The validated variants were highlighted in red color.

Table 1 Novel variants identified in the current study

As the third intron of OXTR has been implicated in multiple studies, and rs2254298 in this region showed the most robust association signal, we attempted to validate five novel variants near this SNP with Sanger sequencing. Four variants were successfully confirmed, including chr3: 8798395G>A, chr3: 8800614T>C, chr3: 8801278G>A and chr3: 8802373G>A. One variant, chr3: 8798903T>A, is located in a homopolymeric region and could not be determined by Sanger sequencing. The Sanger sequencing results are shown in Figure 2a–d. Furthermore, MiSeq also identified a novel nonsynonymous variant in the third exon: p.R150S (c.C448A). As this variant was only shown in pool 2, we first performed Sanger sequencing on samples from this pool (n=26). This variant, together with two low-frequency variants in exon 3 (rs202023509 and rs202237352), was confirmed by Sanger sequencing, as shown in Figure 2e, f).

Figure 2
figure 2

The electropherogram traces of variants confirmed by Sanger sequencing. The traces include four novel variants located in the third intron of OXTR (ad) and three variants in the exon (eg). The ID numbers for novel variants or rs numbers for known SNPs, their locations, nucleotide changes and consequent amino-acid changes are described in the center of the subfigures.

To determine whether the above five novel variants and two low-frequency exonic variants were carried by healthy individuals or not, we sequenced an independent set of healthy control samples (n=312). Two variations, chr3: 8798395G>A and chr3: 8800614T>C, were also carried by healthy individuals (4 and 3 hetero carriers, respectively), the remaining five variants were not identified. For p.R150S, we further checked an in-house whole-exome sequencing data set, which consists of 418 control samples. The R150S variant was not found in this sample set. Based on the above data, we performed the burden analysis and found that the overall burden of rare variants is significantly higher in the ASD individuals compared with healthy subjects (P=0.002). No variant was found within the 10 bp flanking sites of the exon–intron junctions. For in silico analysis of nonsynonymous variants, R150S and rs35062132(R376G) were predicted to be damaging by SIFT with a score of 0.002 and 0.021, respectively, whereas the other two missense variants were predicted to be tolerated. The R150S variant was also suggested to be potentially damaging by PolyPhen-2. Further evolutionary analysis indicated that the R150 residue is highly conserved in vertebrates, as shown in the Supplementary Information 3.

Discussion

In this study, we performed variation screening on both coding and non-coding regions of OXTR for one hundred ASD patients. Rather than relying on traditional Sanger sequencing, we developed a strategy that took advantage of the NGS technology, LR-PCR and DNA pooling. Although NGS is now routinely used for whole-exome and -genome sequencing (usually several samples per run), its capacity to resequence for large sample set has been less exploited. In many NGS resequencing studies reported, multiplex PCR is used to amplify the target regions, which uses a pool of custom-synthesized primer sets.27,28 This strategy is suitable for the sequencing of multiple non-continuous exons, but is less convenient to process long strands of DNA. Here, we showed that LR-PCR can be used to amplify a region as long as 21 kb. With DNA pooling followed by NGS, our method can sensitively detect variations with a minor allele frequency as low as 0.5% in a cost-efficient way.

Our primary interests are to identify potential novel variants within the third intron of OXTR, a region that has been strongly implicated in ASD and other personality traits. Our previous association study and haploblock analysis highlighted a 4.6 kb region (chr3: 8798181–8802851, hg19). We suspected that the potential causal or susceptibility variants might be located within this range.11 As rs2254298 shows the strongest signal and is the most well- replicated SNP in this region, we first focused on novel findings near this SNP and four novel variants were confirmed by Sanger sequencing. To further examine whether these variations are potentially functional, we used the DNase I hypersensitive sites (DHSs) data from the ENCODE project.29 DHSs are stretches of DNA that are accessible to transcription factors and other regulatory proteins and can be used as location indicators for putative regulatory elements. Among four novel variants, we found that chr3: 8801278G>A is located in one DHS (chr3: 8801161–8801495). Interestingly, this DHS is the region that has been previously shown to contain genomic elements that suppress the expression of OXTR in a functional study.17 Given the above evidence, we speculate that this variant may lead to functional changes. If we include other known variants, several rare variants including rs151308446 and rs74370440 are located in such DHSs. These variants might be able to affect the binding affinity of the interacting transcription factors or repressors.

In addition to the intronic variation, we identified and confirmed a rare missense variant R150S in ASD patients. The 150R residue, together with six other residues (57N, 85D, 136D, 137R, 325N, 329Y), form a polar pocket structure—which is indispensable for the activation of OXTR.30 The central four residues (aspartic acid, arginine) are polar and charged. The R150S variant, which causes a change from arginine to uncharged serine, is likely to abolish the activation capacity. Also, based on mutagenesis experiments on other polar pocket residues,31,32 the R150S variation is likely to be a loss-of-function variant. OXTR has been shown to be a haploinsufficient gene in a recent animal study.5 Heterozygous knockout mice, which have reduced mRNA expression to 50%, display abnormal behaviors including impaired sociability and preference for social novelty, but show no deficits in cognitive flexibility and aggression. This may suggest that the R150S variant, if indeed a loss-of-function variant, could contribute to the autistic symptoms. Intriguingly, the exact same variant was found in an independent cohort of 212 ASD patients from the Japanese population in another study.18 By combining with our data, this variation was found in 2 out of 318 ASD patients, but only 1 out of 1397 healthy individuals, indicating that this variant might be enriched in ASD individuals. In addition, our burden analysis showed that ASD individuals carried an overall higher burden of rare variants compared with healthy individuals, which is particularly interesting and supports that the rare variants might be another important component to the pathogenesis of ASD. Given the limited sample size in this study, further studies with a larger sample will be required to more definitively test the association between rare variants and ASD.

In summary, we observed 28 novel variations in ASD subjects and also provided a comprehensive distribution map of both common and rare variants of OXTR in the ASD patients. We also demonstrated that our NGS-based strategy is highly sensitive and reliable for variation detection, and could be applied to screening for other genes. Our burden analysis suggested that the overall burden of rare variants is significantly higher in ASD individuals compared with that in healthy subjects and that future studies with larger samples are warranted. In addition, functional studies are needed to examine the effects of these rare variants. With the newly available genome-editing tools such as TALEN and CRISPR/Cas9,33 it will be interesting to knock-in these variants in cell lines and to check the consequent endogenous changes.