Autism is a common neurodevelopmental disorder with a significant genetic component and locus heterogeneity. To date, 12 microsatellite genome screens have been performed using various data sets of sib-pair families (parents and affected children) resulting in numerous regions of potential linkage across the genome. However, no universal region or consistent candidate gene from these regions has emerged. The use of large, extended pedigrees is a recognized powerful approach to identify significant linkage results, as these families potentially contain more potential linkage information than sib-pair families. A genome-wide linkage analysis was performed on 26 extended autism families (65 affected, 184 total individuals). Each family had two to four affected individuals comprised of either avuncular or cousin pairs. For analysis, we used a high-density single-nucleotide polymorphism genotyping assay, the Affymetrix GeneChip Human Mapping 10K array. Two-point analysis gave peak heterogeneity limit of detection (HLOD) of 2.82 at rs2877739 on chromosome 14q. Suggestive linkage evidence (HLOD>2) from a two-point analysis was also found on chromosomes 1q, 2q, 5q, 6p,11q and 12q. Chromosome 12q was the only region showing significant linkage evidence by multipoint analysis with a peak HLOD=3.02 at rs1445442. In addition, this linkage evidence was enhanced significantly in the families with only male affected (multipoint HLOD=4.51), suggesting a significant gender-specific effect in the etiology of autism. Chromosome-wide haplotype analyses on chromosome 12 localized the potential autism gene to a 4 cM region shared among the affected individuals across linked families. This novel linkage peak on chromosome 12q further supports the hypothesis of substantial locus heterogeneity in autism.
Autism (OMIM 209850 (Online Mendelian Inheritance of Man, http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=209850)) is a neurodevelopmental disorder characterized by three areas of abnormality: impairment in social interaction, impairment in communication and restricted and repetitive patterns of interest or behavior. Developmental abnormalities are apparent in the first 3 years of life and the characteristic impairments persist into adulthood. With improved detection and recognition of autism, resulting from a broadening of the diagnostic concept and systematic population approaches, a recent prevalence study reports that autistic disorder affects as many as one in 300 children in a US metropolitan area.1
Autism is one of the most heritable complex genetic disorders in psychiatry. A strong genetic component in autism is indicated by an increased concordance rate in monozygotic (60% for narrow (diagnosed as autism), and 91% for broader phenotypes (diagnosed as autism spectrum disorder (ASD)) versus dizygotic twins (0% for narrow and 10% for broader phenotypes),2, 3 and a 75-fold greater risk to siblings of idiopathic cases, compared to the general population prevalence.4 Estimates of the number of genes involved in autism range from 2–10,5, 6 to 15 or more,7 to 100 loci.8
More than 12 genome-wide screens have been performed in autism.7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 Results from these screens indicate potential susceptibility genes spread across the entire genome. Although several promising regions have been indicated (e.g., on chromosomes 7, 15 and 17), no universally accepted gene has emerged. In addition, hundreds of association studies have been conducted on more than 130 autism candidate genes, based on location in a linkage peak and/or their potential biological function. A recent review of published autism studies reveals at least one study indicating a positive finding for virtually every chromosome.20
Collectively, these studies yield convincing evidence for multigenic inheritance with locus and allelic heterogeneity as well as epitasis playing a role in autism etiology. Heterogeneity can significantly reduce the power of replicating and detecting linkages,21 which could partly explain the unsuccessful search to date for autism genes. One approach to addressing genetic heterogeneity is to identify homogeneous subsets of the data using phenotypic parameters. The identification of phenotypic subtypes has been applied in autism studies and appears to have been successful in refining linkage results.22, 23, 24, 25
The use of highly selected extended families is another approach to improve the power of detecting linkage in the presence of genetic heterogeneity. Terwilliger and Goring26 point out that ascertaining extended families potentially increases the homogeneity of the data set. In addition, these families contain significantly more linkage information. All published genome screens with the exception of one15 restricted their analysis to an affected sib-pair design. Although this represents a powerful approach, there are also potential drawbacks in the use of cousin pairs and hence bi-linearity in linkage studies, particularly in psychiatric disorders, because of the possibility of assortative mating. Hodge et al.27 explored these difficulties using simulation studies and found that the loss in linkage information was small especially when parental phase was known.
The resulting 12 genome screens also used microsatellite markers, with an approximate density of one marker per 10 cM. The feasibility of using of high-density single-nucleotide polymorphisms (SNPs) for genome-wide linkage analysis and the substantially greater linkage information they provide in complex diseases has been affirmed.28, 29, 30, 31, 32 Results indicate that low genotyping error rates and map construction generate reliable and robust results. Moreover, the increased information content identified linkage signals that were not detected in low-resolution microsatellite scans.
In this study, a total of 26 extended families, including both avuncular and cousin pairs, were selected for genome-wide linkage analysis. We hypothesize that these extended families are a powerful data set for linkage study,26 and that use of a high-density SNP genotyping assay, the Affymetrix GeneChip Human Mapping 10K array, ensures that genome-wide, sufficiently informative markers will have the potential to identify significant linkage regions containing autism susceptibility loci.
Materials and methods
Families were ascertained at three sites: the Center for Human Genetics (CHG) at Duke University Medical Center, the WS Hall Psychiatric Institute and the Center for Human Genetics Research (CHGR) at Vanderbilt University. The Institutional Review Boards (IRBs) at the participating sites approved the study protocol. Families were ascertained using clinical referrals and through active recruitment in various lay organizations. After a full description of the study was given to the families, written informed consent was obtained from parents and from children who were able to give informed consent. For the current study, a total of 27 extended ASD families were available and, within these families, there were 68 individuals with autism. These highly selective pedigrees were chosen from a larger data set of 315 multiplex Caucasian families and are noteworthy in their extended family structure with both avuncular and cousin pairs occurring across multiple generations (Supplementary Figure 1).
All family history interviews are standardized to create a multigeneration pedigree of first, second and third-degree relatives. Identification of individuals with clear or even questionable affection status lead to further investigation in the form of a personal interview and an examination. This information was used to create large pedigrees with extensive information about medical, developmental, behavioral and related disorders. By this approach, we ascertained a substantial number of extended families in which ASDs were present. Four of the families contained parents who qualified as having an ASD, on the basis of a clinical diagnosis of Asperger disorder (ASP). Unaffected status in siblings was based on the absence of obvious indicators of an ASD, as determined by systematic questioning during the family history interview.
The collaborative autism team from the Duke CHG and the WS Hall Psychiatric Institute contributed 17 families, and nine families were ascertained by the Vanderbilt CHGR. Probands for the study consisted of individuals between the ages of 3 and 21 years who were clinically diagnosed with autism using Diagnostic and Statistical Manual (DSM)-IV criteria. A consistent set of diagnostic criteria was applied to all families. For this study, the presumptive clinical diagnosis of autism was confirmed based on clinical evaluation using DSM-IV diagnostic criteria, the Autism Diagnostic Interview, Revised (ADI-R) and medical records. The ADI-R33, 34 is a validated, semi-structured diagnostic interview, which yields a diagnostic algorithm based on the DSM-IV criteria for autism. All participants met current diagnostic criteria for autism and were included only if they had a minimal developmental level of 18 months on the Vineland Adaptive Behavior Scale Score,35 or an IQ equivalent greater than 35. These minimal developmental levels assure that ADI-R results are valid and reduce the likelihood of including individuals with only severe mental retardation. A best estimate clinical research diagnosis for autism was determined by the clinicians at each of the research sites utilizing all available case materials. Subjects were excluded if there was evidence of developmental disorders with known phenotypic overlap with autism (e.g., Prader–Willi syndrome, Angelman syndrome, tuberous sclerosis complex, Rett Syndrome and fragile X syndrome), neurological, severe sensory disorders or motor disorders.
A high-density SNP genotyping assay using the Affymetrix GeneChip Human Mapping 10K array was performed at the Translational Genomics Research Institute. A total of 11 665 SNPs were genotyped on 192 samples; 11 563 have their sequence localization, of which 15 markers had only one allele. The genotype efficiency for each SNP was calculated before error checking. A total of 9581 markers had efficiencies >0.9, of which 76 markers had a minor allele frequency <0.01. Thus, a total of 9505 markers across the genome were used in the analysis.
Mendelian pedigree inconsistencies were identified using PEDCHECK.36 All of the genotype inconsistencies identified were zeroed out from the database. Further, inter- and intra-familial genetic relationships were verified, using RELPAIR37, 38 at the beginning of the study, and by using 10% of all genotyped markers across the genome. Incorporating the inconsistency results from PEDCHECK with the findings from RELPAIR, one family and one triad from another family were removed from analysis owing to multiple pedigree inconsistencies. In the end, 26 extended families with 65 affected (male-to-female ratio: 55:10) and a total of 184 samples were included in the final analysis. This resulted in a total of 52 affected-relative pairs; including 26 cousin, 10 full-sibling, two half-sibling and 11 other pairs (second-cousin pairs, etc.). Finally, 17 of the 26 autism families contained only males affected with autism, and they were classified as male-only families.
The selected families were analyzed for linkage using a multianalytical approach. Both parametric and non-parametric linkage analyses were performed. Parametric two-point linkage analysis was performed using the FASTLINK program of the LINKAGE software package.39, 40 Analyses were performed using genotype data only from affected individuals with both dominant and recessive genetic models and allowed for a disease allele frequency of 0.001 for the dominant model and 0.01 for the recessive model.14 These two-point limit of detection (LOD) scores were used to calculate genetic heterogeneity LOD scores (HLOD) using HOMOG.41 Non-parametric two-point analysis was conducted using MERLIN.42 Parents who qualified as ASP were coded as unknown with respect to autism in the linkage analysis. These individuals were classified as unknown, i.e. given that only limited clinical data were available (e.g. No ADI-R data). However, analyses coding parents as affected did not substantially change the linkage results.
Multipoint parametric and non-parametric linkage analyses were performed using the version of MERLIN43, 44 that allows for the adjustment of linkage disequilibrium (LD) between markers. In this study, r2=0.16 was set as the cutoff point for the LD level between markers for the multipoint analysis.45 The genetic marker distance was calculated by the local Ensembl program,46 based on 1 cM sex-average integrated maps from deCode Genetics.47 Marker allele frequencies were estimated from the data set using all individuals.48 As the sample was comprised of pedigrees of varying sizes, we assessed identity-by-descent sharing (LOD*) between all pairs of affected individuals within a family using the Spairs sharing statistic49 and the exponential model,50 as implemented in MERLIN.43 A whole genome-wide simulation was conducted in 17 male-only affected families for both multipoint parametric and non-parametric analysis to generate an empirical P-value for the marker with the maximal LOD score by employing the simulation option embedded in MERLIN,43 using a total of 1000 replicates. The haplotyping function embedded in MERLIN,43 was used to construct the whole chromosome 12 haplotypes to identify the minimal shared region across the male-only families sharing positive LOD scores across the peak linkage region.
Supplementary Figure 1 shows all pedigrees that were included in the final analysis. Out of a total of 184 sampled individuals, 65 there had autism.
Table 1 shows the results from two-point parametric analysis for all markers with HLOD 2.0 in the overall data set. A peak heterogeneity HLOD of 2.82 was found at rs2877739 on chromosome 14q12 under the dominant model. Suggestive linkage evidence (HLOD2) was found on chromosomes 1q23.2–23.3 (156–160 cM); 2q35–36.3 (211–230 cM); 5q23.2 (127 cM); 6p25.3 (16 cM); 6q15 (93 cM); 11q22.1 (101 cM); 12q14.2–14.3 (76-80 cM); 12q22 (102 cM). For two-point non-parametric analysis, none of the markers gave us LOD*2.0.
Figure 1 presents the results of the multipoint non-parametric and parametric analysis results for the overall data set. For non-parametric multipoint analysis, a significant linkage peak (LOD*>3.0) was found at chromosome 12q at rs1445442 with LOD*=3.2. Suggestive linkage evidence (LOD*>2.0) was seen on chromosome 7p14.1 (63–64 cM). For multipoint parametric analysis, significant linkage evidence was found on chromosome 12q with the peak at rs1445442 (HLOD=3.02 under the recessive model). This region at 12q13.13–q15 (67–84 cM) was the only region showing significant genome-wide linkage evidence.
For chromosome 12, the overall data set was stratified by the gender of affected individuals in the families. Under both multipoint parametric and non-parametric analysis, the maximum LOD score increases to HLOD=4.51 under the recessive model; genome-wide empirical P-value=0.001 and LOD*=4.22; empirical P-value=0.0001 in male-only affected families at rs146122 (Figure 2). Using the one LOD rule,51 the confidence interval of the peak region is from 74.68 (rs717274) to 81.58 cM (rs1405467). Supplementary Figure 2 provides the genome-wide multipoint non-parametric and parametric plots in by-gender data sets. Coding the parents diagnosed as ASP as affected did not alter the LOD score (HLOD[rec]=4.52).
Eleven of the 17 male-only families consistently gave positive LOD scores throughout the linked region (67–84 cM). The remaining six male-only families and eight of the nine with female affected gave negative multipoint LOD scores throughout this region. Figure 3 displays the shared region on chromosome 12 (shared region defined as the haplotype transmitted from the same founder) between or among the affected in the male-only families. Based on the male-only families exhibiting positive multipoint LOD scores, the most likely minimal candidate region is from 75 to 79 cM. Figure 4 displays an example of a segregating haplotype and shared regions among affected individuals.
More than 12 genome screens have been performed in autism using microsatellites and data sets comprised almost exclusively of affected sib-pair families (parents and their affected offspring). This is the first genome-wide screen report using both a high-resolution 10K SNP panel and extended families comprised of avuncular and cousin pairs. It is also the first autism report employing chromosome-wide multipoint parametric linkage analysis and using chromosome-wide haplotype construction to look at the minimal shared region across affected individuals to narrow the linkage peak region. A significant genome-wide linkage peak was found at 12q14.2, which has been undetected by previous genome screens, although the results did not meet the corrected genome-wide significance in the entire data set.52, 53 This linkage evidence was significantly enhanced in families with only male affected individuals, resulting in a HLOD=4.51 under the recessive model with a genome-wide empirical P=0.001. That the linkage evidence is significantly enhanced in male-only affected families, further supports previous evidence that the gender dichotomy is an important factor in the genetics of autism.54
Extended pedigrees can contain more genetic information than sib-pair families and can provide substantially more power to detect linkage, particularly when there is genetic heterogeneity. These extended families may well represent a gene or a few genes of relatively large effect. This can be an advantage in detecting linkage even with a small sample size such as the current study of only 26 families.55 Previous studies have demonstrated that small sample sizes can identify regions of interest in genomic screens for complex traits when the genetic effect is strong. For example, the initial genomic screen in late-onset Alzheimer's disease that established linkage to chromosome 19 and subsequently led to the identification of apolipoprotein E,56 a major susceptibility gene for late-onset Alzheimer's disease, included only 31 multiplex families.57 However, we cannot exclude the possibility that the identified linkage peak may be specific only for those autism families with an extended family history and, thus, may represent a small proportion of patients.
For genome-wide linkage screens, false-positive linkage evidence is a major concern. In high-density SNP panels, LD between SNPs becomes a new significant source of LOD score inflation. To address this issue, we employed a new version of MERLIN42 that performs chromosome-wide multipoint analysis while taking into account LD between markers, although it is unlikely to make a difference in these data because the majority of parents (8/105) are genotyped.
Inspection of the pedigrees (Supplementary Figure 1) found several examples of affected sib-pairs who shared the same two haplotypes from their parents. These data explain why the HLOD is more significant under the recessive model rather than the dominant model despite the hypothesized vertical transmission.
Defining the minimal linkage region for detailed evaluation of potential candidate genes in a complex trait such as autism is problematic. A number of factors complicate the effort including genetic heterogeneity, small family sizes and variable phenotypes. These studies, however, are critical because they guide the search for contributing susceptibility genes by identifying high-priority regions upon which to focus further efforts. Our approach to identify a region within which to begin our detailed molecular investigation included an examination of the 11 male-only families that contributed to the positive LOD score establishing linkage in the region. The remaining six male-only families, as well as eight of the nine families with female affected had negative multipoint LOD scores across the linkage region. Based on these criteria, we identified a 4 cM (75–79 cM) long segment as the most likely location of the chromosome 12 autism susceptibility gene. The approach, albeit important, is not infallible. Small positive scores, owing to limited family size, could represent false-positive results that might mask the true minimal candidate region. Thus, it is critical that this region be viewed as a starting point that can be sequentially extended to include additional genes if the contributing susceptibility variant is not found.
Within our identified most significant linkage peak region (most likely minimal shared region: 75–79 cM) from 60 710 030 bp (rs348691) to 64 239 801 bp (rs4129000), more than 19 known and predicted genes are found (Ensembl, June 2006 http://www.ensembl.org/Homo_sapiens/index.html). These include the AVPR1A (Arginine vasopressin receptor 1A (61 826 000–68 832 000)) gene, which has been recently reported as a strong candidate for involvement in autism susceptibility in association studies, and deserves continued scrutiny.58, 59 SRGAP1 SLIT-ROBO GTPase-activating protein, RHO 1 (62 524 808–62 823 751 bp), a member of the SLIT family of proteins known to be involved in neuronal guidance among other functions, and WIF1 (Wnt inhibitory factor precursor (63 730 674–63 928 374)), involved in developmental Wnt regulation, all lie within the critical region.
In conclusion, the novel linkage peak on chromosome 12q14 in this genome-wide screen further supports the hypothesis that there is substantial locus heterogeneity in autism, and that extended families, along with gender effects, may help delineate these different loci.
We thank the patients with autism, and the family members who agreed to participate in this study, as well as the personnel of the CHG at DUMC, for their input on this project. We also thank Drs Robert Delong and Gordon Worley for referring patients and their families to the study. We thank the Translational Genomics Research Institute (TGen) for their generosity and effort in performing the Affymetrix chip assays. This research was supported, in part, by National Institutes of Health (NIH) program project Grant NS26630, and Grants R01 NS36768, R01 AG20135 and R01 NS42165; by the National Alliance of Autism Research (NAAR); and by a gift from the Hussman Foundation. The research conducted in this study complies with current US laws. We also gratefully acknowledge the resources provided by the AGRE consortium and the participating Autism Genetic Resource Exchange (AGRE) families. The AGRE is a program of Cure Autism Now (CAN). This work used the core resources of the GCRC (MO1 RR-00095) and the CHGR at VUMC and the CHG at DUMC.
About this article
Haplotype structure enables prioritization of common markers and candidate genes in autism spectrum disorder
Translational Psychiatry (2013)