We performed a high-density, single nucleotide polymorphism (SNP), genome-wide scan on a six-generation pedigree from Utah with seven affected males, diagnosed with autism spectrum disorder. Using a two-stage linkage design, we first performed a nonparametric analysis on the entire genome using a 10K SNP chip to identify potential regions of interest. To confirm potentially interesting regions, we eliminated SNPs in high linkage disequilibrium (LD) using a principal components analysis (PCA) method and repeated the linkage results. Three regions met genome-wide significance criteria after controlling for LD: 3q13.2–q13.31 (nonparametric linkage (NPL), 5.58), 3q26.31–q27.3 (NPL, 4.85) and 20q11.21–q13.12 (NPL, 5.56). Two regions met suggestive criteria for significance 7p14.1–p11.22 (NPL, 3.18) and 9p24.3 (NPL, 3.44). All five chromosomal regions are consistent with other published findings. Haplotype sharing results showed that five of the affected subjects shared more than a single chromosomal region of interest with other affected subjects. Although no common autism susceptibility genes were found for all seven autism cases, these results suggest that multiple genetic loci within these regions may contribute to the autism phenotype in this family, and further follow-up of these chromosomal regions is warranted.
Autism (MIM 209850) is a pervasive neurodevelopmental disorder, characterized by impairments in verbal and nonverbal communication and social interaction, and the presence of repetitive stereotyped behaviors and interests. Autism is a member of a spectrum of neurodevelopmental disorders (autism spectrum disorders (ASDs)), which also includes Asperger's syndrome, and pervasive developmental disorder, not otherwise specified (PDD-NOS). It is typically diagnosed within the first 3 years of life, and there is strong evidence that autism is highly heritable. The concordance rate in monozygotic twins is 70–90%,1, 2 and the autism rate in siblings is 3–5%, much higher than expected from the general population prevalence.3, 4 Despite the strong heritability, identification of the underlying genetic mechanisms for autism has been elusive.
Efforts to find autism predisposition genes have focused on genome-wide linkage scans, and a number of such genomic searches have been published to date.5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 However, no single chromosomal region has been consistently shown to be associated with autism. Numerous reasons for the lack of consistency of results exist including genetic heterogeneity, such that multiple genes are thought to be involved in the etiology of autism.9, 18 It has been suggested that as few as 3–4 genes19 and perhaps up to 15 predisposing genes9 may be involved, and the complexity of finding multiple genes is further increased as autism predisposition genes do not appear to be inherited in a simple Mendelian fashion. Furthermore, genetic and environmental interactions may also contribute to the autism phenotype.20, 21 In addition, the inclusion of heterogeneous phenotypes may explain the lack of consistency of results in the study of autism. Genome-wide linkage scans that stratify subjects into more homogeneous phenotypes have shown striking differences in linkage results, including differences by the strictness of the definition of autism,16, 22 sex,16, 23 developmental regression,16, 24 language delay25, 26 and repetitive behaviors.27, 28 The key to identification of autism susceptibility genes will be the reduction of both genetic and phenotypic heterogeneity through selection of well-characterized families.
While most linkage studies to date have utilized nuclear families composed of either trios (two parents and an affected offspring), or sibling pairs (concordant or discordant), large extended pedigrees offer many advantages for the localization of candidate genes for autism. The study of large extended pedigrees increases the likelihood that there will be greater phenotype homogeneity and reduced environmental heterogeneity by pedigree. Study of large pedigrees also increases the power of a study to detect disease predisposition genes because of reduced genetic heterogeneity by pedigree. While study of large autism pedigrees affords many advantages, because of the relatively low autism recurrence risk in relatives, finding large extended autism pedigrees is a challenge.
Here, we present results for a high-density single nucleotide polymorphism (SNP) genome-wide linkage analysis in a six-generation pedigree from Utah with seven affected males, which is one of the largest known autism pedigrees. Previously, we published linkage results using this same pedigree for a single chromosomal region (3q25–27).29 In our current study, we present results for a genome-wide two-stage linkage design. In stage 1, an initial linkage screen was performed to identify potential regions of interest. We confirmed potentially interesting regions in stage 2 by eliminating SNPs in high LD. Maximizing genetic and phenotypic homogeneity through study of a highly informative pedigree using an increased genetic-marker density increases the probability of both confirming and narrowing significant linkage regions previously published for autism, as well as further facilitating the identification of autism predisposition genes.
Materials and methods
The seven affected male subjects are all descendants of a single founding couple of Northern European ancestry (see Figure 1). To preserve confidentiality of the pedigree, most siblings are excluded. Diagnosis was based on Diagnostic and Statistical Manual of Mental Disorders, fourth edition, Text Revision (DSM-IV-TR) criteria. Developmental history and clinical observations were gathered using the Autism Diagnostic Interview-Revised (ADI-R) and the Autism Diagnostic Observation Schedule-Generic (ADOS-G),30, 31 respectively, with two exceptions. One subject was unavailable for ADOS-G testing, and another subject's ADI-R was considered unreliable, as the informant was elderly and unable to recall subtle aspects of early development. However, in both cases there was also documentation of an earlier diagnosis and/or very early concerns about social development such that, along with current information gathered through our testing, DSM-IV-TR criteria for autistic disorder was met. Six of the seven affected subjects met DSM-IV-TR criteria for autistic disorder. One subject (subject no. 5) met DSM-IV-TR criteria for PDD-NOS. As one of the seven affected subjects was diagnosed with PDD-NOS, we define the entire pedigree as an ASD pedigree.
Other clinical characteristics of the seven affected subjects have been assessed; however, little similarity between cases was observed. Full-scale intelligence quotient (IQ) scores of affected subjects ranged from 41 to 124 for verbal IQ (VIQ) and 45 to 140 for performance IQ (PIQ) using IQ measures appropriate for age and level of functioning (Wechsler Adult Intelligence Scale (WAIS-III),32 Wechsler Intelligence Scale for Children (WISC-III)33 or Differential Abilities Scale (DAS)34). Three of the seven affected subjects showed language delay (that is, onset of single words after 24 months and/or onset of phrases after 33 months) as measured by items on the ADI-R. One of the affected subjects had a history of nonfebrile seizures, but the other six did not. None of the seven subjects showed developmental regression (that is, loss of previously gained language and social skills) as measured by items on the ADI-R. See Table 1 for details of these phenotypes.
We included 22 unaffected living relatives from the same pedigree to infer genotypes of deceased relatives or phase for genotyped cases. Phenotypes for the unaffected relatives were set to ‘unknown’ or missing for all analyses, as phenotype status of the deceased relatives was not recorded or possible to determine. Records were also not available in sibships of upper generations of the pedigree; it is unknown if there were additional affected members and hence anticipation could not be assessed. We verified pedigree structure for the 7 affected and 22 unaffected relatives using the Utah Population Database (UPDB), a computerized genealogy database that contains family history information for almost 4 million individuals who are, for the most part, descendants of the nineteenth century pioneers to Utah. Low rates of inbreeding have been previously reported within the UPDB.35, 36 Fragile X DNA testing was performed on all known autism cases and found to be negative. Karyotyping using a routine g-banded chromosomal analysis to rule out major chromosomal rearrangements was also performed on all cases; no large-scale chromosomal abnormalities were observed. All subjects signed an informed consent and this study was approved by the University of Utah Institutional Review Board.
We genotyped the 7 affected individuals and the 22 unaffected relatives using the Affymetrix 10K SNP panel. The 10K SNP panel contains 10 660 SNPs on a single array, and has an average distance between SNPs of 210 kb. SNP genotypes were obtained by following the Affymetrix protocol for the GeneChip Mapping 10K Xba Array.37 In brief, 250 ng of genomic DNA taken from peripheral blood was digested with the Xba1 restriction enzyme into fragments. This step is followed by ligation to Xba adaptors that place no restriction on fragment size. Using PCR reaction, a single generic primer is used to amplify adaptor-ligated DNA fragments. The amplified DNA product is then fragmented, labeled with biotin-ddATP and hybridized to the 10K chip. Following an 18-hour hybridization, the chip is washed, stained and scanned using an Affymetrix Fluidics Station FS450 and the Affymetrix GeneChip 3000 scanner. Affymetrix GCOS software was used to determine SNP genotypes for each locus.
Genotype error checking and mapping
Genotype error checking including checks for Mendelian inconsistencies was performed with Mega2,38 and as a second validation we used the program CheckErrors,39 which uses graphical modeling to calculate the posterior probability of genotype errors in pedigrees. All SNPs with genotype errors from either Mega2 or CheckErrors were eliminated from further study. We also eliminated all SNPs with a rare allele frequency ⩽0.10.
We used the genetic map provided by Affymetrix, based on the deCODE genetic map, for the linkage analysis. Base pair positions were obtained from the May 2004 human reference sequence (hg17) assembly and were used for graphs presented in the manuscript. For peak regions that spanned the centromere, we validated separately the chromosomal p and q arms for the LD assessment, because of suppressed recombination in the centromere region.
Stage 1: initial genome-wide linkage screen
An initial genome-wide linkage scan with no correction for LD was performed using the multipoint linkage software MCLINK.40 MCLINK is a Markov chain Monte Carlo method that allows for fully informative multilocus linkage analysis on large extended pedigrees. Using blocked Gibbs sampling, MCLINK generates inheritance matrices from haplotype chains for the markers being analyzed, and performs an approximate calculation of the log-likelihood function linkage statistics. MCLINK has been used previously to identify candidate genomic regions for a number of complex diseases.41, 42, 43, 44 We performed a nonparametric linkage analysis, as inheritance models for autism in general and specifically for broad spectrum of autism are unknown. Ideally, we could have derived the inheritance model parameters from our sample and used them for analysis. However, because we have a single pedigree, segregation parameters could not be resolved with any degree of accuracy and hence a model-free approach was utilized. Allele frequencies for the MCLINK analysis were estimated using a maximum likelihood gene counting method,45 which statistically infers allele frequencies based on the assumption that the pedigree founder alleles were randomly sampled from a wider population. For this study, allele frequencies as defined by Affymetrix for their 10K SNP panel on 40 Caucasian subjects were considered as the wider population allele frequencies.
Stage 2: assessment of linkage disequilibrium in regions with linkage NPL results ⩾2.0
Currently available multipoint linkage software assumes that genetic markers are in linkage equilibrium, which is a valid assumption if one uses distantly spaced microsatellite markers. However, using higher density, more closely spaced SNP markers may violate this assumption. For all regions from stage 1 with nonparametric linkage (NPL) scores ⩾2.0 and an additional ∼15–20 SNPs on either side of a peak region, we screened for potential biased results due to LD. As the number of unrelated individuals in our pedigree was limited, we used unrelated, parental genotype data (N=60) from Centre d'Etude du Polymorphisme Humain (CEPH) Utah trios in HapMap for LD assessment. The CEPH founders, as also being of Northern and Western European ancestry, are likely to be very similar to our Utah autism extended pedigree subjects, and are a valid resource for determining LD structure for our subjects. Using only Affymetrix SNPs that matched HapMap CEPH genotype data, we estimated the pairwise LD measure D′ between all pairs of SNPs in a region of interest. All markers with a D′ value ⩾0.70 and within 2-million base pairs of each other were considered for potential removal. The D′ threshold value of 0.70 has been used by others to eliminate SNPs in high LD.46, 47 Markers identified as being in high LD were phased using the software SNPHAP, and the resulting haplotypes were entered into a principal components analysis (PCA) in a manner using a modified version of the PCA tag-SNP method for candidate genes proposed by Horne and Camp.48 In brief, this PCA method extracts factors that account for at least 90% of the underlying genetic variation; each extracted factor is defined as an LD group. For each LD group, we retained the SNP with the highest factor loading as the SNP that best characterizes the variance for each group, and eliminated all other SNPs as providing redundant information. Hence, SNPs that were included in the confirmation analysis (1) were included in HapMap data and (2) had either D′ values of <0.7 or if D′ was ⩾0.7, the SNP was selected from the PCA method. Linkage analysis was repeated using a reduced marker list following the same procedure as described previously.
As nonparametric linkage results are on a different scale than parametric LOD scores, we applied the NPL equivalent of the Lander and Kruglyak criteria49 for determining suggestive and significant results (see also Ray and Weeks50). For the initial LD screening analysis, a potentially interesting region was defined as having an NPL score ⩾2.00; LD assessment was performed for all of these regions. After elimination of SNPs in high LD, we applied the Lander and Kruglyak significance criteria to determine significance. Suggestive evidence of linkage was defined as an NPL score ⩾3.18, (P=0.00074) and significant evidence of linkage was defined as an NPL score ⩾4.08 (P=0.000022). Boundary regions for a peak after LD assessment were defined based on a ‘two NPL drop’ interval. For regions of interest at the beginning or end of the chromosomal p or q arms, where it was not feasible to require that a boundary SNP reach a ‘two NPL drop’ criteria, the first or last available SNP in the area defined the region of interest. As a secondary validation of significance, we also simulated 100 null genotype configurations using the software package SIMULATE51 for all potentially interesting regions (NPL⩾2.0). SIMULATE51 generates genotype data for linked markers unlinked to affection status using the same pedigree structure, population allele frequencies and recombination fractions between markers as the original data. As analysis of SNP data can lead to biased linkage results because of LD between markers, only regions where we had previously removed high-LD markers were reanalyzed. For each reanalyzed region, we performed a nonparametric linkage analysis using MCLINK and compared the maximum NPL result from the simulated null genotype data to the maximum NPL result obtained from the observed data for the region. The reported P-value is a count of the number of times the simulated data exceeded the observed data across the 100 simulations.
All 7 affected individuals and the 22 unaffected relatives were genotyped on the premarket version of the Affymetrix 10K chip. Because of sample errors, two subjects were re-genotyped on a subsequent version of the 10K chip. From the 10 660 total SNPs on a 10K chip, we eliminated 407 SNPs because of differences in chip versions and 412 SNPs because of obvious mapping errors. We eliminated an additional 1634 SNPs because the minor allele frequency was ⩽0.10 and 195 SNPs because of genotype errors. The final total number of SNPs studied was 8012.
The results from the initial multipoint genome-wide linkage scan obtained from MCLINK are displayed in Table 2 and Figure 2. There were 18 regions with an NPL score ⩾2.00. The maximum NPL statistic was 5.56 (empirical P-value 0.000000287) and was found on both chromosome 3p12.3–q21.3 and 20p12.1–q13.12.
LD assessment results
Linkage disequilibrium assessment was performed for the 18 chromosomal regions attaining an initial NPL score ⩾2. 0. From the original set of Affymetrix 10K SNPs for each region of interest, we retained on average 87.3% (s.d. 4.8%) of SNPs after matching to HapMap data and on average 69.8% (s.d. 8.6%) of SNPs after matching to HapMap data and after elimination of SNPs in high LD. Table 2 shows the median intermarker distance for chromosomal regions of interest both prior to LD assessment and after removal of SNPs in high LD. The overall median intermarker distance for all regions of interest after matching HapMap data, but prior to LD assessment was 0.248 Mb (25th and 75th percentile 0.091–0.565 Mb). After elimination of high-LD SNPs, the overall median intermarker distance was 0.344 Mb (25th and 75th percentile 0.156–0.709 Mb).
The validation linkage results after removal of SNPs in high LD are presented in Table 2 and Figure 3. Five chromosomal regions had NPL scores ⩾3.18 after LD assessment. The maximum multipoint NPL score of 5.58 (empirical P=0.000000287) was obtained for chromosome 3q13.2–q13.31. Other regions attaining significant NPL scores (NPL⩾4.08) were 3q26.31–q27.3and 20q11.21–q13.12. Regions attaining suggestive evidence for linkage (NPL⩾3.18) were 7p14.1–p11.22 and 9p24.3. We note that the simulation generated results concurred with the observed data. For all five regions of interest meeting at least suggestive evidence of linkage, the observed NPL result exceeded all 100 simulation-generated NPL results (that is, P<0.01).
Table 3 contains the haplotype sharing information for the five chromosomal regions attaining at least suggestive evidence of linkage (NPL⩾3.18). The two regions of interest on chromosome 3 and the region on chromosome 20 were all shared among four affected subjects. The region on chromosome 7 was shared by three affected subjects. Two distinct haplotypes were shared by two different pairs of affected subjects on chromosome 9. It should be noted that the groups of individuals who shared a particular haplotype differed by chromosomal region, such that the group of individuals, for example, who shared the chromosome 3q13.2–q13.31 region differed from the group of individuals who shared the chromosome 3q26.31–q27.3 region. Individual 2 did not share any of the chromosomal regions of interest based on linkage evidence with his other affected relatives, whereas individual 6 shared all of the chromosomal regions of interest with other affected relatives.
We present results for a high-density, genome-wide SNP analysis in a single large pedigree with seven affected autism cases from Utah. Our six-generation ASD pedigree of all male cases represents one of the largest autism pedigrees known. Through study of this highly informative pedigree where cases are more likely to be genetically homogeneous, we have the potential to confirm previously published linkage results as well as narrow significant linkage regions with an end goal of facilitating the identification of autism susceptibility loci.
The most promising chromosomal regions to emerge from these results are the 3q13.2–q13.31, 3q26.31–q27.3 and the 20q11.21–q13.12 regions. The maximum multipoint NPL score was obtained for chromosome 3q13.2–q13.31 (NPL, 5.58; empirical P=0.000000287). Schellenberg et al.16 also observed a significant peak in the 3q13.31 region using microsatellite markers for families with two affected siblings. Their most significant finding (P=0.007) for the 3q13.31 region was for families without behavioral regression and a broad diagnosis of autism, which included 147 families where both affected siblings met criteria for the strict definition of autism and 53 families with other types of affected sibpairs (for example, a sibpair diagnosed with strict definition of autism and the second sibling diagnosed with PDD-NOS). Other subset analyses of their data were also found to be significant at the same 3q13.31 region: multipoint (P=0.03) and single point (P=0.02) analyses for families with a broad diagnosis of autism without consideration of behavioral regression, families with only male affected siblings (P=0.03) and families without behavioral regression and a strict diagnosis of autism (P=0.02). These results also concurred with a meta-analysis of five genome scans for a broad definition of autism-spectrum disorders that found interesting results for the 3q13.1–q21 region (P=0.021).22 Based on our current results and these two other studies, genetic variant(s) in the 3q13.2–q13.31 region may predispose to broad spectrum of autism in males without behavioral regression. As further corroborating evidence for the significance of the 3q13.1–q21, we also note that the Autism Genome Project Consortium recently published a copy number variant analysis in which they detected a copy number variant loss at chromosome 3q13.31 that was validated by familial clustering.17 Autism candidate genes of interest in the 3q13.2–q13.31 region include DRD3, GAP43, LSAMP and MAK3. In an early publication, allele frequencies in the gene DRD3 were compared between subjects with autism and matched school children; however, no significant difference between groups was observed.52
The second highest peak was observed at 20q11.21–q13.12 (NPL, 5.56; empirical P=0.000000287). Schellenberg et al.16 also found suggestive linkage results at 20q11.21; however, only for the subset of families with male only affected siblings and a broad diagnosis of autism as defined above (P=0.03). Again, based on our current results and those of Schellenberg et al.,16 genetic variants in the 20q11.21–q13.12 region may predispose males to broad definition of autism. Autism candidate genes of interest in this region include NNAT, SLC32A1, POFUT1, DLGAP4 and PTPRT. However, none of these genes have been tested for an association with autism.
The 3q26.31–q27.3 region (NPL, 4.85; empirical P=0.000000636) was previously identified by our group,29 and replicates findings for 38 Finnish autism families to 3q25–27 (maximum multipoint LOD score=4.81), particularly their follow-up study of 34 families from Central Finland.53 The strongest results for the 3q25–27 region from the Finnish study were obtained for a broad autism spectrum diagnosis that included Asperger's syndrome. Our study of an ASD pedigree of Northern European descent confirms the Finnish results, and suggests that this locus might be specific for broad spectrum of autism. Further, as increased linkage evidence for the chromosome 3 region was not observed in a combined analysis of the same Finnish families and the large-scale collaborative efforts of the Autism Genetic Resource Exchange,54 this suggests that this particular mutation(s) may be limited to those with Northern European ancestry. We make particular note that our current study using high-density SNPs was able to narrow the region from 3q25–27 identified using microsatellite markers in the Finnish study to 3q26.31–q27.3. We also point out that in our previous study,29 we did not attempt to rigorously control for LD; hence, we reported our previous findings as within the general 3q25–27 region. Potential candidate genes for autism in this chromosomal region include NLGN1, FXR1, SOX2, HTR3C, HTR3D, HTR3E, AHSG, CAM-KIIN and EPHB3. FXR1 and NLGN1 were previously screened as potential autism candidate genes; however, neither showed a strong association with autism.29, 55, 56
For the two chromosomal regions with suggestive evidence of linkage, 7p14.1–p11.22 and 9p24.3, both were replicated by other groups. Wolpert et al.57 observed a single case with autism to have a de novo partial duplication of 7p11.2–p14.1. As noted by Wolpert et al.,57 the 7p11.2–p14.1 region is adjacent to the HoxA-1 gene, which has been suggested as a candidate gene for autism.58, 59, 60 A copy number variant gain was detected by the Autism Genome Project Consortium at chromosome 9p24.3.17
Based on our haplotype sharing results for the five chromosomal regions with at least suggestive evidence of linkage results (NPL⩾3.18), no single haplotype was consistently shared among all affected subjects, suggesting that no single chromosomal region, or hence no single gene, explained all of the autism cases observed. Rather, we observed that five of the affected subjects (that is, subjects 1, 3, 4, 5 and 6) shared more than a single chromosomal region of interest with other affected subjects. For these five affected subjects, sharing occurred across a combination of various different regions, via most likely gene–gene interactions, to result in the autism phenotype. The hypothesis of genetic heterogeneity in autism is not new.16, 61 However, what is interesting is that we observed genetic heterogeneity within a single large pedigree. While it is possible that more than a single autism predisposition gene may lie in one of the identified regions, our findings suggest that the number of potential genes and various combinations of genes that may be involved in autism within our pedigree are limited, and we have identified a relatively small number of regions to follow-up. Furthermore, genetic heterogeneity within a single pedigree may also help explain the lack of consistency of results between autism studies. If genetic heterogeneity exists in a single large pedigree, it likely exists in other autism pedigrees and further emphasizes the challenges that will be faced to find autism predisposition genes. Despite the genetic heterogeneity observed, the reduced phenotypic and environmental heterogeneity in our large pedigree allowed us to detect three significant and two suggestive linkage peaks.
We did not observe haplotype sharing for individual 2 with the other affected subjects at any of the five chromosomal regions of interest. Individual 2 may be a sporadic autism case in the pedigree with de novo mutations and/or environmental causal factors, as he had the lowest IQ and is the only case to have seizures. It is also possible that this family is genetically heterogeneous and despite the relationship between affected individuals, the occurrence of autism in some of the branches of the genealogical tree might have appeared independently. Lack of sharing with individuals 2 could also indicate that not all genes contributing to autism within this pedigree were detected. Here, we focused on results attaining at least suggestive evidence of linkage. As shown in Table 2 there are other regions of the genome that are potentially interesting but have NPL scores less than 3.18. Furthermore, submicroscopic chromosomal abnormalities may also predispose to autism. A follow-up study to estimate copy number variants using higher density SNPs is planned. While it is possible that multiple genes across these other regions may act jointly in a low-penetrant polygenic manner to contribute to the autism phenotype, additional studies are required to understand their possible contribution to autism.
One of the most consistent chromosomal regions demonstrating significant linkage results for the autism phenotype is at chromosome 7q,9, 10, 13, 14, 16, 62 a region that was not significant in our analyses. Recent studies have shown that the peak on chromosome 7q is only significant in studies using subjects with a strict diagnosis of autism.16, 22 When the diagnosis is relaxed and ASDs are included in the phenotype, the linkage evidence is much weaker.16, 22 Despite only one of seven cases being diagnosed with PDD-NOS in our extended pedigree, our negative results at chromosome 7q emphasize the importance of genealogy for phenotype classification, particularly in pedigrees where there is only a single child in a nuclear family with ASD.
In our analysis, we controlled for the presence of LD among SNP markers, as LD has the potential to bias multipoint linkage results. The majority of previous studies that have examined the effect of LD on linkage analyses have found an inflation of both NPL and LOD scores,46, 63, 64 although a single study observed a slight decrease in NPL scores in the presence of LD.65 We observed 18 chromosomal regions attaining an NPL score ⩾2.0 when LD was ignored, and 16 of these regions resulted in NPL scores ⩾2.0 when LD was considered. We found that our methodology for eliminating high-LD SNPs was consistent and actually more conservative than that used by Sellick et al.,64 who also used a 10K SNP panel for study of a complex disease. We observed a median intermarker distance of 0.344 Mb (25th and 75th percentile 0.156–0.709 Mb) after elimination of high-LD SNPs, and Sellick et al.64 observed a median intermarker distance of 0.17 Mb (25th–75th percentile 0.06–0.38) after removal of high-LD SNPs by selecting a single SNP from clusters of SNPs with an r2>0.4. It is possible that LD may still have influenced our results, and it is also possible that we may have missed other potentially interesting regions by not assessing LD on the complete genome. Our high-LD elimination method is rather time intensive, and hence we opted to assess LD only in regions of potential interest defined at a conservative threshold. As all five chromosomal regions with at least suggestive evidence of linkage are consistent with other linkage analyses using microsatellite markers or analyses with copy number variants, we feel confident that these regions are valid peaks and warrant follow-up with fine mapping and candidate gene studies.
In conclusion, our genome-wide linkage analysis of a single high-risk ASD pedigree resulted in suggestive or significant linkage evidence for five distinct chromosomal regions, all of which show striking replication with other autism studies. Our replication of previous results adds credibility to previous findings and reinforces the soundness of our methodology for identifying regions of interest, and follow-up of these five chromosomal regions is warranted.
This work was supported by R01 MH069359, 5 U19 HD035476 (one of the NICHD Collaborative Programs of Excellence in Autism), the Utah Autism Foundation and by GCRC grant number M01-RR00064 from the National Center for Research Resources. Partial support for all datasets within the Utah Population Database (UPDB) was provided by the University of Utah Huntsman Cancer Institute. We thank Dr Sally Ozonoff for assistance with diagnoses of subjects, and our staff whose countless hours of work have made this study possible. We also greatly appreciate the time and effort given by the family members who participated in this study.
Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.nih.gov/Omim/
Utah Population Database (UPDB), http://www.hci.utah.edu/groups/ppr/