Introduction

Spinal muscular atrophy (SMA) is an autosomal recessive (AR) neuromuscular disease that occurs in 1 in 10 000 European births, making it the second most common lethal AR disease after cystic fibrosis in European populations. In US Whites, 1 in 35 are SMA carriers.1, 2 SMA is caused by the homozygous loss of survival motor neuron 1 (SMN1) gene function due to a deletion or possibly to a gene conversion event in 95% of cases;3 the remaining 5% of patients have intragenic mutations that inactivate the gene.4, 5, 6 Affected individuals exhibit progressive muscle weakness and paralysis as a result of degeneration and loss of motor neurons in the spinal cord. Four main types of SMA with a large range of severity have been described. SMA type I, the most severe, typically leads to death within the first 2 years of life, whereas individuals with SMA type IV do not experience symptoms until adulthood. The severity of SMA is modified by multiple factors, with the copy number of the homologous gene SMN2 being the major modifier.3, 7, 8, 9

The SMN1 gene is located on chromosome 5q13, proximal to a nearly-identical homologous gene, SMN2. SMN2 copy number varies from zero to three copies per chromosome, whereas SMN1 copy number ranges from zero to >2 per chromosome. All SMA patients have at least one copy of SMN2, and no individuals who are null for both SMN1 and SMN2 have been observed. It has been hypothesized, therefore, that the loss of both SMN genes is an embryonic lethal condition.10 Moreover, a single C>T base substitution in SMN2 disrupts proper splicing, so that the most prevalent SMN2 transcript is missing exon 7 and is therefore unstable, whereas SMN1 produces full-length, functional transcript. However, because SMN2 produces a small amount of full-length transcript that can partially ameliorate the SMA phenotype, SMN2 copy number is inversely correlated with disease severity and considered a modifier of SMA.11

Carrier screening is technically challenging because the SMA region contains many repeated elements that predispose to rearrangements; as a result, 2% of patients carry de novo deletions that originated in the gametes of a parent with two copies of SMN1.12 In addition, the high homology between SMN1 and SMN2 complicates interpretation of carrier testing results. Carrier and disease status for SMA is typically determined by direct quantification of SMN1 exon 7 dosage, but this method will not identify carriers of intragenic mutations and carriers with the ‘2+0’ genotype,1, 12 in which one chromosome has two copies of SMN1 in cis and the other has zero copies.

The more than 40 000-member Hutterite community lives in the northern United States and western Canada, and is divided into three major groups, the Schmiedeleut (S-leut), Lehrerleut (L-leut), and Dariusleut (D-leut). Marriages between the three leuts have been uncommon over the last 90 years, although all members of this founder population descended from fewer than 90 ancestors.13 SMA had previously been reported in the Hutterites,14 but the carrier frequency in the population was not known. Therefore, we sought to estimate the carrier frequency for SMA mutations, to determine whether a common founder mutation was responsible for SMA in the Hutterites, and to assess the utility of haplotype-based carrier screening in this population. Here we report the results of these studies in 1415 Hutterites.

Subjects and methods

Subjects

The study sample, including one child with SMA and four known obligate carrier parents, was comprised of 1415 S-leut Hutterites from South Dakota, aged 6–93 years at the time of our genetic studies.15, 16, 17, 18 These individuals are related to each other in a 13-generation pedigree, containing 3673 individuals, all of whom can be traced to 64 common founders. The child with SMA is a boy who had onset of symptoms in the second year of life. Three additional L-leut Hutterites from Montana with SMA were also studied. Two were females with onset of symptoms at 7 months of age and 25 years, respectively; one was male, with onset of symptoms at 16 months of age. All four Hutterites with SMA are still ambulatory (current ages 11, 4, 28, and 4 years, respectively).

Genotyping and analysis of SMA region haplotypes

All subjects were genotyped using the Affymetrix 500k, 5.0 or 6.0 SNP arrays (Affymetrix, Santa Clara, CA, USA). SNPs were excluded from analysis if the call rate was <90%, Hardy–Weinberg P<0.001 (correcting for Hutterite population structure and inbreeding19) or ≥5 Mendelian errors were generated. SNPs with minor allele frequency <0.05, or were not present on all three chips were also excluded. A final set of 16 405 chromosome 5 SNPs was used for haplotype analysis.

To identify the haplotype carrying the SMA mutation in the Hutterites, we first used the Runs of Homozygosity function in PLINK,20 which characterized the extent of homozygosity around SMN1 in one S-leut and three L-leut individuals with SMA. We allowed for a 2 Mb gap in SNP coverage in the SMN1/SMN2 region, because the Affymetrix arrays do not include SNPs in this region. No heterozygous SNPs and only five missing genotypes were permitted within the homozygous stretch; SNPs were required to be present at a minimum density of one SNP every 50 kb (excluding the SMN1/SMN2 region mentioned above).

Phasing of subjects and haplotype-based carrier screening

Kong et al21 showed that if two distantly related individuals share at least one allele identical by state (IBS) at ≥1000 consecutive SNPs, one can conclude that the SNPs constitute a haplotype that was inherited identical by descent (IBD). We used this rationale to identify Hutterites who were IBD with the SMA carrier haplotype and thus likely to be carriers of the SMN1 deletion, and to exclude Hutterites who were not IBD with any of the SMA haplotype and thus likely to be noncarriers. Custom Perl scripts were used to identify all subjects in our sample who were IBS≥1 at most or all of the SNPs across the region that was homozygous in the affected subjects. Individuals that were IBS≥1 at ≥95% of SNPs in the 22 Mb region were considered to be carriers and the remaining individuals were considered to be noncarriers. Haplotypes in both classes were visually verified for consistency with the SMA haplotype to allow identification of any individuals carrying or lacking a section of the haplotype as a result of a recombination event. Using this approach allowed us to phase the SMA carrier haplotype, when present, in each subject without having to phase the other haplotypes. We termed this method ‘pseudo-phasing.’

Carrier screening by qPCR

To determine whether there were SMA deletions on other haplotype backgrounds, we used a quantitative real-time PCR (qPCR) protocol,22 modified to use RNASP as an internal 2-copy control for each subject in a separate PCR reaction. Using this protocol, we were only able to confidently call genotypes for 600 of the 1335 samples tested, possibly due to the complicated genetic structure of the SMA locus. Therefore, we used these data only as an internal check for contrasting with results of the haplotype studies and to identify individuals for further studies using a competitive PCR assay.

SMN copy number determination by competitive PCR

To confirm carrier statuses determined by the haplotype-based screening method or qPCR, SMN1 and SMN2, copy numbers for a subset of the Hutterites were determined at the Ohio State University Molecular Pathology Laboratory using a competitive PCR assay that directly assesses SMN1 and SMN2 dosage.23 We used three different criteria to select samples for this test. First, we selected 52 representative subjects from families where the SMN1 copy number, as predicted by the haplotype analysis, was discordant with the copy number predicted by the qPCR assay. Second, we selected 12 subjects from families with small portions of the SMA haplotype and three subjects from families that carry part of the SMA haplotype, but not immediately surrounding the locus itself. Lastly, we selected one unaffected adult who was homozygous for the shared segment.

Minimal haplotype analysis

To determine the minimum number of SNPs required to determine carrier status, a ‘minimal SMA haplotype’ was selected from the 2066-SNP, 22 Mb SMA haplotype, using lasso regression implemented in the R package glmnet,24 after choosing the parameters α=1.0, λ=1.503 × 10−3, by running a 10-fold cross-validation procedure. K-fold cross validation was used to minimize the effects of overfitting the model to our data. The full data set is randomly divided into K-subsamples; K-1 samples are used to develop the model, and then the remaining subsample is used for validation of the model. Lasso regression uses SNPs as predictors of the phenotype (carrying at least one copy of the SMA mutation), while minimizing the number of SNPs in the model. Genotypes were coded as a single allele dosage value using PLINK, where the value indicated the number of copies of the minor allele. Missing genotypes were substituted with the mean dosage value at that SNP calculated from the entire genotyped population.

Results

Identifying a shared haplotype in affected individuals

We expected the pathogenic mutations in the Hutterites with SMA to be inherited IBD from a common S-leut and L-leut ancestor because of the genetic isolation of the population. Consistent with this expectation, we identified a 22 Mb (2066 SNPs; rs1328254 to rs7709974; 49 596 616 to 72 430 491 bp) segment around the SMN1/SMN2 locus that was homozygous in the S-leut child from South Dakota with SMA (Figure 1a, individual A). The three L-leut Hutterites from Montana with SMA were homozygous for smaller segments (10 Mb; 61 091 141 to 72 430 491 bp) of the same haplotype, spanning the SMN1/SMN2 locus (Figure 1a, individuals B–D).

Figure 1
figure 1

(a) Haplotypes of four Hutterites with SMA. SNPs are shown from left to right along the x-axis, with the SMN1/SMN2 locus shown by a vertical black line. No SNPs are present within 2 Mb at the locus. Orange indicates SNPs for which each subject is IBS≥1 to the SMA haplotype in individual A, blue indicates SNPs where the subject is IBS=0 and white indicates SNPs with missing data. Parenthetical labels on the right correspond to the position of each individual in the pedigree shown in Figure 2. Individual A is the S-leut SMA child who is homozygous for 22 Mb (2066 SNPs) across the SMN1/SMN2 locus. Individuals B–D are L-leut SMA individuals who share 10 Mb (783 SNPs) of the haplotype in individual A. This 10 Mb shared haplotype is marked above the figure and is seen as a contiguous block of orange around the SMN1/SMN2 locus. In contrast, the proximal (left) end of the 22 Mb region shows a mix of both blue and orange SNPs, indicating that it is not IBD to the SMA haplotype. (b) Pedigree of individuals with SMA. All affected individuals can be traced back eight generations to their most recent common ancestors, a couple born in the 1790s (individuals I.1 and I.2). Black-filled symbols are affected individuals in our study. Birth years are shown.

An eight-generation pedigree connects these four affected individuals to their most recent common Hutterite ancestors, a couple who were born in the 1790s (Figure 1b, individuals I.1 and I.2). The sharing of a 10 Mb haplotype (783 SNPs) between the four affected Hutterites supports a single founder origin for the SMA mutation in both leut. Using a competitive PCR assay, we could determine that all four SMA individuals carried zero copies of SMN1 and four copies of SMN2, indicating that the haplotype carrying the deletion of SMN1 in the Hutterites also carries two copies of SMN2.

Haplotype-based carrier screening

Pseudo-phasing of 1414 S-leut Hutterites across the SMA region revealed 177 individuals carrying the majority of the 10 Mb shared haplotype (Supplementary Figure S1), suggesting a carrier frequency of at least one in eight (12.9%) in the South Dakota Hutterites. The most recent common ancestors of the 177 carriers are three Hutterites who were born in the late 1700s, none of whom are founders of the Hutterite pedigree (not shown). Among the Hutterite founders, 13 are ancestral to all 177 carriers, and any one of these founders could have introduced the mutation into the population.

Carrier status of a representative subset of the subjects was confirmed by competitive PCR, and samples from a representative selection of haplotype-determined noncarriers were confirmed by competitive PCR to have two copies of SMN1 (Supplementary Figure S2). None of the noncarriers had more than two copies of SMN1, suggesting that the ‘2+0’ genotype is not present in the Hutterite population.

The carrier frequency determined by our haplotype-based method is possibly an underestimate of the true SMA mutation frequency in the Hutterites. One source for undercalling carriers would be recombination events that result in retention of a very small portion of the shared haplotype. We searched for such individuals by examining progressively smaller segments of the haplotype (200, 100 and 50 SNPs) proximal and distal to the SMA locus in subjects who did not share ≥95% of SNPs IBS≥1 across the entire full-length haplotype, but who were IBS≥1 at the SNPs immediately surrounding the locus.

We identified 12 individuals who had small segments that were IBS to the shared haplotype. Visual examination of the haplotypes revealed that four individuals (two were first degree relatives) appeared to be carriers who had lost most of the haplotype through recombination events (Figure 2a) and four others carried most of the shared haplotype, but appeared to have had a recombination event near the SMN1 locus, resulting in the loss of the SMA deletion (Figure 2b). We also identified four individuals who were IBS≥1 to small portions of the shared haplotype, but we were uncertain as to whether they carried the deletion (Figure 2c). Competitive PCR assays in these 12 subjects confirmed that the SMN1 dosage in each of these individuals corresponded with the status predicted on the basis of visual examination of their haplotypes, the first group (Figure 2a) were carriers, and the second (Figure 2b) and third (Figure 2c) groups were noncarriers. The fact that the third group of individuals are noncarriers indicates that being IBS for <3 Mb of the haplotype is not sufficient to be called a carrier.

Figure 2
figure 2

Haplotypes reflecting recombination events involving the SMA locus. (a) Haplotypes of relatively-unrelated SMA carriers (first degree relatives not shown), who lost most of the original 10 Mb shared haplotype as a result of recombination events, but retained the shared haplotype immediately proximal and distal to the SMA locus. (b) Haplotypes of relatively-unrelated SMA noncarriers (no first-degree relatives), who are IBD to large portions of the shared haplotype, but not immediately surrounding the SMA locus. (c) Haplotypes of noncarriers who have very small segments (<3 Mb) around the SMA locus that were IBS, but not IBD, with the shared haplotype.

Identifying carriers of a de novo SMA deletion

A second source of undercalling carriers using the haplotype-based approach is the occurrence of de novo mutations, which have been reported in 2% of SMA cases.12 In those cases, the mutations would likely occur on a different haplotype and be missed by our study. The qPCR assay22 revealed a father and two of his children who appeared to be carriers of an SMN1 deletion (Figure 3), although none carried the SMA haplotype. The competitive PCR assay determined that all three were in fact carriers. Further examination of this family revealed that the haplotype segregating with the deletion in these individuals was also present in other family members who were not carriers by competitive PCR, indicating that a de novo deletion arose in the germline of the paternal grandfather (individual II.1 in Figure 3) that was inherited by his son and two grandchildren.

Figure 3
figure 3

Pedigree of family with a de novo deletion of SMN1. Individuals III.2, IV.2 and IV.5 (half-shaded symbols) are confirmed carriers by competitive PCR, but do not carry any of the haplotype shared by the affected individuals. These three carriers share an SNP haplotype that differs from the four affected individuals. This newly-shared haplotype (filled in red/gray in figure) is also present in noncarriers in the same family (II.3, III.4, IV.5, etc.), indicating that the deletion arose within this family. As the noncarrier grandfather (II.1) carries the haplotype, an SMN1 deletion must have arisen in his germline and then passed on by his son (III.2) to at least two of his grandchildren (IV.2, IV.5). Symbols for individuals with unknown carrier status are shaded in gray, symbols for carriers are half-filled, and symbols for noncarriers are white.

Although this was the only de novo mutation identified in this study, we obtained qPCR results for less than half of our sample (594/1335 samples). Therefore, it is possible that more de novo mutations are present in the population and that the carrier frequency estimated by the shared haplotype analysis is a lower bound (carrier frequency of 12.7 versus 12.9%, when including the three additional de novo carriers, for example). We estimate a de novo mutation rate of 8.4 × 10−4, based on the discovery of one such mutation in 1188 meioses.

Identification of an asymptomatic subject homozygous for the SMN1 deletion

Two of the 1415 S-leut individuals in our study were homozygous for the SMA haplotype. One was a child diagnosed with SMA in early childhood, but the other was an ostensibly healthy adult woman who was 41 years old at the time of study. The homozygous deletion of SMN1 in this woman was confirmed by competitive PCR and, as expected, all four of her children were carriers. This asymptomatic woman is homozygous for the entire 22 Mb SMA haplotype, although she is not particularly closely related to the SMA child (kinship coefficient of 0.027 versus the S-leut average of 0.034).

Previous studies have reported asymptomatic relatives of SMA individuals who are homozygous for the SMN1 deletion.9, 25, 26, 27, 28, 29 The precise genetic modifiers that ameliorate the phenotype have not been determined in most cases, although a single base substitution in SMN2 (c.859G>C) enhances splicing of the gene, enabling it to partially or fully compensate for the lack of SMN1 in some cases.25 However, the unaffected homozygous woman in our study did not carry this modifier.

Identification of a minimal SMA haplotype

To further explore the utility of SNP genotyping for carrier screening in the Hutterites, we employed lasso regression to select the fewest SNPs (of the 2066 in the shared haplotype) required to accurately determine carrier status. The best model consisted of 26 SNPs that formed a ‘minimal SMA haplotype’ (Supplementary Table S1).

The 26-SNP model correctly predicted carrier status in 1409 of 1415 individuals (99.8%) (Figure 4a). The three subjects who were carriers of a de novo mutation were predicted by the model to be noncarriers, as expected, because the model will only detect individuals carrying the shared SMA haplotype. Two noncarriers were predicted to be carriers, because they carried most, but not all, of the SMA haplotype (individuals 1 and 2, Figure 4b). Finally, one additional carrier was predicted to be a noncarrier, because only a very small portion of the SMA haplotype remained after a likely recombination event (individual 3, Figure 4b). Examination of the full 2066-SNP haplotypes of these three subjects (Figure 4c) shows that all three experienced a recombination event breaking up the SMA haplotype. As a carrier screening test, the minimal SMA haplotype model has a specificity of 99.86% and a sensitivity of 99.71%, which are well within the range of SMA carrier tests using other methods.1, 30, 31, 32 If we further require carriers to be IBS≥1 at the nearest single SNPs flanking the SMN1/SMN2 locus, individual 2 would have been correctly classified as a noncarrier, increasing the specificity to 99.92%, although the sensitivity of this model remains unchanged.

Figure 4
figure 4

Examples of haplotypes using 26 SNPs to predict carrier status in the ‘minimal haplotype’ model. (a) 26-SNP haplotypes of individuals correctly called as carriers or noncarriers by the model. (b) The 26-SNP haplotypes of the three individuals who were incorrectly called by the ‘minimal haplotype’ model. Two of the noncarriers (individual 1, 2) revealed a recombination event close to the SMA locus, but retained most of the haplotype (the region between 58–67 Mb). One individual (3) is a carrier, but retained only a small portion of the haplotype. Note that, if we required that all carriers be IBS≥1 at both of the SNPs immediately flanking the SMN1 locus, individual 2 would have been correctly called as a noncarrier, but individual 1 would still be called a carrier and individual 3 a noncarrier. (c) The full 2066-SNP haplotypes of the subjects shown in b. Locations of the 26 minimal haplotype SNPs are shown by dashed lines.

Discussion

In this study, we report evidence for a single founder origin for SMA in the S-leut and L-leut Hutterites and identify 26 SNPs that predict carrier status with high specificity and sensitivity. We further report a minimum SMA carrier frequency of one in eight in the South Dakota (S-leut) Hutterites, which is the highest ever reported for SMA and the highest carrier frequency reported to date for an AR mutation in the Hutterites of South Dakota. To determine how likely it is that a mutation present in a single Hutterite founder would reach a carrier frequency as high as 12.8%, we performed gene-dropping simulations in the South Dakota Hutterite pedigree. The results of 10 000 trials per founder indicated that the mean probability of a unique mutation originating in one of the 13 founders ancestral to all SMA carriers, reaching this frequency or higher was 0.06 (range: 0.013–0.103).

Earlier studies of SMA reported de novo mutations based on microsatellite haplotype analysis in families of affected individuals.12, 33, 34, 35 In contrast, we detected a de novo mutation event by contrasting phased-SNP haplotypes to SMN1 dosages determined by qPCR and examining the segregation pattern of the haplotype and mutation in additional close relatives. The segregation of the haplotype and SMN1 deletion in this family indicated that the event occurred during a paternal meiosis, as in most other de novo SMN1 deletions.12 The de novo mutation rate, μ, for SMA in the Hutterites is at least 8.4 × 10−4, which is similar to the μ=1.1 × 10−4 determined by Wirth et al.12

The expression of the SMA phenotype in the affected Hutterites is relatively mild and is consistent with the four SMN2 copy number (all Hutterite SMA cases are homozygous for the same shared haplotype (0 SMN1, 2 SMN2 per chromosome)). However, there was clinical variability observed, including one asymptomatic homozygous adult with four SMN2 copies out of 1415 subjects (7.1 × 10−4). In addition to SMN2 copy number, other modifying factors have been shown to influence the phenotypic variability of SMA. In fact, there are very rare families in which markedly different disease severity is present in affected siblings with the same SMN2 copy number. In one study, differences in splicing factor abundance allowed more full-length expression from the SMN2 gene and accounted for some of the variability observed between discordant sibs.26, 36 Another study reported families with unaffected SMN1-deleted females who had increased expression of the X chromosome gene plastin 3 (PLS3) compared with their SMA affected siblings.9 Plastin 3 is important for axonogenesis and, therefore, was proposed as a protective modifier. It is possible that differential expression of PLS3 is modifying the SMA phenotype in the Hutterites, possibly accounting for the relatively early age of onset in the two affected boys and the milder phenotypes among females (including an asymptomatic woman), all of whom are homozygous for the same haplotype carrying 0 SMN1 and 2 SMN2 genes. Regardless of whether PLS3 is having a modifying role in the Hutterites, the variable expression and penetrance of homozygosity for the SMN1 deletion, together with their remarkably similar environment, diet and lifestyle, support the presence of additional genetic modifiers of SMA segregating in the population.

The American College of Medical Genetics recommends32 carrier screening for SMA in the general population (carrier frequency of 1 in 35), whereas the American College of Obstetrics and Gynecology (ACOG) recommends screening only for those who request the test or who have a family history of the disease.37 The particular concerns of ACOG include the low frequency of SMA carriers in the general population and the lack of data on price and cost-effectiveness of a carrier screening program. They also noted that counseling of tested individuals is complicated by the fact that 3–5% of individuals have the ‘2+0’ genotype and will be incorrectly identified as noncarriers.1, 2, 38

Although the costs and benefits of offering SMA carrier screening to the general population are actively debated,2, 39 the case for offering screening to the Hutterite population is more straightforward for several reasons. First, the high carrier frequency (at least one in eight) would, by itself, justify population screening. Second, previous surveys of attitudes toward cystic fibrosis carrier screen in the Hutterites40 (unpublished results) indicate that the Hutterites value knowing their carrier status and that many would participate in a voluntary carrier screening program. Finally, the ‘2+0’ genotype appears to be absent in the Hutterites, reducing the potential for false negative results. Therefore, providing education and offering carrier screening is indicated and would likely be welcomed by members of this community.