Alzheimer’s disease (AD) is a serious brain disorder that appears through memory loss and cognitive impairments. Age is the most prominent risk factor, and the number of affected individuals rises dramatically with an aging population. There are two types of AD: type I—with late onset (65 years and later), and type II—with early onset (before 65). The last one has an autosome-dominant character of inheritance and such cases make <5% of all cases of AD.

Initially four genes have been definitively implicated in the etiology of AD. Mutations of the genes encoding β-amyloid precursor protein (APP) and presenilin 1 and 2 (PSEN1, PSEN2) cause a rare, type II form of the disease,1,2 though some studies reported mutations in these genes in families with late-onset AD. APOE gene (allele 4) has been established as a susceptibility gene of the type I form of AD.3,4

Multiple genome-wide association studies (GWAS) suggested that mutations in genes such as ARSB, CAND1, GRN, MAPT, MICA, PLAU, PICALM and many others may also affect AD development.5 Currently the AD database includes >500 genes that could be associated with the risk of AD development;6 however, for almost half of them the data are contradictive and come to different conclusions.7

In this study we tried to reveal the genetic variation that caused AD in a Russian family by analyzing 249 genes that demonstrated positive association with AD according to the AlzGene database.6 Russian family N from Oryol region is characterized by appearance of AD cases in at least three generations. It is known that a woman from the second generation of this family, as well as her mother, suffered from a severe form of dementia and died at the age of 72. One of her sons and daughter also died from AD at the age of 79 and 78, respectively; two other sons were diagnosed with mild cognitive impairment (MCI). None of the fourth-generation family members have AD or MCI yet (Figure 1 and Supplementary Table S1).

Figure 1
figure 1

Genealogy tree of the studied family. Rectangles indicate male individuals, circles indicate female individuals. Symbols representing the patients diagnosed with AD are filled with black, symbols representing the patients diagnosed with MCI are filled with gray. The code names are shown for the patients analyzed in the study.

Two hundred and forty-nine Alzheimer-associated genes were chosen for the analysis (Supplementary Information). Capturing was performed using the SureSelect approach involving the coding regions of the genes surrounded by 100-bp margins, 5′ promoter regions (up to 1,500 bp upstream of the transcription start site) and 3′ noncoding regions (up to 1,500 bp downstream of the mRNA polyadenylation site). Synthesis of biotinylated RNA oligos complementary to selected regions of human genome was ordered from Agilent (Agilent Technologies, Santa Clara, CA, USA).

SureSelect Target Enrichment Cupturing was performed according to the standard protocol. All seven samples were sequenced with SOLiD 4 platform and four samples (members of the third generation) were sequenced with SOLiD 4 and Illumina GAIIx platforms. Fifty-base pair reads were generated on both platforms.

The sequencing data generated by GAIIx were mapped to NCBI37 reference human genome (the version used for 1000 Genomes project) with BWA aligner. The sequencing data originating from SOLiD platform were mapped to the same reference genome with Bioscope software.

For SNV- and indel-calling, we performed a standard ‘best-practices’ GATK approach as described in Van der Auwera et al.8 Briefly, we first marked the duplicates potentially originating from the same DNA fragment due to PCR amplification, that is, the reads that appeared to have the same start and end coordinates after mapping. We next performed indel realignment aimed to improve the original mapping by BWA/Bioscope. Both SNVs and indels were called simultaneously by the GATK HaplotypeCaller tool for all samples. The quality of obtained SNVs was assessed by GATK quality recalibration procedure based on the reference databases of confirmed SNVs and control sets of all potential variants. Only the variants that successfully passed the filter were taken for further analysis.

We proved that none of the patients had polymorphisms in APOE, PSEN1, PSEN2 and APP genes, which are known to be the largest risk factors for AD.

In total, 14,819 SNVs and indels passed the GATK quality control procedure and had non-reference genotype in at least one sample according to at least one sequencing platform. As all the diseased patients from the third generation (P3A, P3B, P3C, P3D) were genotyped by both Illumina and SOLiD platforms, only the SNPs having the same genotype according to both sequencing technologies in every patient were considered. Among those we considered 1,403 variants that had non-reference genotype in all the samples diagnosed with AD (P3A, P3B, P3C, P3D, all four representing the third generation of the family). The variants were annotated based on their localization with respect to genes and their influence on protein product (sense, nonsense, missense) with ANNOVAR software.9 We considered only nonsynonymous variants affecting the sequence of the protein and discarded variants in introns, splice sites and promoters that might also be of interest. For the variants found in dbSNP we assigned appropriate rs numbers. The variants that had been previously discovered in 1000 Genomes project were annotated with their frequencies in the European population. The data on clinical significance of SNVs were taken from clinvar and GWAS databases.10,11

Among the 1,403 variants, 167 variants with relatively low (<0.2) frequency in the European population were considered. We next filtered 11 nonsynonymous variants. We did not consider the variants appearing in the genes often harboring deleterious mutations in healthy individuals, like HLA genes. The variants known to be associated with clinical phenotype were considered to be of high interest. The variants of interest were validated with Sanger sequencing. The results of the described annotations are shown in Table 1.

Table 1 SNVs after quality filtration having low frequencies (<0.2) in European population causing nonsynonymous changes in protein products

We found a rare SNP in the hemochromatosis (HFE) gene in all diseased patients. The variant is known to be associated with hemochromatosis according to clinvar database. The distribution of genotypes within the studied family corresponded to the AD status of all the patients: particularly, P4B and P4C were diagnosed as healthy and lacked the variant allele. The latest investigations emphasize that mutations in transferrin (TF) and HFE genes involved in iron metabolism may enhance AD development.12,13 It has been reported that mutations in these genes can be associated with iron accumulation in specific brain areas of patients with AD.14 HFE and TF mutations were shown to interact genetically, increasing the risk of AD in patients having both mutations.15

Here we report a case of AD with HFE gene mutated while TF or APOE genes were not affected. The genotypes were checked in the NGS data (HFE and APOE) and validated with Sanger sequencing (HFE, APOE and TF). It was shown recently that HFE C282Y (rs1800562) mutation on its own does not increase the chances of AD: the fraction of AD-affected individuals does not differ significantly between HFE C282Y-positive and wild-type cohorts.15 As HFE mutations were shown to cause AD cooperatively with mutations in TF or APOE, our data suggest that the family might have a different affected gene that increases the risk of AD. The variant might affect a gene out of the scope of our targeted sequencing or appear in a non-coding DNA locus (splice site or promoter). Thus, broader exome sequencing might reveal a variant causing AD cooperatively with the mutation in HFE. As TF is involved in iron transfer, other genes participating in iron transfer, particularly, TF receptor (TFRC) gene, are the most interesting candidates for further sequencing.