Introduction

Inbreeding can lead to rare heritable illnesses conferred by homozygous recessive alleles. Reduced early survival of children from first-cousin marriages and similar observations in other organisms emphasize the presence of an increased number of homozygous deleterious alleles in the genome.1,2 Inbreeding is highly prevalent across the world, and differences in disease prevalence between populations can be partially attributed to extent of inbreeding.1,3

Conservative estimates of prevalence of consanguineous marriages (defined as a union between individuals related as second cousins or closer) range between 1 and 10% among 2,811 million people studied globally.3 Previous studies have seen a strong association between the extent of inbreeding and reproductive health, as well as childhood mortality and rare Mendelian disorders.1,2,4,5 However a multipopulation meta-analysis found only a moderate 1.1% increase in the infant death rate of 1.1% in the progeny of first cousins.6 Biological consequences of inbreeding are known to become worse with aging in nonhuman species.7,8 Studies examining the effects of inbreeding on late-onset complex diseases have found conflicting trends. Longer stretches of homozygosity have been observed in patients with breast, prostate, head and neck, and colorectal cancer,9,10 but these findings have not been consistently replicated.11,12,13,14 A recent study examining the concordance of Alzheimer disease (AD) raised the possibility that as much as 90% of early-onset cases with AD are probably the result of autosomal recessive inheritance.15 Multiple risk loci for AD16 were detected in a consanguineous Israeli–Arab community from WadiAra (Israel). Unaffected healthy controls were found to be more inbred than cases in the WadiAra population, suggesting higher frequency of protective alleles as a result of inbreeding.17 In an autopsy-confirmed AD data set comprising subjects from the Saguenay region of Quebec (Canada), subjects with late-onset AD and having at least one APOE ɛ4 allele were observed to have higher levels of inbreeding (equivalent to first-cousin genomic sharing) compared with healthy controls.18 Recent studies have found the presence of long runs of homozygosity (ROHs) in Caribbean Hispanic late-onset Alzheimer disease (LOAD) patients as compared with healthy controls.19 By contrast, two ROH studies of outbred Caucasian populations did not yield any significant associations with LOAD.20,21 In the current study, we estimated the level of inbreeding in Caribbean Hispanic families that are known to have higher rates of LOAD22 and investigated the association between inbreeding and risk of LOAD. Potential association between inbreeding and risk of LOAD may aid next-generation sequencing studies in mapping of disease-related genes.

Materials and Methods

Study population

Study participants were identified from two source populations of Caribbean Hispanic ancestry. The two-parent studies include the Washington Heights-Inwood Columbia Aging Project (WHICAP)23 and the Estudio Familiar Influencia Genetica en Alzheimer (EFIGA) family study.24 The WHICAP study is a longitudinal cohort study that has examined a multiethnic cohort of elderly individuals residing in northern Manhattan (New York, New York). We recruited Medicare recipients who were at least 65 years of age, were without dementia, and lived in three contiguous postal codes in northern New York City. The EFIGA study is a family-based study comprising Dominican families with multiple persons affected with LOAD and a case–control study that included unrelated patients with LOAD and similarly aged unaffected and unrelated controls. Study participants were recruited from multiple sources including clinics in the Dominican Republic, as well as the Alzheimer’s Disease Research Center Memory Disorders Clinic at Columbia University in New York City. To augment family recruitment, we advertised in local newspapers and media in the Dominican Republic and New York. In addition, we recruited probands from the WHICAP study when the informant reported family members with dementia. Families were recruited as follows: once probands were identified, structured family history interviews were conducted to determine whether siblings and more distant relatives were affected with dementia. When probands had AD and also had other family members with dementia, we interviewed and neurologically evaluated all siblings and more relatives. We assessed and corrected for cryptic relatedness by using genetic markers. The Caribbean Hispanic case–control study complements the EFIGA study in that the sampling frame was the same as that of the EFIGA study; however, recruitment was restricted to affected and unaffected persons who were unrelated and did not have family history of dementia. For these participants, we performed the same extensive medical, neurological, and neuropsychological evaluations at each visit. Clinical diagnoses were made in a consensus diagnostic conference by a panel of neurologists, neuropsychologists, and psychiatrists. The presence of LOAD was assessed based on the criteria established by the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association.25 Additional demographic and epidemiological information was available for all genotyped individuals ( Table 1 ). For estimating inbreeding in Caribbean Hispanics in this study, we selected age-matched unrelated cases and controls from the WHICAP and the EFIGA studies and one case per family from the EFIGA study.

Table 1 Demographic information

Genotyping

Previously, a HumanOmni-650Y SNP chip was used for genome-wide association study (GWAS) of 1,094 Caribbean Hispanics.26 This study consisted primarily of samples from whole blood, with 0.16% of samples from saliva. Blood samples were extracted using the Qiagen method; saliva samples were extracted using the Oragene method. Samples were genotyped in batches corresponding to 96-well plates. Each plate contained one or two HapMap controls, as well as an average of two study sample duplicates. The DNA samples were genotyped at the Center for Inherited Disease Research using the Illumina HumanOmni1-Quad Quad v1 0 H array (http://www.illumina.org) and using the calling algorithms GenomeStudio version 2011.1, Genotyping Module 1.9.4, and GenTrain version 1.0 (Illumina, San Diego, CA). The genome build is 37/hg19. We combined the previously published GWAS data with the samples genotyped on the Illumina HumanOmni1-Quad chip (Illumina) to create a large data set for estimating the degree of inbreeding in the Caribbean Hispanic population.

GWAS quality control

We used the quality assurance/quality control protocol described by Laurie et al.27 to ensure consistency of the data. We excluded from the analyses samples missing 2% of the single-nucleotide polymorphisms (SNPs) from the GWAS panel and SNPs with a genotype missingness rate of 5% or SNPs with a minor allele frequency <0.05.

Statistical analyses

We pruned the genome-wide SNP data based on linkage disequilibrium to retain 177,997 tagging SNPs from GWAS data at pairwise r2 < 0.3 using PLINK.28 We estimated the inbreeding coefficient in the sample set using the GCTA software.29 GCTA gives two estimates for the relationship between haplotypes within an individual: one based on the variance of additive genetic values (diagonal of the SNP-derived genetic relationship matrix) and the other based on SNP homozygosity (implemented in PLINK).28 Here we report the second measure of inbreeding (the two metrics on average gave similar results). In the context of inferring relatedness in GWAS with population structure, relatedness estimation methods that assume population homogeneity can give extremely biased estimates. We used the Relatedness Estimation in Admixed Populations (REAP) software30 to estimate the average inbreeding in the Caribbean Hispanic population by adjusting for their admixed ancestry. REAP takes as input the proportion of parental populations for each sample and estimates autosomal kinship coefficients and identity-by-descent sharing probabilities using SNP genotype data in samples with admixed ancestry. We estimated the proportion of ancestry from each parental population using the ADMIXTURE software31 by assuming that the admixture in the Caribbean Hispanics is conferred by two, three, four, and five parent populations. This software provided a maximum likelihood estimation of individual ancestries from multilocus SNP genotype data sets. We then used the estimates of parental population proportions and allele frequencies for each sample as input to the REAP software to compute admixture-adjusted inbreeding coefficients.

Results

The average inbreeding coefficient in the Caribbean Hispanics without accounting for admixture (computed using GCTA software) was 0.018 (±0.048), suggesting significant inbreeding; 1,372 (40.4%) of the 3,392 subjects had an inbreeding coefficient greater than 0.02 (Supplementary Figure S1 online). We computed the inbreeding coefficient accounting for admixture conferred to Caribbean Hispanics from two, three, four, and five parent populations (Supplementary Figure S2 online). Traditional methods assume homogeneity of population, which can significantly inflate inbreeding estimates. By using REAP, we adjust for subpopulation frequencies at sites to calculate admixture-adjusted inbreeding coefficients. It can be argued that Caribbean Hispanics are derived from Caucasian, African, Asian, and American Indian ancestries and the exact number of parental populations is unknown. Hence, we used three to five ancestral populations as input to adjust for admixture in Caribbean Hispanics (Supplementary Table S1 online). Adjusting for admixture, the average inbreeding coefficient decreased, ranging from 0.0034 (±0.019) for three parent populations to 0.002 (±0.018) for five parent populations. Of 3,392 samples, 329 (9.7%) were highly inbred, with an inbreeding coefficient F > 0.02, assuming admixture from three parent populations. Supplementary Figure S3 online shows the admixture in Caribbean Hispanics originating from Caucasian, African, and Asian ancestries. We used the admixture-adjusted inbreeding coefficient values for samples obtained from REAP software in subsequent analyses; 47.4% of the samples in the data set have predominantly Caucasian ancestry (Supplementary Figure S3 online). The mean inbreeding coefficient was highest among those of Caucasian ancestry (F = 0.0066; equivalent to mating by second cousins once removed), followed by those with African ancestry (F = 0.0014; Table 2 ). Supplementary Figure S4 online shows the overall distribution of the inbreeding coefficients in the samples by computed ancestry. Samples with African ancestry were less inbred than individuals having a significant proportion of the other two ancestries. Of those of Caucasian ancestry, 14% had a high inbreeding coefficient of >0.02 compared with 4.8% for individuals of Asian and 4.8% individuals of African ancestry.

Table 2 Inbreeding coefficient by ancestry of the samples

We then tested the association of the inbreeding with age LOAD status and age at onset of the disease using logistic and linear regression models whenever suitable ( Table 3 ). For each sample, we used the proportion of ancestry from the Asian and African parent populations (Caucasian ancestry was the reference value) as covariates. Age was weakly inversely correlated with the inbreeding coefficient but was not statistically significant.

Table 3 Association of inbreeding with age

Inbreeding was a significant predictor of LOAD adjusted for age, sex, and population covariates (P = 0.034; Table 3 ). The presence or absence of the APOE ɛ4 allele when used as a covariate in the model strengthens the association of the inbreeding coefficient with LOAD (P = 0.03). This could possibly imply that of the level of inbreeding is correlated with APOE ɛ4 status and the residual effect of inbreeding level on LOAD risk after adjusting for APOE genotype could be attributable to other recessive loci. To test the relationship of APOE with inbreeding and its association with AD, we regressed the inbreeding coefficient on APOE ɛ4 status in cases and controls separately. Inbreeding is associated with number of ɛ4 alleles in unaffected subjects (Supplementary Table S2 online) but not in affected subjects. Also, the direction of effect was in the opposite direction in cases versus controls. Alternatively, extent of inbreeding tended to increase the ɛ4 copies in cases but decreased ɛ4 copies in controls. This prompted us to test an interaction model between APOE and inbreeding with LOAD status. We found a significant association between the interaction term of inbreeding and APOE with LOAD status (P = 4.04 × 103).

A higher inbreeding coefficient was associated with an increased risk of LOAD, which is consistent with findings in other complex diseases, including coronary heart disease, stroke, cancer, depression, asthma, type 2 diabetes, and gout.32 Contrary to the study of other complex diseases, inbreeding does not significantly affect the age at onset of LOAD, but a tendency for lower age at onset was observed with increased inbreeding in the data set ( Tables 3 and 4 ).

Table 4 Association of inbreeding with LOAD status

Discussion

Our findings regarding the extent of inbreeding in Caribbean Hispanics were consistent with previous reports in this population.19 Accounting for admixture, we show that the true extent of inbreeding is less than second-cousin mating but greater than that of outbred populations in which consanguineous marriages occur at a low frequency. The range of inbreeding rates in Canada (Roman Catholics) is 0.00004–0.00007, in the United States (Roman Catholics) it is 0–0.0008, in Latin America it is 0–0.003, in southern Europe it is 0.001–0.002, and in Japan it is approximately 0.005.33 As compared with that of the Samaritans (F = 0.04), a 3,000-year-old genetic-isolate population comprising only 500 people, the observed inbreeding in Caribbean Hispanics is at an intermediate level.34 Despite the higher risk that inbreeding might confer in complex late-onset traits,7,8 it has not been well-studied in LOAD. In this study we demonstrate a statistically significant association of the extent of inbreeding with AD risk. This is consistent with the hypothesis that a significant proportion of risk in complex diseases such as LOAD could be mediated through multiple causal recessive loci resulting from increased homozygosity in inbred subjects. This is also consistent with our previous finding of larger ROH and higher numbers of ROH in LOAD patients (n = 559) versus controls (n = 554)19 from the same population. In the Caribbean Hispanic population, the authors detected an association between LOAD and a larger genome-wide mean ROH size (P = 0.0039), which was stronger with familial LOAD (P = 0.0005); however, studies of Caucasian data sets have not reported an increased burden of ROH in AD.20,35 A likely explanation is that the significant inbreeding in the Caribbean Hispanics as detailed in this report increases the likelihood of recessive alleles in affected subjects resulting in longer ROH and higher numbers of ROH in the genome. Interestingly, the authors also noted that total ROH size was twice as long in the European Hispanic subset versus the African Hispanic subset, which is corroborated by our observation of a higher inbreeding coefficient in the Caucasian subset of the data ( Table 2 ). The higher level of inbreeding in the Caucasians is likely to render larger regions in the genome homozygous as compared with those of the African subset of this population.

The high degree of inbreeding and presence of long ROH combined with higher frequency of AD in the Caribbean Hispanic population compared with Caucasians22 suggests that there may be one or more recessive loci mediating AD risk in this population. Low-frequency mutations are hypothesized to confer greater risk for disease than common variants by collectively accounting for substantial fractions of common disease heritability.36 Inbred populations with few founders, such as the Caribbean Hispanics, share large chromosomal segments recurring among relatives, and otherwise rare alleles can be observed repeatedly in multiple individuals. This reduces false-positive findings due to sequencing errors that can be difficult to identify in isolated cases from outbred populations.

Disclosure

The authors declare no conflict of interest.