Abstract
Linkage studies of complex genetic diseases have been largely replaced by genome-wide association studies, due in part to limited success in complex trait discovery. However, recent interest in rare and low-frequency variants motivates re-examination of family-based methods. In this study, we investigated the performance of two-point linkage analysis for over 1.6 million single-nucleotide polymorphisms (SNPs) combined with single variant association analysis to identify high impact variants, which are both strongly linked and associated with cardiometabolic traits in up to 1414 Hispanics from the Insulin Resistance Atherosclerosis Family Study (IRASFS). Evaluation of all 50 phenotypes yielded 83 557 000 LOD (logarithm of the odds) scores, with 9214 LOD scores ⩾3.0, 845 ⩾4.0 and 89 ⩾5.0, with a maximal LOD score of 6.49 (rs12956744 in the LAMA1 gene for tumor necrosis factor-α (TNFα) receptor 2). Twenty-seven variants were associated with P<0.005 as well as having an LOD score >4, including variants in the NFIB gene under a linkage peak with TNFα receptor 2 levels on chromosome 9. Linkage regions of interest included a broad peak (31 Mb) on chromosome 1q with acute insulin response (max LOD=5.37). This region was previously documented with type 2 diabetes in family-based studies, providing support for the validity of these results. Overall, we have demonstrated the utility of two-point linkage and association in comprehensive genome-wide array-based SNP genotypes.
Similar content being viewed by others
Introduction
Family-based linkage analysis has largely been supplanted by genome-wide association studies, often using unrelated samples, following the limited success of linkage when applied to complex traits. Family-based analyses, however, have inherent strengths, which complement other approaches for identification of contributors to complex phenotypes.1, 2 Such analyses may be especially applicable to identifying low-frequency (minor allele frequency (MAF) 0.01–0.05) to rare (MAF<0.01) alleles with high impact.3, 4, 5, 6, 7, 8 We have implemented approaches in parallel, which use simple two-point linkage analysis and conventional association analysis to search for genetic variants with meaningful contributions to phenotypic variance of traits. Two-point linkage analysis considers each variant independently, unlike multipoint analysis, which integrates the information from multiple variants simultaneously. Therefore, two-point linkage does not have the same issues with inflation because of linkage disequilibrium (LD) between markers and can be used to test putatively impactful variants for linkage directly. The combined two-point linkage and association approach has the advantage of being able to directly align single-nucleotide polymorphism (SNP) results for the two analyses, pinpointing variants that show evidence of both linkage and association at the single SNP level. In prior studies, this has been applied to exome chip data, thus focusing on coding variants9 and characteristics of a functional SNP.10
Evaluation of association in the context of linkage has an extensive history,11, 12, 13 with association typically used to determine whether genetic variants residing under the linkage peak explain the observed signal. We have observed that instances of strong linkage and association together at a single locus (e.g. APOE with ApoB levels, CETP (cholesterol ester transfer protein) with high-density lipoprotein (HDL) levels, ADIPOQ with adiponectin levels)9, 10 represent variants or loci that have a striking impact on phenotype, reflected as explanation of a high proportion of the variance of the trait (3–60%). We have also observed this across a range of minor allele frequencies (1–45%), indicating that this approach can be informative for a full range of genetic variation. Other groups have used combined metrics of linkage and association to identify variants with large impact;11 however, that is a project currently undergoing evaluation separate from these analyses.
Here we have investigated the performance of these approaches in a contemporary genetic data set consisting of comprehensive genome-wide and exome chip data encompassing 1.6 million SNPs in 90 Hispanic families from the Insulin Resistance Atherosclerosis Family Study (IRASFS). Based on our prior work and recent evidence for the existence of high impact noncoding variants,14 we hypothesize this family-based method is applicable to the search for such variants.
Materials and methods
Samples and phenotype data
The samples used in this study are from the Hispanic cohorts of the IRASFS.15 Briefly, subjects were ascertained on the basis of large family size in San Luis Valley, Colorado and San Antonio, Texas. The sample consisted of 1425 individuals from 90 families, who were extensively phenotyped, including a frequently sampled intravenous glucose test, measures of blood lipids and inflammatory markers, anthropomorphic measures, as well as fat deposition measures by computed tomography and dual X-ray absorptiometry scans. Institutional Review Board approval was obtained at all clinical and analysis sites, and all participants provided informed consent.
Genotype data
SNP genotype data from three genotyping chips were used. Illumina OmniExpress and Illumina Omni1S chips were genotyped as part of the Genetics Underlying Diabetes in Hispanics (GUARDIAN) Consortium (N=1034 and 1038, respectively),16 and the Illumina HumanExome Beadchip was genotyped on a larger subset (N=1414)9 of the full IRASFS Hispanic cohorts. Genotyping of the Illumina HumanExome BeadChip v.1.0 (N=552) and v.1.1 (N=862) was performed at the Wake Forest Center for Genomics and Personalized Medicine Research, whereas the Illumina HumanOmniExpress BeadChip and Illumina Omni1S BeadChip were genotyped at the core genotyping laboratory at Cedars-Sinai Medical Center (Los Angeles, CA, USA). All genotypes were called separately by genotyping array using GenomeStudio (Illumina, San Diego, CA, USA). Sample and autosomal SNP call rates were ⩾0.98 (>0.99 SNP call rates for the OmniExpress and Omni1S chips), and Exome Chip SNPs with poor cluster separation (<0.35) were excluded. All data sets independently underwent Mendelian error checking using PedCheck17 to detect genotypes discordant in families for Mendelian inheritance, with resolution by removing all inconsistent genotypes. The total number of unique SNPs available for analysis following quality control was as follows: 81 559 from the Exome Chip, 668 758 from OmniExpress and 920 823 from the Omni1S chip, for a total of 1 671 140 SNPs.
Imputation to the 1000 Genomes integrated reference panel (version 2) was performed using genotypes and samples from the OmniExpress data set (N=634K genotypes and 1034 individuals) using SHAPEIT18 for phasing and IMPUTE219 for imputation.
Analyses
SNPs were evaluated for both two-point family-based linkage and single SNP association using Sequential Oligogenic Linkage Analysis Routines (SOLAR)20 separately by genotyping platform. Both analyses used age, sex, body mass index (BMI) and study center as covariates. All phenotypes evaluated were transformed to approximate normality of the residuals if necessary (Supplementary Table 1). Additionally, because of the high impact of a low-frequency variant known to influence adiponectin levels in this population,3, 10 presence of the variant encoding the G45R missense mutation in ADIPOQ (rs200573126) was included as a covariate for analyses involving adiponectin. Visceral adipose tissue (VAT) area, visceral-to-subcutaneous tissue ratio (VSR), waist circumference and waist-to-hip ratio were run both with and without BMI as a covariate. However, subcutaneous adipose tissue area, percent body fat and body adiposity index were not adjusted for BMI. All association analyses included three admixture proportions as covariates. Existing admixture proportion estimates were available from previously genotyped exome chip data; estimates were computed by maximum-likelihood estimation of individual ancestries in ADMIXTURE21 assuming five ancestral populations (K=5) from exome chip-wide SNP data after pruning for LD to produce admixture estimates for the greatest number of samples. Of the five variables considered, three variables were selected as representing the variation in these Hispanic samples, as inclusion of additional postulated ancestral populations began isolating individual pedigrees.
For validation of performance, genotypes imputed to the 1000 Genomes panel were also evaluated for linkage (and association) in two regions, which were selected for their linkage regions as well as being phenotypically of particular interest to our group: chromosome 1 for acute insulin response (AIR) to glucose and chromosome 7 for insulin sensitivity index (SI). Best guess genotypes from the imputed data were used in the linkage analysis because methods that account for imputation uncertainty have not been developed for linkage. These analyses used the same covariates as previously mentioned.
Results
The aim of this analysis was to test the utility of carrying out a combined linkage and association analysis in a contemporary data set made up of genome-wide association studies (GWAS) (Illumina OmniExpress and Omni1S) and exome chip data encompassing over 1.6 million SNPs. The combined performance was evaluated for a total of 50 quantitative traits from 7 phenotypic groups: glucose homeostasis, adiposity, lipids, biomarkers, hypertension, liver enzymes and liver fat, in 90 families from the IRASFS with an average family size of 15.4 individuals. Overall, 83 557 000 LOD (logarithm of the odds) scores and association P-values were calculated across the three genotyping sets.
Characteristics of the samples and genotyping are summarized in Table 1. The sample consisted of 1418 individuals from 90 families. Specifically, for the smallest genotyped sample (OmniExpress), sample sizes ranged from 786 (percent body fat) to 1034 (AIR), although larger sample sizes were available for SNPs present on the exome chip (up to 1256 for fibrinogen and albumin/creatinine ratio). Across all phenotypes, there were 9214 LOD scores ⩾3, 845 ⩾4 and 89 ⩾5. Of the variants with LOD scores ⩾5.0, 27 were linked to tumor necrosis factor-α (TNFα) receptor 2 levels, 13 to HDL levels, 24 to AIR, 13 to G45R-adjusted adiponectin levels and 3 to BMI-adjusted VSR. While a detailed summary of each trait analysis is impractical, following on our earlier observations,9, 10 we have initially focused on the patterns visible in linkage analysis followed by relating these results to association analysis results. In this report, we evaluated linkage and association with 50 cardiometabolic phenotypes (see Supplementary Table 1 for complete listing). Selected phenotypes, namely TNFα receptor 2 levels, HDL levels, AIR, adiponectin levels (adjusted for G45R, a high impact mutation identified previously in these samples3, 10) and VSR are summarized in Table 1. Overall, 12 phenotypes (from four phenotype groups: glucose homeostasis, lipids, adiposity and biomarkers) were represented in this category of LOD scores >5.0 (results are summarized in Table 2), where highest LOD scores are grouped by phenotype and chromosome. A complete summary of LOD scores >5 is presented in Supplementary Table 2.
Evaluation of loci with high LOD scores
The overall maximal LOD score of 6.49 was observed with rs12956744 with the biomarker TNFα receptor 2 levels (Table 3 and Figure 1a). This SNP is located in intron 1 (nearer the 5′ end) of LAMA1 (laminin subunit alpha-1 gene) on chromosome 18. Of note, three additional intronic variants in LAMA1 were also linked to TNFα receptor 2 levels with LOD>6, and nine SNPs overall were linked with LOD>3 (Table 3). Notably, one SNP (rs28569884) was also associated with TNFα receptor 2 levels (P-value=5.9 × 10−4; LOD=1.06). The variant rs28569884 (in intron 56) is distal to the striking linkage signal (146 kb apart), although there was another LOD score over 4 (rs4395154; LOD=4.47) just 13 kb away at the 3′ end of the LAMA1 gene (intron 62). LAMA1 is a very large gene, with 63 exons and 245 SNPs analyzed. Of these, 11 (4.4%) had nominally significant association (P-value<0.05) with TNFα receptor 2 levels. Comparatively, 9 variants had LOD scores >3 (3.7%) and 23 variants had LOD >1 (9.4%).
A major focus of our laboratory is identifying genetic contributors to metabolic measures of glucose homeostasis. The top linkage result of LOD=6.47 (Table 4) for AIR was rs28479408, an intronic variant located in SYCP2L (synaptonemal complex protein 2-like gene) on chromosome 6 (Figure 1b). Although this variant was not associated with AIR (P-value=0.71), six other SNPs in this gene were also linked (rs4713044, LOD=6.10; rs12190237, LOD=5.58; rs12214063, LOD=3.58; rs1767771, LOD=3.42; rs2153159, LOD=3.31; rs1632103, LOD=3.15) but not associated (P-values >0.5) (Table 4).
Strikingly, chromosome 1 had a broad linkage peak for AIR, with a maximal LOD score of 6.37 (rs2252384) in the region between FAM163A and TOR1AIP2 (located at ~179 Mb; 1q25.2; Figure 1b and Table 5). Chromosome 1 has a long history of linkage to diabetes, making this result all the more interesting.22, 23, 24, 25 Here, variants with LOD scores >3 spanned much of the proximal q arm of the chromosome, with the most concentrated linkage peak residing between 156 and 187 Mb, a region encompassing 357 RefSeq genes (1q22-31.1). Focusing on the peak LOD-1 substantially narrowed the region to a very narrow 1.57 Mb. Of the 343 variants within this region with LOD scores >3, 73 of them had P-values <0.05, with a best association signal occurring at rs6426957 (Chr1: 165 988 336; P-value=6.34 × 10−4, LOD=3.09, MAF=0.441; Supplementary Table 3). Notably, many variants within RASAL2 (RAS protein activator like 2 gene) showed nominal evidence of association (0.05>P-value>1.42 × 10−3) in addition to linkage (N=45 of 46 linked (LOD>3) SNPs; Tables 5 and 6). LOD scores at this gene ranged from 3.00 to 5.38.
Additional linkage results of interest include regions on chromosomes 7 and 12, which were linked to insulin sensitivity index (SI). Although these regions did not reach the magnitude seen for TNFα receptor 2 and AIR, the consistency of linkage in the region is compelling. On chromosome 7, the highest LOD score (5.11) was seen with rs1024591, an intergenic SNP over 300 kb from the nearest gene (a long intergenic noncoding RNA, LINC01372) (Supplementary Table 4). The linkage signal on chromosome 12 is made up of two distinct peaks (Figure 1c), one at ~53 Mb and the second at ~105 Mb (Supplementary Table 5). The LOD scores seen here are not as striking by magnitude (max LOD for each peak 4.27–4.28), but the consistency of LOD scores >3 into tight peaks is notable (Supplementary Table 5). The first peak consists of 14 variants with LOD scores >3, from 50.6 to 54.5 Mb, with multiple variants in the KRT8 (keratin 8 gene) and ESPL1 (extra spindle pole bodies like 1, separase) showing evidence for linkage, as well as single variants at the proximal end of the peak in LIMA1 (LIM domain and actin binding 1 gene), DIP2B (disco-interacting protein 2 homolog B gene) and SLC4A8 (solute carrier family 4, sodium bicarbonate cotransporter, member 8 gene). There was no evidence for association among linked variants at this linkage peak, although other, unlinked variants in the region showed nominal association (Supplementary Table 5).
The second linkage peak resides from 101 to 109 Mb on chromosome 12, and included 21 linked variants, which represented multiple signals from CHST11 (carbohydrate (chondroitin 4) sulfotransferase 11 gene), ACACB (acetyl-CoA carboxylase beta gene) and FOXN4 (forkhead box N4 gene), in addition to intergenic variants and genes implicated by a single variant, such as CMKLR1 (chemerin chemokine-like receptor 1 gene) (Supplementary Table 5). One of these linked variants showed nominal evidence of association, with a P-value of 5.50 × 10−3 (rs11114094 in SVOP (SV2-related protein gene); Table 6 and Supplementary Tables 3 and 5), although like the prior peak, other unlinked variants in the linkage region also demonstrated evidence of association.
Variants with evidence of both linkage and association
Using the linkage results as a search tool and prioritizing those with any evidence of association identified 1076 variants with P-values <0.05 as well as a LOD score ⩾3 (Supplementary Table 3). Twenty-seven variants were associated with P<0.005, as well as having a LOD score>4 (Table 6). NFIB was the primary gene implicated under a linkage peak with TNFα receptor 2 levels on chromosome 9, where there was also evidence of nominal association (P-values on the order of 2 × 10−4; Figure 1a and Supplementary Table 6). NFIB, which encodes nuclear factor I/B, is represented by 293 SNPs (135 from OmniExpress, 157 from Omni1S, 1 from exome chip), 289 of which were located in introns. Only one coding variant in this gene was polymorphic from the exome chip data set, this SNP (rs114558598; I24F) was not linked (LOD=−0.005) or associated (P-value=0.08). Ten common variants (0.27<MAF>0.49) within this gene (all intronic) had LOD scores >3. Overall, 68 NFIB variants had LOD scores >1, and 24 had LOD scores >2.
LPHN3 on chromosome 4 was a strong signal for LDL levels, with two intronic variants being both linked and associated (rs2343249; LOD=4.30; P-value=1.00 × 10−5 and rs9312078, LOD=3.02; P-value=8.20 × 10−5; Table 7 and Figure 1d). Both the linkage and association signals were confined to the gene region, with strong LD (r2>0.8) between the two top SNPs. There was further support throughout the gene-encoding region for both modest linkage and association with diminishing LD (Supplementary Figure 1). The strongest association result among LOD scores ⩾3 was with fibrinogen levels; rs1131878 from the OmniExpress chip, LOD=3.08 and P-value=1.99 × 10−6 (Supplementary Table 3). This SNP was located within the UGT2B4 gene, which encodes UDP glucuronosyltransferase 2 family polypeptide B4.
Discussion
This study evaluated the utility of combining two-point linkage with association analysis in a data set comprised of array-based SNP genotyping totaling 1.6 million noncoding and coding variants in a family-based sample of Hispanics with extensive phenotype information. The aim of the study was to evaluate whether GWAS data in the context of linkage adds insight into the genetic origins of cardiometabolic traits, while using association analysis as a follow-up to determine likely candidate loci. This builds upon our prior evaluation of combined linkage and association using exome chip data in this cohort.9 Large-scale linkage analysis of SNP genotyping has been uncommon for complex phenotypes recently. To this end, we evaluated 50 phenotypes (46 distinct traits) related to glucose homeostasis, lipids, blood pressure, adiposity, liver fat and enzymes, and biomarkers. Given the breadth of genotypic data and the number of phenotypes, the results are extensive, but some noteworthy observations can be made. Broadly speaking, we believe the markedly denser genotypic data set reveals many insights into the genetic bases of the traits such as TNFα receptor 2, AIR and SI when compared with our prior study using the more limited data from the exome chip.
Relatively dense genotyping data provides visual evidence of linkage similar to conventional multipoint methods. In addition, while exome chip analysis primarily targets models where functional variants are exonic, the GWAS data sets can potentially address other models such as high impact noncoding variants, especially through linkage analysis. Here we have observed few examples where evidence for both linkage and association are apparent. An example is LPHN3 (Table 7 and Supplementary Figure 1), where LOD scores reached 4.30 with a P-value of 1.00 × 10−5, suggesting a true impact on LDL levels. Given the actual low density of coverage in GWAS data sets, which are designed to cover genomic regions through LD relationships, it is unlikely to capture truly causal variants by chance. The ultimate test of whether this approach will be successful will require whole-genome sequencing data. Overall, these results incorporating two-point linkage and association analyses can identify meaningful signals that impact cardiometabolic traits, often in the absence of striking association alone. These conclusions are consistent with our prior work9, 10 in which we have shown that linkage evidence can be relatively strong, but association evidence only appears when the functional variant is also captured. The latter is unlikely in a GWAS data set. For these reasons, our main focus was on regions with evidence of linkage based on both the power of linkage methods and the 'far-sighted' ability of linkage to identify genetic relationships.4, 5, 6, 7, 9, 10
As noted above, several genomic regions had relatively strong evidence of linkage, but limited association results. Based on our logic, this would suggest the possibility of underlying, as yet unidentified functional variants. Thus, for the strongest linkage with TNF2α receptor levels (LOD=6.49), we would hypothesize that one or more high impact noncoding variants lie within the linkage region. LAMA1 is similar to LAMA5, which has previously been related to TNFRSF1B expression,26 making it plausible for LAMA1 to be related to TNF2α receptor levels.
Analysis of traits of interest to our laboratory (AIR, SI) also resulted in notable linkage peaks. It is tempting to scan these linked regions for biologically relevant genes. Genes located under a broad AIR linkage region on chromosome 1 (Figure 1b and Table 5) included FAM163A, also known as neuroblastoma-derived secretory protein (NDSP), TOR1AIP2 and RASAL2. FAM163A (aka NDSP) has been associated in methylation analysis for borderline personality disorder27 with overexpression observed in neuroblastoma.28, 29 TOR1AIP2 encodes torsin A-interacting protein 2, which is involved in the nuclear envelope.30, 31 Mutations in TOR1AIP1 have been shown to cause muscular dystrophy.32 RASAL2 (RAS protein activator like 2) has been implicated as an obesity susceptibility gene in both Chinese33 and Mexican populations,34 as well as having a role in the susceptibility of many cancers, including liver,35 thyroid,36 ovarian,37 breast37, 38 and lung.39
Genes under the SI linkage peaks also included interesting candidates. On chromosome 12, the most relevant gene with linkage in the distal linkage peak was CMKLR1 (chemerin chemokine-like receptor 1), which is believed to have a role in glucose homeostasis,40, 41, 42 obesity41, 43, 44 and diabetes development.45 Of note, a strong association signal (P-value=1 × 10−7) was also seen within this linkage peak in WSCD2 (WSC domain containing 2; 100 Mb from CMKLR1) (Figure 1c).
Additional genes included LIMA1 (LIM domain and actin binding 1, also known as EPLIN and SREPB3), a tumor suppressor; DIP2B (disco-interacting protein 2 homolog B), replicated as a susceptibility locus for colorectal cancer;46 and SLC4A8, a sodium bicarbonate transporter, which may have a role in regulation of blood pressure with some variants in this gene having been previously implicated.47, 48 Further, KRT8 (keratin 8, type II), which is overexpressed in human liver disease, resides under the linkage peak on 12q.49 The linkage region on chromosome 7 contained only one putative gene, LOC102723427, about which there is no known information.
The most intriguing signal lies in LPHN3 and was both linked and associated with LDL levels at two separate variants. This gene encodes latrophilin 3 (recently renamed as ADGRL3;50 adhesion G-protein-coupled receptor L3), which is related to latrotoxin, the toxin produced by the black widow spider.51 There is evidence suggesting a role for latrophilin 3 (among other latrophilins) in binding to fibronectin leucine-rich transmembrane (FLRT) family members, which has been shown to promote the development of glutamatergic synapses.52, 53 Additionally, genetic variants in LPHN3 have been associated reproducibly with attention deficit hyperactivity disorder and other psychiatric conditions.54, 55, 56 LPHN3 is also being investigated as a pharmacogenetic target.57 Despite the lack of biological evidence directly supporting the link between LPHN3 variants and LDL cholesterol levels, cholesterol is crucially important in the brain, and further study may elucidate a mechanism by which genetic variants in LPHN3 impact plasma LDL levels.
We previously reported CETP linkage and association with HDL levels in exome chip data from this Hispanic sample.9 Linkage of CETP in this data set was stronger with LOD scores of up to 5.43, an increase of 1.14 over the previous top signal (Table 6 and Supplementary Table 2). The addition of GWAS data implicated additional linked variants (LOD>5, N=4) proximal to the coding region, perhaps occluding interpretation of the functional impact of this linkage result.
Here we assessed the impact of SNP density to provide insight into linkage relationships with the conclusion that dense SNP maps do reveal additional insight. We have extended this query further by evaluation of imputed genotype data in regions of particular interest because of evidence of strong linkage with glucose homeostasis-related phenotypes. Three regions were selected based on substantial linkage evidence and a particular interest in glucose homeostasis: chromosome 1 with AIR and chromosomes 7 and 12 with SI. Utilization of imputed data increases the number of markers capturing the region by 22-fold (18 411 directly genotyped markers, 406 K imputed markers). The maximal LOD score from the imputed AIR region was 6.45 at rs2252384 (the same SNP implicated in the directly genotyped data; Supplementary Figure 2). The slight increase in LOD score (6.37–6.45) can likely be attributed to more complete information following imputation of missing genotypes. For chromosome 7 with SI, a new best SNP rs2530421 had the maximum LOD score of 5.53 (compared with the prior best LOD of 5.11 at rs1024591). The imputed best SNP lies very near the original peak linkage, providing little additional guidance in refining the causal variant(s), given the high degree of correlation between the top-linked SNPs (r2=0.937). Evaluation of another linked region (chromosome 12 with SI) also showed some limited improvement in linkage signals, but linkage signals were only modestly increased, as could be expected because of the information carried by these imputed markers being wholly derived from the genotyped markers, which had already been informative. Thus, inclusion of imputed genotypes marginally improved the maximal LOD scores when evaluated in this small number of examples. However, the improvements did not further refine the regions of interest (Supplementary Figure 2).
In conclusion, we have built upon our previous analysis of combined two-point linkage and association9 and evaluated utility of the approach in a data set comprised of comprehensive genome-wide array-based SNP genotypes. As seen previously, there were few examples in these data where linkage and association both provided striking evidence at the same locus, which, based on our prior analysis,10 would implicate a likely ungentoyped causal variant. However, the GWAS plus exome chip design identified multiple additional regions of linkage, which were not seen in exome chip analysis alone. Positive, strong evidence of association with SNPs was not observed, suggesting that functional variants, if they are indeed captured by the linkage signal, have not been identified. To truly test the broad utility of this approach, whole-genome sequencing data will be necessary, which will incorporate the full spectrum of variant frequencies.
References
Ott, J., Kamatani, Y. & Lathrop, M. Family-based designs for genome-wide association studies. Nat. Rev. Genet. 12, 465–474 (2011).
Speed, D. & Balding, D. J. Relatedness in the post-genomic era: is it still useful? Nat. Rev. Genet. 16, 33–44 (2015).
Bowden, D. W., An, S. S., Palmer, N. D., Brown, W. M., Norris, J. M., Haffner, S. M. et al. Molecular basis of a linkage peak: exome sequencing and family-based analysis identify a rare genetic variant in the ADIPOQ gene in the IRAS Family Study. Hum. Mol. Genet. 19, 4112–4120 (2010).
Bowden, D. W. Will family studies return to prominence in human genetics and genomics? Rare variants and linkage analysis of complex traits. Genes Genomics 33, 1–8 (2011).
Wang, Q., Lu, Q. & Zhao, H. A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing. Front. Genet. 6, 149 (2015).
Teare, M. D. & Santibanez Koref, M. F. Linkage analysis and the study of Mendelian disease in the era of whole exome and genome sequencing. Brief. Funct. Genomics 13, 378–383 (2014).
Ott, J., Wang, J. & Leal, S. M. Genetic linkage analysis in the age of whole-genome sequencing. Nat. Rev. Genet. 16, 275–284 (2015).
Saad, M. & Wijsman, E. M. Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes. Genet. Epidemiol. 38, 1–9 (2014).
Hellwege, J. N., Palmer, N. D., Raffield, L. M., Ng, M. C., Hawkins, G. A., Long, J. et al. Genome-wide family-based linkage analysis of exome chip variants and cardiometabolic risk. Genet. Epidemiol. 38, 345–352 (2014).
Hellwege, J. N., Palmer, N. D., Brown, W. M., Ziegler, J. T., Sandy An, S., Guo, X. et al. Empirical characteristics of family-based linkage to a complex trait: the ADIPOQ region and adiponectin levels. Hum. Genet. 134, 203–213 (2015).
Voruganti, V. S., Kent, J. W. Jr, Debnath, S., Cole, S. A., Haack, K., Göring, H. H. et al. Genome-wide association analysis confirms and extends the association of SLC2A9 with serum uric acid levels to Mexican Americans. Front. Genet. 4, 279 (2013).
Nemesure, B. B., He, Q. & Mendell, N. Integration of linkage analyses and disease association studies. Genet. Epidemiol. 12, 653–658 (1995).
Heo, M., Leibel, R. L., Fontaine, K. R., Boyer, B. B., Chung, W. K., Koulu, M. et al. A meta-analytic investigation of linkage and association of common leptin receptor (LEPR) polymorphisms with body mass index and waist circumference. Int. J. Obesity Relat. Metab. Disord. 26, 640–646 (2002).
Weedon, M. N., Cebola, I., Patch, A. M., Flanagan, S. E., De Franco, E., Caswell, R. et al. Recessive mutations in a distal PTF1A enhancer cause isolated pancreatic agenesis. Nat. Genet. 46, 61–64 (2014).
Henkin, L., Bergman, R. N., Bowden, D. W., Ellsworth, D. L., Haffner, S. M., Langefeld, C. D. et al. Genetic epidemiology of insulin resistance and visceral adiposity. The IRAS Family Study design and methods. Ann. Epidemiol. 13, 211–217 (2003).
Palmer, N. D., Goodarzi, M. O., Langefeld, C. D., Wang, N., Guo, X., Taylor, K. D. et al. Genetic variants associated with quantitative glucose homeostasis traits translate to type 2 diabetes in Mexican Americans: the GUARDIAN (Genetics Underlying Diabetes in Hispanics) Consortium. Diabetes 64, 1853–1866 (2014).
O'Connell, J. R. & Weeks, D. E. PedCheck: a program for identification of genotype incompatibilities in linkage analysis. Am. J. Hum. Genet 63, 259–266 (1998).
Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012).
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5, e1000529 (2009).
Almasy, L. & Blangero, J. Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62, 1198–1211 (1998).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19, 1655–1664 (2009).
Das, S. K., Hasstedt, S. J., Zhang, Z. & Elbein, S. C. Linkage and association mapping of a chromosome 1q21–q24 type 2 diabetes susceptibility locus in northern European Caucasians. Diabetes 53, 492–499 (2004).
Langefeld, C. D., Wagenknecht, L. E., Rotter, J. I., Williams, A. H., Hokanson, J. E., Saad, M. F. et al. Linkage of the metabolic syndrome to 1q23–q31 in Hispanic families: the Insulin Resistance Atherosclerosis Study Family Study. Diabetes 53, 1170–1174 (2004).
Wiltshire, S., Hattersley, A. T., Hitman, G. A., Walker, M., Levy, J. C., Sampson, M. et al. A genomewide scan for loci predisposing to type 2 diabetes in a UK population (the Diabetes UK Warren 2 Repository): analysis of 573 pedigrees provides independent replication of a susceptibility locus on chromosome 1q. Am. J. Hum. Genet. 69, 553–569 (2001).
Vionnet, N., Hani, E. H., Dupont, S., Gallina, S., Francke, S., Dotte, S. et al. Genomewide search for type 2 diabetes-susceptibility genes in French whites: evidence for a novel susceptibility locus for early-onset diabetes on chromosome 3q27-qter and independent replication of a type 2-diabetes locus on chromosome 1q21–q24. Am. J. Hum. Genet 67, 1470–1480 (2000).
Adair-Kirk, T. L., Atkinson, J. J., Kelley, D. G., Arch, R. H., Miner, J. H. & Senior, R. M. A chemotactic peptide from laminin alpha 5 functions as a regulator of inflammatory immune responses via TNF alpha-mediated signaling. J. Immunol. (Baltimore, Md: 1950) 174, 1621–1629 (2005).
Prados, J., Stenz, L., Courtet, P., Prada, P., Nicastro, R., Adouan, W. et al. Borderline personality disorder and childhood maltreatment: a genome-wide methylation analysis. Genes Brain Behav. 14, 177–188 (2015).
Vasudevan, S. A., Russell, H. V., Okcu, M. F., Burlingame, S. M., Liu, Z. J., Yang, J. et al. Neuroblastoma-derived secretory protein messenger RNA levels correlate with high-risk neuroblastoma. J. Pediatr. Surg. 42, 148–152 (2007).
Vasudevan, S. A., Shang, X., Chang, S., Ge, N., Diaz-Miron, J. L., Russell, H. V. et al. Neuroblastoma-derived secretory protein is a novel secreted factor overexpressed in neuroblastoma. Mol. Cancer Ther. 8, 2478–2489 (2009).
Goodchild, R. E., Kim, C. E. & Dauer, W. T. Loss of the dystonia-associated protein torsinA selectively disrupts the neuronal nuclear envelope. Neuron 48, 923–932 (2005).
Vander Heyden, A. B., Naismith, T. V., Snapp, E. L., Hodzic, D. & Hanson, P. I. LULL1 retargets TorsinA to the nuclear envelope revealing an activity that is impaired by the DYT1 dystonia mutation. Mol. Biol. Cell 20, 2661–2672 (2009).
Kayman-Kurekci, G., Talim, B., Korkusuz, P., Sayar, N., Sarioglu, T., Oncel, I. et al. Mutation in TOR1AIP1 encoding LAP1B in a form of muscular dystrophy: a novel gene related to nuclear envelopathies. Neuromusc. Disord. 24, 624–633 (2014).
Cheung, C. Y., Tso, A. W., Cheung, B. M., Xu, A., Ong, K. L., Fong, C. H. et al. Obesity susceptibility genetic variants identified from recent genome-wide association studies: implications in a Chinese population. J. Clin. Endocrinol. Metab. 95, 1395–1403 (2010).
Leon-Mimila, P., Villamil-Ramirez, H., Villalobos-Comparan, M., Villarreal-Molina, T., Romero-Hidalgo, S., López-Contreras, B. et al. Contribution of common genetic variants to obesity and obesity-related traits in Mexican children and adults. PLoS ONE 8, e70640 (2013).
Stefanska, B., Cheishvili, D., Suderman, M., Arakelian, A., Huang, J., Hallett, M. et al. Genome-wide study of hypomethylated and induced genes in patients with liver cancer unravels novel anticancer targets. Clin. Cancer Res. 20, 3118–3132 (2014).
Xu, Y., Deng, Y., Ji, Z., Liu, H., Liu, Y., Peng, H. et al. Identification of thyroid carcinoma related genes with mRMR and shortest path approaches. PLoS ONE 9, e94022 (2014).
Feng, M., Bao, Y., Li, Z., Li, J., Gong, M., Lam, S. et al. RASAL2 activates RAC1 to promote triple-negative breast cancer progression. J. Clin. Invest. 124, 5291–5304 (2014).
Huang, Y., Zhao, M., Xu, H., Wang, K., Fu, Z., Jiang, Y. et al. RASAL2 down-regulation in ovarian cancer promotes epithelial-mesenchymal transition and metastasis. Oncotarget 5, 6734–6745 (2014).
Li, N. & Li, S. RASAL2 promotes lung cancer metastasis through epithelial-mesenchymal transition. Biochem. Biophys. Res. Commun. 455, 358–362 (2014).
Rourke, J. L., Muruganandan, S., Dranse, H. J., McMullen, N. M. & Sinal, C. J. Gpr1 is an active chemerin receptor influencing glucose homeostasis in obese mice. J. Endocrinol. 222, 201–215 (2014).
Ernst, M. C., Haidl, I. D., Zuniga, L. A., Dranse, H. J., Rourke, J. L., Zabel, B. A. et al. Disruption of the chemokine-like receptor-1 (CMKLR1) gene is associated with reduced adiposity and glucose intolerance. Endocrinology 153, 672–682 (2012).
Sell, H., Laurencikiene, J., Taube, A., Eckardt, K., Cramer, A., Horrighs, A. et al. Chemerin is a novel adipocyte-derived factor inducing insulin resistance in primary human skeletal muscle cells. Diabetes 58, 2731–2740 (2009).
Gruben, N., Aparicio Vergara, M., Kloosterhuis, N. J., van der Molen, H., Stoelwinder, S., Youssef, S. et al. Chemokine-like receptor 1 deficiency does not affect the development of insulin resistance and nonalcoholic fatty liver disease in mice. PLoS ONE 9, e96345 (2014).
Kim, S. H., Lee, S. H., Ahn, K. Y., Lee, D. H., Suh, Y. J., Cho, S. G. et al. Effect of lifestyle modification on serum chemerin concentration and its association with insulin sensitivity in overweight and obese adults with type 2 diabetes. Clin Endocrinol (Oxf) 80, 825–833 (2014).
Roman, A. A., Parlee, S. D. & Sinal, C. J. Chemerin: a potential endocrine link between obesity and type 2 diabetes. Endocrine 42, 243–251 (2012).
Closa, A., Cordero, D., Sanz-Pamplona, R., Solé, X., Crous-Bou, M., Paré-Brunet, L. et al. Identification of candidate susceptibility genes for colorectal cancer through eQTL analysis. Carcinogenesis 35, 2039–2046 (2014).
Guo, L., Liu, F., Chen, S., Yang, X., Huang, J., He, J. et al. Common variants in the Na-coupled bicarbonate transporter genes and salt sensitivity of blood pressure: the GenSalt study. J. Hum. Hypertens. 30, 543–548 (2015).
Aalkjaer, C., Boedtkjer, E., Choi, I. & Lee, S. Cation-coupled bicarbonate transporters. Compr. Physiol. 4, 1605–1637 (2014).
Guldiken, N., Usachov, V., Levada, K., Trautwein, C., Ziol, M., Nahon, P. et al. Keratins 8 and 18 are type II acute-phase responsive genes overexpressed in human liver disease. Liver Int. 35, 1203–1212 (2015).
Hamann, J., Aust, G., Arac, D., Engel, F. B., Formstone, C., Fredriksson, R. et al. International Union of Basic and Clinical Pharmacology. XCIV. Adhesion G protein-coupled receptors. Pharmacol. Rev. 67, 338–367 (2015).
Martinez, A. F., Muenke, M. & Arcos-Burgos, M. From the black widow spider to human behavior: Latrophilins, a relatively unknown class of G protein-coupled receptors, are implicated in psychiatric disorders. Am. J. Med. Genet. B 156b, 1–10 (2011).
Jackson, V. A., del Toro, D., Carrasquero, M., Roversi, P., Harlos, K., Klein, R. et al. Structural basis of latrophilin-FLRT interaction. Structure (London, England: 1993) 23, 774–781 (2015).
O'Sullivan, M. L., de Wit, J., Savas, J. N., Comoletti, D., Otto-Hitt, S., Yates, J. R. et al. FLRT proteins are endogenous latrophilin ligands and regulate excitatory synapse development. Neuron 73, 903–910 (2012).
Fallgatter, A. J., Ehlis, A. C., Dresler, T., Reif, A., Jacob, C. P., Arcos-Burgos, M. et al. Influence of a latrophilin 3 (LPHN3) risk haplotype on event-related potential measures of cognitive response control in attention-deficit hyperactivity disorder (ADHD). Eur. Neuropsychopharmacol. 23, 458–468 (2013).
Ribases, M., Ramos-Quiroga, J. A., Sanchez-Mora, C., Bosch, R., Richarte, V., Palomar, G. et al. Contribution of LPHN3 to the genetic susceptibility to ADHD in adulthood: a replication study. Genes Brain Behav. 10, 149–157 (2011).
Arcos-Burgos, M., Jain, M., Acosta, M. T., Stanescu, H., Wallis, D., Domené, S. et al. A common variant of the latrophilin 3 gene, LPHN3, confers susceptibility to ADHD and predicts effectiveness of stimulant medication. Mol. Psychiatry 15, 1053–1066 (2010).
Bruxel, E. M., Salatino-Oliveira, A., Akutagava-Martins, G. C., Tovo-Rodrigues, L., Genro, J. P., Zeni, C. P. et al. LPHN3 and attention-deficit/hyperactivity disorder: a susceptibility and pharmacogenetic study. Genes Brain Behav 14, 419–427 (2015).
Acknowledgements
This work was supported by the Grants R01 HG007112 (to DWB and CDL) and R01 DK087914 (to MCYN). The GUARDIAN study, which contributed the IRASFS GWAS genotypes to this project is supported by Grant R01 DK085175 (to LEW), and the IRASFS study was supported by HL060944, HL061019 and HL060919. The provision of GWAS genotyping data was supported, in part, by UL1TR000124 (CTSI), and DK063491 (DRC). The provision of exome chip data was supported, in part, by the Department of Internal Medicine at University of Michigan, the Doris Duke Medical Foundation and R01 DK106621 (to EKS). Computational support was provided, in part, by the Center for Public Health Genomics at Wake Forest School of Medicine.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies the paper on Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Cite this article
Hellwege, J., Palmer, N., Dimitrov, L. et al. Genome-wide linkage and association analysis of cardiometabolic phenotypes in Hispanic Americans. J Hum Genet 62, 175–184 (2017). https://doi.org/10.1038/jhg.2016.103
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/jhg.2016.103