eLD: entropy-based linkage disequilibrium index between multiallelic sites

Okada, Yukinori

doi:10.1038/s41439-018-0030-x

Download PDF

Software Report
Open access
Published: 22 October 2018

eLD: entropy-based linkage disequilibrium index between multiallelic sites

Yukinori Okada ORCID: orcid.org/0000-0002-0311-8472^1,2

Human Genome Variation volume 5, Article number: 29 (2018) Cite this article

2697 Accesses
13 Citations
3 Altmetric
Metrics details

Abstract

Quantification of linkage disequilibrium (LD) is a critical step in studies investigating human genome variations. Commonly used LD indices such as r² handle LD of biallelic variants for two sites. As shown in a previously introduced LD index of ε, normalized entropy difference of the haplotype frequency between LD and linkage equilibrium (LE) could be utilized to estimate LD of biallelic variants for multiple sites. Here, we developed eLD (entropy-based Linkage Disequilibrium index between multiallelic sites) as publicly available software to calculate ε of multiallelic variants for two sites. Application of eLD could dissect complex LD structures among multiple HLA genes (e.g., strong LD among HLA-DRB1, HLA-DQA1, and HLA-DQB1 in East Asians). Use of eLD is not restricted to haplotype-based LD; it is also applicable to genotype-based LD. Therefore, eLD enables estimation of trans-regional LD of SNP genotypes at two unlinked loci, such as the nonlinear LD between functional missense variants of ADH1B (rs1229984 [Arg47His]) and ALDH2 (rs671 [Glu504Lys]).

Linkage disequilibrium (LD) is defined as the nonrandom distribution of alleles at different loci¹. Quantitative assessment of LD in a population of interest is an important procedure to conduct fine-mapping of causal variants embedded in the disease risk loci identified by genome-wide association studies (GWAS)². Population-specific features of LD are related to ethnically heterogeneous distributions of single-nucleotide polymorphisms (SNPs)³. The most widely used measurements of LD are r² and D′; both values quantify LD between biallelic variants (i.e., SNPs) for two sites, reflecting nonrandom distributions of four haplotypes consisting of pairwise combinations of the alleles. Specifically, r² can be interpreted as Pearson’s correlation measurement (R²) of allele distributions and is known to be proportional to χ² values of genotype–phenotype association statistics between two sites¹. LD values can easily be calculated using publicly available software (e.g., PLINK and vcftools), or using downloaded pre-calculated values from websites (e.g., HaploReg and LocusZoom).

Nothnagel et al.⁴ previously demonstrated that r² can also be interpreted as normalized entropy in haplotype frequencies, and introduced a novel LD index named ε (see definition in Supplementary Information). ε represents the normalized entropy difference of the haplotype frequencies between LD and those expected under the null hypothesis of no LD (i.e., linkage equilibrium [LE]). The value of ε ranges between 0 and 1, with larger values indicating stronger LD. Application of ε enabled LD quantification of biallelic variants for multiple sites (Fig. 1)⁴, which was effective in selecting tag SNPs free from ambiguous definitions of LD blocks in an unbiased manner⁵.

We have recently extended ε to further quantify LD of multiallelic variants for two sites as described elsewhere (Fig. 1)⁶. Here, we developed eLD (entropy-based Linkage Disequilibrium index between multiallelic sites) as publicly available software to calculate the ε of multiallelic variants for two sites (see the software URL). Various multiallelic variants exist with important clinical impacts in terms of genotype–phenotype associations. Of these, polymorphisms of human leukocyte antigen (HLA) genes in the major histocompatibility (MHC) locus have a wide spectrum of risk for a variety of human diseases. While elucidation of the complex LD structure of HLA genes has been challenging, application of ε clearly identified hidden LD relationships among the HLA genes⁶. For example, we observed relatively strong LD between HLA-C and HLA-B, among HLA-DRB1, HLA-DQA1, and HLA-DQB1, and between HLA-DPA1 and HLA-DPB1 (ε > 0.15; calculated using 4-digit classical alleles of a subset of the East Asian subjects [n = 300] enrolled in the original studies^6,7; Fig. 2). Since estimation of the haplotype frequency could be biased when its distribution is sparse, an option to combine the alleles with frequencies lower than the defined threshold (0.05 in default settings) into a single dummy allele is implemented in eLD.

One of the novel features of eLD is to empirically estimate a value of ε in a null hypothesis of LE (= ε_{_NULL}). Additionally, it also calculates the ε actually observed in a given data set ( = ε_{_Observed}). eLD calculates ε_{_NULL} based on a permutation approach. By randomly shuffling connections of the alleles between the two sites, ε_{_NULL} is estimated as the mean value of ε obtained in each iteration step (×1000 iterations in default settings). Since the baseline value of ε_{_NULL} depends on the number of alleles in each site, calculation of ε_{_NULL} as well as ε_{_Observed} would help to evaluate the relative strength of LD relationships at the observed sites.

Another feature of the software is that application of eLD is not restricted to haplotype-based LD; it is also applicable to genotype-based LD. Using eLD, one can estimate LD between loci where phasing of the haplotypes is theoretically difficult. As an illustrative example, we estimated trans-regional LD in two unlinked loci: ADH1B at 4q23 and ALDH2 at 12q24. ADH1B and ALDH2 harbor well known functional missense variants at rs1229984 (Arg47His) and rs671 (Glu504Lys), respectively. Both of these SNPs have pleiotropic effects on a number of human complex traits, including dietary habits. Studies investigating natural selection pressure identified strong significant positive selection on these missense variants in Japanese or other East Asian populations, which was closely linked to geographical heterogeneity in allele frequency spectra of these SNPs even within a single population⁸. Here, using eLD, we calculated ε to estimate trans-regional LD between rs1229984 and rs671 (Fig. 3). We obtained genotypes for these SNPs from East Asian subjects within the 1000 Genomes Projects (n = 504, phase 3 version 5), and found a high ε_{_Observed} value (=0.0053) when compared to ε_{_NULL} (= 0.0024). As expected from natural selection pressure on these variants⁸, rs1229984AA-rs671AA genotypes and rs1229984GG-rs671GG genotypes had increased frequencies compared to those variants in LE (≥1.21-fold), while rs1229984GG-rs671GA genotypes had decreased frequencies (0.58-fold) compared to those variants in LE. While Pearson’s correlation between genotypes can also evaluate trans-regional LD, nonlinear relationships of genotypes (such as the reduced frequency of rs1229984GG-rs671GA) would not have been reflected with this measurement.

In summary, we developed software, which we named eLD, that quantifies the entropy-based LD index of ε in multiallelic variants for two sites, such as LD between highly polymorphic HLA genes. eLD also enables estimation of trans-regional LD of SNP genotypes, such as functional variants of ADH1B and ALDH2. We note that normalized entropy has increased the potential to dissect complex dependencies among human genome variations (e.g., Y-chromosomal short tandem repeat [STR] marker selection⁹), and development of additional methodology should be warranted.

Software availability

eLD is freely available at http://www.sg.med.osaka-u.ac.jp/tools.html with example data sets.

References

Slatkin, M. Linkage disequilibrium - understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 9, 477–485 (2008).
Article CAS Google Scholar
Schaid, D. J. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 91–504 (2018).
Article Google Scholar
Kanai, M. et al. Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set. J. Hum. Genet. 61, 861–866 (2016).
Article CAS Google Scholar
Nothnagel, M. et al. Entropy as a measure for linkage disequilibrium over multilocus haplotype blocks. Hum. Hered. 54, 186–198 (2005).
Article Google Scholar
Nothnagel, M. & Rohde, K. The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates. Am. J. Hum. Genet. 77, 988–998 (2005).
Article CAS Google Scholar
Okada, Y. et al. Construction of a population-specific HLA imputation reference panel and its application to Graves’ disease risk in Japanese. Nat. Genet. 47, 798–802 (2015).
Article CAS Google Scholar
Okada, Y. et al. Risk for ACPA-positive rheumatoid arthritis is driven by shared HLA amino acid polymorphisms in Asian and European populations. Hum. Mol. Genet. 23, 6916–6926 (2014).
Article CAS Google Scholar
Okada, Y. et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 9, 1631 (2018).
Article Google Scholar
Siegert, S. et al. Shannon’s equivocation for forensic Y-STR marker selection. Forensic Sci. Int. Genet. 16, 216–225 (2015).
Article CAS Google Scholar

Download references

Acknowledgements

We thank Prof. Michael Nothnagel for his thoughtful suggestions. This study was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI (15H05911) and the Japan Agency for Medical Research and Development (AMED; 18gm6010001h0003 and 18ek0410041h0002).

Author information

Authors and Affiliations

Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan
Yukinori Okada
Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, 565-0871, Japan
Yukinori Okada

Authors

Yukinori Okada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yukinori Okada.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Okada, Y. eLD: entropy-based linkage disequilibrium index between multiallelic sites. Hum Genome Var 5, 29 (2018). https://doi.org/10.1038/s41439-018-0030-x

Download citation

Received: 09 September 2018
Revised: 18 September 2018
Accepted: 18 September 2018
Published: 22 October 2018
DOI: https://doi.org/10.1038/s41439-018-0030-x

This article is cited by

A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response
- Yang Luo
- Masahiro Kanai
- Soumya Raychaudhuri
Nature Genetics (2021)
GWAS of 165,084 Japanese individuals identified nine loci associated with dietary habits
- Nana Matoba
- Masato Akiyama
- Yukinori Okada
Nature Human Behaviour (2020)
Functional variants in ADH1B and ALDH2 are non-additively associated with all-cause mortality in Japanese population
- Saori Sakaue
- Masato Akiyama
- Yukinori Okada
European Journal of Human Genetics (2020)
Recessive Z-linked lethals and the retention of haplotype diversity in a captive butterfly population
- Ilik J. Saccheri
- Samuel Whiteford
- Arjen E. van’t Hof
Heredity (2020)
Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population
- Jun Hirata
- Kazuyoshi Hosomichi
- Yukinori Okada
Nature Genetics (2019)

eLD: entropy-based linkage disequilibrium index between multiallelic sites

Abstract

Software availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Electronic supplementary material

Supplementary Information

Rights and permissions

About this article

Cite this article

This article is cited by

A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response

GWAS of 165,084 Japanese individuals identified nine loci associated with dietary habits

Functional variants in ADH1B and ALDH2 are non-additively associated with all-cause mortality in Japanese population

Recessive Z-linked lethals and the retention of haplotype diversity in a captive butterfly population

Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population

Search

Quick links

Abstract

Software availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Electronic supplementary material

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response

GWAS of 165,084 Japanese individuals identified nine loci associated with dietary habits

Functional variants in ADH1B and ALDH2 are non-additively associated with all-cause mortality in Japanese population

Recessive Z-linked lethals and the retention of haplotype diversity in a captive butterfly population

Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population

Search

Quick links