Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants

Zhang, Fu-Tao; Zhu, Zhi-Hong; Tong, Xiao-Ran; Zhu, Zhi-Xiang; Qi, Ting; Zhu, Jun

doi:10.1038/srep10298

Download PDF

Article
Open access
Published: 30 July 2015

Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants

Fu-Tao Zhang¹^na1,
Zhi-Hong Zhu¹^na1,
Xiao-Ran Tong¹^na1,
Zhi-Xiang Zhu¹^na1,
Ting Qi¹^na1 &
…
Jun Zhu¹^na1

Scientific Reports volume 5, Article number: 10298 (2015) Cite this article

4181 Accesses
33 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Precise prediction for genetic architecture of complex traits is impeded by the limited understanding on genetic effects of complex traits, especially on gene-by-gene (GxG) and gene-by-environment (GxE) interaction. In the past decades, an explosion of high throughput technologies enables omics studies at multiple levels (such as genomics, transcriptomics, proteomics and metabolomics). The analyses of large omics data, especially two-loci interaction analysis, are very time intensive. Integrating the diverse omics data and environmental effects in the analyses also remain challenges. We proposed mixed linear model approaches using GPU (Graphic Processing Unit) computation to simultaneously dissect various genetic effects. Analyses can be performed for estimating genetic main effects, GxG epistasis effects and GxE environment interaction effects on large-scale omics data for complex traits and for estimating heritability of specific genetic effects. Both mouse data analyses and Monte Carlo simulations demonstrated that genetic effects and environment interaction effects could be unbiasedly estimated with high statistical power by using the proposed approaches.

Multi-omics study for interpretation of genome-wide association study

Article 18 September 2020

Masato Akiyama

A global overview of pleiotropy and genetic architecture in complex traits

Article 19 August 2019

Kyoko Watanabe, Sven Stringer, … Danielle Posthuma

A versatile, fast and unbiased method for estimation of gene-by-environment interaction effects on biobank-scale datasets

Article Open access 25 August 2023

Matteo Di Scipio, Mohammad Khan, … Guillaume Paré

Introduction

Both natural and experimental populations harbor an array of phenotypic variations because of the complicate genetic architecture underlying quantitative traits. It is well documented that the genetic basis responsible for phenotypic variability consists of individual causal genes and interacting networks, with their specific effects in multiple environmental conditions. Gene-by-gene (epistasis or GxG) and gene-by-environment (GxE) interactions, such as chicken comb type¹, animal coat color and the ABO blood group in humans, are confirmed to exist². Complex traits are controlled by multiple loci, which harbor polymorphisms that give rise to phenotypic variation in a population. Complex traits cannot be studied by testing a single locus at a time, especially when the contribution of each locus is small³. To understand the genetic architecture of variation for complex traits, we need to perform system level analyses that encompass genome-wide SNPs, transcripts, proteins and metabolites by considering the effects of GxG and GxE interactions.

In the past decades, an explosion of new high throughput technologies enables omics studies at multiple levels (such as genomics, transcriptomics, proteomics and metabolomics). At each level it is possible to construct interaction networks associated with complex traits (including diseases)⁴. These large-scale omics data provide great opportunity for biological understanding, but integrating the diverse omics data and environmental effects in the analyses has remained a challenge. New computational methods need be developed to understand these complex heterogeneous omics data^5,6,7,8,9. The analysis of large omics datasets, especially two-loci interaction analysis, involves intensive computation. Heterogeneous computational environments including graphic processing units (GPUs) system can provide effective solutions for large-scale data sets analysis¹⁰. CPU-GPU heterogeneous parallel computing is very common nowadays.

Linkage analyses and association analyses are two genetic mapping approaches used to assess the relation between the genotypic and phenotypic variations on a population scale. Taking advantage of conventional molecular markers, efficient statistical methods of QTL (Quantitative Trait Locus) mapping have become pervasive¹¹ since the landmark approach (interval mapping) developed by Lander and Botstein¹². Since then, several methods have been developed for searching epistasis^{13,14,15,16,17} and GxE interactions^18,19,20,21. Mixed linear model-based composite interval mapping (MCIM)^22,23,24 could detect both GxG and GxE interactions by experimental data involving multiple environments (or treatments). However, with the recent development of high-throughput genotyping technologies, genetic association analyses have become common tools for uncovering causal genetic variants and networks at the whole-genome level²⁵. In 1947, Fisher first used linkage disequilibrium (LD) information to map casual loci for human blood types²⁶. So far, many mapping studies of human diseases and complex traits by genetic association analyses have revealed plenty of novel loci and provided insight into the biology of diseases. Several methods have been published for exhaustive epistasis analysis^{27,28,29,30,31}. However these methods cannot integrate other omics data except genome data. Because associating DNA (Deoxyribonucleic Acid) polymorphism with phenotypic variation omits all of the intermediate steps in the chain of causation from genetic perturbation of variation in quantitative traits, the intermediate molecular variables such as transcript abundance could allow us to interpret the causal networks³². The RNA expression microarray has been combined with other experimental approaches to find the key mechanism of complex traits³³. One such technique considers the transcript abundance as a quantitative trait, known as expression quantitative trait locus (eQTL)³⁴. Other approaches are to identify significantly expressed transcripts underlying complex traits by using a Pearson correlation coefficient³⁵ and multiple linear regression³⁶, in which the GxG and GxE at transcript levels are ignored. Despite intensive efforts to explain genetic variation of quantitative traits, which have identified a great number of genetic variants and transcripts for various complex traits, we still fall short of understanding the mechanism of the genetic architecture of complex traits.

In this study, mixed linear model approaches are proposed to identify genetic effects of individual loci, epistasis effects of pair-wise loci (Fig. 1a), as well as GxE interaction (Fig. 1b), which is applicable for genome-wide association studies (GWAS). Our approaches consist of four steps in statistical analyses: (1) one-dimension search for individual loci; (2) exhaustive two-dimension search for epistasis loci; (3) stepwise search for fitting a full genetic model, including candidate loci with main effects, epistasis and GxE interaction; and (4) estimating gene effects of individual and epistasis loci detected in previous process by method of Monte Carlo Markov Chain via Gibbs Sampling^24,37. All these processes have been implemented in a GPU-based mapping software, named QTXNetwork. With the massive parallel nature of multi-GPUs, association analyses can be performed for detecting loci on large-scale omics data for complex traits and for estimating variance components of genetic effects. QTXNetwork consists of three functional modules: quantitative trait locus (QTL)³⁸ for QTL analyses (Fig. 1c), quantitative trait SNP (QTS) for genome analyses and quantitative trait transcript/protein/metabolite (QTT/P/M) for transcriptome, proteome, or metabolome analyses (Fig. 1d). Association analyses can also be conducted for networks among four omics variants (Quantitative Trait X for SNPs, Transcripts, Proteins and Metabolites) (Fig. 1e). By analyzing mouse datasets on anxiety and Monte Carlo simulations for linkage mapping of QTLs, association mapping of QTSs and QTTs, we demonstrated that unbiased estimation could be obtained for genetic effects of causal genes. The package QTXNetwork can be downloaded at the following website http://ibi.zju.edu.cn/software/QTXNetwork.

Results

Analysis of mouse data

We applied our proposed statistical methods for mapping QTLs, QTSs and QTTs to searching for the genetic mechanism of anxiety in 71 BXD recombinant inbred (RI) strains of mice (n = 528 mice). Differences in the phenotypes are evident in the parental strains. For example, the maternal strain C57BL/6J exhibits lower anxiety- and fewer stress-related effects than the paternal strain DBA/2J, which exhibits greater fear-related responses³⁹. Animals of 71 BXD RI strains, 60 to 120 days old, were used. These strains were derived by crossing C57BL/6J (B6) and DBA/2J (D2) strains in the 1970s (BXD1-32; 26 strains) and 1990s (BXD33-42; 9 strains)⁴⁰. Genotypes of the BXD strains were generated at the University of Tennessee Health Science Center. A total of 3795 markers covering 19 autosomal chromosomes and one sex chromosome were genotyped, including 3,033 SNPs and 762 SSRs (Simple Sequence Repeats). Many adjacent markers had identical strain distribution patterns. Therefore, we selected 2,320 markers for the subsequent analysis (1,814 SNPs and 506 SSRs). On the other hand, there were 46,643 transcripts in total. Because many of them appeared to show no or little variation, we selected 4,193 transcripts with relatively large variance (coefficient of variation CV > 1.0%).

Anxiety-related behavior was examined in the closed quadrants of an elevated zero maze, a standard tool for testing anxiety⁴¹, under five conditions: 1) animals acutely restrained and receiving ethanol; 2) animals acutely restrained and receiving saline; 3) animals receiving only a saline injection; 4) animals receiving only an ethanol injection; and 5) animals not restrained or receiving any injection. Acutely restrained animals were placed in an immobilization tube for 15 minutes. Animals receiving injections were given either ethanol (1.8 g/kg) or saline and were returned to their home cages. The activities of the test session were recorded in the closed quadrants.

As shown in Fig. 2 and Table 1, there were three QTLs detected by linkage analysis on chromosomes 1 and 11, of which Q₁ (within 25.2 Mb ~ 27.1 Mb) and Q₂ (within 169.1 Mb ~ 169.8 Mb) were on chromosome 1 and Q₃ (within 44.6 Mb ~ 53.9 Mb) was on chromosome 11. These three loci were confirmed by QTS association analysis with precision location (Q₁ at 27.1 Mb, Q₂ at 169.1 Mb and Q₃ at 52.8 Mb). Two extra QTS sites were also discovered on chromosome 11 (Q₄ at 35.3 Mb and Q₅ at 36.5 Mb). The QTS mapping matched well with exact position of identified SNP and higher power than QTL mapping. For the three loci detected by QTL and QTS mapping, only one was confirmed by QTT mapping (Q₂ at 169.1 Mb), but another one was revealed nearby (Q₆ at 155.5 Mb). It is apparent that QTT mapping can only discover transcript loci at the time when they are expressed.

Table 1

Full size table

As shown in Table 2, the epistasis loci QQ₁ was identified with similar predicted effects by both QTL mapping (D1Mit291 × rs3659789) and QTS mapping (D1Mit291 × rs3717220). Compared with the QTL mapping, QTS mapping appeared to have higher statistical significance. Because no transcription QQ₁ was detected on chromosome 1, there might have been no significant association of transcript epistasis QQ₁ at the time when the tissue used for mRNA extraction was collected. There was another transcript epistasis QQ₂ (ILM100060136 × ILM1740047) that was detectable only by QTT mapping.

Table 2

Full size table

Monte Carlo simulations

A simulation study with 200 replications was conducted. The BXD mouse genetic map was used to generate three simulated populations for mapping QTLs, QTSs and QTTs. Initially, we generated a simulated population for QTS mapping with 200 RIL genotypes consisting of 2,320 SNPs covering 2,037.6 cM. Five QTSs (denoted Q₁, Q₂, Q₃, Q₄ and Q₅) were assumed to control the simulated trait. Four of the five QTSs were involved in the three pairs of two-way interactions, denoted QQ₁ for Q₁ × Q₃, QQ₂ for Q₁ × Q₄ and QQ₃ for Q₃ × Q₄. The whole-genotype individuals were investigated in three environments. The individual SNPs and interactions were set to account for as much as 20% in total heritability (). Detailed genetic information is listed in Table S2 and Table S3. For 200 simulations, we can detect significant individual QTLs/QTSs and pair-wise epistasis QTLs/QTSs. Power (%) was calculated as the percentage of true loci significantly detected. Mean of estimated genetic effects and standard error (SE) were also calculated for inferring un-biasedness of estimation of genetic effects.

A second simulation population was generated for mapping QTLs, including 506 microsatellite markers drawn from the entire 2,320 markers within each observation sample. Other parameters had the same settings as described above. A third simulation population was created for mapping QTTs, including 200 genotypes, with each composed of 2,320 transcript loci, using the same map as the mouse genetic map. Four transcript loci (denoted Q₁, Q₂, Q₃ and Q₄) were supposed to control the phenotype variation. Meanwhile, three pairs of two-loci combinations (denoted QQ₁, QQ₂ and QQ₃) between the four transcript loci were assumed to be associated with the simulated trait. The 200-genotype individuals were tested in three environments. The total heritability was equal to 20%. Detailed information is listed in Table S4 and Table S5. Power of detecting loci and estimated genetic effects with their standard error (SE) were also calculated as for QTLs/QTSs mapping.

The Monte Carlo simulation demonstrated that mixed linear model approaches could robustly estimate positions and effects for QTLs, QTSs and QTTs. The simulation results of mapping QTLs and QTSs are listed in Table S2 and Table S3. Our simulation results revealed that both QTL and QTS mapping approaches could obtain efficient and unbiased estimations of locations and genetic effects of loci with high power (>82.5% for individual loci and >87.0% for pair-wise epistasis loci). For example, Q₁ ( 2.33%) and Q₄ ( 3.63%) had statistical power of 100% by both two methods. The loci with relatively small heritability may be more likely to be identified by QTS association analysis. Individual loci Q₅ ( 1.31%) had the smallest heritability among the simulated loci, which was detected with a statistical power of 90.5% by QTS association analysis, but only 82.5% by QTL linkage analysis. Similarly, for the locus Q₂ with a heritability of 1.77%, QTS association analysis had higher statistical power (100%) than QTL linkage analysis (95%). Furthermore, the positions and genetic effects could be estimated more precisely by QTS association analysis. For a locus with a relatively large effect, both methods could yield an unbiased estimate. However, there were obvious differences between the two approaches for estimating genetic effects and positions of loci with relatively small heritability. For locus Q₅, the smaller standard error (SE) of the estimated position indicated that QTS association analysis could define a more precise position than QTL linkage analysis. Because of the precise identification of position, the estimated effects of the locus may be closer to the parameters by the QTS association method. The estimated additive and additive-by-environment interaction effects of locus Q₅ were also relatively accurate by QTS mapping. Likewise, the more precise estimation and smaller SE of the general additive effect of individual locus Q₄ revealed that QTS mapping ( 3.66, SE = 0.56) could obtain more accurate estimates than QTL mapping ( 3.13, SE = 1.13).

Detailed simulation results for mapping QTTs are listed in Table S4 and Table S5. Association analysis of QTTs could also efficiently detect the casual transcript loci and provide unbiased estimations, such as positions, genetic main effects and GxE interaction effects. Individual transcript loci could be detected with statistical power higher than 83.0% and the power for detecting epistasis was 100% in all cases. The estimates of genetic effects and environment interaction effects were close to the parameter setting with very small SEs for individual transcript loci as well as two-transcript loci interactions. Because QTT association analysis could identify the transcript loci efficiently, we could obtain unbiased estimates of QTT main effects and QTT by environment interaction effects.

GPU Accelerating Performance

We used three GPU servers to test the performance. The first one consisted of 2 NVIDIA GTX480 cards running on an Intel® core™ i7 × 980 with 3.33 GHz (Gigahertz) CPU using 12 GB (Gigabyte) DDR3 host memory. The second one consisted of 4 NVIDIA GTX680 cards running on an Intel® core™ E5645 with 2.40 GHz CPU using 48GB DDR3 host memory. The third one consisted of 4 NVIDIA Tesla K20c cards running on an Intel® core™ E5645 with 2.40 GHz CPU using 48GB DDR3 host memory. We compared the running time of three implementation versions and measured the time of the whole procedure including the input, one-dimension search, two-dimension search, effect estimation and the output as the comparing time. We implemented multi-GPU computing in two-dimension search. First we divided the whole SNP pairs into parts according to the number of GPUs and assigned each part to one GPU. Each GPU finished its tasks in loops. The speed-up results of GPU implements over single-thread CPU implementation are summarized in Table S6. We can see that the speed-up increases as the SNP number increasing. Given the same GPU architecture, the speedup is nearly in proportion to the number of GPUs. We can achieve more than 250 times speed-up by using four Tesla K20c cards. We used bit compression in QTS to save the memory space. We also tested the performance of GPU implementation with bit compression technology. Table S7 shows the speed-up of GPU implementation with compression over the single-thread CPU implementation. From Table S6 and Table S7 we can see that compression technology increased the performance instead of decreasing it. This was mainly because 1) we used bitwise operations instead of arithmetic operations to compress and decompress the data; 2) one GPU memory access can get more data by the compression. Therefore, one memory access can serve more GPU threads and the number of memory access decreased. We have also used the newly developed software to analyze publicly available data (humans and plants) and detected major genetic variation due to dominance and epistasis for human BMI⁴², but epistasis and their environment interaction for cotton yield⁴³.

Discussion

Traditionally, linkage analyses can detect the causal individual QTLs and epistasis. Linkage mapping has discovered many QTLs affecting various quantitative traits. Because of the recent development of high-throughput genotyping technologies and identification of highly dense SNPs⁴⁴, SNP markers have been commonly used in genome research⁴⁵, bioinformatics and bio-computation studies⁴⁶, genetic study of complex traits⁴⁷ and population genetics of human beings⁴⁸. As compared with linkage analyses, association analyses based on SNP markers have several advantages. Firstly, the QTS association mapping can be applied in different populations. QTL linkage mapping is realized by determining the probability of three genotypes (QQ, Qq and qq), supposing the existence of linkage between the flanking markers and the unobserved loci. However, in artificially generated lines such as recombinant inbreeding lines (RILs) or doubled haploid lines (DHLs) derived from two parental lines, the abundant recombination may eliminate linkage over generations. Besides, it may be difficult to infer the probability of three genotypes in mapping QTLs for populations derived from multiple parental lines. The QTS association analyses rely on the retention of adjacent DNA variants over many generations. As a result, it is appropriate to detect loci for natural populations and complicated experimental designs by QTS association analyses.

For advanced populations, such as recombinant inbred lines (RILs) and near-isogenic lines (NILs), the linkage between the flanking markers and unobserved markers is reduced, as a few generations increase the recombination frequency⁴⁹. This change may decrease the statistical power for detection of QTLs by the linkage analyses, because the reduced linkage may influence the prediction of three genotypes’ probability. On the contrary, because of the high density of SNP markers and observed genotypes, the association methods can detect QTSs efficiently, even QTSs with small heritability. From the results of simulations, it is revealed that the association analyses have higher statistical power than the linkage analyses, especially for loci with small heritability, such as Q₂ and Q₅ in Table S2. As shown in Fig. 2, higher peaks suggest that candidate loci may be detected more certainly by QTS association mapping. Furthermore, the candidate gene regions identified by QTL mapping may be large, encompassing hundreds or even thousands of genes. By contrast, the association analysis, drawing from historic recombination, may narrow the trait-associated regions to only one gene or gene fragment. In the Monte Carlo simulations, the individual QTL Q₅ in Table S2 had the smallest heritability. The QTS association analyses obtained smaller SEs of estimated position than the QTL linkage analyses. In addition, when analyzing the data of the mouse on chromosome 11, the QTS association mapping detected two significant SNPs in the region of the QTL mapped by the linkage study. It is revealed that QTS association analysis has advantages over linkage analysis for efficiency and accuracy in mapping loci.

Discovered loci such as QTSs can subsequently be used to predict phenotypic values and QTS effects in an independent population and it typically provides some improvement in classifying phenotypic values over random decision-making. In public health, it is useful to determine whether individuals are in an at-risk group. Owing to the accuracy of locus position and effect estimation and the ease of discovery of loci with low heritability, the effective and efficient QTS association can improve the genetic predictor.

On the other hand, transcript association can detect causal transcript loci efficiently. In contrast to QTS association analysis and QTL linkage analysis, the genotypic variants of QTT association are continuous gene expression data. The high statistical power and unbiased estimation indicates that QTT association is also a useful approach to map individual transcripts and pair-wise interaction, which are significantly associated with the quantitative traits. In addition, the approach could also be extended to mapping quantitative trait protein (QTP) and quantitative trait metabolite (QTM)⁵⁰. Combining the results of transcript association with the QTSs mapped by association analyses, we could further understand the function of the candidate genes. Although we detected several loci by the linkage analyses and association analyses, they may affect the quantitative traits by a specific unknown mechanism. We can settle the problem by QTT association analyses. For example, in the case of anxiety of the mouse, we found three individual loci by both QTL linkage mapping and QTS association mapping. The transcript association mapping shows that only one of them was associated at the gene expression level with anxiety. Thus, it is a useful approach to combine the intermediate molecular phenotypes with QTS mapping to understand the biologically causal networks. Moreover, as other intermediate molecular variations, such as proteins and metabolites, we can further explore the “black box” of complex traits.

Methods

Mixed linear model

For mapping quantitative trait SNP (QTS) or quantitative trait transcript/protein/metabolite (QTT/QTP/QTM), mixed linear model approaches can be used to detect loci significantly associated with phenotypic variation^{51,52,53,54,55}. When quantitative variation of transcripts, proteins and metabolites are used as independent variables for association analyses among these three omics genotypic variants, other types of QTXs can be identified. The names of total 16 types of QTXs detectable by association mapping are listed in Table S1.

Mixed-model approach for QTL mapping^24,37 can deliver unbiased estimation of genetic effects (additive, dominance, epistasis and their environment interaction) for detected loci based on a genetic model with genetic main effects as fixed effects and environment interaction effects as random effects. For analyzing large amount of candidate omics variants by associating mapping, we proposed to use genetic model setting all genetic effects as random variables. For mapping SNPs in homozygote population and transcripts/proteins/metabolites in homozygote/heterozygote population, the dependent variables () of the k-th subject in the h-th environment can be expressed by the following mixed linear model:

where is the population mean; is the fixed effect of the h-th environment; is the i-th locus effect with coefficient (1 for QQ, -1 for qq, and 0 for Qq in QTS mapping and using expression values in QTT/P/M mapping); is the epistasis effect of locus i × locus j with coefficients (1 for QQ × QQ and qq × qq, -1 for QQ × qq and qq × QQ in QTS mapping and using expression values in QTT/P/M mapping); is the environment interaction effect of the i-th locus in the h-th environment with coefficient ; is the epistasis × environment interaction effect of locus i × locus j in the h-th environment with coefficient ; and is the residual effect of the k-th individual in the h-th environment.

The mixed linear model can be presented in matrix notation:

where is an n × 1 column vector of phenotypic values and n is the number of sample observations; is a column vector of μ and environment effects; is the known incidence matrix relating to the fixed effects; is the known coefficient matrix relating to the v-th random vector ; is the kinship coefficient matrix relating to the v-th random vector ; and is an n × 1 column vector of residual effects.

To identify the susceptible individual and epistasis loci, we can conduct two-step approaches:

1
Individual locus detection. To test significance of the i-th individual locus, we used the following mixed linear model
where the parameters are defined as in Equation (1). We performed the F-test step by step based on the Henderson method III⁵⁶. The locus with maximum F-value²⁴ exceeding a predefined critical value (experiment-wise error rate < 0.05) is considered as a candidate individual SNP or transcript.
2
Epistasis loci detection. In order to search all possible epistasis interacting loci when s individual locus has been selected by the first step, we conduct an exhausted two dimension (2D) genome scan by the following statistical model.

where the parameters have the same definitions as in Equation (1). The F-test is performed to test all possible pairs. The pairs of loci with maximum F-value larger than the predefined threshold value ( < 0.05) are considered as candidate epistasis interacting loci.

After selecting the candidate individual and pair-wise loci, a full statistical model as in Equation (1) is used to estimate variance components and genetic effects by mixed linear model approaches. Variance components in the following equations can be estimated by MINQUE(1) method (Minimum Norm Quadratic Unbiased Estimation setting prior values as 1)

where

Genetic effects can be predicted by an Adjusted Unbiased Prediction (AUP) method⁵⁷

In the detection of individual and epistasis loci association with the phenotypic variation, multiple hypothesis tests are conducted among the candidate genotypes. To control experiment-wise type I error, a permutation testing is applied. Because the statistical model consists of parameters to be tested for putative individual loci in a two-locus detection process, we randomly shuffle the order of parameters to be tested. 2000 permutations were used to calculate the critical P-value for controlling the experiment-wise type I error. Stepwise selection was performed on all the significant peaks selected from the F-statistic profile, which meets the significance level ( < 0.05) of experiment-wise type I error^24,37. The effects of individual and epistasis interacting loci detected in the previous process are estimated by the following mixed model equations via Markov chain Monte Carlo (MCMC)^24,37:

In the current study, a chain length of 200,000 and a thinning interval of 10 cycles were employed for parameter to be estimated, after the chain reached the equilibrium distribution.

GPU Computing Implementation

We implemented mixed linear model approaches with architecture of CPU-GPU heterogeneous parallel computation. The designing of QTT/M/P mapping is similar to QTS mapping. For illustrating how computation is performed, we took QTS mapping as an example and drew Fig. S1 showing the computational flow chart. We exploited GPU computing on one-dimension search for individual loci and two-dimension search for epistasis loci, which are the most time-consuming steps among the whole statistical analyses. Other less time-consuming statistical analysis steps and the input/output procedure ran on CPU. Moreover, a self-adaptive load balancing method and a matrix compression method for coefficient matrix of mixed linear model were exploited. In order to hide the GPU latency, the number of running warps (32 threads a warp) on SM (Stream Multiprocessor) should be set as many as possible. In general the size of grid should be at least three times of the number of SM. Moreover there should be more than four warps in a Block. In one-dimension search and two-dimension search, we exploited one to one model. One candidate locus test or one interaction test is finished by one GPU thread.

In one-dimension search, the significance of one locus was analyzed by one GPU thread. In this step, some optimization technologies (Divide and Conquer, Coalesced Memory Access and Matrix Compression) were exploited. The framework is shown as Fig. S2.

In two-dimension search, one pair of loci was tested by one GPU thread. Because of the high throughput technology, the pair number can be very huge. We implemented the interaction scan on multi-GPU platforms. We have drawn Fig. S3 showing the framework of single GPU implementation and Fig. S4 showing the framework of multiple GPUs implementation. In two-dimension search scan, some data structures such as phenotype vector, permutation matrix and coefficient matrix should be copied from host memory to GPU global memory. Each interaction test has a different coefficient matrix. All these necessary coefficient matrices should be copied to GPU global memory. We used bit compression technology to compress these matrices. A lot of memory space and transfer time were saved. Besides this technology OpenMP, Divide and Conquer were exploited.

Additional Information

How to cite this article: Zhang, F.-T. et al. Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants. Sci. Rep. 5, 10298; doi: 10.1038/srep10298 (2015).

References

Carlborg, O., Hocking, P.M., Burt, D.W. & Haley, C.S. Simultaneous mapping of epistatic QTL in chickens reveals clusters of QTL pairs with similar genetic effects on growth. Genet. Res. 83, 197–209 (2004).
Article CAS PubMed Google Scholar
Carlborg, O. & Haley, C.S. Epistasis: too often neglected in complex trait studies? Nat. Rev. Genet. 5, 618–625 (2004).
Article CAS PubMed Google Scholar
Scheinfeldt, L.B. & Tishkoff, S.A. Recent human adaptation: genomic approaches, interpretation and insights. Nat. Rev. Genet. 14, 692–702 (2013).
Article PubMed PubMed Central Google Scholar
Berger, B., Peng, J. & Singh, M. Computational solutions for omics data. Nat. Rev. Genet. 14, 333–346 (2013).
Article CAS PubMed PubMed Central Google Scholar
Schumacher, A., Rujan, T. & Hoefkens, J. A collaborative approach to develop a multi-omics data analytics platform for translational research. Appl. Transl. Genomics 3, 105–108 (2014).
Article CAS Google Scholar
Gomez-Cabrero, D. et al. Data integration in the era of omics: current and future challenges. BMC Syst. Biol. 8 Suppl 2, I1 (2014).
Article PubMed PubMed Central Google Scholar
Lin, D. et al. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression. Front. Cell Dev. Biol. 2, 62 (2014).
Article PubMed PubMed Central Google Scholar
Meng, C., Kuster, B., Culhane, A.C. & Gholami, A.M. A multivariate approach to the integration of multi-omics datasets. BMC bioinformatics 15, 162 (2014).
Article CAS PubMed PubMed Central Google Scholar
Saha, R., Chowdhury, A. & Maranas, C.D. Recent advances in the reconstruction of metabolic models and integration of omics data. Curr. Opin. Biotech. 29, 39–45 (2014).
Article CAS PubMed Google Scholar
Schadt, E.E., Linderman, M.D., Sorenson, J., Lee, L. & Nolan, G.P. Computational solutions to large-scale data management and analysis. Nat. Rev. Genet. 11, 647–657 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zeng, Z.B. Precision mapping of quantitative trait loci. Genetics 136, 1457–1468 (1994).
CAS PubMed PubMed Central Google Scholar
Lander, E.S. & Botstein, D. Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199 (1989).
CAS PubMed PubMed Central Google Scholar
Jing, P.J. & Shen, H.B. MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics, 10.1093/bioinformatics/btu702 (2014).
Schupbach, T., Xenarios, I., Bergmann, S. & Kapur, K. FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26, 1468–1469 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wan, X. et al. BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. American journal of human genetics 87, 325–340 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wei, C. & Lu, Q. GWGGI: software for genome-wide gene-gene interaction analysis. BMC genetics 15, 101 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhang, F., Boerwinkle, E. & Xiong, M. Epistasis analysis for quantitative traits by functional regression model. Genome Res. 24, 989–998 (2014).
Article CAS PubMed PubMed Central Google Scholar
Chen, H., Meigs, J.B. & Dupuis, J. Incorporating gene-environment interaction in testing for association with rare genetic variants. Hum. Hered. 78, 81–90 (2014).
Article CAS PubMed Google Scholar
Dai, J.Y. et al. Simultaneously testing for marginal genetic association and gene-environment interaction. Am. J. Epidmol. 176, 164–173 (2012).
Article Google Scholar
Ma, S., Yang, L., Romero, R. & Cui, Y. Varying coefficient model for gene-environment interaction: a non-linear look. Bioinformatics 27, 2119–2126 (2011).
Article CAS PubMed PubMed Central Google Scholar
Manning, A.K. et al. Meta-analysis of gene-environment interaction: joint estimation of SNP and SNP x environment regression coefficients. Genet. Epidmiol. 35, 11–18 (2011).
Article Google Scholar
Zhu, J. Mixed linear model approaches for analyzing genetic models of complex quantitative traits. J. Zhejiang Univ. Sci. 1, 78–90 (2000).
Article MATH Google Scholar
Wang, D.L., Zhu, J., Li, Z.K.L. & Paterson, A.H. Mapping QTLs with epistatic effects and QTL × environment interactions by mixed linear model approaches. Theor. Appl. Genet. 99, 1255–1264 (1999).
Article Google Scholar
Yang, J., Zhu, J. & Williams, R.W. Mapping the genetic architecture of complex traits in experimental populations. Bioinformatics 23, 1527–1536 (2007).
Article CAS PubMed Google Scholar
Balding, D.J. A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7, 781–791 (2006).
Article CAS PubMed Google Scholar
Fisher, R.A. The rhesus factor; a study in scientific method. Am. Sci. 35, 95–102 (1947).
CAS PubMed Google Scholar
Evans, D.M., Marchini, J., Morris, A.P. & Cardon, L.R. Two-stage two-locus models in genome-wide association. PLoS Genet. 2, e157 (2006).
Article CAS PubMed PubMed Central Google Scholar
Hemani, G., Theocharidis, A., Wei, W. & Haley, C. EpiGPU: exhaustive pairwise epistasis scans parallelized on consumer level graphics cards. Bioinformatics 27, 1462–1465 (2011).
Article CAS PubMed Google Scholar
Kam-Thong, T. et al. EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units. Eur. J. Hum. Genet. 19, 465–471 (2011).
Article CAS PubMed Google Scholar
Lippert, C. et al. An exhaustive epistatic SNP association analysis on expanded Wellcome Trust data. Sci. Rep. 3, 1099 (2013).
Article CAS PubMed PubMed Central Google Scholar
Zhu, Z. et al. Development of GMDR-GPU for gene-gene interaction analysis and its application to WTCCC GWAS data for type 2 diabetes. PloS one 8, e61943 (2013).
Article CAS ADS PubMed PubMed Central Google Scholar
Mackay, T.F., Stone, E.A. & Ayroles, J.F. The genetics of quantitative traits: challenges and prospects. Nat. Rev. Genet. 10, 565–577 (2009).
Article CAS PubMed Google Scholar
Schadt, E.E. et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 37, 710–717 (2005).
Article CAS PubMed PubMed Central Google Scholar
Jansen, R.C. & Nap, J.P. Genetical genomics: the added value from segregation. Trends. Genet. 17, 388–391 (2001).
Article CAS PubMed Google Scholar
Petretto, E. et al. Integrated genomic approaches implicate osteoglycin (Ogn) in the regulation of left ventricular mass. Nat. Genet. 40, 546–552 (2008).
Article CAS PubMed PubMed Central Google Scholar
Ayroles, J.F. et al. Systems genetics of complex traits in Drosophila melanogaster. Nat. Genet. 41, 299–307 (2009).
Article CAS PubMed PubMed Central Google Scholar
Zhu, J. Mixed linear model approaches for analyzing genetic models of complex quantitative traits. J. Zhejiang Univ. Sci. 1, 78–90 (2000).
Article MATH Google Scholar
Yang, J. et al. QTLNetwork: mapping and visualizing genetic architecture of complex traits in experimental populations. Bioinformatics 24, 721–723 (2008).
Article CAS PubMed Google Scholar
Brigman, J.L., Mathur, P., Lu, L., Williams, R.W. & Holmes, A. Genetic relationship between anxiety-related and fear-related behaviors in BXD recombinant inbred mice. Behav. Pharmacol. 20, 204–209 (2009).
Article PubMed PubMed Central Google Scholar
Taylor, B.A. et al. Genotyping new BXD recombinant inbred mouse strains and comparison of BXD and consensus maps. Mamm. Genome 10, 335–348 (1999).
Article CAS PubMed Google Scholar
Shepherd, J.K., Grewal, S.S., Fletcher, A., Bill, D.J. & Dourish, C.T. Behavioral and Pharmacological Characterization of the Elevated Zero-Maze as an Animal-Model of Anxiety. Psychopharmacology 116, 56–64 (1994).
Article CAS PubMed Google Scholar
Zhang, B. & Zhu, J. Impact of cigarette smoking and gender on genetic architecture of body mass index. J. Zhejiang Univ. (Agric. & Life Sci.) 40, 421–430 (2014).
Google Scholar
Jia, Y. et al. Association mapping for epistasis and environmental interaction of yield traits in 323 cotton cultivars under 9 different environments. PloS one 9, e95882 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Altshuler, D. et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407, 513–516 (2000).
Article CAS ADS PubMed Google Scholar
Lizardi, P.M. et al. Mutation detection and single-molecule counting using isothermal rolling-circle amplification. Nat. Genet. 19, 225–232 (1998).
Article CAS PubMed Google Scholar
Brookes, A.J. The essence of SNPs. Gene 234, 177–186 (1999).
Article CAS PubMed Google Scholar
Gatz, M. et al. Heritability for Alzheimer’s disease: the study of dementia in Swedish twins. J. Gerontol. B-Psychol. 52, M117–125 (1997).
Article CAS Google Scholar
Laan, M. & Paabo, S. Demographic history and linkage disequilibrium in human populations. Nat. Genet. 17, 435–438 (1997).
Article CAS PubMed Google Scholar
Zou, F. et al. Quantitative trait locus analysis using recombinant inbred intercrosses: theoretical and empirical considerations. Genetics 170, 1299–1311(2005).
Article CAS PubMed PubMed Central Google Scholar
Zhou, L.Y. et al. Mapping epistasis and environment × QTX interaction based on four -omics genotypes for the detected QTX loci controlling complex traits in tobacco. The Crop Journal 1, 151–159 (2013).
Article Google Scholar
Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. methods 8, 833–835 (2011).
Article CAS PubMed Google Scholar
Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nat. methods 9, 525–526 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Article CAS PubMed PubMed Central Google Scholar
Henderson, C.R. Estimation of Variance and Covariance Components. Biometrics 9, 226–252 (1953).
Article MathSciNet Google Scholar
Zhu, J. & Weir, B.S. Diallel analysis for sex-linked and maternal effects. Theor. Appl. Genet. 92, 1–9 (1996).
Article CAS ADS PubMed Google Scholar

Download references

Acknowledgements

This research is supported in part by grants from the National Basic Research Program of China (973) (2011CB109306, 2010CB126006), National Natural Science Foundation of China (30470916), Microsoft Research Asia, NVIDIA China. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors wish to thank Dr. Pen Wang for his helps in developing GPU-based software and also thank Drs. Robert Anholt and Jian Yang for reading the manuscript and constructive criticisms. The genotype and phenotype data of BXD recombinant inbred (RI) strains of mice were provide by Drs Robert Williams and Lu Lu (Department of Anatomy and Neurobiology, University of Tennessee Healthy Science Center).

Author information

Zhang Fu-Tao and Zhu Zhi-Hong contributed equally to this work.

Authors and Affiliations

Institute of Bioinformatics, Zhejiang University, Hangzhou, China
Fu-Tao Zhang, Zhi-Hong Zhu, Xiao-Ran Tong, Zhi-Xiang Zhu, Ting Qi & Jun Zhu

Authors

Fu-Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Hong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Ran Tong
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Xiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ting Qi
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.Z. designed the methods. F.T.Z., Z.H.Z. and J.Z. wrote the manuscript. Z.H.Z. analyzed the data. Z.H.Z., F.T.Z., X.R.T., Z.X.Z. and T.Q. implemented the software.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Zhang, FT., Zhu, ZH., Tong, XR. et al. Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants. Sci Rep 5, 10298 (2015). https://doi.org/10.1038/srep10298

Download citation

Received: 23 September 2014
Accepted: 07 April 2015
Published: 30 July 2015
DOI: https://doi.org/10.1038/srep10298

This article is cited by

Dominance is common in mammals and is associated with trans-acting gene expression and alternative splicing
- Leilei Cui
- Bin Yang
- Lusheng Huang
Genome Biology (2023)
Genome-wide association studies revealed complex genetic architecture and breeding perspective of maize ear traits
- Mita Khatun
- Md Mamun Monir
- Haiming Xu
BMC Plant Biology (2022)
Multi-omics analysis dissects the genetic architecture of seed coat content in Brassica napus
- Yuting Zhang
- Hui Zhang
- Liang Guo
Genome Biology (2022)
Identification of superior parents with high fiber quality using molecular markers and phenotypes based on a core collection of upland cotton (Gossypium hirsutum L.)
- Chengqi Li
- Chengguang Dong
- Nijiang Ai
Molecular Breeding (2022)
Conditional and unconditional genome-wide association study reveal complicate genetic architecture of human body weight and impacts of smoking
- Ting Xu
- Md. Mamun Monir
- Jun Zhu
Scientific Reports (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.