In multiloci-based genetic association studies of complex diseases, a powerful and high efficient tool for analyses of linkage disequilibrium (LD) between markers, haplotype distributions and many chi-square/p values with a large number of samples has been sought for long. In order to achieve the goal of obtaining meaningful results directly from raw data, we developed a robust and user-friendly software platform with a series of tools for analysis in association study with high efficiency. The platform has been well evaluated by several sets of real data.
At present, Gene based association study is a widely used method to study the genetic etiology of complex diseases. Since there are huge numbers of candidate genes, susceptible polymorphism sites and repetition studies in different samples, a user-friendly software platform with high efficient algorithms is urgently needed for researchers. To meet this need, therefore, we recently developed such a platform with our suggested name “SHEsis”, which was placed on the website http://www.nhgg.org/analysis/ (Fig. 1).
Linkage disequilibrium (LD): We calculated Lewontin's D' (|D'|) 1 and r2 between each pair of genetic markers.
Haplotype analysis: a Full-Precise-Iteration (FPI) algorithm was used in haplotype reconstruction and frequency estimation inner randomly chosen samples. If there are two SNP loci, the FPI algorithm is based on the equation N(11)=2N(11/11)+N(12/11)+N(11/12)+P[(11/22)|(XX)]*N(XX). Here, N(11) means the number of “11” haplotype; N(12/11) means the number of samples who carry “12” haplotype on one chromosome and “11” on another, etc; N(XX) is the number of samples carry both “1/2” genotype at the two loci, which is ambiguous for haplotype recognition;. As N(11/22) is linked with N(11) and N(22), we could find out the answer by iteration.
Case control study: Monte carlo simulation test, normal chi-square test and odds ratio test were all implemented for alleles and genotypes on single locus and multi-loci haplotypes.
RESULTS AND DISCUSSION
This platform supports all kinds of currently-used genetic markers of chromosomes and has capacity of analyzing the correctly formatted input parameters in seconds. It is quite user-friendly and easy-handling.
In haplotype analysis, we created an FPI algorithm, which could reconstruct ambiguous haplotypes and estimate haplotype frequencies in the given random sample set. The more samples are used in analysis, the more accuracy it will show. The platform can estimate haplotype frequency individually in controls and in samples to give the results of both single haplotype and a global data automatically. Haplotypes with low frequencies could be collected together to give one result in the case-control analysis.
In order to detect the reliability of our platform, we evaluated it with several published scientific papers2, 3, 4, 5, 6 and obtained exactly the same result but very robust rate. Compared with the tools used in cited papers2, 3, 4, 5, 6, our platform shows the advantage of easier handling and the capability of analyzing complicated data. Therefore, it could become a powerful and useful tool for studies of complex diseases in the near future. The SHEsis software platform is now available for free at the above website address. And we also will go on updating its power and capacity.
This work was supported by the Major State Basic Research Development program of China and the National High Technology Research and Development Program of China.