Genome-wide analysis correlates Ayurveda Prakriti

The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as “Prakriti”. To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p ≤ 1 × 10−5) were significantly different between Prakritis, without any confounding effect of stratification, after 106 permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India’s traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine.

Association analysis was performed using plink software 26 . Since the present study has no cases and controls (patients and healthy), we considered one Prakriti as case and the remaining two Prakritis as controls, and performed association analyses in three combinations: Vata vs. Kapha and Pitta (V vs. PK); Pitta vs. Kapha and Vata (P vs. VK); Kapha vs. Pitta and Vata (K vs. VP). Prior to association analysis, 3,890; 4,153 and 4,124, respectively, markers were removed from 791186 markers, which were not in Hardy-Weinberg equilibrium (HWE) i.e. p-value < 0.001 in controls of V vs. PK, P vs. VK and K vs. VP; respectively. The three combination association results were further used to identify the SNPs that were significant. Considering the fact that none of the samples represents 100% single Prakriti, we did not expect very low p-value in the association analysis. In this scenario, truly associated loci may co-exist with false positive markers and can be identified by permutation analysis. As expected, we observed that SNPs having approximately same p-value in the extreme tail of theoretical distribution failed to achieve 10 6 permutations (Table 1). For example, rs2939743 having p-value 7.61 × 10 −5 dropped at 142717 th  permutation while rs10197747 having p-value 2.50 × 10 −5 achieved 10 6 permutations, which of course revealed that rs2939743 is false positive. Similarly, we found 52 true positive SNPs achieved 1 million simulations with theoretical p-value ≤ 1 × 10 −5 (details are given in Table 1; Figure S6). It is well known that some markers differ in allele frequency more across ancestral population, compared to other set of markers. Moreover, natural selection might be the reason for this phenomenon because it acts locus-specific manner 21 . We speculate that the above so-called true positive loci might be artifacts of population stratification because of high probability of false positive results at the p-value, which observed in association analysis. Hence, we performed extensive statistical analyses to control these confounding factors and/or population stratification. Prevailing methods include genomic control and EIGENSTRAT to find such confounding effect of stratification. Genomic control uses uniform inflation factor to correct stratification, which is not sufficient for those SNPs having high frequency differences between ancestors 21 . Hence, we proceeded with EIGENSTRAT and found p-value did not change drastically (Table S9). To further confirm, we used variance component model (implemented in EMMAX) 27 and mixed-linear model of association analysis (implemented in GCTA) 28 , which can correct sample structure in association, but have different statistics comparative to eigenstrat. Intriguingly, even with this analysis, we did not observe any drastic change in the p-value (Table S9). This has proved that these 52 SNPs were genuine characteristics of Prakriti and not derived from ancestry. Moreover, we also explored the allele frequency differences between centers; however, we did not find any significant difference for these 52 SNPs (Table S10). We further explored the power of 52 SNPs in Prakritis genetic differentiation ( Figure S1). In principal component analysis, 19 SNPs were excluded with r 2 > 0.75 and, as expected, we found striking separation of subjects according to their Prakriti ( Fig. 2A). On eigenvector-1 (eigenvalue = 18.168248) Pitta significantly differentiated against Vata and Kapha (p-value = 1.11022 × 10 −16 , 4.44089 × 10 −16 , respectively); while on eigenvector 2 (eigenvalue = 15.890861) Kapha was significantly different compared to Vata and Pitta (p-value = 3.33067 × 10 −16 and ~0 respectively).
To examine the statistical power of these 52 markers for categorizing the samples with unknown Prakriti, we generated a statistical model (see methods). Initially, we applied it on 205 samples and found 23.9% (49 out of 205) were explained by the proposed model (Table S11). Further, we applied it on 297 Indian (population) samples and found 37 individuals (5 Austro-Asiatic; 22 Dravidian; 8 Indo-European and 2 Tibeto-Burman) satisfying the model. According to the model, 7 individuals were Vata, 20 were Pitta and 11 were Kapha. Interestingly, Indian population samples, which belong to one Prakriti were from different ancestry (Table S12), suggesting that these makers could separate the Prakritis, irrespective of their ancestry. To confirm the proposed model, we projected these 37 individuals on eigenvector of Prakriti samples and found that these individuals clustered with Prakriti as predicted in the model (Fig. 2B). It suggests that the cluster is based on Prakriti, and is not due to the ancestry of samples. That would also suggest that the phenotypic variations have a genetic basis, which would be shared by Prakritis of Ayurveda.
Further, we used these 52 markers to find the genotype-phenotype correlations. We observed that 2 markers (rs10518915 and rs986846) were associated with two different Prakriti; rs10518915 with Vata and Pitta, while rs986846 with Kapha and Vata. This observation prompted us to believe that different alleles of the same locus might be influencing different Prakriti (Table 1). In order to correlate the functional relevance of these SNPs, we divided them into genic and non-genic. The SNPs, which are within 10 kb of gene, were considered genic; while others as non-genic 29,30 . We found 28 were genic SNPs, of which 12 were in Vata (7 genes), 11 in Pitta (7 genes), and 6 in Kapha (7 genes) ( Table 1). To correlate the function of these genes with respect to the characteristics of Prakritis, we searched in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Reactome event and found PGM1 gene associated with the Pitta phenotype. In Ayurveda, characteristics of Pitta include digestion, metabolism and energy production. Interestingly, we found PGM1 gene is in the center of many metabolic pathways i.e. glycolysis or gluconeogenesis (hsa00010); pentose phosphate pathway (hsa00030); galactose metabolism (hsa00052); purine metabolism (hsa00230) and; starch and sucrose metabolism (hsa00500) ( Figure S7). Our finding suggests that the function of the gene directly correlates with the role of Pitta in metabolism as described in Ayurvedic literature.
In addition, we have checked the PGM1 gene markers in Affymetrix data set and found 4 markers (rs2269241, rs2269240, rs2269239,and rs2269238) were associated with Pitta Prakriti and all are in strong Linkage Disequilibrium (LD) ( Figure S8). Therefore, to find the functionally relevant variants, we sequenced the whole exons and UTRs of the PGM1 gene in 78 individuals using Ion Torrent PGM (Life Technologies, USA). We found 23 variations in the gene, of which 8 were novel (Table S13). Interestingly, one non-synonymous; c.1258T > C (p.Tyr420His) (rs11208257) variant was present in the LD block and found in association with Pitta Prakriti (p-value-7.049 × 10 −3 ). The frequency of the mutant allele "C" was 5.8% in Pitta and 20% in Kapha Prakriti (Table S14). This result prompted us to replicate the marker (rs11208257) in additional samples. We genotyped this marker (rs11208257) for 665 Prakriti individuals (299 Vata, 164 Pitta and 202 Kapha) using Sanger sequencing method. Initially, we analyzed the distribution of the genotype among participating centres and found "U" samples (collected from Udupi centre) were not in HWE (p-value -0.04) (Table S15). Hence, we excluded 169 "U" samples from the analysis. Association analysis revealed that allelic and genotype distribution of the marker rs11208257 is significantly different in Pitta Prakriti against Vata and Kapha with p-value-2.06 × 10 −2 ; p-value-6.16 × 10 −3 , respectively. Further, we explored the association between P vs. V and P vs. K; and found significant p-value -7.61 × 10 −3 and 2.35 × 10 −2 , respectively. The results would therefore suggest that Vata differs more from the Pitta Prakriti than Kapha (Table S16). We further screened 1108 randomly selected Indians and 992 HapMap samples and found that the frequency of mutant allele "C" was 17.9% among Indians, 15.5-17.6% in the Europeans, 14.5-18.8% in East Asians, 42% in Mexican, 15.3% in admixed Indians (GIH) and 12.8-28.3% in Africans. Indians have comparable frequency with Europeans and GIH (Table S17). Interestingly, we found Pitta has less frequency of mutant "C" allele, and Vata and Kapha have comparable frequency with overall Indian population. To explore the functional relevance of the variant, we used SIFT software and found that the mutation is damaging with 0.01 score and thus substitution at this position may affect the protein function. Our data suggest that the SNP (rs11208257) in PGM1 gene is linked with one of the main features (energy production), which is more homogenous and constant in Pitta than with Vata and Kapha, and a genotype correlation exists for the characteristics of Prakriti classification.
In conclusion, our preliminary study suggests that the Prakriti classification, as a foundation for the practice of Ayurveda, has a genetic basis and does provide clues for further studies.

Selection of subjects and Prakriti assessment. Selection of subjects and evaluation of the Prakriti
(the human classification of Indian ancient medicine) were carried out at three centres; 1. Institute of Ayurveda and Intergrative Medicine (IAIM), Bangalore, Karnataka; 2. Sinhgad College of Engineering (SCE) Pune, Maharashtra; and 3. Shri Dharmasthala Manjunatheshwara College of Ayurveda (SDMCA) Udupi, Karnataka. This study was approved by Institutional Ethics Committees (IECs) of all the collaborative centres and the methods were carried out in accordance with the approved guidelines. We have screened normal and healthy male subjects, who were between 20-30 years. Although several Ayurvedabased studies have included both male and female subjects 7,8,15,16 , we have excluded female subjects from this study to minimize the confounding variations. Prakriti of an individual is determined based on defined anatomical, physiological psychological and behavioural characteristics. During actual assessment of Prakriti, the Ayurvedic physician needs to factor in these characteristics. One such aspect is the cyclical hormonal changes that occur in women, particularly the menstrual cycle. The hormonal fluctuations result in numerous physical and psychological disturbances, which occur in the premenstrual and menstrual phases. Existing evidence suggest that about 97% of young nulliparous women experience varying degrees of such disturbances 31 . These elicitable and visible features can confound or obscure the Prakriti assessment process. For example, premenstrual irritability occurring in a woman of Kapha Prakriti is confounding, since Kapha Prakriti individuals normally possess low irritability. Although the Ayurvedic physicians routinely enquire about the menstrual habits of patients while assessing the Prakriti, it would have been difficult for us to make similar enquiries to young, healthy women who volunteered to join this study. The health status of an individual was assessed based on the Ayurvedic criteria, that include; normal desire for food, easy digestion of ingested food, excretion of feces, excretion of urine, excretion of flatus, functioning of sensory organs, comfortable sleep, easy awakening, and attainment of strength, bright complexion and longevity. Subjects with smoking habit, diabetes, hypertension and other chronic diseases were excluded from the study. Blood pressure (BP) was measured for each subject and BP > 130/90 mm of Hg were excluded from the study. Chronic systemic diseases such as rheumatoid Scientific RepoRts | 5:15786 | DOi: 10.1038/srep15786 arthritis, cancer, etc., and subjects having recent history of acute ailments such as fever due to infections were also excluded.
We followed three steps for the Prakriti assessment of each subjects. In the first stage, senior Ayurvedic physicians assessed the Prakriti of the subjects, applying classical Ayurveda parameters of Prakriti determination. In second stage, the same subjects were assessed using Ayusoft, a Prakriti software (www. ayusoft.cdac.in), which contains a comprehensive questionnaire, which had been developed based on the information from original Ayurvedic literature. In the third stage, another team of Ayurvedic physicians, who were not aware of the outcomes of assessment by senior physicians and Ayusoft, compared the Prakritis analysis. Subjects with ≥ 60% of single Prakriti dominance and having concordance in all the three stages were selected for the genome-wide analysis. Quantitative analysis of Prakriti was performed using Ayusoft along with traditional ayurvedic measures for the Prakriti assessment. The reason for considering ≥ 60% of a particular Prakritias a dominant was mainly due to feasibility and concordance. Single dosha Prakriti with high percentage of one dosha rarely exist, hence most of the individuals possess dual-dosha Prakriti 12 . Therefore, we have considered subjects with ≥ 60% as single dosha dominant-Prakriti. Subjects ≥ 60% of one dominant Prakriti were selected and blood was drawn after obtaining their informed written consent. A total of 3,416 healthy individuals were screened for their Prakriti, as per the details given above. From the total, 971 subjects who showed a predominant Prakriti of ≥ 60% were included in the analysis ( Figure S1).
High throughput genotyping, their quality control criteria and resequencing. Genotyping.
DNA was isolated from the blood samples using standard protocol 32 . We randomly selected 262 Prakriti individuals for genotyping, using Genome-Wide Human SNP Array from Affymetrix (6.0), following manufacturer's protocols. About 250 ng of genomic DNA was digested with Nsp I and Sty I restriction enzymes, followed by ligation of Nsp/Styadaptors, using T4 DNA ligase. PCR was performed using the primers that are specific to these adopters. After checking the amplicons on 2% agarose gel, they were purified with deep-well plate using magnetic beads and the fragments were eluted using EB buffer, followed by quantification and fragmentation. The fragmented PCR products (< 180 bp) were end-labeled using labeling kit. Labeled fragments were hybridized onto the Affymetrix (6.0) SNP arrays using hybridization cocktail. Hybridization was performed in hybridization oven for about 18 hrs at 50 °C. After hybridization, arrays were washed, stained, scanned and analyzed using Affymetrix Genotyping Console 2.0 and GeneChip ® Operating Software (GCOS). The samples which passed the quality controls i.e. call rate > 95% and CQC > 0.4 were considered. Affymetrix power tool (apt-geno-qc) was used for calculation of dm (dynamic model) value. The samples having dm.all_qc< 0.83 were removed from further analysis and genotypes were fetched with Birdsuite software from Broad Institute 17 ( Figure S1).
Detection of technical artifacts. In order to validate the Affymetrix data set, we randomly selected 48 markers (Table S3) from Affymetrix array and genotyped 48 individuals, who were already genotyped by Affymetrix array, using custom-designed VeraCode GoldenGate Genotyping Assay System (Illumina, San Diego, USA). Genotyping was performed according to the manufacturer's (Illumina, San Diego, USA) instructions. The genotypes obtained by both the platforms were compared and checked for accuracy ( Figure S1).
Targeted resequencing. We sequenced the whole-exons and UTRs of PGM1 gene ( Figure S1) for randomly selected 43 Pitta and 35 Kapha individuals using Ion Torrent (Life Technologies, USA), following protocols of the manufacturer. Primer sequences were manufactured specifically for use with Ion AmpliSeq kits. The costume Ion AmpliSeq TM primer contains 35 amplicons in a single pool. For preparing amplicon libraries, about 10 ng of DNA was amplified (PCR) using AmpliSeq TM primer pools and Ion AmpliSeq TM HiFi master mix (Ion AmpliSeq kit version 2.0 Beta). The amplified products were pooled and treated with 2 μ l of FuPa reagent. The amplicons were then ligated with adapters from the Ion Xpress TM barcoded adapters 17-64 kit according to the manufacturer's instructions (Ion Torrent). After ligation, the amplicons were purified by Agencourt ® AMPure ® XP Reagent and additional amplification was performed to complete linkage between adapters and amplicons. In order to determine the library concentration, an Agilent 2100 Bioanalyzer high-sensitivity DNA kit (Agilent, Santa Clara, CA) was used to visualize the size range of the libraries. Equimolar concentrations of all the libraries were pooled and diluted. Using Ion One Touch TM 200 Template Kit v@ DL (Life Technologies, USA), emulsion PCR was carried out according to the manufacturer's instructions. Ion Spheres (ISPs) were recovered according to the Ion Sphere Particles 200 recovery protocol. Sequencing was done following the Ion PGM TM 200 Sequencing Kit Protocol (version 6; Ion Torrent). The 318 sequencing chip was loaded and run on an Ion Torrent PGM (Ion Torrent). Base calling and alignment were performed using the Torrent Suite 3.0 software (Ion Torrent). In order to find the significant variation in the PGM1 whole exome data, we performed association analysis using plink software 26 and variations are annotated on EnsEMBL-BioMart.
Sanger sequencing. To validate and replicate the Pitta associated SNP (rs11208257), Sanger sequencing was carried out for 496 Prakriti samples (246 Vata, 116 Pitta and 134 Kapha) along with randomly selected 1108 Indian samples. Pair of primers (Forward primer: 5′ -GCACGTTTCTTACAGCAGCT-3′ and Reverse primer: 5′ -ACCTTACCTTGTACCCCAGC-3′ ) were designed, synthesized and PCR was Population stratification. Principal component analysis was done with Eigensoft Package 21 . Convertf was used for converting plink ped file to Eigenstrat format. We pruned the SNPs on the basis of their Linkage Disequilibrium (r2 > 0.75) before running PCA by using Eigensoft's killr 2 option. Ten eigenvectors were fetched. To find the ancestry of 245 Prakriti samples, we used 297 known ancestry of Indian population dataset (previously published) and performed the PCA. Stratification was checked and 40 outlier samples were excluded with cutoff sigma value ≥ 0.6 (default value) on 1-10 eigenvectors in 10 iterations ( Figure S1). Association analysis. Plink was used for association analysis 26 . Imputed Beagle file were converted into plink ped file. Association analysis was performed for the Prakritis. Since there are no case control groups in the present study, we compared one Prakriti against the other two Prakritis (Vata vs. Pitta and Kapha, Pitta vs. Vata and Kapha, and Kapha vs. Vata and Pitta) and calculated p-value from theoretical distribution. In order to exclude the markers, which could be in association by chance, we also performed adaptive permutation approach (empirical distribution) for maximum 10 6 iteration withplink and considered only those markers who achieved maximum 10 6 permutations and have p-value ≤ 1 × 10 −5 in theoretical distribution ( Figure S1).
Addressing issue of population stratification as possible confounder in association analysis. Even subtle stratification can cause spurious association; hence we used EIGENSTRAT software 21 for correcting association chi-square value on 10 eigenvector and to find its confounding effect. Initially we excluded 385, 404 SNPs with r2 > 0.75 and calculated eigenvector with remaining 405, 782 SNPs with SMARTPCA. Further we used these same 10 eigenvector for correction of chi-square value with EIGENSTRAT ( Figure S1).
To address this issue, we also used EMMAX and GCTA tools 27,28 . Both statistical methods consider genetic structure in association analysis. Hence, we expected major changes in p-value of 52 SNPs. First, we generated IBS matrix implemented in EMMAX and then used it with 10 eigenvector (generated with SMARTPCA) as covariate to calculate p-value with variance component model (implemented in EMMAX). To calculate mixed-model association p-value (implemented in GCTA), first, we calculated