Major depressive disorder (MDD) is one of the most common psychiatric disorders, with approximately 15% of the lifetime prevalence. Although the etiology underlying MDD remains unclear, based on the etiological studies, MDD is a multifactorial disorder with multiple genetic and environmental factors that have relatively small effects1. Based on this evidence, many genetic studies have sought to identify loci that are significantly (P < 5.00 × 10−8) associated with MDD, but only a small number of loci that met the criteria have been identified, likely because of an under-powered sample size2,3,4,5.

Very recently, however, the Psychiatric Genomics Consortium (PGC) carried out meta- and mega-analyses (PGC2) of genome-wide genomic data for MDD in which 130,664 MDD cases and 330,470 controls were assessed, identifying 44 MDD-associated loci with genome-wide significance6. In the MDD genome-wide association study (GWAS), including the PGC2 MDD GWAS, the majority of the subjects were of European ancestry. Whereas GWASs targeting non-European samples were limited, as there is only one large scale GWAS for MDD in patients of non-European ancestry that was carried out on Han Chinese women by the China, Oxford and Virginia Commonwealth University Experimental Research on Genetic Epidemiology (CONVERGE) consortium7. In addition, the sample size being limited, we previously carried out a genome–wide environmental interaction study (Fujita Health University depressive state GWEIS) and identified novel risk loci for psychological distress in the Japanese population8. To further assess the susceptibility loci and genes associated with MDD/psychological distress, large and multi-ethnic datasets are essential. In this context, for large GWAS datasets, a web-based data collection is expected to meet these needs9.

In the present study, we carried out a GWAS in an East Asian (Japanese) population in which over ten thousand participants filled out web-based questionnaires on their medical/health conditions, and the questionnaires included a case-finding instrument for MDD, the 6-item Kessler screening scale (K6 scale) for psychological distress10,11. This GWAS aimed to (1) detect the loci or pathway for the psychological distress/MDD; (2) to assess whether there are overlaps between the loci identified in the present GeneQuest (GQ) psychological distress GWAS and those found in previous GWASs including the CONVERGE MDD study, PGC2 MDD study, the Genetics of Personality Consortium (GPC2) neuroticism study, and Fujita Health University depressive state GWEIS; and (3) to evaluate genetic correlations between the present psychological distress GWAS results and those of the CONVERGE and PGC2 MDD GWASs. To the best of our knowledge, this is the first large-scale GWAS for MDD/psychological distress using data from direct-to-consumer (DTC) genetic tests carried out in a population of non-European-ancestry.

Materials and methods

Study cohort

All participants were the customers of Japanese Direct to Consumer (DTC) genetic testing service, HealthData Lab (Yahoo! Japan Corporation, Tokyo, Japan). The customers purchased or were complementarily provided the genetic testing kit via the internet. They read an agreement, signed a consent form, and registered their IDs on the web page. In the consent form, they signed consent forms agreeing to use their data for research. The participants collected their saliva using Oragene®·DNA(OG-500) (DNA Genotek, Ottawa, Canada), a saliva collection kit. They sent the saliva collection kit and signed consent form to the DTC genetic testing service provider, and a few weeks later, the analysis results were provided via the HealthData Lab web site. We also obtained a second form of consent for the usage of their data in this study by offering an opt-out through the web site at this stage. For participant confidentiality, all individuals were irreversibly anonymized. We approached 11,091 subjects and 11,089 subjects (99.98%) did not opt-out.

Genotyping assay

DNA was extracted from saliva samples according to the manufacturer’s instructions. Degeneration and concentration of the obtained genomic DNA was performed by gel electrophoresis with PicoGreen® (Thermo Fisher Scientific, Waltham, MA), respectively. When the genomic DNA was degraded or in low concentration, we sent a new saliva collection kit to the customer and requested an additional saliva collection.

Genotyping was performed at RIKEN GENESIS. We used two platforms, the Illumina HumanCore-12 Custom BeadChip and HumanCore-24 Custom BeadChip (Illumina, San Diego, CA) platforms. The Illumina HumanCore-12 Custom BeadChip contained 302,072 markers, including about 2000 our selected markers. The Illumina HumanCore-24 Custom BeadChip contained 309,725 markers, including approximately 3000 our selected markers. When the sample call rate (percentage of SNPs successfully genotyped by sample) for ordinal SNPs, those contained in the Illumina HumanCore platform, was under 85 %, we sent a new saliva collection kit to the customer and re-genotyped. For this study, we selected 296,675 SNPs contained in both genotyping platforms for analyses.

Phenotype measurement

To determine a participant’s depression phenotype, we used questionnaire responses, which were collected on the DTC genetic testing service web page. The K6 scale was used to assess the psychological distress-related phenotype: (1) “During the past month, how often did you feel so sad that nothing could cheer you up?”, (2) “During the past month, how often did you feel hopeless?”, (3) “During the past month, how often did you feel restless or fidgety?”, (4) “During the past month, how often did you feel nervous?”, (5) “During the past month, how often did you feel that everything was an effort?”, and (6) “During the past month, how often did you feel worthless?”; Answers: Never (score: 0), Not very often (score: 1), Some of the time (score: 2), Most of the time (score: 3), All the time (score: 4). A previous study reported that the sensitivity and specificity of the K6 scale for screening psychological distress were 100 and 68.7%, respectively, with a cut-off point of 5 in the Japanese population12. Therefore, we used this cut-off line to dichotomize the samples into two groups, “psychologically distressed group (score: 5~24)” and “non-psychologically distressed group (score: 0~4)”, and subjects with past medical history were removed from “non-psychologically distressed group”.

Quality control

The QC was assessed for each sample at both the individual and SNP levels. At the individual level, subjects with inconsistent sex information between X chromosomal SNP genotypes and the questionnaire, or low call rates (<0.95), were excluded. In addition, kinship was examined with a pairwise identify-by-descent (IBD) estimation. Individual pairs estimated to be kin (p(IBD = 0) < 0.05) were excluded. At the SNP level, SNPs with low call rates (<0.95), low Hardy-Weinberg equilibrium exact test P values (<0.001), or low minor allele frequencies (MAFs; <0.01) were excluded. These QC procedures were performed with PLINK v1.07. Finally, 10,892 individuals and 229,276 SNPs from 11,091 individuals remained.

Population stratification

To estimate genetic ancestry, we applied the principal component analysis in EIGENSOFT v6.1.32,3. To identify continental ancestry, we downloaded HapMap phase 3 individual-level genotypes and merged the HapMap4 genotypes with our genotypes. We merged our genotypes with the HapMap data from 4 populations, CEU (Utah residents with Northern and Western Europe ancestry from CEPH collection), YRI (Yoruba in Ibadan, Nigeria), JPT (Japanese in Tokyo, Japan), and CHB (Han Chinese in Beijing, China). The eigenvectors were generated by EIGENSOFT, and the top two of eigenvectors were plotted with R-statistics software v3.2.1. To minimize the ancestry biases, 219 subjects with non-Japanese and Ryukyu ancestry were excluded from further analyses.


Haplotype phasing and imputation were performed using SHAPEIT v213 and IMPUTE2 v2.3.214 with a 1000 Genomes reference panel (phase 3)15. Only samples that had passed quality control assessment were imputed. After the imputation, SNPs were assessed using an information threshold of 0.3 and a minor allele frequency threshold of 0.01, giving a final total of 8,567,708 SNPs for analysis.

Genome-wide association analysis (GWAS)

We used logistic regression to test the association between phenotype and allele in each SNP, assuming an additive model. Covariates included age, sex, and the top five eigenvectors. Odds ratio and 95 % confidence intervals were calculated for effect allele. A Manhattan plot and quantile-quantile plot were constructed using the R software package qqman v0.1.2. The significance level was set at P < 5.00 × 10−8 and the suggestive significance level was set at P < 1.00 × 10−5, which is the Bonferroni-correlated threshold for the number of independent variants among the HapMap phase 3 genotyped SNPs. All statistical analyses were performed using PLINK v1.07.

Gene-set enrichment analysis

To identify biological pathways or gene sets associated with psychological distress, we carried out a pathway analysis using meta-analysis gene-set enrichment of variant associations (MAGENTA) ( MAGENTA implements gene-set enrichment analysis on GWAS data by assessing pathway annotations in web-based databases, including the Data bases: The Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Protein Analysis Through Evolutionary Relationships (PANTHER), BioCarta, Reactome, and Ingenuity databases. Since the 75th percentile cut-off demonstrates greater power than 95th percentile cut-off in interpreting complex diseases with high polygenesis, we used the 75th percentile as the cut-off value for our interpretation16.

Risk profile score analyses

Risk profile score (RPS) analyses were carried our as previously described17. Briefly, the statistical analysis software package PRSice v1.23 was used. The P threshold (PT) for selecting the “risk” SNPs was set sequentially at 0.001, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5. SNPs were selected if their P values were between 0 and the chosen value of PT. The variance for the RPS was estimated using Nagelkerke’s R2 from a logistic regression mode. We set the type I error rate at 0.001, which is suggested as a conservative threshold18.

Genetic correlation analysis

Linkage disequilibrium (LD) score regression analysis was carried out to examine the genetic correlation, as previously described17, using pre-computed LD scores for East Asian (Japanese and Chinese) results19. To compare Japanese and European results, we used Popcorn (version 0.9.7) and examined the trans population genetic effect correlation (the correlation coefficient for the per-allele SNP effect sizes, ρge), and the genetic impact correlation (the correlation coefficient for the population-specific allele variance normalized SNP effect sizes, ρgi)20.


Population sociodemographic features are shown in Table 1. To dichotomize the sample set, we used the K6 scale21 to screen subjects with psychological distress and set the cut-off at 5 point. Subjects with a score of 5 or more (psychologically distressed group, N = 3981) were classified as cases, and those with score of 0–4 and no past medical history of MDD (non-psychologically distressed group, N = 6349) were used as controls.

Table 1 Cohort demographics for the GeneQuest data sets for psychological distress assessed using K6

First, we assessed the loci for the psychological distress/MDD. Quantile–quantile (QQ) and Manhattan plots for psychological distress were assessed with a K6 scale (cut-off point of 5) and are listed in Fig. 1. λgc is 1.020 (95%CI: 1.018–1.022). In this SNP-based association analysis, we detected an association with a genome-wide significant level (P < 5.00 × 10−8) at rs6073833 (P = 7.60 × 10−9), a region upstream of WAP four-disulfide core domain 11 (WFDC11) at 20q13.12 (Fig. 2, Table 2). Additional loci with suggestive association levels are shown in Table 2 and Supplementary Table 1. We then assessed the gene-set enrichment for psychological distress, calculated with MAGENTA software16. Supplementary Table 3 lists biological pathways and gene sets with a nominal P75% cutoff less than 0.01, among which there are biological pathways with a false discovery rate (FDR) less than 0.05, such as platelet-derived growth factor (PDGF) signaling (Nominal P75% cutoff = 5.00 × 10−4; FDR75% cutoff = 2.72 × 10−2) and 14-3-3 signaling (Nominal P75% cutoff = 3.30 × 10−3; FDR75% cutoff = 4.65 × 10−2). Biological pathways with an FDR less than 0.1 included nuclear transcription factor peroxisome proliferator-activated receptor (PPAR) signaling (Nominal P75% cutoff = 3.50 × 10−3; FDR75% cutoff = 6.63 × 10−2) and interleukin-6 (IL-6) signaling (Nominal P75% cutoff = 7.30 × 10−3; FDR75% cutoff = 8.61 × 10−2), both of which may be associated with MDD22,23,24.

Fig. 1
figure 1

Association analysis for imputed SNPs. a Quantile-Quantile plot for GeneQuest GWAS for psychological distress assessed using K6. Horizontal and vertical axes indicate expected the P values under a null distribution and the observed P values respectively. b Manhattan Plots show –log10 P of the genotyped SNPs. The red line corresponds to P = 5.00 × 10−8. Gene labels are annotated for the genes closest to the significant SNPs

Table 2 Lists of SNPs, and their closest genes, showing genome-wide significant association, or suggestive significant association, with psychological distress (assessed by the K6 scale; cut-off point: 5)

Next, we evaluated whether the loci observed in the present GeneQuest (GQ) psychological distress GWAS were replicated in the results based on the previous GWASs. In particular, we assessed the extent to which loci with a P value less than 5.00 × 10−5 from the present GWAS were associated with phenotypes in the previous studies including PGC2 MDD (excluding 23andMe data)6, and GPC2 neuroticism GWAS25,26,27 (Supplementary Table 3). In this replication analysis, we faild to find association loci with a P value less than 0.05 in the PGC2 MDD GWAS. On the other hand, there were four loci that showed P values less than 0.05 in the GPC2 neuroticism study, with the same direction of effect. This may suggest risk loci shared between the psychological distress and neuroticism.

Fig. 2
figure 2

Regional plot of association signals in the newly identified loci. The –log10 P of the genotyped SNPs for psychological distress is shown on the left y axis. The recombination rates expressed in centimorgans (cM) per Mb (Megabase) (blue line) are shown on the right y axis. Position in Mb is on the x axis. The most associated SNP is shown as a purple diamond

Finally, we evaluated shared genetic components between the present psychological distress GWAS results and those of the CONVERGE and PGC2 MDD GWASs based on risk profile score (RPS) analysis and genetic correlation analysis. In the RPS analysis, we set discovery (training) datasets obtained from PGC2 and CONVERGE, whereas training dataset from the current GQ GWAS, in which 229,276 SNPs, were obtained before imputation (Fig. 3). We found a significant replication in the RPS scores of the GQ case/control status based on the CONVERGE (the highest R2 = 0.0013, best P= 0.001, P-value threthold = 0.0004) dataset, but no replication based on the PGC2 MDD (the highest R2 = 0.00044, best P= 0.049, P-value threthold = 0.00695) was observed. We also assessed genetic correlations between the GQ psychological distress results and other GWAS results including CONVERGE MDD, PGC2 MDD, and GPC2 neuroticism results (Table 3). For CONVERGE results, similar ancestry (East Asian) was evaluated, thus LD score regression (LDSC) analysis19 was suitable. However, for the GWASs based on European populations (PGC2 MDD and GPC2 neuroticism GWASs), it is difficult to calculate the genetic correlation between different populations with LDSC analysis. Therefore, we used recently developed software, Popcorn, to analyze trans-population genetic correlation. Based on the LDSC analysis for CONVERGE, we observed a non-significant trend (P = 0.0529) for the estimate of the genetic correlation (rg: 0.368). However, based on the Popcorn analysis, significant correlations were observed between our Japanese samples and the PGC2 MDD GWASs (P for |ρge| > 0: 4.15 × 10−4, P for |ρgi| > 0: 3.28 × 10−7, P for |ρge| < 1: 2.68 × 10−7, P for |ρgi| < 1: 9.27 × 10−10; ρge = 0.394, ρgi = 0.358), though there were no significant correlations with the GPC2 neuroticism GWAS (P for |ρge| > 0: 0.0787, P for |ρgi| > 0: 0.0817, P for |ρge| < 1: 0.179, P for |ρgi| < 1: 0.243; ρge = 0.512, ρgi = 0.544).

Fig. 3
figure 3

Risk profile score (RPS) analysis. The y-axis presents the Nagelkerke pseudo R2 that indicates the variance explained in psychological distress for each P value threshold shown. GeneQuest psychological distress (pruned data) was used as the target and the CONVERGE MDD or PGC2 MDD was used as the discovery

Table 3 Genetic correlation assay for GeneQuest psychological distress GWAS


In the present GWAS, loci associated with psychological distress were assessed in a Japanese population and a genome-wide significant association with psychological distress was observed at 20q13.12. The linkage disequilibrium block (LDB) to which the top-hit SNP belongs contains a few genes, including WFDC11, 10B, and 9, and we cannot identify the causal gene(s) for psychological distress. However, there are studies reported that association of WFDC11 with psychiatric disorder. In particular, WFDC11 expression was up-regulated in autism28 and WFDC11 methylation levels were down-regulated in schizophrenia29. WFDCs were first identified as protease inhibitors, but the WFDC11 target molecules have not been identified. While their precise molecular functions have not been fully elucidated, recent studies demonstrated that proteins with a WFDC domain may be involved in inflammation-related signaling30,31,32. For example, WFDC12 is known to induce down-regulation of the pro-inflammatory signal in monocytes32. This is consistent with the hypothesis that immune dysregulation-related states, such as chronic inflammation and immunosuppression, might be involved in the pathogenesis of MDD33,34,35. It is, however, unclear whether WFDC11, 10B, and 9 are involved in the pathophysiology of psychological distress via its regulatory roles in inflammation and/or immune state36. Assessing the molecular functions of the WFDCs, in inflammation models of depression in rodents may help reveal whether WFDC11 plays a role in the pathophysiology of psychological distress/MDD. The top-ranked gene-sets detected by gene set enrichment analysis included those involved in IL-6 and PPAR signaling, both of which have been associated with MDD/psychological distress. In particular, IL-6 is one of the most consistent inflammatory markers elevated in MDD23,24. PPAR-γ agonists were reported to have antidepressant properties, and PPAR is thought to represent a potential molecular target for MDD treatment22. This supports the external validity of the study. Significant enrichments were also detected for the PDGF and 14-3-3 signaling. While there are currently no reports indicating their significant associations with psychological distress/MDD, those signaling pathways may be involved in the pathophysiology of psychological distress/MDD.

To replicate our top-hit and confirm the shared genetic components with MDD or neuroticism, we consulted GWAS results based on other datasets from similar and different populations. We assessed whether there are replications of risk loci found in the present and previous studies. No nearby locus for our top-hit showed an association (P < 0.05) with MDD in the PGC2 study (Supplementary Table 4) or the CONVERGE study (data not shown), which was performed on an East Asian population (Han Chinese population). Thus, no replication was obtained for the top-hit. These discrepancies may be due to the differences in genetic background among studies and/or in environmental exposures. On the other hand, several top-hit loci in SNPs from the WFDC cluster display weak associations with neuroticism (Supplementary Table 5), suggesting that risk locus may be shared between psychological distress and neuroticism.

In association analysis, where we assessed the extent to which psychological distress case-control status in the present GWAS can be predicted by MDD RPSs. The best P for the CONVERGE MDD GWAS was barely significant, while that for the PGC2 MDD GWAS failed to reach the significant level. This may be due to the influences of LD differences and/or unique risks for GQ subjects. For the genetic correlation analysis, that non-significant trend for the genetic correlation with CONVERGE was observed, while PGC2 MDD GWAS from EUR showed a significant correlation. This might be derived from the sample size differences as PGC2 analyzed the largest sample size, but CONVERGE targeted less than 1/5~1/107,25,26,27. However, we assume that our psychological distress GWAS shared the genetic component for MDD with the EAS or EUR populations. A larger sample size is essential for conclusive results, but our sample captured an appropriate phenotype related to MDD.

There are a some limitations, that are noted, to our present study. First, given the expected small effect sizes of the associated variants for MDD or psychological distress6, the sample size of approximately ten thousand may not be sufficient for precisely assessing risk variants. Indeed, top-hit locus reported in the PGC2 MDD study did not show significant overlap with the loci identified in the present GWAS, and the effect sizes of the detected top-hit loci in the preset study were larger in either direction when compared to those reported in the PGC2 MDD study. In addition, the effect of environmental factors was not assessed in the present study, although gene-environment interactions are thought to play a major role in the susceptibility and pathogenesis of depressive state/MDD. These factors may reduce the power of the study. Second, there can be a selection bias in participant recruitment, as the subjects voluntarily participated in the project, a preference for novelty may be higher in the participants than in the general population. Thus, there can be biased genotype distribution in the subjects, which may also affect the power of the GWAS. Additionally, the trait selection was carried out with self-rating questionnaires, not with psychiatric consultation. In summary, we identified a locus as a novel susceptibility region associated with psychological distress in the Japanese population. Further replication is necessary to confirm the present findings and to uncover the genetic landscape for psychological distress traits.