A web-based survey on various symptoms of computer vision syndrome and the genetic understanding based on a multi-trait genome-wide association study

A variety of eye-related symptoms due to the overuse of digital devices is collectively referred to as computer vision syndrome (CVS). In this study, a web-based survey about mind and body functions, including eye strain, was conducted on 1998 Japanese volunteers. To investigate the biological mechanisms behind CVS, a multi-trait genome-wide association study (GWAS), a multivariate analysis on individual-level multivariate data, was performed based on the structural equation modeling methodology assuming a causal pathway for a genetic variant to influence each symptom via a single common latent variable. Twelve loci containing lead variants with a suggestive level of significance were identified. Two loci showed relatively strong signals and were associated with TRABD2B relative to the Wnt signaling pathway and SDK1 having neuronal adhesion and immune functions, respectively. By utilizing publicly available eQTL data, colocalization between GWAS and eQTL signals for four loci was detected, and a locus on 2p25.3 showed a strong colocalization (PPH4 > 0.9) on retinal MYT1L, known to play an important role in neuronal differentiation. This study suggested that the use of multivariate questionnaire data and multi-trait GWAS can lead to biologically reasonable findings and enhance our genetic understanding of complex relationships among symptoms related to CVS.

With the growth of digital devices, there are many complaints nowadays about a variety of eye-related symptoms such as eye strain, blurred vision, and double vision, and it is collectively referred to as digital eye strain or computer vision syndrome (CVS). In addition, it is also known to be associated with non-ocular symptoms such as headaches, and shoulder and neck pain. The American Optometric Association has defined CVS as a combination of eye and vision problems linked to the use of visual display terminals (VDTs) 1 . The diagnosis is carried out based on a comprehensive examination including visual acuity measurements, testing for refractive error and focusing movement in addition to medical inquiries about symptoms the patient has experienced. Although CVS is associated with a variety of symptoms, Sheedy et al. 2 stated that the underlying problem can be identified by the location of the symptoms and categorized into the two groups of internal and external symptoms. According to the general review by Blehm et al. 3 , there are four types of symptoms (asthenopic, ocular surface-related, visual, and extra-ocular) and they can be subdivided into three main potential pathophysiological causes (ocular surface mechanisms, accommodative mechanisms, and extraocular mechanisms). Numerous studies have been conducted so far to address health and safety issues for VDT users, but the efficacy of proposed treatments for minimizing CVS-related symptoms have not been adequately demonstrated yet 3,4 . A better understanding of the underlying physiology behind CVS is important to enable more accurate diagnosis and treatment. Table 1. Characteristics of participants in this study. This is a relatively older population, but it can be seen to cover all age groups to some extent. a It is simulated to responses before and after the VDT task by setting conditions of eye use. b Based on the dry eye-related quality-of-life score QOL (DEQS) questionnaires 42 , participants who were likely positive for dry eye syndrome were screened at a cutoff of > 15 DEQS 43 . SD standard deviation. Male, n (%) 24 (Table 2). Each item was measured on a scale of one to seven in each of the two conditions: "When you feel like you have used your eyes" and "When you feel like you have not used your eyes very much". These two conditions were set to simulate the answers before and after the VDT task, and the difference (delta) of these two conditional answers was used in the following analysis. In addition to that, self-reported medical history was obtained on the presence or absence of dry eye syndrome, presbyopia and other common eye diseases. This study received approval from the institutional review board of DeNA Life Science, Inc. and was implemented in accordance with the Declaration of Helsinki.
Genotyping and quality control. Participants collected their own saliva samples using the kit sent to them when they used the service before participation in this study, and DNA extraction and genotyping were performed at the laboratory of DeNA Life Science, Inc. Infinium OmniExpress-24 or Human OmniExpress-24 BeadChip (Illumina, Inc.), both of which additionally contained approximately 30,000 custom probes, was used. The stringent quality control (QC) procedures of genotyped results were applied with call rate ≥ 95%, minor allele frequency (MAF) ≥ 0.01, identity by descent < 0.1875, and Hardy-Weinberg equilibrium (p value > 1.0e−6). The gender on the declaration was confirmed to match the genotyped one. All the above procedures were performed using PLINK (ver. 1.9) 13 . In order to stratify the population, PCA computed the first two principal components based on a sample set of 1000 Genomes Project phase 3 (1KGP; N = 2504) and then projected study samples onto the 2D subspace. Samples falling outside the Japanese cluster were excluded ( Supplementary  Fig. S1). Eventually, 1966 subjects (999 male) with a total of 540,916 single nucleotide polymorphisms (SNPs) on autosomes ended up passing the QC filter. GRCh37/hg19 (GCF_000001405.13) was the reference for the genome construction information.

Statistical analysis.
To implement the multi-trait GWAS based on the SEM methodology, gwsem package (ver. 0.1.17) in R, which was developed by Verhulst et al. 14 , was introduced. Age, sex, and the top 2 eigenvectors from PCA were added into the model as external covariates. The number of principal components to be incorporated was determined based on examination of the scree plot of their eigenvalues. After running the SEM-based multi-trait association analysis on all variants, p values were subsequently calculated, adjusting each test statistic by a genomic inflation factor. Genotype imputation was conducted on variants which were located within ± 500 kb of GWAS hits, and SEM-based GWAS was rerun on estimated genotypes. Eagle2 (ver. 2.4.1) 15 was used for the phasing step referring to the East Asian population (EAS; N = 504) of 1KGP, and then imputation was performed using Minimac3 (ver. 2.0.1) 16 . The accuracy index of imputation performance, correlation between reference and predicted genotypes (R squared), was 0.942 ± 0.116 for common variants (MAF ≥ 0.05; see Supplementary Fig. S2). Imputed variants with R squared < 0.7 were removed for QC purpose. GWAS hits meeting a suggestive level of significance (p value < 1.0e−5) were considered lead variants, and the locus was defined as a region containing highly correlated variants (R squared > 0.6) with each lead variant in the 1KGP EAS population. Colocalization between GWAS summary statistics and expressed quantitative trait locus (eQTL) signals was analyzed based on a Bayesian statistical methodology proposed by Giambartolomei et al. 17 We utilized GSE115828 data (N = 523) 18 , the only publicly available retinal tissue eQTL data from the Gene Expression Table 2. List of CVS-related questionnaires and statistics on responses. "Eye strain", "blurred vision", and "back and shoulder pain" have also been reported as common symptoms in digital device users from other studies 44, 45 . In contrast, the prevalence of "headache" in this study were relatively lower, but this corresponds to another study on Japanese population 46 .

Symptom category
Questionnaire item  19 was analyzed as well. In addition, two data sources for eQTL data (GSE53351 20 and hum0099.v1 21 ) were added, available from GEO and the National Bioscience Database Center and consisting of more than 100 healthy Japanese subjects. The former is whole blood data (N = 301) and the latter is eQTL data from whole blood cells and five immune cell populations (CD4 + T cells, CD8 + T cells, B cells, NK cells, and monocytes) of 105 subjects. Note that previous studies 19, 22 have demonstrated that a sample size of about 100 individuals is sufficient to detect major eQTL effects in various cell types collected in a single population. Posterior probability of H 4 (PPH 4 ) of colocalization was calculated for each ± 100 kb of the identified loci containing lead variants using the coloc package in R.

Results
Model specification via cause-and-effect graph. It is assumed that the 14 questionnaire items are observed symptoms due to a potential syndrome, that is CVS, and multiple items will be observed in conjunction and correlated to some extent. The results of 1998 respondents were used to calculate pairwise partial correlation coefficients between items, and positive correlations among all items were verified ( Supplementary Fig. S3). The PCA calculated from the eigenvalue vectors of the partial correlation matrix showed that all items were in the same direction on the first PC, and they appeared to form clusters corresponding to the four categories to which each item belonged on the second PC, though not completely (Fig. 1a). Next, covariance selection 23 , a graphical modeling technique, was applied to derive a partial correlation network consisting of only well-fitting edges between items by stepwise removing edges with low partial correlation coefficients (Fig. 1b). Although q5 "watery eyes" was a singleton that did not share edges with any of the items, it could be seen as forming one large network as a whole. Since the above suggests that each of the four categories captures its own characteristics among belonging items, the items were integrated (averaged) and the representative values of each category were used in the following analysis. In addition, we constructed a factor analysis model with one common factor for all categories (Fig. 2). Using OpenMx package (ver. 2.17.3) in R to calculate goodness of model fit, the Chi-squared statistic was 22.5 (degrees of freedom = 9; p value = 7.4e−3) and the root mean square error of  Multi-trait GWAS. SEM is a form of graphical modeling that uses graphs to assume causalities between observed and latent variables and estimates the strength of the causalities. We introduced a latent variable into the model, where the causal effect of each variant on the questionnaire items (observed variables) was transmitted via the latent variable (Fig. 2). Comprehensive association analysis on each variant was performed to infer the strength of association between the variant and latent factor, and the statistical significance was calculated as a p value. A quantile-quartile (Q-Q) plot was generated between the actual observed p values and the p values as theoretically expected, showing no significant population stratification ( Supplementary Fig. S4). The output of GWAS can be represented by a diagram called a Manhattan plot, where each variant's strength of association (− log10 of p value) is relative to the genomic position (Fig. 3). Although there were no hits with the genome-wide significance level (5.0e−8), 12 peaks with variants at the top that met the suggestive level of significance were detected (Table 3). For each of the peaks, regional association plots were generated using LocusZoom (ver. 1.4) 25 , and they all formed a single peak at each close location on the genome ( Supplementary Fig. S5). When GWAS was performed again with the addition of other covariates (daily VDT hours, glasses and/or contact lens use, and eye drop use), there was no noticeable change in the relative trend, although the overall signal weakened slightly. We performed single-trait GWAS on each of the four categories using gwsem, and several peaks detected by the multi-trait GWAS were identified as well (Supplementary Fig. S6). As an example, the peak upstream of the sidekick cell adhesion molecule 1 (SDK1) gene on 7p22.2 are seen in the "ocular surface-related" and "visual" categories, while the others also show a less significant association. Furthermore, since similarly consistent peaks can be found using PLINK, which is a gold standard tool for GWAS, there is little concern about bias due to the statistical processing method.

GWAS and eQTL colocalization analysis. None of the lead variants detected from multi-trait GWAS
were located on a protein coding region. We examined the influence of detected variants on gene expression levels by utilizing publicly available eQTL data sources. In order to identify genes for which the expression level is affected by each variant, we focused on colocalization between the distribution of GWAS and eQTL signals. In other words, the p value of the GWAS result for each variant was compared with that of eQTL on a genome scale to see if they both show significant p values at the same position. Calculating a posterior probability (PPH 4 ) when GWAS and eQTL are associated and share a single causal variant, combinations of locus, genes and expression tissue with > 0.5 of PPH 4 were comprehensively explored. As a result, four combinations were detected (Table 4). When using a criteria > 0.75 of PPH 4 strongly suggesting GWAS and eQTL signals are due to a common causal variant 26 , a strong colocalization (PPH 4 = 0.906) between GWAS signals at the locus (1,625,705-1,627,501) ± 100 kb on chromosome 2 and eQTL signals on the myelin transcription factor 1 like (MYT1L) gene in retinal tissue was detected. As seen in the regional association plots generated by the locuscomparer package (ver. 1.0.0) 27 in R, the p values distribution in GWAS and eQTL signals showed a similar shape and the high degree of correlation was visually confirmed as well (Fig. 4). This suggests that rs9677043, which has the www.nature.com/scientificreports/ strongest p value for both GWAS and eQTL, is a highly possible causal variant. Note that rs9677043 is located approximately 166 kb downstream of the MYT1L gene. We analyzed the association of rs9677043 with other self-reported diseases (presbyopia, dry eye syndrome, severe myopia, cataract, and glaucoma), but no significant associations were found.

Discussion
The MYT1L gene, the expression level of which showed a strong colocalization (PPH 4 > 0.9) in retina, is a paralog of MYT1 and encodes a member of the zinc finger superfamily of transcription factors. It is mainly expressed in neural tissue and is known to play an important role in neuronal differentiation by specifically suppressing the expression of non-neuronal genes 28 . The eQTL data showed rs9677043, detected as a possible causal variant, has a positive effect on MYT1L expression levels in retinal tissue. The path coefficient β estimated by SEM was 0.132, showing that the mutation of rs9677043 worsens the observed symptoms. This suggests that the differentiation of neurons in retinal tissue has a potential to be associated with CVS.  ,120 have relatively strong GWAS signals (p value < 1.0e−6), and they were associated with the TraB domain containing 2B (TRABD2B) and SDK1 genes according to the position on the genome, respectively. TRABD2B is a metalloproteinase having the function of a negative regulator of the Wnt signaling pathway, which has been reported to regulate the transdifferentiation of retinal nerves into the ciliary body 29 . On the other hand, SDK1 encodes a protein in the immunoglobulin superfamily taking a critical part of the immune response, and in addition, is thought to work in selective synapse formation between retinal neurons because of its ability to adhere specifically to another SDK1 on the surface of other cells 30 . All of these genes are associated with retinal nerves or a wide range of neural tissues. As chronic inflammation is reported to be one factor in neuronal dysfunction in the retina 31,32 , immune response also might provide clues to investigate the physiology of CVS.
We will describe the limitations of this study from the following four of viewpoints: (1) self-reported questionnaires via a web browser, (2) sufficiency of the level of statistical significance, (3) impact of racial differences in eQTL analysis, (4) need for validation study using different sample groups.
There is no doubt that web-based questionnaires are less clinically reliable than diagnosis by a doctor. Furthermore, it has been pointed out that there are some problems, such as the bias caused by the web-mediated process, compared to traditional laboratory questionnaire interviews 33 . On the other hand, one of the DTC genetic testing services, 23andMe in the US, has uncovered numerous biological findings utilizing self-reported medical history and web-based questionnaires from a large group of customers 34 . This is a new research approach that is driven by mass data and this study is a continuation of that trend.
The ordinary GWAS has established a very strict criteria-a genome-wide significance level. This study instead used the suggestive significance level to give a biological interpretation. The main reasons for the low statistical power may be the subjective way in which the trait was assessed, the wide spectrum of traits, and the www.nature.com/scientificreports/ small number of samples. Considering the original purpose of GWAS, we should find its meaning and value in the discovery of unknown important molecular mechanisms or new markers for diagnosis. In fact, other studies have led to an extended list of associated variants, which can provide a resource for functional studies, by using more exploratory criteria 35 . This requires sufficiently narrowing the window for the possibility of false positives, and this study used colocalization with eQTL to give additional validity to biological certainty. It has been noted that the expression levels of many genes are variable between human populations, and it is mainly explained by differences in genotype frequencies 20 . We used HaploReg (ver. 4.1) 36 to check rs9677043 detected from the European retinal eQTL data, and the MAF in the European and East Asian populations were 0.53 and 0.47, respectively. Although there is no large difference compared to this study (0.464), it is desirable to obtain retinal tissue eQTL data from the Japanese population, considering the fact that transcriptional regulation at the gene level cannot be explained by a single variant alone.
GWAS can be positioned as a "forward genetic" experiment to identify the genetic basis of a trait and provide an opportunity for exploratory hypothesis generation. In order to strengthen the hypothesis, it is ideal to conduct external validation on different sample groups, and this study is no exception. On the other hand, the common study design of examining the functional role of genetic variation is to knock out a specific gene in a model organism. Such an experiment based on "reverse genetics" is necessary for hypothesis verification. In recent years, national biobanks and DTC genetic testing services provide a large-scale cohort linked to genetic and rich phenotypic information, mainly obtained from electronic medical records. An approach of extending GWAS to the entire set of phenotypic entities is called the phenome-wide association study (PheWAS) 37 , and it allows "reverse genetic" experiments to be virtually conducted on human subjects 38,39 . The proof-of-principle for the hypotheses generated in this study is expected by making use of cohorts in the future. Besides, the Mendelian randomization approach is an extension of PheWAS to test for causal relationships between phenotypes, and the SEM methodology has been used in many cases 40,41 .
Last of all, we will discuss the benefits of the use of SEM for multi-trait GWAS. The path coefficients estimated by SEM reflect the strength of each causal relationship and quantify the influence of a variant on the target phenotype via intermediate variables (latent factors) and predefined causal paths. For instance, the positive path coefficient from an intermediate variable to an observed variable suggests that an increase in the intermediate variable directly leads to an increase in the observed variable. In the case of this study, the coefficients (γ) from the four categories (asthenopic, ocular surface-related, visual, and extra-ocular) to the intermediate variable were 0.842 ± 0.024, 0.657 ± 0.018, 0.787 ± 0.027, and 0.823 ± 0.022, respectively, and "asthenopic" and "extra-ocular" were found to be relatively strong symptoms for CVS. Thus, we believe that SEM has the potential to enhance our understanding of complex phenotypes, such as syndromes, by genetically unraveling the relationship between observed symptoms. SEM is expected to be a core technique for performing a series of association analyses from GWAS to PheWAS.

Data availability
The datasets used and/or analyzed during the current study are available from the corresponding authors on reasonable request.