Validation of the Korean Stroop Test in Diagnosis of Minimal Hepatic Encephalopathy

The burden of minimal hepatic encephalopathy (MHE) is significant, but no universal criteria for diagnosis have been established. We aimed to validate the Korean Stroop Test for MHE screening. Chronic hepatitis B-related liver cirrhosis patients were recruited prospectively from 13 centers. The Korean Stroop Test consisted of two Stroop-off states (color and word) and two Stroop-on states (inhibition and switching). Accuracy adjusted psychomotor speed (rate correct score) of these tests were analyzed. Sex- and age- adjusted rate correct scores of these tests were rated as the Korean Stroop Score (K-Stroop score). MHE was diagnosed when Portosystemic Encephalopathy Syndrome Test (PHES) scores were below −4. A total of 220 liver cirrhosis patients and 376 healthy controls were enrolled. Prevalence of MHE was 20.6% in cirrhosis patients. Rate correct scores and the K-Stroop score showed significant differences between healthy controls, cirrhosis patients without MHE, and cirrhosis patients with MHE. The rate correct score of the K-Stroop score was 0.74 (95% Confidence Interval: 0.66–0.83, P < 0.001). Female gender and the K-Stroop score were significant for MHE diagnosis. The Korean Stroop Test is simple and valid for screening of MHE.

(2019) 9:8027 | https://doi.org/10.1038/s41598-019-44503-w www.nature.com/scientificreports www.nature.com/scientificreports/ used 1 . However, it is evident that at least one-third of cirrhotic patients exhibit MHE in Korea 3,4 . Patients with MHE have been reported to have a poor prognosis in that they experience poor quality of life and have a higher risk of traffic violations and accidents 5,6 . Furthermore, a MHE episode can predict the development of overt HE, death and hospitalizations 7 . Therefore, it is important to test for MHE in patients who are at risk, even though they have no overt symptoms or signs of cognitive dysfunction 1 . There are many tests for MHE, which can be largely categorized as paper-pencil-based tests, computerized tests, and neurophysiological tests 2 . However, no universal criteria for diagnosis have yet been established. The Portosystemic Encephalopathy Syndrome Test (PHES) is the most validated test worldwide and is considered the gold standard for the diagnosis of MHE 8,9 . However, paper-pencil tests are burdensome to use as screening tests in asymptomatic patients in real-life practice. Ideal screening tests for MHE should be simple to use, have an objective outcome, be less time-consuming, be mobile-based, independent of specialists' interpretation, and free from copyrights and fees. Additionally, they should have local normal values, and validated data. The Stroop task has been analyzed in several studies of MHE [10][11][12] . Recently, a smartphone-based Stroop test, the Encephalapp, has been developed and validated for the screening and diagnosis of covert HE in the United States [13][14][15][16] . Although the Encephalapp is free of charge, differences in language make it difficult to use the Encephalapp outside the United States. Furthermore, we need the normative data and validating process in the Korean LC population to apply the Encephalapp in clinical practice. Therefore, we developed and aimed to validate the K-Stroop Test for the screening and diagnosis of MHE in Korea.

Results
Baseline characteristics. A total of 376 healthy controls were enrolled in this study. Approximately half of the control group was male (n = 190). The mean age was 43 years and the mean years of education was 14 years. Healthy controls were evenly distributed in each subgroup divided by sex and 10-year intervals of age (Supplementary Table S1).
A total of 220 LC patients were enrolled. Sixty-seven percent of the patients were male (n = 148) ( Table 1). The mean age of the LC patients was 54 years and the years of education were 11 years. Thus, there were significant differences from healthy controls (both P < 0.001). The model for end-stage liver disease (MELD) score of LC patients was 9 ± 3. The prevalence of MHE based on conventional PHES in LC patients was 20.5%. The mean years of education was shorter in LC patients with MHE than in LC patients without MHE. Serum albumin levels were lower in LC patients with MHE. However, sex, age, platelet counts, prothrombin time, ALT, total bilirubin, sodium, ammonia, and MELD were were not different between the two groups.
Correlation of the K-stroop test results with demographic data in healthy controls. In healthy controls, age and education years showed significant negative correlation (r = −0.43, P value < 0.001). The rate correct scores (RCS) of each test were calculated by the formula described in the methods section (Table 2). RCS -Color Off (RCS-C), RCS -Word Off (RCS-W), RCS -Inhibition On (RCS-I), and RCS -Switching On (RCS-Sw) showed negative correlations with age (all P < 0.001) and positive correlation with education years (all P < 0.001) in healthy controls. The K-Stroop score showed nonsignificant or very weak correlation with age and education years. There was no significant difference related to gender on RCS values of these four tests or the K-Stroop score.
Correlation of the K-stroop test results with clinical parameters in LC patients. In LC patients, the age and education years showed significant negative correlation ( Table 3). As in healthy controls, the RCS of each test in LC patients showed significant negative correlation with age and weak positive correlation with education years. There was no significant gender effect on RCS values of these four tests or the K-Stroop score in LC patients. The RCS of each test showed positive correlation with PHES score. The K-Stroop score did not show   www.nature.com/scientificreports www.nature.com/scientificreports/ AUC for RCS-I showed the highest level at 0.80 (95% C.I. 0.73-0.88, P < 0.001). The cut-off point for the highest Youden's index value was 2.08 for RCS-I with 80.0% sensitivity and 73.1% specificity. The cut-off point for the highest Youden's index value for diagnosis of MHE in the K-Stroop score was 0.5 with 82.6% sensitivity and 53.1% specificity. A K-Stroop score of 1.5 showed 52.2% sensitivity and 79.7% specificity.
Predictive factors for the presence of MHE in LC patients. Sex, age, years of education, RCS-I, K-Stroop score, and MELD were entered into logistic regression analysis. Female sex, years of education, RCS-I, and K-Stroop score were significant variables in univariate analysis as predictive factors of MHE in LC patients ( Table 6). Female sex was the significant variable in multivariate analysis with the range of odds ratio between 2.79-2.91. Both the RCS-I and K-Stroop scores were significant predictive factors for the presence of MHE in LC patients independent of the MELD.

Analysis for test-retest reliability and comparison of devices (smartphone vs. tablet).
Four types of RCS for each test were analyzed in 63 healthy controls who repeated the K-Stroop Test after a short period to assess test-retest reliability. Test results were compared between visit 1 and visit 2 for each device, i.e. smartphone and tablet, and showed no differences overall. In addition, the K-Stroop Test results were compared between smartphone and tablet devices at each visit, i.e. visit 1 and visit 2, and the results also showed no significant differences.
To assess learning effects, we compared the RCS during the familiarization trial, 1 st test, and 2 nd test at visit 1, and observed differences within subject as repetitions progressed for each test (all P < 0.001). The RCS-Sw showed significant differences between the values of the 1 st and 2 nd tests ( Table 7).

Discussion
Epidemiologic studies regarding MHE in Korea are limited due to the lack of normalized and validated data for appropriate diagnostic tools. Seo et al. reported that the prevalence of MHE in LC patients was 25.6%, of which 20.2% were Child-Pugh class A, 42.9% Child-Pugh class B, and 60.0% Child-Pugh class C based on the Korean version of the conventional PHES 4 . The authors provided normative data of the Korean version of the conventional PHES based on 200 healthy Korean subjects 4 . However, this version of the PHES has an obstacle for its widespread use in real-life practice, as approval from the copyright holder of conventional PHES should be obtained before use. Jeong et al. adopted and modified the conventional version of PHES into a new 'copyleft' paper-pencil test in 2017 3 . They established normative reference data based on 315 healthy subjects and validated the results in a small group of cirrhotic patients 3 . The prevalence of MHE was estimated to be 37.5% based on this new Korean paper-pencil test 3 . Although MHE is asymptomatic, it has a sizeable prevalence and poor prognostic implications in LC patients. Nevertheless, there is no consensus on which test should be used to diagnose MHE in real-life practice. Furthermore, recent guidelines suggest that either the computerized test or neurophysiological testing should be used alongside the paper-pencil test (i.e., the PHES) for multicenter studies. However, it is difficult to make good use of neurophysiological tests, such as electroencephalography, critical flicker frequency,  www.nature.com/scientificreports www.nature.com/scientificreports/ and evoked potentials, as they are expensive, time-consuming, and dependent on the specialist's interpretation. Therefore, alternative computerized tests are required to carry out multicenter studies of MHE in Korea.
In this context, the EncephalApp is a computerized test, which has been validated for the diagnosis of MHE in the United States. It is based on the Stroop effect universally present in individuals able to read letters. Theoretically, an increase in the latency and the number of errors in response to incongruent conditions, relies on a higher strength of reading response than a naming response 17 . Nevertheless, there are some obstacles to the use of the original EncephalApp as a diagnostic tool of MHE in Korea, as it lacks Korean normative reference and validated data. The K-Stroop Test has been modified from the EncephalApp and has been developed to meet the needs of both real-life practice and multi-center studies as it is computer-based, highly accessible by web, less time-consuming, and free from copyright issues.
In the EncephalApp, On Time + Off Time, rather than accuracy of the results, was the best element to discriminate MHE patients among LC patients with an AUC of 0.91 14,15 . Based on these results, clinicians could be confused as to whether the Stroop task should be used for the diagnosis of MHE when the times for response were more significant than numbers of errors irrespective of whether Stroop states are "Off " or "On". We assume this confusion may be related to the characteristics of the original EncephalApp. The EncephalApp assesses accuracy by repeating tasks until the subject correctly answers ten problems in a row. Emphasis is given to the time taken to repeat the test after the subject has made a mistake, giving less opportunity for accuracy. As a result, the patient solves at least 100 problems to accomplish the EncephalApp even if no mistakes are made. We compared the RCS of each test, assessing reaction times weighted by accuracy. As a result, the K-Stroop Test required only 40 problems to differentiate patients with MHE.
The RCS of each test were shown to be significantly correlated with either age or years of education in both healthy controls and LC patients. The K-Stroop scores are the number of tests where the result lies more than 1.5 standard deviation (SD) below the mean value of each subgroup, divided according to sex and 10-year intervals of age (range 0 to 4). Unlike the EncephalApp, which proposed time-fixed cut-off criteria regardless of sex and age, we analyzed the data based on the mean and SD values collected from Korean healthy controls. We could not adjust for years of education, but we assume that it could be adjusted by the fact that the K-Stroop score showed very weak correlation with the years of education (r = −0.22, P value 0.001). This could be related to the tendency to a lower level of education in older subjects and vice versa.
Interestingly, both PHES and K-Stroop scores showed a very weak correlation with MELD (r values of −0.15 for the PHES score and 0.16 for the K-Stroop score) differing from results based on the higher prevalence rate in Child Pugh C than Child Pugh A. Perhaps the prevalence of MHE could have been underestimated because clinicians do not consider the possibility of MHE in the asymptomatic population with good liver function.
There were significant negative correlations between the K-Stroop and PHES scores, but agreement between the tests was poor, based on the cut-off point of 1.5 (highest specificity 79.7%) for K-Stroop score (kappa 0.29, P < 0.001). This may be because HE is associated with multidimensional dysfunction and PHES only measures two (psychomotor speed and visuospatial demand) of seven domains in cognitive function (psychomotor speed, working memory, verbal memory, visuospatial ability, visual memory, language, reaction time, and motor function) [18][19][20][21] . Furthermore, PHES itself does have false positive or false negative rates even though it is the conventional gold standard in the diagnosis of MHE 22 . Therefore, it is possible for a subgroup of patients to show significantly poor psychometric performance compared to healthy controls by the Stroop test but not the PHES test. The clinical significance of Stroop abnormality requires long term follow up.
Although the AUCs of RCS-W or RCS-I were similar, we speculate that the Inhibition On test is equivalent to a Stroop test and would thus have better discriminating power than the Word Off test alone. Both the RCS-I and K-Stroop scores were significant predictive factors of MHE in LC patients regardless of liver function in multivariate analysis in our study. Therefore, we suggest that RCS-I can be used for rapid screening of a patient with a cut off of 2.08 regardless of sex and age. In patients with high suspicion of MHE, results can be judged by the K-Stroop score adjusted for sex and age. The cut-off of 1.5 can be used for the diagnosis of MHE given the high specificity. Unlike previous studies where gender susceptibility for MHE was not significant, female gender was another predictive factor for MHE in our study. This needs to be further evaluated.
There was good test-retest reliability in the K-Stroop Test. A learning effect was found based on the RCS for each test in healthy controls. Thus, a familiarization process is required for respondents to understand and  www.nature.com/scientificreports www.nature.com/scientificreports/ practice the tests. The type of device used, whether smartphone or tablet, had no effect on the outcomes of the K-Stroop Test. This is the first trial assessing the Stroop Test for the diagnosis of MHE in LC patients of a non U.S population. Additionally, this is the first version of the Stroop test to be validated in LC patients with or without MHE in Korea. The normative data of the K-Stroop Test is essential to diagnose MHE in a real-life setting and to carry out multicenter studies. However, there were several limitations to this study. First, the definition of healthy controls was not strict. We relied upon the subjective answers of healthy controls who did not meet the exclusion criteria. Laboratory findings were not available in healthy controls. Second, we compared the result of the K-Stroop Test to the PHES but the PHES is the provisional gold standard test. Eventually, these tests should be helpful not only in the diagnosis of MHE, but also in the formulating of prognosis. Therefore, a longitudinal study comparing prognoses of patients who were diagnosed MHE by only the PHES, only the K-Stroop Test, and both the PHES and the K-Stroop Test are required. Third, the K-Stroop Test does not present the opportunity in trials for respondents to practice and understand each step of the test before being assessed. This can lower the power of discrimination especially in the process of the Switching On test. Fourth, the position of each example, e.g. red, yellow, green, and blue, was fixed for each test question. We recently updated the K-Stroop Test (accessible via http://encephalopathy.or.kr), to improve these limitations and are planning to set normative data for the updated version of the K-Stroop Test in the near future. In conclusion, the mobile-based K-Stroop Test is a simple, handy, objective, and valid method to screen and diagnose for MHE in real-life practice. The K-Stroop Test may serve as a diagnostic tool for MHE assessment in multi-center studies alongside paper-pencil tests in Korea.  Table S2). MHE was diagnosed when PHES scored below −4. The normative data of PHES in the Korean population was adopted from a recent Korean study 4 . study endpoints. The primary endpoint was to validate the K-Stroop Test for screening of MHE in LC patients. Other endpoints were to create normative data for the K-Stroop Test in Korea and to study reliability and inter-device correlation in healthy controls.

Methods
Liver cirrhosis patients. LC was diagnosed by either of the following criteria: (1) biopsy-proven LC or Healthy controls. We prospectively recruited voluntary healthy controls to establish normative data for the K-Stroop Test over the same period from a single center. Adults aged 19 to 65 years were included. Exclusion criteria were: (1) uncontrolled chronic disease within 6 months, (2) suspicious symptoms or previous history of dementia, (3) taking neurologic or psychological medications, (4) alcohol intake more than 210 g per week for men, and 140 g per week for women in recent two years, and (5) color blindness. Healthy controls also completed the K-Stroop Test after obtaining informed consent. Subgroups were divided into 10 groups according to sex and 10-year intervals of age (20-29, 30-39, 40-49, 50-59, 60-69 Fig. S3). The "Color Off " test presented colored symbols "####" and gave instructions to choose the color of the symbols. The "Word Off " test presented color words in Korean in black-colored lettering and gave instructions to choose the name of the color. Stroop "On" states consisted of an "Inhibition On" test and "Switching On" test, which required naming responses to incongruent stimuli to provoke Stroop interference. The "Inhibition On" test presented a color word in a mismatched color and gave instructions to choose the color of the word (e.g. the answer is "green" to "red" in a green color). The "Switching On" test alternatively presented an Inhibition On test or Switching test. The Switching On test presented a color word with either matched or mismatched color in a box, and gave instruction to choose the color word itself irrespective of the color of the word (reading response). Four examples were in fixed spots, which were red, yellow, green, and blue. The K-Stroop Test was tested with an 8-inch tablet computer.

Interpretation of the Korean Stroop Test. The Stroop test is a test of psychomotor speed and cognitive
flexibility. The Stroop test measures Stroop effect which shows slower, error-prone responses to incongruent stimuli (e.g., color-word printed in a mismatched color) 17 . To combine the information of speed and accuracy www.nature.com/scientificreports www.nature.com/scientificreports/ provided by the test, we analyzed the RCS in Color Off test, Word Off test, Inhibition On test, and Switching on test 23 . RCS is calculated by the formula as below and it can be interpreted as the rate of correct response in one second 24  Test-retest reliability and comparison of devices: smartphone versus tablet. Among the healthy controls, 63 individuals voluntarily participated in the test-retest reliability protocol. They repeated the K-Stroop Test after 22 ± 16 days (range: 14-114 days). At each visit, they started with a familiarization exercise to exclude learning effect. Half of the individuals were first tested using a smartphone and second with a tablet on visits 1 and 2. The other half of the participants first started with the tablet and second with a smartphone to compare results across the devices.
Biochemical analysis. Blood test samples of LC patients were taken on the day of the PHES performance.
These included complete blood cells, prothrombin time, serum albumin, aspartate aminotransferase (AST), alanine aminotransferase (ALT), total bilirubin, creatinine and ammonia. statistical analysis. Categorical and continuous variables are expressed as the mean ± standard deviation and number (%), respectively. These variables were analyzed using the Chi-square test or Fisher's exact test and Student's independent t-test. Pearson's correlation test was used to assess the correlation between the results of the K-Stroop Test and demographic-/clinical-parameters, such as age, education, serum ammonia, MELD, and PHES score, etc. We compared receiver operating characteristic (ROC) curves of the K-Stroop results to assess the most accurate parameter of the MHE based on conventional PHES in LC patients. Significant variables by univariate analysis in logistic regression were entered into multivariate regression analysis. For the analysis of the learning effect and inter-device compatibility (i.e. smartphone and tablet), repetitive measurement ANOVA was used to compare results of the familiarization trial, the 1 st test, and 2 nd test. Statistical significance was set at a P-value < 0.05. Statistical analysis was performed using SPSS 21.0 software (SPSS, Inc., an IBM Company, Chicago, IL, USA).

Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.