Legibility Assessment of Visual Word Form Symbols for Visual Tests

For a reliable visual test, it is important to evaluate the legibility of the symbols, which depends on several factors. Previous studies have compared the legibility of Latin optotypes. This study developed a visual function test based on identification visual capacity for a Chinese reading population. The legibility of word symbols was assessed with three methods: (1) Identification of the contrast thresholds of the character sets, (2) patterns of confusion matrices obtained from analysis of the frequency of incorrect stimulus/response pairs, and (3) pixel ratios of bitmap images of Chinese characters. Then characters of similar legibility in each character set were selected. The contrast thresholds of the final five character sets and the Tumbling E and Landolt C optotypes were evaluated. No significant differences in contrast threshold were found among the five selected character sets (p > 0.05), but the contrast thresholds were significantly higher than those of the E and C optotypes. Our results indicate that combining multiple methods to include the influences of the properties of visual stimuli would be useful in investigating the legibility of visual word symbols.

chart) 21 . A fixed size and various contrasts of letters are used for the Pelli-Robson chart, and Robson et al. found only smaller differences between an acuity chart and a contrast chart 22 . However, Elliot, Whitaker, and Bonette reported that the legibility of Sloan letters assessed at the contrast threshold was different from that assessed at the acuity threshold 16 . Alexander et al. reported that the Sloan letter optotypes showed relatively greater interletter variability of threshold log contrast for small letters than for large letters or threshold log MAR for high-contrast letters 18 . These results show the importance of contrast in optotype legibility. Therefore, it is necessary to consider the performance near the contrast threshold when assessing legibility.
To the best of our knowledge, there is no standardized equivalent test to measure acuity with traditional Chinese characters. The purpose of this study was first to develop a method to investigate the legibility of Chinese characters, a type of visual word symbol, and then to group visual word symbols of similar legibility so as to develop a visual function test that assesses identification visual capacity for a Chinese reading population. To this end, based on the properties of symbols, multiple methods (identification contrast threshold, patterns of confusion matrices, and pixel ratio) were used to estimate the legibility of Chinese characters presented to the fovea. For comparison with current established tests, we also compared the legibility of Chinese characters with those of the E and C optotypes to examine whether there was a significant difference in legibility between Chinese characters with a simple structure and Latin optotypes. We used contrast threshold, instead of size threshold, as a critical criterion to determine the legibility. The rationale behind this choice was the importance of contrast threshold in legibility assessment, as discussed above 16 , and the need for clinical application in the assessment of contrast-related functional vision 24,25 .

Results
Character selection. The legibility of a character was evaluated with contrast threshold, patterns of confusion matrices, and pixel ratio. The contrast threshold of each character ( Fig. 1) was the average of the contrast at the last five reversals in all 4 repetitions of the same condition for each participant. The mean of the identification contrast threshold for all 43 characters was 0.77 ± 0.08 log unit. The average values of the contrast thresholds of the five character sets were 0.81 ± 0.06 log unit (Group I), 0.76 ± 0.06 log unit (Group II), 0.74 ± 0.06 log unit (Group III), 0.68 ± 0.07 log unit (Group IV), and 0.84 ± 0.11 log unit (Group V), respectively. The error bars represent 1 standard deviation (SD). Characters with relatively larger variation in contrast threshold among the participants were excluded from the final test. The pixel ratio, indicating stroke density, as discussed above, was the number of the total pixels composing the strokes divided by the number of pixels in the whole character image (Table 1).

Figure 1.
Identification contrast thresholds for the original five character group sets (from panel (a) to (e): stage one, character selection). The contrast threshold of each character was the average of the contrast at the last five reversals in all 4 repetitions of the same condition. Means of contrast threshold were mostly located between 0.7 log unit and 0.9 log unit. The error bars represent 1 standard deviation (SD). Tables 2-6 shows the confusion matrices averaged across participants. The confusion matrix was computed by counting the types of the incorrect response in the last five reversals of each run. Each column represents a stimulus presented to the participants. The rows show the probability of each character being selected by the   Table 3. Confusion matrices of the Group B character sets (first stage) Stimulus. Note: Percentages over 15% were underlined. X-axis, stimuli; Y-axis, incorrect responses in percentage frequency (%).
www.nature.com/scientificreports www.nature.com/scientificreports/ participants. Cells, which are the probability of selecting a particular character given a stimulus, having a confusion rate exceeding 15% are underlined because they indicate character pairs with a greater possibility that participants may confuse them. Highly confusable characters are more valuable for testing the limits of contrast vision than are ones that are easily distinguished, so these character pairs were considered good confusion pairs.
Based on these results, we were able to select the characters for the final test. We applied three criteria: (1) Good pairs of characters should produce more than 15% error in the confusion matrix. Thus, characters that produced confusion of less than 15% were excluded. (2) Characters with contrast thresholds deviating from the mean by more than 1.5 standard deviation (SD) were excluded to ensure the homogeneity of performance for the whole data set. (3) Finally, we kept only three to four characters in each dataset that have the most similar pixel ratios. The excluded characters are marked by asterisks in Table 1. Finally, 16 characters (Table 1 with bold marks) were selected from the original five character sets composed of 43 characters.
Comparison with alphabetic optotypes. We then compared the contrast threshold of the characters in our final test set with those of the Tumbling E and Landolt C optotypes. Figure 2 shows the contrast thresholds averaged across characters in each final character set and those of the Tumbling E and Landolt C optotypes. The average values of the contrast threshold of the five new selected character sets were 0.73 ± 0.02 log unit. Mean contrast thresholds for individual character sets, which ranged from 0.70 (for group II) to 0.75 (for group I) log unit, were similar among the Chinese character sets. The identification contrast thresholds of the Latin optotypes were 0.35 (SD = 0.10) and 0.44 (SD = 0.04) log units for the Tumbling E and Landolt C, respectively. We used the independent t test for comparing the mean contrast thresholds between different groups. No significant differences in contrast threshold were found among the five selected character sets (p > 0.05). However, independent t-test suggested significant differences between our character sets and the Latin optotypes (p < 0.05).

Discussion
The main focus of this study was to select characters for the development of a visual test for the Chinese reading community by measuring the legibility of traditional Chinese characters. We applied the identification contrast threshold, confusion matrix and pixel ratio to identify a set of characters suitable for a clinical test. Sixteen characters were selected for the test from 43 assessed characters 26 .
We also compared the legibility of the 16 traditional Chinese characters to those of the Landolt C/tumbling E optotypes, which are used in most clinical eye examinations. The results showed that the contrast threshold of Chinese characters, even those with simple structures, was significantly higher than those of the Tumbling E (2.3%) and Landolt C (2.8%) optotypes. A possible reason for the effect is the differences in the spatial complexities of our stimuli and the E or C optotypes. Pelli et al. measured the identification efficiency for various types of visual word forms and checkerboard patterns 27 . They found that efficiency decreases with spatial complexity; in particular, the efficiency of identification is lower for Chinese characters than for Latin letters 27 . They argued that the difference in efficiency results from the likelihood to form "goodness" (in Gestalt theories of visual perception) in characters. Our result is consistent with theirs and thus may reflect the same difference in visual processing of complex characters and simple letters.
We did not directly compare the legibility of our characters and that of the Sloan letters, for the latter are not generally used in clinical settings in Taiwan, where this study was performed. Nevertheless, a previous study has compared them 5 . Zhang et al. showed that an observer could identify a Sloan letter at a smaller size than they could the simplest Chinese characters (e.g., 山, 人, 力) with a smaller stroke frequency 5 . Our characters had higher spatial complexity than the easiest character group in the study by Zhang, et al. 5 , so the legibility of our characters could be lower than that of the Sloan letters.
Thus, the legibility of our characters was lower than those of the Landolt C and Tumbling E and is very likely to be lower than that of the Sloan letters. This is not surprising, for our characters are spatially more complex than those of the Latin optotypes. The participants may have needed to expend more effort to identify these characters than to identify the Latin optotypes. It should be noted that the main purpose of this test was not to www.nature.com/scientificreports www.nature.com/scientificreports/ replace the Landolt C and Tumbling E, which are good tools for assessing visual acuity, but to assess a patient's functional vision, which highly depends on the cultural background of the patient. Thus, we believe that the test we developed here should be more relevant than Latin optotypes to the functional vision of a patient in Taiwan. In addition, our methods to determine the legibility could be adopted for other countries that their primary language does not use Latin letters.
The use of several configurations of similar Chinese characters was modelled after the Sloan letter set 18 , which contains various types of strokes and subgroups of letter sets, such as C, D, and O, which tend to be confused at the acuity or contrast threshold. Our study also showed similar results in the confusion matrix of the final character set in the experiment of comparison with alphabetic optotypes ( Table 7). The results showed that the characters formed close relations of confusion only near the contrast threshold with characters from their original character set (marked with underline).
The results on these 16 Chinese characters of similar legibility have been applied to a new macular visual function test, named the Macular Multi-Function Assessment (MMFA) 26 . In that study, Chinese characters were adopted by the MMFA in the forms of 4 contrast levels (80%, 25%, 10%, and 5% contrast) to assess macular visual function in patients with Type II diabetes and in controls 26,28 . The results showed that the scores of the MMFA and Early Treatment Diabetic Retinopathy Study (ETDRS) contrast acuity charts displayed high levels of agreement and similar discriminative ability 28 . In addition, most of the MMFA scores showed significant differences between the diabetic group and the control group 26 . Therefore, the results support that the methods used in our study to    examine the legibility of Chinese characters would be useful and reliable for clinical application in the use of visual word form symbols as the optotypes instead of the Landolt C or Tumbling E. Unlike the generality of the Landolt C, tumbling E, or even the Sloan letters, which are relatively simple and have been extensively studied for decades 1,4,29,30 , the generality of our findings for different ophthalmological or neurological diseases has yet to be tested. It is possible that patients with different diseases would yield different response patterns to our character set. After all, different diseases may produce impairment of different visual functions. Further studies are needed to investigate the validity of our test in characterizing character identification performance in various visual disorders.

Conclusions
Traditional Chinese characters have high spatial complexity, so it is not easy to use a single method to examine and present the legibility of Chinese characters. We applied multiple criteria based on identification contrast thresholds, analysis of patterns of confusion matrices, and comparisons of pixel ratios to examine the legibility of logographic visual word symbols. Based on this work, we were able to construct a test for the visual assessment of persons with Chinese as their first or only language.

Methods
Participants. Eight participants (5 female, 3 male, mean ages 29.8 ± 5.6 years) participated in this experiment. All participants were recruited from the campus of National Taiwan University and had normal or corrected-to-normal visual acuity (20/20). All of the procedures were reviewed and approved by the Institutional Review Board of the Taipei City Hospital, and all tests were conducted in accordance with the tenets of the Declaration of Helsinki. Informed consent was obtained from each participant when they understood the procedures of this study.

Inclusion of characters.
Given the sheer number of Chinese characters and their wide range of spatial complexity, it was impractical to include all Chinese characters in this study. Thus, some principles for characters selection were developed.
The spatial complexity of a Chinese character basically depends on (1) stroke number and (2) configuration 17,31,32 . Strokes are the basic features of a character, such as the dots, lines, slants, and hooks, of a character. The configuration is the spatial relation between character components, such as left-right (e.g., 江, /jiang/), surrounding (e.g., 田, /tian/), or top-down (e.g., 古, /gu/) 17 . Configuration is important for classifying the spatial compositions of Chinese characters, as a skilled Chinese reader often cannot identify a character when its components are in the wrong configuration and is more likely to confuse characters of the same configuration but not those of different configurations 17,[31][32][33] . The development of the Sloan letter chart also took configuration into consideration and thus includes letters of various forms, such as vertical, oblique, and curved contours 4 .
The characters were selected from the Report on the Survey of Characters and Words Frequently used by Elementary School Children in Taiwan 34 . That report lists 5,021 commonly used characters. Only characters with fewer than 10 strokes were included in this study. The reason for this criterion was to ensure that the difficulty level would be appropriate for most readers and thus suitable for clinical applications, as Taiwanese elementary school children in grades 1 and 2 should be able to recognize most of the high frequency characters with 1 to 10 strokes 35 . Second, only characters within a category, defined by character configuration and spatial similarity (see  www.nature.com/scientificreports www.nature.com/scientificreports/ below), containing at least five characters were included. We used only the Ming typeface of the characters. This typeface is considered to be the most legible on VDTs and is a popular typeface form 17 .
In total, five character sets composed of 43 characters, shown in Table 1, in four configuration patterns (single, left-right, surrounding, and top-down) were chosen for this study. The number of strokes of each character ranged from 3 to 7 (mean = 4.98, SD = 1.28). The characters in every set had similar numbers of strokes, forms, and configurations. The pixel ratios, defined as the number of pixels of all strokes divided by that of the whole character image (159*159 pixels) for each individual character, are shown in Table 1.
For the purpose of investigating character legibility, the character image was scaled to a 1 × 1 degree visual angle to achieve basic comfortable visibility for all of the characters 36 . The characters were presented on a display that had a mean background luminance of 89.2 cd/m 2 at a viewing distance of 50 cm.
Apparatus. Stimuli were displayed on a ViewSonic monitor (G90fB 19″) driven by a MacBook Pro with an Intel HD Graphics 3000 display card. The stimuli were generated by Psykinematix software with the Mono 10.8 bit bit-stealing method to reach 10 bits of contrast resolution 37 . The gamma correction was performed with the Psykinematix software and Eye-one Display 2 together. The monitor resolution was 1280 (H) × 1024 (V), and the refresh rate was 85 Hz.
Protocol of legibility investigation. This protocol comprised two parts. The first stage was the investigation of Chinese character legibility by the psychophysical method. This information was then used to further screen the characters for the final character set. The second stage was the comparison of legibility between the selected characters and the E and C optotypes.
The first stage contained five conditions, one for each of the five character sets (Table 1). Each condition was repeated four times. The order of the 20 runs (5 conditions * 4 repetitions) was randomized. Every run contained 5 to 11 interleaved staircases, depending on the number of characters in each character set. The initial Weber contrast of a character was set at 80%. Participants were instructed to view the characters binocularly and to maintain their attention on the fixation mark at the center of the display. A character was then shown in the center of the display. The participants were instructed to press a corresponding button on the keyboard to indicate what character they perceived. Each character was presented for 250 ms, which was sufficiently long for the participants to identify the character 38 . Auditory feedback was provided to indicate correct or incorrect responses. Three participants participated in the first stage.
In the second stage, characters of similar legibility, according to the results of the first stage, were combined into a new character set. For comparison, we also measured the contrast threshold for the Landolt C and Tumbling E optotypes for eight participants, including the three who had participated in the first stage. The protocol of the psychophysical method of the second stage session was the same as that of the first stage session. The independent t test was used to compare the mean contrast thresholds between different groups.
Participants were given ample time to study the characters before the experiment. Practice trials were given at the beginning of the experiment to decrease the learning effect. Experiments were conducted in a dark room. Contrast thresholds were measured using interleaved multiple 3-down, 1-up staircase procedures. That is, the contrast of a test character decreased after three correct responses and increased after one wrong response. This procedure yields a threshold level of 79.4% correct responses. The contrast decrease rate was 50% before the first reversal and 12.5% after the first reversal, and the increase rate was 25%. Each run was terminated after 6 reversals. In total, there were at least 37 trials and on average 62.73 trials for each run. The standard error of the last 5 reversals, averaged across all trials, was about 0.63%, showing the stability of our measurements.