Quantification of expected information gain in visual acuity and contrast sensitivity tests

We make use of expected information gain to quantify the amount of knowledge obtained from measurements in a population. In the first application, we compared the expected information gain in the Snellen, ETDRS, and qVA visual acuity (VA) tests, as well as in the Pelli–Robson, CSV-1000, and qCSF contrast sensitivity (CS) tests. For the VA tests, ETDRS generated more expected information gain than Snellen. Additionally, the qVA test with 15 rows (or 45 optotypes) generated more expected information gain than ETDRS, whether scored with VA threshold alone or with both VA threshold and VA range. Regarding the CS tests, CSV-1000 generated more expected information gain than Pelli–Robson, and the qCSF test with 25 trials generated more expected information gain than CSV-1000, whether scored with AULCSF or with CSF at six spatial frequencies. The active learning-based qVA and qCSF tests have the potential to generate more expected information gain than traditional paper chart tests. Although we have specifically applied it to compare VA and CS tests, expected information gain is a general concept that can be used to compare measurements in any domain.


Computing the expected information gain
Two equivalent methodologies exist for calculating the expected information gain.The first approach hinges on the reduction of uncertainty regarding the property being measured, often referred to as the "truth", following the measurement.In contrast, the second approach is grounded in the disparity between the uncertainty associated with all conceivable measurement outcomes within a population and the projected remaining uncertainty after the execution of a measurement.While the first approach is a direct derivation from the concept of expected information gain, the second approach, in practice, tends to be more straightforward to implement.Both approaches are presented in the following section, and additional verification of their equivalence can be located in Appendix 1.
The first approach to calculate expected information gain begins by considering the property to be measured, referred to as the "truth".This property is represented by a random variable X , with a probability density function denoted as P(x) (Fig. 1a).We also consider all possible outcomes of the measurement, represented by a random variable Y , with its own probability density function P(y) (Fig. 1c).Following each measurement, the outcome is represented by a probability distribution P(y|x) (Fig. 1b).By applying Bayes' rule, we can derive the posterior distribution of x , denoted as P(x|y)(Fig.1d), using the formula: P x|y = P y|x P(x)/P(y) , where P y = X P y|x P(x)dx .Shannon entropy is then employed to quantify the level of uncertainty associated with X before any measurement: Following a measurement, the expected entropy of X is determined as: and subsequently, the expected information gain is computed (Fig. 1e): This approach quantifies the reduction in uncertainty regarding the property X as a result of the measurement outcomes represented by Y .
The second approach to compute expected information gain also begins by considering probability density functions: P(x), P y and P(y|x) (Fig. 1a-c).Using Shannon entropy, we first assess the level of uncertainty associated with Y based on the distribution of all possible measurement outcomes: (1) H(X) = − X P(x)log 2 (P(x))dx.www.nature.com/scientificreports/Next, we determine the expected residual uncertainty of Y after a measurement: Finally, the expected information gain (Fig. 1e) is computed as: This second approach assesses how much uncertainty remains in Y after obtaining the measurement out- comes represented by P(y|x) .In essence, it quantifies the reduction in uncertainty associated with Y as a result of measuring X .Both approaches yield equivalent results and provide valuable insights into the information gained through measurements in different ways.

A practical illustration of expected information gain
We present a practical illustration of expected information gain using two rulers, considering both the first (Fig. 2a,d) and second (Fig. 2b,c) approaches: • First approach For a ruler with a unit , the probability distribution of measuring an object with length x is represented by , where U(m, n) is a uniform distribution with boundaries m and n .If this ruler is used to measure objects with lengths between 0 and L with equal probability, the prob- ability distribution for object lengths is P(x) = U(0, L) .The entropy of X before any measurement H(X) is: The outcome distribution P(y) can also be determined: The entropy of X after considering Y is: The expected residual uncertainty of Y after considering X is determined: The expected information gain is then computed as: Consider two one-foot-long rulers, one with a one-inch unit and the other with a 1/16-inch unit.Using both approaches, the expected information gains are calculated as log 2 (12/1)) = 3.58 bits and log 2 (12/(1/16)) = 7.58 bits for the two rulers, respectively.These values correspond to 12 and 192 distinct length classes, consistent with the number of units on the rulers.These calculations demonstrate how expected information gain can be applied to assess the knowledge gained from measurements using different rulers with varying units of measurement.It provides a quantitative measure of the information acquired through these measurements, which can be valuable in various contexts.

Expected information gain for a unidimensional measurement with a normal outcome distribution
We next show how to derive the expected information gain for a unidimensional measurement with a normal outcome distribution and provide an intuitive interpretation.In this case, the probability distribution for measuring y given x is defined as , where µ(x) is the expected value of x .If both x and y are uniformly distributed in an interval of length L , then Intuitively, these values indicate the follow- ing: (1) L/(4.13σ ) classes: This represents the number of distinct classes or intervals into which the measurement outcomes can be categorized.Each class has a size of approximately 4.13σ , which corresponds to the 98% con- fidence interval of the outcome distribution.In other words, this is a way to quantitatively express the granularity or resolution of the measurement.(2) The 98% Confidence Interval: The size of each class, 4.13σ, represents the range within which an observed measurement is likely to fall with a high level of confidence (98% confidence interval).This interval provides a measure of uncertainty associated with the measurement outcome.

Overview of the current study
Expected information gain, which quantifies the knowledge gained through measurement, is not limited to unidimensional examples.It can be effectively used to compare measurements with any level of dimensionality.In this study, we extended its application to the assessment of visual acuity (VA) and contrast sensitivity (CS) tests, which inherently involve measurements with different dimensionalities.By applying expected information gain to VA and CS tests, we aimed to provide a quantitative basis for comparing these tests, considering their varying optotypes, outcome dimensionalities, and their ability to provide valuable knowledge.
Visual acuity is a crucial measure of visual function and is widely used for diagnosing and managing visual diseases, evaluating the effectiveness of treatments, and establishing professional standards [23][24][25] .The gold standard test, the ETDRS chart 26 , consisting of rows of five equal-sized optotypes, is used to generate a unidimensional VA threshold score for each patient.However, a newer test called the qVA test 27,28 is introduced, which involves three equal-sized optotypes in each trial and generates a two-dimensional score, incorporating VA threshold and VA range.Importantly, this added dimension provides a more comprehensive assessment of visual acuity, and not considering it could lead to an incomplete evaluation 28 .
The contrast sensitivity function is an increasingly important measure in clinical research and clinical trials as it offers a more comprehensive characterization of spatial vision compared to VA 29,30 .Multiple instruments have been developed to measure CS 20,31,32 , each using different optotypes and producing scores with varying dimensionalities.For example, the Pelli-Robson test 32 uses unfiltered Sloan letter stimuli and generates a unidimensional CS score at one spatial frequency, while the CSV-1000 test 33 uses windowed sinewave grating stimuli and provides a four-dimensional CS score at four spatial frequencies.Another test, the qCSF 20,34 , uses filtered Sloan letter stimuli and generates both a unidimensional area under the log CSF (AULCSF) score and (7c) www.nature.com/scientificreports/ a six-dimensional CS score at six spatial frequencies.This diversity in CS tests and their outcome dimensions poses challenges when comparing their effectiveness.While many studies have assessed the accuracy, test-retest variability, sensitivity, and specificity of VA and CS tests 28,[35][36][37][38][39][40][41][42][43][44][45][46][47] , comparing them has proven difficult due to differences in optotypes, optotype arrangements, and outcome dimensionalities.Existing metrics often focus solely on the uncertainty of measurement outcomes and do not quantify the knowledge gained through these measurements.This study aimed to address these challenges by employing the concept of expected information gain to compare VA and CS tests.Computer simulations were used to calculate and compare the expected information gain and the number of classes derived from the Snellen chart 48 , ETDRS chart 26 , and qVA test 27 for VA assessment, as well as the Pelli-Robson chart 32 , CSV-1000 chart (Vector Vision, Houston, Texas) 33 , and qCSF test 20,34 in populations with uniform distributions, as well as populations with distributions based on experimental data.This approach provides a quantitative and knowledge-based method for comparing these essential vision tests.

Apparatus
All the simulations and analyses were conducted on a Dell computer with Intel Xeon W-2145 @ 3.70 GHz CPU (8 cores and 16 threads) and 64 GB installed memory (RAM) with Matlab R2019a (MathWorks Corp., Natick, MA, USA) and R (R Core Team, 2020).

Simulated observers
We conducted two simulations using the Snellen, ETDRS, and qVA tests (Fig. 3).In Simulation 1, we simulated 1386 observers from a uniform distribution of VA threshold θ VA Threshold and VA range θ VA Range , with θ VA Threshold ∈ [−0.3, 1.0] logMAR and sampled every 0.02 logMAR, and log 10 (θ VA Range ) ∈ [−1.0, 0] and sampled every 0.05 log 10 units.In Simulation 2, we simulated 1386 observers from the population distribution of VA threshold and VA range derived from an existing qVA dataset of 14 eyes tested with Bangerter foils 28,49 .The original experiment was conducted at the Ohio State University.Written consent was obtained from all the participants before the experiment.The study protocol was approved by the institutional review board of human subject research of The Ohio State University and adhered to the tenets of the Declaration of Helsinki.where f (.) is derived from signal detection theory by considering chart design, i.e., the number of optotypes in a row and whether optotypes in each row are sampled from the 10 Sloan letters with or without replacement.

Snellen chart
The Snellen chart (Fig. 3a) has 11-rows, with 1, 2, 3, 4, 5, 6, 7, 8, 8, 8, and 9 optotypes in each row and optotype size descending from 1.0 to − 0.3 logMAR.Each simulated observer was tested with the standard procedure 50 .The probability of correctly identifying m optotypes in a row is determined by Eq. ( 9) with varying M across rows.Starting from the top row, the observer must correctly identify at least half of the optotypes on a row before proceeding to the next row.If they can't identify the optotype on the top row, the VA score was 1.1 logMAR; otherwise, the VA score is equal to the size of the optotypes in the last row with at least 50% correct identification.The VA score could therefore take 12 potential values.For each simulated observer x i , we repeated the test 1000 times to obtain the distribution of test scores P y j |x i , where j = 1, . . ., 12.

ETDRS chart
The ETDRS chart (Fig. 3b) has 14 five-optotype rows, with optotype size descending from 1.0 to − 0.3 logMAR.Each simulated observer was tested with the standard procedure 26,50 .The probability of correctly identifying m optotypes in a row is determined by Eq. ( 9) with M = 5 .Four different termination rules were simulated.Start- ing from the top row, the test could stop after the observer makes three, four, or five mistakes in identifying the optotypes in a row or continue until the observer is tested with the entire chart.The VA score is computed as 1.1-0.02n,where n is the number of correctly identified optotypes, with 71 potential values.For each simulated observer x i , we repeated the test 1000 times to obtain the distribution of test scores P y j |x i , where j = 1, . . ., 71.

qVA test
The qVA (Fig. 3c) is a Bayesian active learning visual acuity test 27 .Its stimulus space consists of optotypes of 91 linearly spaced sizes from − 0.5 to 1.3 logMAR, with a 0.02 logMAR resolution.Starting with a weak prior distribution of VA threshold and VA range in a two-dimensional space that has 700 linearly spaced VA thresholds (between − 0.5 and 1.3 logMAR) and 699 log-linearly spaced VA ranges (between 0.1 and 1.5 logMAR), it uses an active learning procedure to test the observer with the optimal stimulus in each trial and generates the posterior distribution of VA threshold and range.Each simulated observer was tested with 5, 15, or 30 rows (corresponding to 15, 45, or 90 optotypes).The probability of correctly identifying m optotypes in a row is determined by Eq. ( 9) with M = 3 .We computed the mean VA threshold and range from their posterior distributions in each test and quantized them into 86 and 56 discrete scores with a 0.02 logMAR resolution, with a total of 4816 potential combinations.For each simulated observer x i , we repeated the test 1000 times to obtain the two-dimensional distribution of test scores P y j |x i , with j = 1, . . ., 4816 .We also computed the distribution of VA threshold P y Threshold,j |x i by marginalizing P y j |x i : where j = 1,…,86.

Information gain
We first computed P y j from P y i |x i for each test: where I = 1386 in both simulations.We then computed the total entropy of each test of a population: where J = 12, 71, 86, 4816 for the Snellen, ETDRS, VA threshold from qVA, and VA threshold and VA range from qVA.The expected residual entropy was computed as: and, finally expected information gain can be obtained:

Simulated observers
We conducted two simulations using the Pelli-Robson chart, CSV-1000 chart, and qCSF test (Fig. 4).In Simulation 1, we simulated 1911 observers from a uniform distribution of peak gain θ CSF PG , peak spatial fre- quency θ CSF PF , and band width θ CSF BH , with log 10 (θ CSF PG ) ∈ [0.3, 2.3] sampled every 0.10 log 10 units, log 10 (θ CSF PF ) ∈ [−0.3, 0.9] sampled every 0.10 log 10 units, and log 10 (θ CSF BH ) ∈ [0.3, 0.6] sampled every 0.05 log 10 units.In Simu- lation 2, we simulated 1911 observers from the population distribution of CSF parameters derived from two existing qCSF datasets, one consisted of 112 eyes tested in three luminance conditions 38 and the other of 14 eyes tested with Bangerter foils 28 , using a hierarchical Bayesian model 49,51 .The original experiments were conducted at the Ohio State University.Written consent was obtained from all the participants before the experiment.The study protocol was approved by the institutional review board of human subject research of The Ohio State University and adhered to the tenets of the Declaration of Helsinki.

Letter and grating contrast sensitivity functions
The letter CSF, which specifies contrast sensitivity S letter f for filtered letters of different sizes at center spatial frequency f , can be described with a log parabola function with three parameters where θ CSF PG is the peak gain, θ CSF PF is the peak spatial frequency (cycles/degree), and θ CSF BH is the bandwidth (octaves) at half-height.For an observer with peak gain θ CSF PG , peak spatial frequency θ CSF PF , and bandwidth θ CSF BH , the probability of correct identification of a bandpass-filtered optotype with contrast c and center spatial fre- quency f is described with a Weibull psychometric function 34 : where g is the guessing rate, = 0.04 is the lapse rate, and b determines the steepness of the psychometric func- tion.Because they both use a 10-alternative forced identification task, g = 0.1 and b = 4.05 for the Pelli-Robson chart and qCSF test.
The grating CSF, which specifies contrast sensitivity S grating f for gratings of the same size at spatial frequency f , needs to be corrected for the increased number of cycles with increasing spatial frequency 54 .For the grating stimuli used in the CSV-1000: For the yes/no task in the first column of the CSV-1000 test, we used a high-threshold model.That is, the simulated observer says yes if the stimulus contrast > threshold (= 1/S grating f |θ CSF ).The simulated observer says no otherwise.For the two-alternative forced choice task in CSV-1000, we replace S letter f |θ CSF with S grating f |θ CSF and set g = 0.5 , b = 3.06 in Eq. ( 16) to compute the probability of making a correct response.

Pelli-Robson chart
The Pelli-Robson chart (Fig. 4a) consists of 16 optotype triplets of the same size and log-linearly spaced contrast between 0.56 and 100% 33 .At a viewing distance of 3 m, the center frequency of the optotypes is 3 c/d.The   www.nature.com/scientificreports/probability of correctly identifying each optotype in the chart is determined by Eq. ( 16).Starting from the top row, the test proceeds to the next triplets only if the observer correctly identifies at least two of the three optotypes in the current triplet.The CS of the observer is determined by the lowest contrast c lowest at which they correctly identify at least two of the three letters in the triplet: S letter (3c/d) = −log 10 (c lowest ), with 17 potential contrast sensitivity scores.For each simulated observer x i , we repeated the test 1000 times to obtain the distribution of test scores P y j |x i , where j = 1, . . ., 17.

CSV-1000 chart
The CSV-1000 chart (Fig. 4b) consists of CS tests at four spatial frequencies: 3, 6, 12, and 18 cycles/degree.Each test consists of 17 stimuli arranged in nine columns, with a single high-contrast vertical sinewave grating in the first column, and two test patches in the remaining eight columns, of which only one contains a vertical sinewave grating.The gratings are arranged with decreasing contrast from left to right, with contrast from − 0.70 to − 2.08, − 0.91 to − 2.29, − 0.61 to − 1.99, and − 0.17 to − 1.55 log10 units in the four rows.Going through all four rows starting from the top, the observer is first required to perform a yes/no task on the first column in each row.If the observer can't see the stimulus in the first column, the test stops for that row and the observer's contrast sensitivity is: If the observer can see the stimulus in the first column, they proceed to identify the location of the patch that contained the grating in each column with a three-alternative forced choice response: top, bottom, or blank.We treat blank as an incorrect response.The lowest contrast at which the observer correctly identifies the location of the grating is used to determine CS in the test: S f = −log 10 (c lowest (f ) ).The result is a four-dimensional CS score sampled at four spatial frequencies.Because there are 10 potential CS scores in each spatial frequency, there are therefore a total of 10 4 potential CS functions.For each simulated observer x i , we repeated the test 1000 times to obtain the distribution of test scores P y j |x i , where j = 1, . . ., 10 4 .

qCSF test
The qCSF (Fig. 4c) is a Bayesian active learning contrast sensitivity test 20,34 .Its stimulus space consists of 128 loglinearly spaced contrasts (from 0.002 to 1.0) and 19 log-linearly spaced spatial frequencies (from 1.19 to 30.95 c/d).Although a four-parameter truncated log parabola has been used in the qCSF test 20 , we removed the truncation parameter in the simulations because we didn't score the simulated observers in very low spatial frequencies.Starting with a weak prior distribution of peak gain, peak frequency and bandwidth in a three-dimensional space that has 60 log-linearly spaced peak gains (from 1.05 to 1050), 40 log-linearly spaced peak frequencies (from 0.1 to 20 c/d), and 27 log-linearly spaced bandwidth (from 1 to 9 octaves), it uses an active learning procedure to test the observer with the optimal stimuli in each trial and computes the posterior distribution of the three CSF parameters in the afore-mentioned three-dimensional space.In each trial, three filtered optotypes with the same center spatial frequency but four, two, and one times the optimal contrast (capped at 0.9) are presented.The observer could be tested with 15, 25 and 50 trials.For each simulated observer x i , we repeated the test 1000 times to obtain distributions of the unidimensional AULCSF P y AULCSF,j |x i and the six-dimensional CSF score P y CSF,j |x i at six spatial frequencies (1, 1.5, 3, 6, 12 and 18 c/d).Sampling the scores at 0.05 log10 resolution, j = 1, …, 57 for y AULCSF,j , and j = 1,…, 253,492 for y CSF,j .

Visual acuity tests
The VA threshold and VA range distributions of the observers in the two simulations are shown in Fig. 5a,b.Figure 5c shows distributions of the test scores P y j |x i of one representative simulated observer x i in the Snel- len, ETDRS (3-mistake rule), and qVA test, with results from the qVA test scored as VA-alone, and as both VA threshold and VA range.Figure 5d,e show the distributions of the test scores P y j of the populations in Simula- tions 1 and 2, respectively.Because a uniform X distribution is used in Simulation 1, the corresponding P y j ′ s from the tests are nearly uniform.On the other hand, P y j ′ s in Simulation 2 are more concentrated because the population is more concentrated.The total entropy H(Y ) , the expected residual entropy H(Y |X), the expected information gain IG(Y |X) , the expected number of classes N(Y |X) , the average number of optotypes tested, the expected information gain per optotype tested, and the ratio of expected information gain versus the log 2 of the number of optotypes tested from the three tests in the two simulations are listed in Table 1.The expected information gain IG(Y |X) is also plotted in Fig. 5f.As expected, H(Y ) in Simulation 2 is less than the corresponding H(Y ) in Simulation 1 for all the tests because of the more concentrated P y j resulted from a narrower range of simulated observers.As a result, IG(Y |X) and N(Y |X) in Simulation 2 are less than those in Simulation 1.
To check the validity of the simulations, we also estimated expected information IG(Y |X) of the Snellen and ETDRS tests from their reported test-retest variabilities (TRV = 1.96 σ ), with the assumption that the outcome distributions are normal and have the same TRV for observers with different acuities.For both tests, the outcome scores cover − 0. www.nature.com/scientificreports/L = 1.3 logMAR.For the Snellen chart, the typical TRV is 0.23 logMAR 35 , with σ = TRV 1.96 = 0.117 logMAR, and the expected information gain is 1.4 bits.For the ETDRS chart, the typical TRV is 0.11 logMAR 35 , with σ = TRV 1.96 = 0.056 logMAR, and the expected information gain is 2.5 bits.These estimated values are largely consistent with our results in Simulation 1.
For the qVA test, we also computed expected information gain from an existing dataset with 14 eyes tested in four Bangerter foil conditions 28 .In this dataset, VA threshold is between − 0.15 and 0.68 logMAR, and VA range is between 0.12 and 0.63 logMAR.We computed the total entropy, residual entropy and information gain based on the posterior distributions from single tests rather than distributions of test scores from repeated tests.In the qVA test, the posterior distributions from single tests are broader than those derived from repeated tests until about 45 optotypes are tested and converge to those from repeated tests afterwards 28 .The expected information gain based on VA threshold alone is 1.6, 2.6 and 2.9 bits after qVA test with 15, 45 and 90 optotypes, respectively, and the expected information gain based on VA threshold and range is 2.1, 3.2, and 3.6 bits after qVA test with 15, 45 and 90 optotypes, respectively.These results are largely consistent with those from Simulation 2.
In both simulations, ETDRS generated more expected information gain than Snellen.Scored with VA threshold alone or with both VA threshold and VA range, qVA with 15 rows (or 45 optotypes) generated more expected information gain than ETDRS.In terms of expected information gain per optotype tested, the qVA test with 45 optotypes scored with VA alone and with both VA and VA range was more efficient than the Snellen chart, which was in turn more efficient than the ETDRS chart.The different efficiencies were caused by different test designs and the distributions of acuity behavior used in the study.Interestingly, the ratio of expected information gain and the log 2 of the number of optotypes tested in qVA was essentially constant across test lengths and was www.nature.com/scientificreports/greater than the corresponding scores in ETDRS and Snellen.The results suggest that the expected information gain increased linearly with the log 2 of the number of optotypes in the qVA test.

Contrast sensitivity tests
The peak gain, peak spatial frequency, and bandwidth distributions of the observers in the two simulations are shown in Fig. 6a,b.Figure 6c shows distributions of the test scores P y j |x i of one representative simulated observer x i in the Pelli-Robson, CSV-1000, and qCSF test, with results from the qCSF test scored as AULCSF, and CSF at six spatial frequencies.Figure 6d,e show the distributions of the test scores P y j of the populations X in Simulations 1 and 2, respectively.Because a uniform X distribution is used in Simulation 1, the correspond- ing P y j ′s from the tests are more uniform.On the other hand, P y j ′s in Simulation 2 are more concentrated because the observers are sampled in a narrower range.The total entropy H(Y ) , the expected residual entropy H(Y |X), the expected information gain IG(Y |X) , the expected number of classes N(Y |X), the average number of optotypes tested, the expected information gain per optotype tested, and the ratio of expected information gain versus the log 2 of the number of optotypes tested from the three tests in the two simulations are listed in Table 2.The expected information gain IG(Y |X) is also plotted in Fig. 6f.As expected, H(Y ) in Simulation 2 is less than the corresponding H(Y ) in Simulation 1 in all the tests because of the more concentrated P y j resulted from a narrower range of simulated observers.As a result, IG(Y |X) and N(Y |X) in Simulation 2 are less than those in Simulation 1.
To check the validity of the simulations, we also estimated information IG(Y |X) of the Pelli-Robson test from its reported test-retest variabilities (TRV = 1.96 σ ), with the assumption that the outcome distributions are nor- mal and have the same TRV for all the observers.For the test, the outcome scores cover 0 to 2.25 log 10 contrast sensitivity, with a typical TRV between 0.15 and 0.20 log10 in the normal population 55 .Therefore, L = 2.25 log 10 CS, σ = TRV 1.96 is between 0.077 and 0.10 log 10 CS, and the expected information gain is between 2.4 and 2.8 bits.These estimated values are largely consistent with the results in Simulation 1.
For the qCSF test, we also computed information gain from two existing datasets, one with 112 subjects tested binocularly in three luminance conditions 38 and the other with 14 eyes tested monocularly in four Bangerter foil conditions 28 .In this dataset, peak gain is between 0.92 and 2.27 log 10 CS, peak spatial frequency is between 0.21 and 3.9 c/d, and the bandwidth is between 1.8 and 5.7 octaves.We computed the total entropy, expected residual entropy, and expected information gain based on the posterior distributions from single tests rather than distributions of test scores from repeated tests.In the qCSF test, the posterior distributions from single tests are broader than those derived from repeated tests until about 25 trials are tested and converge to those from repeated tests afterwards 34 .The expected information gain based on AULCSF is 2.0, 2.4, and 2.9 bits after qCSF test with 15, 25 and 50 trials, respectively, and the expected information gain based on CSF at six spatial frequencies is 4.1, 4.6, and 5.3 bits after qCSF test with 15, 25 and 50 trials, respectively.Again, the results from the dataset are largely consistent with those from Simulations 2.
In both simulations, CSV-1000 generated more expected information gain than the Pelli-Robson test.Scored with AULCSF or with CSF at six spatial frequencies, qCSF with 25 trials generated more expected information gain than CSV-1000.In terms of expected information gain per optotype tested, the CSV-1000 test was the most efficient in Simulation 1, and qCSF with 15 rows of trials was the most efficient in Simulation 2. The variations in efficiency were caused by different test designs and the distributions of the populations used in the study.Interestingly, the ratio of expected information gain versus the log 2 of the number of optotypes tested in qCSF was essentially constant across test lengths and was greater than the corresponding scores in Pelli-Robson in Simulation 1 and both Pelli-Robson and CSV-1000 in Simulation 2. The results suggest that the expected information gain increased linearly with the log 2 of the number of optotypes in the qCSF test.

Discussion
In this study, we introduced a concept from information theory, called expected information gain (or mutual information), to quantify the amount of new knowledge that can be obtained from measurement on a population.This concept allows us to compare measurements with different dimensionalities and assess the potential advantages of new measurements that generate distinct or higher dimensional data compared to the current gold standard.Importantly, it focuses on the actual knowledge gained through the measurements, surpassing the mere quantification of measurement uncertainties.We demonstrated two equivalent approaches for computing expected information gain: (1) Reduction of uncertainty This approach gauges the reduction in uncertainty regarding the "truth" (the property to be measured) after the measurement.(2) Difference in uncertainty The second approach calculates the difference between the uncertainty associated with all possible measurement outcomes in a population and the expected residual uncertainty after the measurement.
In both approaches, the key idea is that expected information gain quantifies the increase in knowledge achieved through measurement.This knowledge gain is greater when there is more uncertainty initially and/or

Appendix 1: Proof of equivalence
It's that information gain is symmetric, that is, IG(Y|X) = IG(X|Y) .We can prove this by substitut- ing P x|y with its Bayesian computation in Eq. (2):

Appendix 2: Multi-optotype acuity behavioral psychometric functions
From the log(dʹ) acuity behavioral psychometric function for a single-optotype in Eq. ( 8) (Fig. 7a), we can derive the percent correct psychometric function for a single optotype for the N-alternative forced optotype identification (N-AFC) task based on signal detection theory (Fig. 7b), with N = 10 in the Snellen, ETDRS and qVA tests 21 .
We then take into account of the chart design (i.e., optotype sampling with or without replacement) to derive the probability of correctly identifying different numbers optotypes in each row of a test.The multi-optotype psychometric functions are shown for the qVA (Fig. 7c) and ETDRS (Fig. 7d) tests.Specially, the three optotypes in each test row are sampled from 10 Sloan letters with replacement in the qVA test, while the five optotypes in each test row are sampled from 10 Sloan letters with replacement in the ETDRS test.

Figure 1 .
Figure 1.(a) Probability distribution of a quantity x occurring in population X, P(x) . (b) Probability distribution of obtaining measurement y given x is the true value being measured: P y|x . (c) Probability distribution of obtaining a measurement y from all potential measurement outcomes Y regardless of the underlying true value, P(y) .(d) Posterior distribution of X, P x|y , following a measurement outcome y .(e) Expected information gain IG(X|Y ) is the difference between H(X) and the expected posterior entropy H(X|Y ) .Expected information gain IG(Y |X) is the difference between H(Y ) and the expected residual entropy H(Y |X) .IG(X|Y ) = IG(Y |X).

Figure 2 .
Figure 2.An illustration of expected information gain from measurement of the length of pencils using a ruler with unit .(a) A uniformly distributed pencil length X , P(x) . (b) Outcome distribution from measuring the length of a pencil with true length x 0 , P y|x 0 . (c) A uniformly distributed outcomes Y , P y .(d) Posterior distribution of x based on measurement outcome y 0 , P(x|y 0 ). https://doi.org/10.1038/s41598-023-43913-1 For each simulated observer, the discriminability ( d ′ ) for an optotype of size s is described by the VA behavioral function (Patent No. US 10758120B2) 27 : where θ VA = (θ VA Threshold , θ VA Range ) , θ VA Threshold is VA threshold, corresponding to the optotype size at d ′ = 2, θ VA Range is the VA range of the behavioral function, that is, the range of optotype sizes that covers d ′ =1 to d ′ = 4 performance levels, and ω = log 10 35 − log 10 1.25.

Figure 3 .
Figure 3. (a) A Snellen chart (Image courtesy of Precision Vision, Inc).(b) An ETDRS chart (Image courtesy of Precision Vision, Inc).(c) A subset of potential stimuli in qVA.

Figure 5 .
Figure 5. Distributions and information gain in visual acuity tests.(a,b) Distributions of VA threshold and VA range of the simulated observers in Simulations 1 and 2. (c) From left to right: Distributions of the test scores P y j |x i of one representative simulated observer x i in the Snellen, ETDRS (3-mistake rule), and qVA test (45 optotypes), with results from the qVA test scored as VA-alone, and as both VA threshold and VA range.(d,e) Distributions of the test scores P y j of the populations in Simulations 1 and 2, respectively. (f) Expected information gain IG(Y |X) of the various tests in Simulation 1 (left) and Simulation 2 (right).For ETDRS, results from the 3-, 4-, 5-mistake and whole chart rules are shown.For qVA, results from testing with 15, 45, and 90 optotypes are shown.The blue bars represent IG(Y |X) ; the gray bars represent residual entropy H(Y |X).

Figure 6 .
Figure 6.Distributions and information gain in contrast sensitivity tests.(a,b) Distributions of peak gain, peak spatial frequency, and bandwidth of the observers in Simulations 1 and 2. (c) From left to right: Distributions of the test scores P y j |x i of one representative simulated observer x i in the Pelli-Robson, CSV-1000, and qCSF test (25 trials), with results from the qCSF test scored as AULCSF, and CSF at six spatial frequencies.(d,e) Distributions of the test scores P y j of the populations in Simulations 1 and 2, respectively. (f) Expected information gain IG(Y |X) of the various tests in Simulation 1 (left) and Simulation 2 (right).For qCSF, results from testing with 15, 25, and 50 trials are shown.The blue bars represent IG(Y |X) ; the gray bars represent residual entropy H(Y |X).

Figure 7 .
Figure 7. Multi-optotype acuity behavioral psychometric functions.(a) The dʹ psychometric function for a single optotype for an observer with VA threshold = 0.2 logMAR and VA range = 0.25 logMAR.(b) The percent correct psychometric function for single-optotype in a 10-AFC task for the observer in (a).(c) The multioptotype psychometric functions of the observer in the qVA test, i.e., the probabilities of correctly identifying 0, 1, 2, or 3 of the three optotypes in a test row as functions of optotype size of the row.(d) The multi-optotype psychometric functions of the observer in the ETDRS test, the probabilities of correctly identifying 0, 1, 2, 3, 4 or 5 of the five optotypes presented in a test row as functions of optotype size of the row.

Table 1 .
Entropy, information gain and number of classes from visual acuity tests.

Table 2 .
Entropy, information gain and number of categories from contrast sensitivity tests.