Rapid online assessment of reading ability

Yeatman, Jason D.; Tang, Kenny An; Donnelly, Patrick M.; Yablonski, Maya; Ramamurthy, Mahalakshmi; Karipidis, Iliana I.; Caffarra, Sendy; Takada, Megumi E.; Kanopka, Klint; Ben-Shachar, Michal; Domingue, Benjamin W.

doi:10.1038/s41598-021-85907-x

Download PDF

Article
Open access
Published: 18 March 2021

Rapid online assessment of reading ability

Jason D. Yeatman^1,2^na1,
Kenny An Tang^1,2^na1,
Patrick M. Donnelly^3,4,
Maya Yablonski^1,2,5,
Mahalakshmi Ramamurthy^1,2,
Iliana I. Karipidis⁶,
Sendy Caffarra^1,2,7,
Megumi E. Takada^1,2,
Klint Kanopka²,
Michal Ben-Shachar^5,8 &
…
Benjamin W. Domingue²

Scientific Reports volume 11, Article number: 6396 (2021) Cite this article

8748 Accesses
13 Citations
30 Altmetric
Metrics details

Subjects

Abstract

An accurate model of the factors that contribute to individual differences in reading ability depends on data collection in large, diverse and representative samples of research participants. However, that is rarely feasible due to the constraints imposed by standardized measures of reading ability which require test administration by trained clinicians or researchers. Here we explore whether a simple, two-alternative forced choice, time limited lexical decision task (LDT), self-delivered through the web-browser, can serve as an accurate and reliable measure of reading ability. We found that performance on the LDT is highly correlated with scores on standardized measures of reading ability such as the Woodcock-Johnson Letter Word Identification test (r = 0.91, disattenuated r = 0.94). Importantly, the LDT reading ability measure is highly reliable (r = 0.97). After optimizing the list of words and pseudowords based on item response theory, we found that a short experiment with 76 trials (2–3 min) provides a reliable (r = 0.95) measure of reading ability. Thus, the self-administered, Rapid Online Assessment of Reading ability (ROAR) developed here overcomes the constraints of resource-intensive, in-person reading assessment, and provides an efficient and automated tool for effective online research into the mechanisms of reading (dis)ability.

On the relations between letter, word, and sentence-level processing during reading

Article Open access 22 October 2022

The metrics of reading speed: understanding developmental dyslexia

Article Open access 19 February 2024

Rapid online assessment of reading and phonological awareness (ROAR-PA)

Article Open access 04 May 2024

Why is learning to read almost effortless for some children and a continuous struggle for others? Understanding the mechanisms that underlie individual differences in reading abilities, and contribute to reading disabilities, is an important scientific challenge with far reaching practical implications. There is near consensus on the importance of foundational skills like phonological awareness for reading development^1,2,3. However, there are a myriad of other sensory, cognitive and linguistic abilities whose link to reading is fiercely debated^{4,5,6,7,8,9,10,11}. For example, hundreds of papers, published over the course of four decades, have debated the causal link between deficits in the magnocellular visual pathway and developmental dyslexia^{6,12,13,14,15,16,17,18}. Similarly contentious is the link between rapid auditory processing and developmental dyslexia^{19,20,21,22,23}. More recently, the role of various general learning mechanisms (e.g., statistical learning, sensory adaptation) in reading development has become a topic of great interest^{24,25,26,27,28}.

One reason these debates persist is that most findings reflect data from relatively small samples collected in a single laboratory and, therefore, findings are not necessarily representative of the population^29,30,31. There are two major factors that limit the feasibility of collecting data in large, diverse, representative samples. First, experiments measuring specific sensory and cognitive processes typically depend on software installed on computers in a laboratory. Second, standardized measures of reading ability typically require a trained test administrator to administer the test to each research participant. These two factors: (a) make it time consuming and costly to recruit and test large samples, (b) prohibit including research subjects that are more than a short drive from a university, (c) bias samples towards university communities and (d) create major barriers for the inclusion of under-represented groups.

Over the past decade, advances in software packages for running experiments through the web-browser have catalyzed many areas of psychological research to pursue larger, more diverse and representative samples. For example, software packages such as jsPsych^32,33, Psychopy³⁴, Gorilla³⁵ and others, make it feasible to present stimuli and record behavioral responses in the web-browser with temporal precision that rivals many popular software platforms for collecting data in the laboratory³⁶. However, online experiments have yet to make major headway in the developmental and educational spheres, specifically in reading and dyslexia research. Standardized testing of reading ability still depends on individually administered tests. Thus, even though a researcher might be able to accurately measure processing speed or visual motion perception in thousands of subjects through the web-browser, they would still need to individually administer standardized reading assessments to each participant.

Our goal here was to develop an accurate, reliable, expedient and automated measure of single word reading ability that could be delivered through a web-browser. Specifically, we sought to design a web experiment to approximate scores on the Woodcock-Johnson Letter Word Identification (WJ-Word-ID) test³⁷. The WJ-Word-ID is one of the most widely used standardized measures of reading ability in research (and practice). Like other test batteries that contain tests of single word reading ability (e.g., The NIH Toolbox³⁸, Wide Range Achievement Test³⁹, Wechsler Individual Achievement Test⁴⁰), the WJ-Word-ID requires that an experimenter or clinician present each research participant with single printed words for them to read out loud. Participant responses are manually scored by the test administrator as correct or incorrect based on accepted rules of pronunciation. The words are organized in increasing difficulty such that the number of words that a participant correctly reads is also a measure of the difficulty of words that the participant can read. Due to excellent psychometric properties, and widely accepted validity, the Woodcock-Johnson and a compendium of similar tests are at the foundation of thousands of scientific studies of reading development. Unfortunately, tests like the Woodcock-Johnson are not yet amenable to automated administration through the web-browser because speech recognition algorithms are still imperfect, particularly for children pronouncing decontextualized words and pseudowords.

In considering a suitable task for a self-administered, browser-based measure of reading ability, the lexical decision task (LDT) is a good candidate for practical reasons. Unlike naming or reading aloud, the LDT can be: (a) scored automatically without reliance on (still imperfect) speech recognition algorithms, (b) completed in group (or public) settings such as a classroom and (c) administered quickly as each response only takes, at most, 1–2 s. The LDT has a rich history in the cognitive science literature as means to probe the cognitive processes underlying visual word recognition. It is broadly assumed that some of the same underlying cognitive processes are at play when participants make a decision during a two-alternative forced choice (2AFC) lexical decision as during other word recognition tasks (e.g. naming^41,42). However, there are also major differences between the processes at play during a lexical decision versus natural reading^43,44. For example reading aloud requires mapping orthography to a phonetic and motor outputs with many possible pronunciations (as opposed to just two alternatives in an LDT). Moreover, selection of the correct phoneme sequence to pronounce a word, or even the direct mapping from orthography to meaning (as is posited in some models⁴⁵), certainly does not require an explicit judgement of lexicality. Thus, from a theoretical standpoint there is reason to be optimistic that carefully selected LDT stimuli will tap into the key mechanisms that are measured by other, validated measures of reading ability (e.g., WJ-Word-ID). However, concerns about face validity would need to be dispelled through extensive validation against standardized measures since there are also clear differences between lexical decisions and reading aloud.

There is some empirical support for LDT performance being linked to reading ability. For example, Martens and de Jong demonstrated that response times (RT) on the LDT differed between children with dyslexia compared to children with typical reading skills⁴⁶. Specifically, they found larger word-length and lexicality effects in children with dyslexia compared to children with typical reading skills. These effects seem to track reading ability, as opposed to being a specific marker of dyslexia: reading matched controls; i.e., younger children matched to the children with dyslexia in terms of raw reading scores showed similar word-length and lexicality effects as found in the older children with dyslexia. Similar to our goal here, Katz and colleagues examined the LDT in combination with a Naming Task, as predictors of standardized reading measures⁴⁷. They found that average response times on the LDT (when combined with the Naming Task) correlated with a variety of reading ability composite scores, including the Woodcock Johnson Basic Reading Skills⁴⁷. However, their study primarily looked at young adults (median age of 21.5 years), and focused on RT as the independent variable, since most participants were at ceiling in terms of accuracy. It is not clear how well these effects would generalize to children learning to read. Moreover, effect sizes were relatively small, suggesting that RT on the LDT may not be the best predictor of reading ability, or that the range of stimuli was not broad enough. More recently, Dirix, Brysbaert and Duyck (2019) examined the relationship between LDT RT and eye tracking measures during natural reading and came to the conclusion that RT varies substantially across contexts and is not a stable measure of word recognition⁴⁴. Taken together, there is some evidence that (a) LDT performance is related to reading ability but (b) RT does not have suitable psychometric properties to construct a standardized measure. However, to our knowledge, no previous study has examined LDT response accuracy as a measure of word recognition ability in the context of development.

Here we sought to develop an implementation of the LDT in the web-browser, along with a suitable list of words and pseudowords, to efficiently and accurately measure reading ability across a broad age range, spanning first grade through adulthood. Specifically, we explore whether correct/incorrect responses to words of varying difficulty levels can serve as an accurate and reliable measure of reading ability. We, first, report results from a study in which we developed, validated and optimized a browser-based LDT measure of reading ability (Study 1). In Study 1 we employed a long list of words and pseudowords, spanning a broad range of lexical and orthographic properties. The goals of Study 1 were to, first, assess the feasibility of using a browser-based LDT to estimate reading ability and, second, leverage the item level data in combination with item response theory (IRT) to construct a collection of short-form tests with optimal psychometric properties. We then report a second study (Study 2) validating the optimized measure from Study 1 to ensure that scores are reliable and consistent with standardized in-person measures: (a) for young children (6–7 year-olds) and (b) across a broad sample spanning early childhood through adulthood. We release the Rapid Online Assessment of Reading (ROAR), and the associated data and code, for other researchers to use (https://github.com/yeatmanlab/ROAR-LDT-Public).

Results: study 1

Lexical decision task performance and reading ability

To examine whether response times (RT) recorded in the online LDT replicated classic findings in the literature we fit linear mixed effects models to log transformed RT data for all correct trials. Each model included random intercepts for each word, random intercepts for each subject and slopes for each within-subject effect. We examined six different models containing fixed effects of (a) lexical status (real word vs. pseudoword), (b) log transformed lexical frequency (real words only), (c) log transformed bigram frequency, (d) word length, (e) the interaction effect of word length and participant reading ability as measured with the WJ-Word-ID and (f) lexical status, bigram frequency, word length and their interactions. As expected, we found: (a) RT was longer for pseudowords compared to real words (β = 0.11, SE = 0.0098, t = − 11.35) (similar to Ref.⁴⁸); (b) responses to real words were faster with increasing lexical frequency (β = − 0.021, SE = 0.0022, t = − 9.84) (similar to Refs.^48,49,50); (c) responses to pseudowords were slower with increasing bigram frequency (β = 0.035, SE = 0.0062, t = 5.65) (similar to Refs.^48,51,52) and responses to real word were not affected by bigram frequency (β = − 0.0033, SE = 0.0079, t = − 0.43); (d) responses to words and pseudowords were slower with increasing stimulus length (β = 0.031, SE = 0.0046, 3% increase per letter, t = 6.6) (similar to Ref.⁴⁸); and (e) there was a significant length by reading ability interaction (β = − 0.010, SE = 0.0036, t = 2.76) indicating that the effect of word length on reading speed diminishes as reading ability improves (similar to Refs.^46,48). When lexical status, bigram frequency, word length and their interactions were entered into a single model (f), all the main effects were significant; the only significant interaction was between lexical status and bigram frequency indicating that bigram frequency influenced RT for pseudo but not real words (β = − 0.032, SE = 0.0057, t = − 5.87).

Stimuli for the LDT spanned a broad range of lexical and orthographic properties (Supplementary Fig. 1), with the goal of providing a sufficiently large difficulty range for detecting associations between accuracy on the LDT and reading ability measured with the WJ-Word-ID. Indeed, overall accuracy, calculated as the proportion of correct responses on the 500 LDT trials, was highly correlated with WJ-Word-ID (Fig. 1 left panel, r = 0.91, p < 0.00001, disattenuated r = 0.94). Accuracy for pseudowords (250 trials) had a slightly higher correlation with WJ-Word-ID (r = 0.86, p < 0.00001) than did accuracy for real words (r = 0.74, p < 0.00001), and this difference in correlation approached significance based on William’s test (p = 0.08; Fig. 1 right panel).

In contrast, median RT across correct responses in the LDT was not correlated with WJ-Word-ID (r = − 0.06, p = 0.69). No correlations were found when median RT was calculated separately for words and pseudowords (r = − 0.13, r = − 0.08, respectively), even though RTs followed the expected patterns in other analyses (e.g., lexicality effects, see preceding paragraph). However, when the RT analysis was limited to the subset of the subjects who performed above 70% correct (N = 28 with WJ-Word-ID scores), a significant negative correlation was found between RT and WJ-Word-ID (r = − 0.53, p < 0.005). The correlation between WJ-Word-ID and median RT for real words (r = − 0.53, p < 0.005) was similar to the correlation with median RT for pseudowords (r = − 0.52, p < 0.005), and this difference was not significant according to William’s test (p = 0.79). Thus, response accuracy is a much stronger predictor of reading ability than RT (William’s test t = − 5.72, p < 0.00005), and RT is only informative in relatively high performing subjects.

The impact of participant characteristics on the LDT-WJ correlation was examined through mediation and moderation analyses. The relation between LDT accuracy and WJ-Word-ID raw scores was not moderated by WJ-Word-ID standard scores, age, or number of months since testing (t = 0.71, p = 0.48; t = 0.12, p = 0.91; t = 1.24, p = 0.22). These results were further confirmed by a mediation analysis (CI [− 0.001, 0.00], p = 0.69; CI [− 0.001, 0.00], p = 0.87; CI [− 0.0002, 0.00], p = 0.49). Together, these analyses suggest that LDT accuracy is a good approximation of standard reading measures for children with reading profiles ranging from severely impaired to exceptional, and ages between 6 and 18 years. It also suggests that LDT accurately predicts reading ability measured within the past 12 months.

Performance across different measures of reading ability is typically highly correlated. Indeed, different tests are frequently used interchangeably in different studies. For example, WJ-Word-ID scores and TOWRE-SWE scores are typically highly correlated (r = 0.93 in our sample), even though the former is an untimed test while the latter is a timed test that combines real word reading accuracy and speed. In fact, it is common to use a threshold on either test to group subjects into dyslexic versus control groups in many research studies^21,53,54,55. The average Pearson correlation between the timed (TOWRE), untimed (WJ), real word (Word-ID and SWE) and pseudoword (Word Attack and PDE) standardized reading measures was r = 0.83 (range: r = 0.72 to r = 0.93; Fig. 1 right panel). Given the correlation between LDT accuracy and WJ-Word-ID, it is not surprising that LDT accuracy was also significantly correlated with all the standardized reading measures (Fig. 1 right panel). The correlations between LDT accuracy and Vocab (r = 0.64) and Matrix Reasoning (r = 0.56) were also nearly identical to the correlations between other reading measures and verbal abilities (r = 0.61 to 0.73) and reasoning abilities (r = 0.34–0.60, Fig. 1 right panel). Thus, LDT response accuracy seems a suitable measure of reading ability as it behaves in a similar manner to the other established, standardized reading measures.

Item response theory analysis of responses to individual words and pseudowords

Next we turn to the challenge of constructing a LDT with suitable psychometric properties for applicability in a research setting. The word list used in Study 1 was constructed to broadly sample different types of words and pseudowords. In doing so, our goal was to use item response theory (IRT) to select the optimal subset of stimuli that measure reading ability reliably and efficiently across the full continuum of reading abilities. Our approach was to remove items that were: (a) not correlated with overall test performance and WJ-Word-ID performance, (b) not well fit by the Rasch model and (c) examine item disrcimination based on the two parameter logistic (2PL) model. We then constructed an optimized set of three short stimulus lists, matched in terms of difficulty.

Step 1: Remove words that are not predictive of overall performance

As a first step, we computed the correlations between item responses, LDT accuracy calculated on the full 500 item test, and WJ-Word-ID scores. Performance on some stimuli, like the pseudoword “insows”, was highly correlated with overall test performance (r = 0.61, p < 0.00001) and WJ-Word-ID score (r = 0.60, p = 0.000035). This means that a participant’s response to that pseudoword was highly informative of their reading ability. Other stimuli, like the pseudoword “timelly”, and the real words “napery” and “kind”, provided no information about overall test performance, as indicated by correlations near (or below) zero. We removed 71 items that did not surpass a lenient threshold of r = 0.l0. Figure 2 shows words that were rejected (red) and retained (blue) based on this criterion.

Step 2: Remove words that are not well fit by the Rasch model

The Rasch model (1 parameter logistic with a guess rate fixed at 0.5)⁵⁶ was fit to the response data for the 429 items that remained after Step 1 for all 118 subjects, using the MIRT package in R⁵⁷. Items were removed if the infit or outfit statistics were outside the range of 0.6–1.4, a relatively lenient criterion intended to remove items with unpredictable responses⁵⁸. The model was iteratively refitted until all items met this criterion. The process resulted in the removal of 129 items with response data that did not fit well with the predictions of the Rasch model (Fig. 2 right panel, red items).

Step 3: Examining item discrimination with the two parameter logistic model

Next we fit the two parameter logistic (2PL) model (guess rate fixed at 0.5) to the response data, for the 300 items that were retained after Step 2. The 2PL model fits item response functions with a difficulty parameter (b) as well as a slope parameter (a); we sought to remove any remaining items that were inefficient at discriminating different levels of reading ability based on the slope of the item response function. The right panel of Fig. 2 shows the item difficulty and discrimination parameters from the 2PL model. We removed 1 item with a low slope (a < 0.7). Of the remaining 299 items, 114 were real words and 185 were pseudowords.

To better understand the lexical and orthographic properties of the items that were selected for the final test, we compared the retained- versus rejected-items based on word length and log frequency of their orthographic neighbors (see Supplementary Fig. 2). We found a significant effect for word length (Kruskall–Wallis H(3) = 26.6, P < 0.0001): Retained real words were significantly longer than rejected real words (p < 0.0001) and retained pseudowords (p = 0.050). We expected that the log frequency of the orthographic neighbors would align with the word length effect reported above (i.e., the final set of retained words would have less frequent orthographic neighbors compared to the rejected words). Our analysis confirmed this expectation (n = 359, Kruskall–Wallis H(3) = 14.5, p = 0.0023). Indeed, pairwise comparisons confirmed that Retained real words had significantly less frequent orthographic neighbors compared to the rejected real words (p < 0.0004). Comparisons between retained and rejected items for bigram frequency, trigram frequency and number of morphologically complex words did not lead to any significant difference (all ps < 0.05). Therefore, real words that are longer and have less frequent orthographic neighbors are retained as a result of the IRT analysis indicating their utility for measuring reading ability.

Step 4: Test optimization

Finally, we sought to create an efficient and reliable test that was composed of three short word-lists with the following properties: (a) quick to administer through the web-browser, (b) balanced in number of real words and pseudowords, (c) matched in terms of difficulty and (d) optimally informative across the range of reading abilities. Based on feedback from subjects in Study 1, we surmised that a list of about 80 items was a suitable length for maintaining focus. Thus, out of the 299 items that remained after Steps 1–3, we constructed three lists with items that were equated in difficulty based on b parameters (estimated with the Rasch model). We further ensured that: (a) items spanned the full difficulty range to maximize test information for the lowest and highest performing participants and (b) real and pseudowords were matched in terms of length on each list. Since only 114 real words were retained after the optimization procedure described above, 76-item lists were generated with 38 pseudowords and 38 real words.

Figure 3 shows the test information function for each of the three lists. The composite of the three word lists (234 item; scores combined across lists) was extremely reliable with a reliability of 0.97. The composite LDT has a reliability comparable to those of the WJ and TOWRE composite indices: WJ Basic Reading Skills, average reliability = 0.95³⁷, and TOWRE Total Word Reading Efficiency, average reliability = 0.96⁵⁹. Performance on the individual word lists was also highly reliable, with an ICC of 0.95 (Fig. 3 upper right; about the same or higher than the individual TOWRE and WJ test forms: WJ Word Attack reliability = 0.90 and WJ Letter Word Identification reliability = 0.94³⁷; TOWRE Sight Word Efficiency reliability = 0.91 and TOWRE Phonemic Decoding Efficiency reliability = 0.92⁵⁹). Performance on each individual word list was highly correlated with WJ-Word-ID scores (Fig. 3 bottom panel; List A: r = 0.88; List B: r = 0.85; List C: r = 0.87) suggesting that administering a single 80-item lexical decision task (~ 2–3 min) would be a suitable measure of reading ability for some research purposes.

Results: study 2

Validation of optimized lexical decision task in young children

We created a new version of the browser-based LDT (version 2) based on the optimized word lists from Study 1. To improve precision in young children, twelve new words were selected from grade level text (1st and 2nd grade) and pseudoword pairs were created for each new real word (see “Methods” and Supplementary Table 1 for details). 24 children between the ages of 5y11m and 7y10m (K—2nd grade) were recruited to test whether the browser-based LDT was an accurate measure of reading ability in young children. Each child completed version 2 of the LDT and the WJ-Word-ID was administered to each child over Zoom. All children were able to follow the instructions (which are narrated by a voice actor) and complete the task in their web-browser without assistance from a parent or researcher. Performance on the LDT was highly correlated with WJ-Word-ID in this young sample (r = 0.96, Fig. 4).

Validation of optimized lexical decision task in children and adults

After confirming that young children could complete the browser-based LDT, we collected data on a broad age range (N = 116; 5y11m–42y7m) and administered the WJ-Word-ID to 84 of these subjects. As expected based on Study 1, the optimized LDT was highly correlated with WJ-Word-ID scores (r = 0.90). Moreover, the measure was highly reliable with an intraclass correlation⁶⁰ of 0.91 for individual lists and 0.97 for the composite of the three lists.

Examination of stimulus list and block order effects

The version 2 stimulus lists were designed to be used interchangeably. Figure 4 shows performance on the three stimulus lists (N = 118); a mixed effects model confirmed that accuracy (i.e., number correct) did not differ significantly across the three lists (List B vs. List A: β = − 0.33, SE = 0.52, t = − 0.63, p = 0.53; List C vs. List A: β = 0.40, SE = 0.52, t = 0.77, p = 0.44). Moreover, the probability of a correct response did not change over the 252 trials (β = − 0.013, SE = 0.02, z = − 0.64, p = 0.52) and proportion correct did not differ significantly between the first, second or third block of trials (Block 2 vs. Block 1: β = − 0.76, SE = 0.52, t = − 1.47, p = 0.14; Block 3 vs. Block 1: β = − 0.39, SE = 0.52, t = − 0.75, p = 0.45).

Discussion

Our primary goal was to evaluate the suitability of a browser-based lexical decision task as a measure of reading ability. Lexical decision is commonly used to interrogate the mechanisms of word recognition, and previous studies have shown: (a) differences in task performance in dyslexia, (b) changes in task performance over development and (c) relationships to various measures of reading ability. Thus, lexical decision was a strong candidate for an automated measure of reading ability. Results from Study 1 revealed that lexical-decision accuracy was a very accurate predictor of reading ability as measured by the WJ-Word-ID (disattenuated r = 0.94). Moreover, reliability analysis confirmed that the lexical decision task has excellent psychometric properties. Thus, by selecting the optimal list of words to span the continuum of reading ability, it might be possible to design a quick, valid and reliable measure of reading ability that could be administered to readers of all ages through the web-browser. To this end, we used IRT to select individual words with suitable measurement properties and created three short versions of the LDT that were matched in terms of difficulty. We validated this optimized version of the LDT (version 2) in a second study with an independent sample (Study 2) and confirmed the reliability and validity of this measure in children as young as 6 years of age. In the future, leveraging the IRT analysis to implement the LDT as a computer adaptive test would likely lead to an even more efficient and accurate measure of reading ability.

Through the process of optimizing the items to be included in the version 2 short-form test, we made four interesting observations that are worth noting (and all the data are publicly available for other researchers who may be interested in further item-level analyses). First, accuracy is a much better predictor of reading ability than RT, at least for children learning to read in English. Previous studies using RT have found statistically significant, but relatively weak relationships between RT and reading ability⁴⁷, a finding we replicate here (Results, page 3). RT is likely a useful measure for participants at the high end of the reading continuum and for children learning to read in transparent orthographies. Indeed, it is well established that accuracy quickly reaches a ceiling in transparent orthographies and individual differences are more pronounced in timed measures^61,62. Moreover, with a larger sample, RT might be useful for making more fine-grained discriminations of automaticity among skilled readers. Second, pseudowords are better at discriminating different levels of reading ability than are real words (as indicated by higher correlation with reading ability and more pseudowords being retained in the IRT analysis). This might reflect greater between-subject variability in performance on pseudowords compared to real words (Supplementary Fig. 2). But it is important to keep in mind that properties of the real words included in a LDT influence responses to pseudowords and vice versa^52,63,64. Third, we investigated the orthographic properties of the items that were selected by the IRT analysis. Retained real words were longer and had lower frequency of orthographic neighbors than removed real words, an effect that was not observed for pseudowords (see Supplementary Fig. 1). Finally, including items that span a wide range of difficulty (Figs. 2, 3) is critical to a lexical-decision based measure of reading ability.

In summary, we have demonstrated that a browser-based lexical decision task is an efficient, accurate, and reliable measure of reading ability, producing metrics that are highly consistent with results of standardized, in-person reading assessment. Selection of stimuli spanning a broad difficulty range is important for the psychometric properties of the LDT. We make the Rapid Online Assessment of Reading (ROAR) openly available for other researchers studying reading development (https://github.com/yeatmanlab/ROAR-LDT-Public).

Methods

Browser-based lexical decision task (version 1)

Data and analysis code are available at: https://github.com/yeatmanlab/ROAR-LDT-Public. A visual-presentation 2AFC lexical decision task (LDT) was implemented in the Psychopy experiment builder³⁴, converted to Javascript, and uploaded to Pavlovia, an online experiment platform⁶⁵.

The LDT was split into five blocks, each consisting of 100 trials (50 real words, 50 pseudowords) and each block was introduced as a quest in the magical world of Lexicality. To engage a wide range of subjects, the LDT was embedded in a fun story, animated environment, and a game that involved collecting golden coins. Instructions were narrated by a character in the game and there was a series of 10 practice trials with feedback to ensure that each participant understood the LDT. After a practice trial with an incorrect response, the character would return, explain why the response was incorrect, and remind the participant the rules of the LDT. This ensured that even the youngest participants understood the rules of the LDT and remained engaged through the duration of the task.

Prior to the five blocks of the LDT, subjects completed a simple response time (RT) task. This task was used as a baseline measure, in case there was too much noise in the response time data (e.g., due to different devices being used). In each trial of the simple RT task, participants saw a fixation cross, then a triangle or a square flashed briefly for 350 ms, followed by another fixation cross. Participants were instructed to use the keyboard to respond, by pressing “left arrow” to indicate that a triangle was presented, or “right arrow” to indicate that a square was presented.

The five blocks of the LDT followed a similar format: a word or pseudoword (Arial font) would flash briefly for 350 ms in the center of the screen, followed by a fixation cross. Participants pressed “right” for a real word or “left” for a pseudoword. Stimuli were presented in the center of the screen and scaled to be 3.5% of the participants’ vertical screen dimension.

The task was gamified in order to be more engaging for the younger participants. If the participants achieved ten consecutive correct answers, a small, one-second animation would play showing their party of adventurers defeating a group of monsters, with a small “+ 10 gold” icon appearing on top. After each block, participants unlocked new characters and were shown the amount of gold that they had collected during the last block.

Word and pseudoword stimuli (version 1)

250 real words and 250 pseudowords were selected to span a large range of lexical and orthographic properties, with the goal of finding stimuli spanning a large range of difficulty. The distribution of lexical and orthographic properties of the stimuli is shown in Supplementary Fig. 1. Pseudowords were matched in orthographic properties to real words using Wuggy, a pseudoword generator⁶⁶. Wuggy generates orthographically plausible pseudowords that are matched in terms of sub-syllabic segments, word length, and letter-transition frequencies. Matched real word/pseudoword pairs were kept within the same block, such that each block contained 50 real words and 50 pseudowords roughly matched in terms of orthographic properties and stimulus order was randomized. Additionally, 18 real words were paired with 18 orthographically implausible pseudowords to create some easier items to ensure that the LDT would not have floor effects for young children and those with dyslexia (e.g. “wndo”). These orthographically implausible words had low trigram frequencies and can be appreciated from the bump at the lower end of the trigram frequency distribution in the lower panel of Supplementary Fig. 1.

Participant recruitment and consent procedure (study 1)

Each adult participant provided informed consent under a protocol that was approved by the Institutional Review Board at Stanford University or The University of Washington and all methods were carried out in accordance with these guidelines. Each participant under 18 years of age provided assent and a parent or legal guardian provided informed consent.

Participants were recruited from two research participant databases: (1) The Stanford University Reading & Dyslexia Research Program (http://dyslexia.stanford.edu) and (2) The University of Washington Reading & Dyslexia Research Program (http://ReadingAndDyslexia.com). Each database includes children and adults who have (1) enrolled and consented (and/or assented) to being part of a research participant pool, (2) filled out extensive questionnaires on demographics, education history, attitudes towards reading and history/diagnoses of learning disabilities, (3) been validated through a phone screening to ensure accuracy of basic demographic details entered in the database. Many of the children and adults have participated in one or more study at Stanford University or the University of Washington and, as part of these studies, have undergone an in-person assessment session including standardized measures of reading abilities (Woodcock-Johnson IV Tests of Achievement, Test of Word Reading Efficiency—2), verbal abilities [Welschler Abbreviated Scales of Intelligence II (WASI-II), Vocabulary subtest] and general reasoning abilities (WASI-II, Matrix Reasoning subtest). Scores on standardized tests were analyzed if the testing had occurred within 12 months of completing the LDT task (scores were adjusted based on the expected improvement since the time of testing). A WASI IQ score less than 70 has been used as an exclusion criteria in previous studies with this subject pool and, therefore, subjects with low IQ were excluded from the present study (when scores were available).

Participants in the database were emailed a brief description of the study and a link to a digital consent form and the online LDT task. If participants provided consent, their LDT performance was linked to their assessment data in the research database (questionnaires and standardized tests). Participants also had the option to complete the online experiment and remain anonymous. A total of 120 children and adults between the ages of 6.27 years and 29.15 (M = 11.61, SD = 3.67) participated in Study 1. These subjects spanned the full range of reading abilities as measured by the Woodcock Johnson Basic Reading Skills (WJ-BRS) standard scores (M = 97.7, SD = 18.1, min = 57, max = 142) and TOWRE Index standard scores (M = 92.4, SD = 18.9, min = 55, max = 134). Many subjects were recruited for studies on dyslexia and, based on the WJ-BRS, 25% of the subjects met a typical criterion for dyslexia (standard score more than 1 SD below the population mean). 37.5% of subjects were below the 1 SD cutoff in terms of TOWRE Index. We treat reading ability as a continuous measure throughout as opposed to imposing a threshold that defines a group of participants as individuals with dyslexia. Demographic information for the participants can be found at: https://github.com/yeatmanlab/ROAR-LDT-Public/blob/main/data_allsubs/demographic_all_newcodes.csv

Data analysis (study 1)

All analysis code and data is publicly available at: https://github.com/yeatmanlab/ROAR-LDT-Public, with a README file that documents how to reproduce each figure and statistic reported in the manuscript. Data analysis was conducted in R⁶⁷. Correct/incorrect responses and log transformed response times (RTs) for each item were concatenated across subjects into a large table. Two subjects were identified as outliers and removed from further analysis based on the following procedure: First, median RTs were calculated for each participant. Then, participants were excluded if their median RT was more than 3 standard deviations below the sample mean. The two subjects who met this criterion performed the LDT at near chance accuracy level, suggesting that their extremely fast RTs were indicative of random guessing (lack of experiment compliance). These participants are displayed in the figures as open circles for reference. For the analysis of RT data, responses shorter than 0.2 s or longer than 5 s were removed. Then, quartiles and the interquartile range (IQR) of the RT distribution were calculated per participant. Responses that were longer than 3 times the IQR from the third quartile, or shorter than 3 times the IQR from the first quartile, were further excluded (see Refs.^68,69 for similar outlier exclusion procedures). These steps resulted in the exclusion of 4.6% of total trials. Analysis of RT data was conducted using the lme4 package in R^67,70.

Browser-based lexical decision task (version 2)

A shorter version of the browser-based LDT (252 trials in Study 2 vs. 500 trials in Study 1) was implemented based on the three, optimized 76-item lists that were designed in Study 1. Each list is presented in a block of trials and the order of the lists is randomized as are the order of the stimuli within each list (but stimuli were not mixed across lists). Otherwise, all the details of the experiment were maintained from version 1.

Twelve new words were selected from grade level text (1st and 2nd grade) in order to include easy vocabulary words for young elementary school children (and English language learners). Pseudoword pairs were created for each new real word by rearranging the letters and maintaining orthographic regularities (as in version 1). These new items (4 real words, 4 pseudowords) were added to each list for a final length of 84 items per list. The three stimulus lists for Study 2 are shown in Supplementary Table 1.

Participant recruitment (study 2)

N = 116 children and adults between the ages of 5y11m–42y7m were recruited through: (1) partnerships with local school districts, (2) research participant databases (http://dyslexia.stanford.edu), and (3) ongoing studies. One of the goals of Study 2 was to validate the Rapid Online Assessment of Reading (ROAR) LDT in young children (< 8 years of age). Thus, a specific effort was made to recruit participants who had just finished kindergarten (n = 20). Each participant who completed the ROAR was invited to participate in an assessment session conducted over Zoom in which we administered the Woodcock Johnson Word ID (n = 84). These data were used to confirm the results of Study 1, and validate version 2 of the ROAR in an independent sample.

References

Snowling, M. Dyslexia as a phonological deficit: Evidence and implications. Child Adolesc. Ment. Health 3, 4–11 (1998).
Article Google Scholar
Wagner, R. K. & Torgesen, J. K. The nature of phonological processing and its causal role in the acquisition of reading skills. Psychol. Bull. 101, 192–212 (1987).
Article Google Scholar
Vellutino, F. R., Fletcher, J. M., Snowling, M. J. & Scanlon, D. M. Specific reading disability (dyslexia): What have we learned in the past four decades?. J. Child Psychol. Psychiatry 45, 2–40 (2004).
Article PubMed Google Scholar
Pennington, B. F. From single to multiple deficit models of developmental disorders. Cognition 101, 385–413 (2006).
Article PubMed Google Scholar
Pennington, B. F. et al. Individual prediction of dyslexia by single vs. multiple deficit models. J. Abnorm. Psychol. 121, 212–224 (2012).
Article PubMed Google Scholar
Vidyasagar, T. R. & Pammer, K. Dyslexia: A deficit in visuo-spatial attention, not in phonological processing. Trends Cogn. Sci. 14, 57–63 (2010).
Article PubMed Google Scholar
Goswami, U. Sensory theories of developmental dyslexia: Three challenges for research. Nat. Rev. Neurosci. 16, 43–54 (2015).
Article CAS PubMed Google Scholar
Ramus, F. Developmental dyslexia: Specific phonological deficit or general sensorimotor dysfunction?. Curr. Opin. Neurobiol. 13, 212–218 (2003).
Article CAS PubMed Google Scholar
Bosse, M. L., Tainturier, M. J. & Valdois, S. Developmental dyslexia: The visual attention span deficit hypothesis. Cognition 104, 198–230 (2007).
Article PubMed Google Scholar
Joo, S. J., Donnelly, P. M. & Yeatman, J. D. The causal relationship between dyslexia and motion perception reconsidered. Sci. Rep. 7, 4185 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Joo, S. J., White, A. L., Strodtman, D. J. & Yeatman, J. D. Optimizing text for an individual’s visual system: The contribution of visual crowding to reading difficulties. Cortex 103, 291–301 (2018).
Article PubMed Google Scholar
Stein, J. F. The current status of the magnocellular theory of developmental dyslexia. Neuropsychologia 130, 66–77 (2019).
Article PubMed Google Scholar
Stein, J. F. & Walsh, V. To see but not to read; the magnocellular theory of dyslexia. Trends Neurosci. 20, 147–152 (1997).
Article CAS PubMed Google Scholar
Lovegrove, W. J., Bowling, A., Badcock, D. & Blackwood, M. Specific reading disability: Differences in contrast sensitivity as a function of spatial frequency. Science 210, 439–440 (1980).
Article ADS CAS PubMed Google Scholar
Hulme, C. The implausibility of low-level visual deficits as a cause of children’s reading difficulties. Cogn. Neuropsychol. 5, 369–374 (1988).
Article Google Scholar
Flint, S. & Pammer, K. It is the egg, not the chicken; dorsal visual deficits present in dyslexia are not present in illiterate adults. Dyslexia 0, 1–15 (2018).
Skottun, B. C. The magnocellular deficit theory of dyslexia: The evidence from contrast sensitivity. Vis. Res. 40, 111–127 (2000).
Article CAS PubMed Google Scholar
Skottun, B. C. The need to differentiate the magnocellular system from the dorsal stream in connection with dyslexia. Brain Cogn. 95, 62–66 (2015).
Article PubMed Google Scholar
Merzenich, M. M. et al. Temporal processing deficits of language-learning impaired children ameliorated by training. Science 271, 77–81 (1996).
Article ADS CAS PubMed Google Scholar
Tallal, P. Auditory temporal perception, phonics, and reading disabilities in children. Brain Lang. 9, 182–198 (1980).
Article CAS PubMed Google Scholar
O’Brien, G. E., McCloy, D. R., Kubota, E. C. & Yeatman, J. D. Reading ability and phoneme categorization. Sci. Rep. 8, 16842 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
O’Brien, G. E., McCloy, D. R. & Yeatman, J. D. Categorical phoneme labeling in children with dyslexia does not depend on stimulus duration. J. Acoust. Soc. Am. 146, 245 (2019).
Article ADS PubMed PubMed Central Google Scholar
Rosen, S. Auditory processing in dyslexia and specific language impairment: Is there a deficit? What is its nature? Does it explain anything?. J. Phon. 31, 509–527 (2003).
Article Google Scholar
Lieder, I. et al. Perceptual bias reveals slow-updating in autism and fast-forgetting in dyslexia. Nat. Neurosci. 22, 256–264 (2019).
Article CAS PubMed Google Scholar
Ahissar, M., Lubin, Y., Putter-Katz, H. & Banai, K. Dyslexia and the failure to form a perceptual anchor. Nat. Neurosci. 9, 1558–1564 (2006).
Article CAS PubMed Google Scholar
Perrachione, T. K. et al. Dysfunction of rapid neural adaptation in dyslexia. Neuron 92, 1383–1397 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gabay, Y., Thiessen, E. D. & Holt, L. L. Impaired statistical learning in developmental dyslexia. J. Speech Lang. Hear. Res. 56, 934–945 (2015).
Article Google Scholar
Elleman, A. M., Steacy, L. M. & Compton, D. L. The role of statistical learning in word reading and spelling development: More questions than answers. Sci. Stud. Read. 23, 1–7 (2019).
Article PubMed Google Scholar
Yarkoni, T. The generalizability crisis. psyarxiv.com (2020).
Button, K. S. et al. Power failure: Why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013).
Article CAS PubMed Google Scholar
Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 1–9 (2017).
Article Google Scholar
de Leeuw, J. R. jsPsych: A JavaScript library for creating behavioral experiments in a Web browser. Behav. Res. Methods 47, 1–12 (2015).
Article ADS PubMed Google Scholar
Pinet, S. et al. Measuring sequences of keystrokes with jsPsych: Reliability of response times and interkeystroke intervals. Behav. Res. Methods 49, 1163–1176 (2017).
Article CAS PubMed Google Scholar
Peirce, J. & MacAskill, M. Building Experiments in PsychoPy. (SAGE, 2018).
Anwyl-Irvine, A. L., Massonnié, J., Flitton, A., Kirkham, N. & Evershed, J. K. Gorilla in our midst: An online behavioral experiment builder. Behav. Res. Methods 52, 388–407 (2020).
Article PubMed Google Scholar
Bridges, D., Pitiot, A., MacAskill, M. R. & Peirce, J. The timing mega-study: Comparing a range of experiment generators, both lab-based and online. (2020).
Schrank, F. A., McGrew, K. S., Mather, N., Wendling, B. J. & LaForte, E. M. Woodcock-Johnson IV tests of achievement. (2014).
Gershon, R. C. et al. NIH toolbox for assessment of neurological and behavioral function. Neurology 80, S2–S6 (2013).
Article PubMed PubMed Central Google Scholar
Wilkinson, G. S. Wide range achievement test—revision 3. Vol. 20 (Jastak Association, 1993).
Wechsler, D. Wechsler individual achievement test—Second UK edition. The Psychological Corporation (2005).
Balota, D. A., Yap, M. J. & Cortese, M. J. Visual Word Recognition: The Journey From Features to Meaning (A Travel Update). 285–375 (Elsevier, 2006).
Seidenberg, M. S. & McClelland, J. L. A distributed, developmental model of word recognition and naming. Psychol. Rev. 96, 523–568 (1989).
Article CAS PubMed Google Scholar
Zoccolotti, P., De Luca, M., Di Filippo, G., Marinelli, C. V. & Spinelli, D. Reading and lexical-decision tasks generate different patterns of individual variability as a function of condition difficulty. Psychon. Bull. Rev. 25, 1161–1169 (2018).
Article PubMed Google Scholar
Dirix, N., Brysbaert, M. & Duyck, W. How well do word recognition measures correlate? Effects of language context and repeated presentations. Behav. Res. Methods 51, 2800–2816 (2019).
Article PubMed Google Scholar
Coltheart, M., Curtis, B., Atkins, P. & Haller, M. Models of reading aloud: Dual-route and parallel-distributed-processing approaches. Psychol. Rev. 100, 589–608 (1993).
Article Google Scholar
Martens, V. E. G. & de Jong, P. F. The effect of word length on lexical decision in dyslexic and normal reading children. Brain Lang. 98, 140–149 (2006).
Article PubMed Google Scholar
Katz, L. et al. What lexical decision and naming tell us about reading. Read. Writ. 25, 1259–1282 (2012).
Article PubMed Google Scholar
Di Filippo, G., de Luca, M., Judica, A., Spinelli, D. & Zoccolotti, P. Lexicality and stimulus length effects in Italian dyslexics: Role of the overadditivity effect. Child Neuropsychol. 12, 141–149 (2006).
Article PubMed Google Scholar
Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H. & Yap, M. Visual word recognition of single-syllable words. J. Exp. Psychol. Gen. 133, 283–316 (2004).
Article PubMed Google Scholar
Keuleers, E., Lacey, P., Rastle, K. & Brysbaert, M. The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behav. Res. Methods 44, 287–304 (2012).
Article PubMed Google Scholar
Balota, D. A. Word frequncy, repetition, and lexicality effects in word recongition tasks: beyond measures of Central Tendency. J. Exp. Psychol. Gen. 128, 32–55 (1999).
Article CAS PubMed Google Scholar
Ratcliff, R., McKoon, G. & Gomez, P. A diffusion model account of the lexical decision task. Psychol. Rev. 111, 159–182 (2004).
Article PubMed PubMed Central Google Scholar
Shaywitz, B. A. et al. Disruption of posterior brain systems for reading in children with developmental dyslexia. Biol. Psychiatry 52, 101–110 (2002).
Article PubMed Google Scholar
Kubota, E. C., Joo, S. J., Huber, E. & Yeatman, J. D. Word selectivity in high-level visual cortex and reading skill. Dev. Cogn. Neurosci. 36, 100593 (2019).
Article PubMed Google Scholar
Fletcher, J. M., Lyon, G. R., Fuchs, L. S. & Barnes, M. A. Learning Disabilities: From Identification to Intervention. (Guilford Press, 2006).
Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests. (University of Chicago Press, 1980).
Chalmers, R. P. et al. mirt: A multidimensional item response theory package for the R environment. J. Stat. Softw. 48, 1–29 (2012).
Article Google Scholar
Wu, M. & Adams, R. J. Properties of Rasch residual fit statistics. J. Appl. Meas. 14, 339–355 (2013).
PubMed Google Scholar
Torgesen, J. K., Wagner, R. & Rashotte, C. TOWRE 2: Test of Word Reading Efficiency. (Pearson Clinical Assessment, 2011).
McGraw, K. O. & Wong, S. P. Forming inferences about some intraclass correlation coefficients. Psychol. Methods 1, 30–46 (1996).
Article Google Scholar
Ziegler, J. C. & Goswami, U. Reading acquisition, developmental dyslexia, and skilled reading across languages: A psycholinguistic grain size theory. Psychol. Bull. 131, 3–29 (2005).
Article PubMed Google Scholar
Marinelli, C. V., Horne, J. K., McGeown, S. P., Zoccolotti, P. & Martelli, M. Does the mean adequately represent reading performance? Evidence from a cross-linguistic study. Front. Psychol. 5, 903 (2014).
Article PubMed PubMed Central Google Scholar
Rastle, K., Kinoshita, S., Lupker, S. J. & Coltheart, M. Cross-task strategic effects. Mem. Cognit. 31, 867–876 (2003).
Article PubMed Google Scholar
Kinoshita, S. & Mozer, M. C. How lexical decision is affected by recent experience: Symmetric versus asymmetric frequency-blocking effects. Mem. Cognit. 34, 726–742 (2006).
Article PubMed Google Scholar
Peirce, J. et al. PsychoPy2: Experiments in behavior made easy. Behav. Res. Methods 51, 195–203 (2019).
Article PubMed PubMed Central Google Scholar
Keuleers, E. & Brysbaert, M. Wuggy: A multilingual pseudoword generator. Behav. Res. Methods 42, 627–633 (2010).
Article PubMed Google Scholar
Team, R. C. R: A language and environment for statistical computing. (2013).
Beyersmann, E. et al. Morpho-orthographic segmentation without semantics. Psychon. Bull. Rev. 23, 533–539 (2016).
Article PubMed Google Scholar
Ferrand, L. et al. Comparing word processing times in naming, lexical decision, and progressive demasking: evidence from chronolex. Front. Psychol. 2, 306 (2011).
Article PubMed PubMed Central Google Scholar
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting Linear Mixed-Effects Models using lme4. (2014).

Download references

Acknowledgements

We would like to thank the Pavlovia and PsychoPy team for their support on the browser-based experiments. This work was funded by NIH NICHD R01HD09586101, research grants from Microsoft and Jacobs Foundation Research Fellowship to J.D.Y.

Author information

These authors contributed equally: Jason D. Yeatman and Kenny An Tang.

Authors and Affiliations

Division of Developmental-Behavioral Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
Jason D. Yeatman, Kenny An Tang, Maya Yablonski, Mahalakshmi Ramamurthy, Sendy Caffarra & Megumi E. Takada
Stanford University Graduate School of Education, Stanford, CA, USA
Jason D. Yeatman, Kenny An Tang, Maya Yablonski, Mahalakshmi Ramamurthy, Sendy Caffarra, Megumi E. Takada, Klint Kanopka & Benjamin W. Domingue
Institute for Learning and Brain Science, University of Washington, Seattle, WA, USA
Patrick M. Donnelly
Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
Patrick M. Donnelly
The Gonda Multidisciplinary Brain Research Center, Bar Ilan University, Ramat-Gan, Israel
Maya Yablonski & Michal Ben-Shachar
Department of Psychiatry and Behavioral Sciences, School of Medicine, Center for Interdisciplinary Brain Sciences Research, Stanford University, Stanford, CA, USA
Iliana I. Karipidis
Basque Center on Cognition, Brain and Language, San Sebastian, Spain
Sendy Caffarra
Department of English Literature and Linguistics, Bar-Ilan University, Ramat-Gan, Israel
Michal Ben-Shachar

Authors

Jason D. Yeatman
View author publications
You can also search for this author in PubMed Google Scholar
Kenny An Tang
View author publications
You can also search for this author in PubMed Google Scholar
Patrick M. Donnelly
View author publications
You can also search for this author in PubMed Google Scholar
Maya Yablonski
View author publications
You can also search for this author in PubMed Google Scholar
Mahalakshmi Ramamurthy
View author publications
You can also search for this author in PubMed Google Scholar
Iliana I. Karipidis
View author publications
You can also search for this author in PubMed Google Scholar
Sendy Caffarra
View author publications
You can also search for this author in PubMed Google Scholar
Megumi E. Takada
View author publications
You can also search for this author in PubMed Google Scholar
Klint Kanopka
View author publications
You can also search for this author in PubMed Google Scholar
Michal Ben-Shachar
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin W. Domingue
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.D.Y. and K.A.T. designed the study. K.A.T., P.M.D. and M.E.T. collected the data. J.D.Y., K.A.T., P.M.D., M.Y., M.R., I.I.K., S.C., K.K., M.B. and B.W.D. analyzed the data. All authors contributed to the writing of the manuscript.

Corresponding author

Correspondence to Jason D. Yeatman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yeatman, J.D., Tang, K.A., Donnelly, P.M. et al. Rapid online assessment of reading ability. Sci Rep 11, 6396 (2021). https://doi.org/10.1038/s41598-021-85907-x

Download citation

Received: 19 August 2020
Accepted: 08 March 2021
Published: 18 March 2021
DOI: https://doi.org/10.1038/s41598-021-85907-x

This article is cited by

Rapid online assessment of reading and phonological awareness (ROAR-PA)
- Liesbeth Gijbels
- Amy Burkhardt
- Jason D. Yeatman
Scientific Reports (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

On the relations between letter, word, and sentence-level processing during reading

The metrics of reading speed: understanding developmental dyslexia

Rapid online assessment of reading and phonological awareness (ROAR-PA)

Results: study 1

Lexical decision task performance and reading ability

Item response theory analysis of responses to individual words and pseudowords

Step 1: Remove words that are not predictive of overall performance

Step 2: Remove words that are not well fit by the Rasch model

Step 3: Examining item discrimination with the two parameter logistic model

Step 4: Test optimization

Results: study 2

Validation of optimized lexical decision task in young children

Validation of optimized lexical decision task in children and adults

Examination of stimulus list and block order effects

Discussion

Methods

Browser-based lexical decision task (version 1)

Word and pseudoword stimuli (version 1)

Participant recruitment and consent procedure (study 1)

Data analysis (study 1)

Browser-based lexical decision task (version 2)

Participant recruitment (study 2)

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Rapid online assessment of reading and phonological awareness (ROAR-PA)

Comments

Search

Quick links