Abstract 91 Poster Session III, Monday, 5/3 (poster 364)

INTRODUCTION: Given the vast array of testing tools used for measuring ability or achievement in the pediatric behavioral sciences, Item Response Theory (IRT) has potential for possible advantages over Classical Test Theory (CTT). The major assumption of all test theory is that measurement of psychological variables contain errors and that the observed score, derived from observed responses, is an imprecise indicator of the underlying trait. The goal of testing is to obtain an accurate measure of true ability for the content domain tested. Coefficients of reliability, test indices, and true score estimates used in CTT are sample dependent and may vary substantially among samples of patients. An assumption of CTT is that measurement errors are uncorrelated with ability. IRT takes into account the fact that measurement precision may vary with patient's ability levels. In IRT, item parameters are said to be invariant if accurate parameter estimation can be derived. The major restriction to wide spread applicability is that as the number of parameters to be estimated increases, a larger sample size and test length is needed for reasonably accurate estimation to occur. XCALIBRE (1997), a software program developed strictly for IRT applications, offered promise to the parameter estimation procedure by employing both Bayesian and Marginal-Maximum Likelihood techniques. If successful, stable estimates could be obtained with fewer patients and shorter test lengths and greatly expand usage of this promising theory. METHODS: This study was monte carlo in nature and designed to investigate parameter estimation efficacy in small data sets by fixing the "c" parameter in two 3PL models while leaving one 3PL model unmodified. Smaller sample sizes (N=50, 100, 200 and 500) and test lengths (n=25, 50, 75, and 100) were compared across varied test conditions and distributions of ability. Estimated parameter values were compared to generated true parameter values and evaluated using a product-moment correlation, a root mean squared error (RMSE), and an averaged bias index. RESULTS: Only the data set with 500 patients and 50 evaluation items provided results comparable to the industry standard of 1000 patients and 50 evaluation items. Overall, models were more similar than different in recovery. Correlations for item difficulty ranged in the low to mid 90s. Ability parameter correlations were near perfect at 0.99 to 1.00. However, correlations for item discrimination were lower and ranged from 0.50 to 0.60. Estimation of "c" was very problematic in the unmodified model with very small and widely varied correlations. CONCLUSION: Holding the lower asymptote constant did not contribute better recovery in smaller data sets. Bayesian priors did not improve the recovery of "c" in the unmodified model. Item discrimination still moderately estimated. "Artificial" changes are not the answer in the 3PL model when faced with small sample sizes.