Psychological measures aren’t toothbrushes

Most psychological measures are used only once or twice. This proliferation and variability threaten the credibility of research. The Standardisation Of BEhavior Research (SOBER) guidelines aim to ensure that psychological measures are standardised and, unlike toothbrushes, reused by others.

PsycInfo is a large database of research in the social and behavioural sciences with over 5,000,000 peer-reviewed records, according to the APA.Among many other things, records have a field for "test and measures" where the names of measures used in an empirical study are recorded.If a record of a measure also exists in PsycTests, it is often (though not consistently) cross-referenced in this field.
We used the PsycTests database to generate a master list of measures used in psychological research by looking up tests using their specific test DOI.This way, different measures with the same generic name (such as "job satisfaction scale") could be distinguished.We then counted the publications using each measure in each year from 1993 to 2022.For Figure 1, we simply aggregated counts across years and plotted how often measures tended to be used.However, 27,000 tests recorded in APA's PsycTests database (56% of those published between 1993 and 2022) have no matches recorded in the PsycInfo database at all.Therefore, Figure 1 is likely undercounting the number of times a measure has only been used once.
For Figure 2A, we generated cumulative counts of distinct measures and constructs.In one specification, we excluded revisions and translations from our count of new measures, in another we included it.For Figure 2B, we computed a measure of fragmentation for each year.We computed fragmentation of measures and constructs.Lumping translations and revisions with the original measure on which they were based made little difference to the results.
Our measure of fragmentation is normalised Shannon entropy, sometimes referred to as H rel (Eq.1; Wilcox, 1973).The maximal Shannon entropy of a set of tests rises with the number of tests that were used at least once log(n).So as not to restate the result that the number of tests increases over time, we normalise entropy by log(n).The maximum is therefore 1 and would be reached if, in each year, all tests are used an equal number of times across publications.The minimum is 0 and would be reached if, in a given year, all publications used the same measure.

Nonredundancy
-Are correlations with other measures reported?-How were the other measures selected (e.g., based on face validity)?-Was the correlation analysis conducted on the same sample used for the focal analyses or on a separate validation sample?-Were correlation coefficients corrected for measurement error (e.g., through latent variable modelling)?-Does the correlations analysis show that the new measure has incremental validity over conceptually similar measures?

Protocol adherence
-Do the authors report having adhered to a published measurement procedure?-To what extent did the authors actually adhere to it (see measure modifications)?

Comprehensive Reporting
-Were all elements of the measurement procedure described/reported?-With respect to implementation, e.g., -The text of the questionnaire items -The code/software for generating stimuli -Instructions for participants -Other measurement characteristics (duration of experiment, decision rules, etc.) -With respect to scoring, e.g., -Sum vs. latent scores -Average reaction time vs. composite of reaction time and accuracy -Were factors allowed to correlate in the CFA? -Was measurement invariance accounted for?-Were summary statistics (i.e., means and standard deviations, not only standardised effect sizes) reported?

Measure modifications
-Did the authors report having made modifications to how each measure was previously used (or as specified by a published measurement protocol)?-Were modifications transparently described and justified?-Was validity evidence provided for the modified procedure?-Were sensitivity analyses reported?-Were the possible implications of modifications and comparability to other implementations discussed?

Preregistration and Registered Reports
-Was the study pre-registered?-If yes, does the pre-registration include a section on the measure(s) to be used?-If yes, does it include information about nonredundancy, protocol adherence, and modifications as outlined above?-If yes, were the deviations from the pre-registration transparently documented?