Evaluation of ASR transcription performance compared to the human-generated reference transcription. Each box denotes the 25th and 75th percentile. Box center-lines denote the median. Whiskers denote the minimum and maximum values, excluding any outliers. Outliers, denoted by diamonds, are defined as any point further than 1.5× the interquartile range from the 25th or 75th percentile. Sample sizes are listed in Table 2. NS not significant means the difference is not statistically significant. a Comparison of word overlap (i.e., word error rate). Lower word error rate is better. b Comparison of semantic similarity (i.e., semantic distance). Lower semantic distance is better.