Speech characteristics yield important clues about motor function: Speech variability in individuals at clinical high-risk for psychosis

Background and hypothesis: Motor abnormalities are predictive of psychosis onset in individuals at clinical high risk (CHR) for psychosis and are tied to its progression. We hypothesize that these motor abnormalities also disrupt their speech production (a highly complex motor behavior) and predict CHR individuals will produce more variable speech than healthy controls, and that this variability will relate to symptom severity, motor measures, and psychosis-risk calculator risk scores. Study design: We measure variability in speech production (variability in consonants, vowels, speech rate, and pausing/timing) in N = 58 CHR participants and N = 67 healthy controls. Three different tasks are used to elicit speech: diadochokinetic speech (rapidly-repeated syllables e.g., papapa…, pataka…), read speech, and spontaneously-generated speech. Study results: Individuals in the CHR group produced more variable consonants and exhibited greater speech rate variability than healthy controls in two of the three speech tasks (diadochokinetic and read speech). While there were no significant correlations between speech measures and remotely-obtained motor measures, symptom severity, or conversion risk scores, these comparisons may be under-powered (in part due to challenges of remote data collection during the COVID-19 pandemic). Conclusion: This study provides a thorough and theory-driven first look at how speech production is affected in this at-risk population and speaks to the promise and challenges facing this approach moving forward.


Full text of the Rainbow Passage
When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow.The rainbow is a division of white light into many beautiful colors.These take the shape of a long round arch, with its path high above, and its two ends apparently beyond the horizon.There is, according to legend, a boiling pot of gold at one end.People look, but no one ever finds it.When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow.Throughout the centuries people have explained the rainbow in various ways.Some have accepted it as a miracle without physical explanation.To the Hebrews it was a token that there would be no more universal floods.The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain.The Norsemen considered the rainbow as a bridge over which the gods passed from earth to their home in the sky.Others have tried to explain the phenomenon physically.Aristotle thought that the rainbow was caused by reflection of the sun's rays by the rain.Since then physicists have found that it is not reflection, but refraction by the raindrops which causes the rainbows.Many complicated ideas about the rainbow have been formed.The difference in the rainbow depends considerably upon the size of the drops, and the width of the colored band increases as the size of the drops increases.The actual primary rainbow observed is said to be the effect of superimposition of a number of bows.If the red of the second bow falls upon the green of the first, the result is to give a bow with an abnormally wide yellow band, since red and green light when mixed form yellow.This is a very common type of bow, one showing mainly red and yellow, with little or no green or blue.

Diadochokinetic Speech Samples
From each participant's two diadochokinetic speech samples, we automatically extracted: (1) the voice-onsettime for every syllable-initial voiceless stop consonant, (2) the duration of every vowel, (3) the first and second formants for every vowel (at 20% and midpoint of the vowel), as well as (4) the speech rate for each individual Diadochokinetic trial.
To do so, we used DDKtor (Segal et al., 2022; https://github.com/MLSpeech/DDKtor)to automatically obtain the VOTs and vowel durations, given hand-selected windows of analysis which corresponded to the individual Diadochokinetic trials.We then used FastTrack (Barreda, 2021; https://github.com/santiagobarreda/FastTrack) to obtain formants for every vowel that was identified by DDKtor.Finally, we obtained speech rate measures for each DDK trial directly from the DDKtor output.This was calculated as the number of syllables the participant produced in the trial divided by the amount of time it took them to produce it (i.e., the time elapsed from the start of the first syllable to the end of the last syllable, as found by DDKtor).We followed all recommendations and default settings given by the creators of the tools.In particular, following DDKtor recommendations, we excluded the shortest 2% and longest 5% of VOTs and vowel durations from our analyses and we double counted extremely long syllables (i.e., syllables with vowels that were at least twice as long as that speaker's average vowel duration) in speech rate calculations.FastTrack requires specifying a plausible range for the maximum analysis frequency value, which it uses to identify the best value below which it should search for F1-F5.Following FastTrack default recommendations, we set this range to 4500-6500 Hz for male speakers and 5000-7000 Hz for female speakers.
Although participants were asked to produce a specific number of syllables and trials, they sometimes deviated from this number.To ensure our measures were based on a sufficient, and relatively equal number of syllables/trials across participants, we excluded AMR (papapa/tatata/kakaka) trials in which participants produced fewer than 10 syllables (when they were supposed to produce 15) and SMR (pataka/katapa) trials in which participants produced fewer than 15 syllables (when they were supposed to produce 30).In addition, we excluded any extra trials participants produced (i.e., we capped the number of analyzed trials at 2 for each type of AMR trial -papapa, tatata, kakaka -and 10 for each type of SMR trial -pataka, katapa).

Read Speech Samples
From each participant's read speech sample, we extracted: (1) the voice-onset-time of every word-initial stop that preceded a vowel (voiced and voiceless), (2) the duration of every vowel with primary stress, (3) the first and second formants for all vowels with primary stress (at 20% and midpoint of the vowel), and (4) local speech rates.
To do so, we first obtained a transcript for each audio file, by manually editing the Rainbow Passage transcript to incorporate any participant speech errors or disfluencies.All meta comments on the task (e.g., "I think that's how you pronounce that' ') were removed.We then force-aligned the transcript to the audio files using the pretrained Montreal Forced Aligner (McAuliffe et al., 2017; https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner).
Given the output phone alignments, we obtained the VOTs of every word-initial stop consonant that preceded a vowel using AutoVOT, an automatic software for VOT detection (Adi et al., 2016; https: //github.com/mlml/autovot).To best fit our data, we retrained a separate voiceless and voiced classifier using ~100 manually-annotated voiceless VOTs and ~100 manually-annotated voiced VOTs directly from our data.AutoVOT requires a window of analysis for every stop: per AutoVOT recommendations, we used the stop consonant boundaries output by the Montreal Forced Aligner, but extended by 31ms in both directions for voiceless stops and extended by 11ms in both directions for voiced stops.We obtained vowel durations directly from the Montreal Forced Aligner output phone alignments, by extracting the duration of all relevant vowels.Finally, we obtained formants (F1 and F2 at 20% and 50%) of each relevant vowel, by applying FastTrack to all vowels bearing primary stress, as identified by the Montreal Forced Aligner.As with the DDK data, we set the frequency analysis range to 4500-6500 Hz for male speakers and 5000-7000 for female speakers.
Finally, we obtained a measure of the local speech rate of each phrase the participants produced.We followed Stuart-Smith et al. (2015) in defining local speech rate as the number of syllables per second in a phrase, where a phrase is any interval between silences that were at least 150ms long.

Spontaneous Procedural Description: Peanut Butter and Jelly
Just as for the read speech sample, we extracted: (1) the voice-onset-time of every word-initial stop that preceded a vowel, (2) the duration of every vowel with primary stress, (3) the first and second formants for all vowels with primary stress (at 20% and 50% of the vowel), and (4) local speech rates.The process for extracting these values was identical to that for the Rainbow Passage, with the exception that the speech transcript was created from scratch, rather than by editing an existing transcript that participants read.

Variability and overlap in vowel formants
In addition to the vowel duration measure discussed in the main text, we also analyzed variability in the first two vowel formants (the lowest two frequencies that have significant concentrations of energy around them; at a high-level, these measure what vowel was produced -e.g., /i/, /u/, etc.; Hillenbrand, 1995;Peterson & Barney, 1952).
• Formant dispersion at 20% of the vowel (all speech samples): The first measure of vowel formant variability was calculated in two-dimensional (Formant1, Formant2) space as follows: That is, we calculate the distance between each vowel token, i (i = 1...n) with formants F 1 i ,F 2 i ), and the center of the vowel space (calculated as the thodsdian F 1,F 2: F 1 m ,F 2 m ).We then average these n values to arrive at an average formant dispersion (Niziolek & Kiran, 2018).We chose to measure this early in the vowel (at 20% of its duration), as past work has shown that motor control issues are more likely to affect early production (i.e., before there is time to reach the target pronunciation).If anything, we would expect this choice to over-exaggerate differences in production related to motor control issues.
• Change in formant dispersion between 20% and 50% of the vowel (all speech samples): The second measure of vowel formant variability was calculated as the difference between the average formant dispersions at 20% and 50% of the vowels: ∆Formant Dispersion = Formant Dispersion at 20% − Formant Dispersion at 50% This measure provides a glimpse into how much the variability in formants changes over the course of the vowel, or how much the pronunciation needs to be adjusted to reach the target pronunciation by midpoint of the vowel (Niziolek & Kiran, 2018).Unlike other measures, this was not log-transformed in our analyses, as it can be negative or zero.
• Overlap in vowel categories (read and spontaneous speech samples only): With increased variability in how vowels are produced, we would expect to see increased overlap between different vowel categories.To test this, we studied how much overlap there was between vowel tokens from different categories, using a measure of phonetic competition (sometimes also referred to as "repulsive force"; McCloy et al., 2015;Wright, 2004;Xie & Myers, 2018).More specifically, this is calculated as the sum of inverse squared distances between all pairs of vowel tokens coming from different vowel categories: , where /i/ = /j/ where i and j represent two vowel tokens being compared, /i/ and /j/ represent their respective vowel categories, and F 1 i and F 2 i represent the first and second formants of vowel token i, respectively.At a high-level, phonetic competition measures how much different vowel categories overlap in their productions, with higher phonetic competition values corresponding to greater overlap.Because participants only produced one vowel type in the diadochokinetic speech task, this measure could only be calculated for the read and spontaneous speech samples.

Pausing and Timing Measures
• Coefficient of variation of syllable duration (diadochokinetic speech samples only): In addition to coefficients of variation at the consonant and vowel level individually, we also calculated the coefficient of variation over Consonant-Vowel syllable durations (e.g., "pa", "ta", "ka") in the Diadochokinetic speech task.
• Coefficient of variation of intersyllable duration (diadochokinetic speech samples only): Intersyllable duration is the amount of time between the end of one syllable and the start of the next.This provides a measure of how much variability there is in the duration of pauses in the Diadochokinetic Speech task (Lozano-Goupil et al., 2022).If there is low variability, then the participants are producing the syllables at a nearly constant rate.If there is high variability, then they are not.
• Total number of pauses (normalized by number of words produced; read and spontaneous speech samples only): We calculated the number of pauses, defined as any silence longer than 150ms.We divided this number by the total number of words the participant produced, to account for differences in total speech sample duration (this could differ even in the Read speech, as participants sometimes stumbled over words, repeated fragments, overlooked words, etc.).Note we excluded this measure for the Diadochokinetic Speech task, as the number of pauses was pre-specified by the nature of the task.Pausing has previously been studied in the high-risk group by Sichlinger et al. ( 2019) and Stanislawski et al. (2021).
Note: We also studied a number of other speech measures that we do not discuss or report detailed results for here.In particular, we also studied pairwise overlap measures between individual stop categories (e.g., /p/-/t/), finding null results, mirroring the vowel overlap measures presented here.With regards to validation measures (i.e., non-speech motor measures), we also studied the SMAP-R scale (Sensorimotor and Activity Psychosis-Risk score; Damme et al., 2021), but only had this measure for ~30 participants and found null results, mirroring the results reported here.Finally, in post-hoc exploratory analyses, we also studied mean and variance speech measures, which we report on in SM 2.4 and SM 2.5.

Inspection of automated tool performance on our data
The creators of the automated tools we use report high levels of reliability.However, to further validate these tools, we investigated the predicted speech measure values on our collected speech samples and compared them against established norms.Overall, we found that the predicted speech measure values matched expected average values, thus providing further evidence of the reliability of the automated tools we use.

DDKtor: Voice-Onset-Times
As reported in the main text, the DDKtor software matches human annotations of diadochokinetic segment duration with correlations of r=0.85-0.90 and matches human annotation of diadochokinetic speech rate with correlations of r=0.94-0.97.The plotted density distributions for voiceless voice-onset-times (output from DDKtor on our collected diadochokinetic speech samples; Figure S1) match expectations, and provide additional evidence of DDKtor's reliability.

AutoVOT: Voice-Onset-Times
As reported in the main text, the AutoVOT software, which predicts voice-onset-times, parallels interrater reliability rates, with ~90% of its predicted voice-onset-times are within 10-15ms of gold-standard human annotation.Figure S2 shows the predict voice-onset-times for vowels in our collected read (top) and spontaneous (bottom) speech samples.As with DDKtor, this plot provides further evidence of AutoVOT's reliability, as the density plots match expectations based on previously-established norm voice-onset-time values.FastTrack is reported to have an average error of ~20Hz and 98.9% of vowels have errors of less than 5% off from the human-annotated value.We inspected the performance of FastTrack on our data, by visualizing the extracted values and comparing the average value extracted from our data to the average formant values reported for American speakers by Hillenbrand et al. (1995).Figure S3 shows the predicted F1/F2 formant values by speech task, and

Correlations with clinical/motor/risk variables
Next, we correlated the measures that consistently differentiated the CHR vs HC groups (i.e., variability in voiceless stop voice-onset-times and variability in speech rates) against SIPS scores, finger tapping scores, and risk scores, but generally found no significant correlations in either the AMR or SMR speech samples.

Clinical High-Risk vs. Healthy Control group differences
Results are presented in Figure S7.In addition to the analyses reported in the main text, we found that voiced consonant production, vowel production, and pausing did not differentiate Clinical High-Risk vs. Healthy Controls in our data.In particular, none of the following speech measures significantly predicted group status in the Rainbow Passage task: coefficient of variation of voiced stop consonants /b, d, g/ (β = -0.03,s.e.= 0.07, t = -0.43,p = 0.667), coefficient of variation of vowel duration (β = 0.01, s.e.= 0.02, t = 0.82, p = 0.417), average formant dispersion at 20% of the vowel (β = 0.03, s.e.= 0.03, t = 0.99, p = 0.327), change in formant dispersion between 20% and 50% of the vowel (β = -0.08,s.e.= 2.75, t = -0.03,p = 0.977), overlap between vowel categories, calculated as the phonetic competition between different vowel types (β = 0.12, s.e.= 0.26, t = 0.46, p = 0.643), and number of pauses per word produced (β = 0.08, s.e.= 0.05, t = 1.51, p = 0.133).Figure S7: Other than those reported in the main text, we found no significant CHR vs HC group differences in speech measures in the reading task.This graph presents vowel and pausing/timing measures.Each black dot represents one participant; the white dot represents the average value across participants.

Correlations with clinical/motor/risk variables
Next, we tested whether the two speech measures that showed group differences (coefficient of variation in voiceless voice-onset-times and coefficient of variation in speech rate) correlated with clinical, non-speech motor, and risk variables.We predicted that increased variability in speech production would correlate with worse symptom severity, increased variability on other motor measures, and with a heightened risk of conversion to psychosis.However, just as in the Diadochokinetic speech task analyses, we found that these speech measures mostly did not correlate with motor, clinical, or risk measures.

Clinical High-Risk vs. Healthy Control group differences
Results are presented in Figure S10.In addition to the null results reported in the main text, we found that voiced consonant production, vowel production, and pausing did not differentiate Clinical High-Risk vs. Healthy Controls in the procedural description task data.In particular, none of the following speech measures significantly predicted group status in the Peanut Butter & Jelly task: coefficient of variation of voiced stop consonants /b, d, g/ (β = 0.15, s.e.= 0.11, t = 1.36, p = 0.177), coefficient of variation of vowel duration (β = 0.01, s.e.= 0.03, t = 0.36, p = 0.718), average formant dispersion at 20% of the vowel (β = -0.01,s.e.= 0.03, t = -0.3,p = 0.768), change in formant dispersion between 20% and 50% of the vowel (β = -2.44,s.e.= 3.48, t = -0.7,p = 0.484), overlap between vowel categories, calculated as the phonetic competition between different vowel types (β = 0.09, s.e.= 0.12, t = 0.76, p = 0.449), and number of pauses per word produced (β = 0.07, s.e.= 0.08, t = 0.84, p = 0.403).Because none of the acoustic speech tasks showed significant results, we did not test for correlations with non-speech motor/clinical/risk measures.Figure S10: We found no significant CHR vs HC group differences in any speech measures in the spontaneous procedural description task.This graph presents vowel and pausing/timing measures.Each black dot represents one participant; the white dot represents the average value across participants.

Results studying mean speech measures
The main text focused on coefficient of variation measures.In exploratory, post-hoc analyses, we also studied whether we observed group differences in mean speech measures.In particular, we tested whether there were CHR vs. HC group differences in average voice-onset-times, average vowel durations, average speech rates, and, in the diadochokinetic speech tasks, average syllable and intersyllable durations.Past work has found that individuals with and at-risk for psychosis speaker slower than healthy controls, so we might expect to find a similar effect here, as well as evidence that CHR participants produce longer consonants, vowels, syllables, and intersyllable durations.However, we found no evidence CHR vs. HC group differences in speech rate or consonant/vowel/syllable/intersyllable durations.We present these results by speech task.

Categorization accuracy results
For each of the speech measures that significantly differed between CHR vs. HC participants, we also run logistic regression models predicting CHR vs. HC status from the acoustic measure to test the categorization accuracy/discriminability of the speech measures.In contrast to previous results, we present these results by speech measure.

Voiceless voice-onset-time
We find that the degree of variability in the voice-onset-time of voiceless consonants predicts CHR vs. HC status with 65.38% accuracy in the Diadochokinetic-AMR subtask, 60.58% accuracy in the Diadochokinetic-SMR subtask, and 56.67% accuracy in the read subtask (with a 50% probability cut-off in the logistic regression).We present the full ROC curve (with sensitivity and specificity values) in Figure S17.While the categorization results are considered inadequate (Hosmer et al., 2013;Mandrekar, 2010), we would not expect speech measures on their own to be able to classify CHR vs. HC, so the fact that the measures perform above chance is in itself promising.This suggests that, combined with other measures, speech measures could potentially be useful in identifying individuals at high-risk for psychosis.

Speech rate
We find that the coefficient of variation in speech rate predicts CHR vs. HC status with 60.58% accuracy in the Diadochokinetic-AMR subtask, 56.73% accuracy in the Diadochokinetic-SMR subtask, and 56.67% accuracy in the read subtask (with a 50% probability cut-off in the logistic regression).We present the full ROC curve (with sensitivity and specificity values) in Figure S18.Again, the fact that the categorization is above chance is promising that speech measures could be helpful in identifying high-risk individuals, in conjunction with other complementary measures.

Combining speech and other measures
Finally, we test how combining speech measures with other motor measures does at categorizing group status.Specifically, we fit a logistic regression model that predicts CHR vs. HC status from a combination of the speech measures that were found to significantly differ by group (voiceless voice-onset-times and speech rate coefficients of variation) and finger-tapping coefficients of variation (dominant hand).We find that this combination of speech and non-speech motor measures predicts group status with 75.32% accuracy in the Diadochokinetic-AMR subtask, 66.23% accuracy in the Diadochokinetic-SMR subtask, and 70.11% accuracy in the read speech task (see Figure S19 for the ROC curve).With the exception of the Diadochokinetic-SMR subtask, this improves over a model that simply uses non-speech motor measures (i.e., finger-tapping coefficients of variation) to predict CHR vs. HC status, which achieves 66.23% accuracy on the Diadochokinetic-AMR subtask, 66.23% accuracy on the Diadochokinetic-SMR subtask, and 64.37% accuracy on the read speech task.Overall, this suggests that, when combined with other clinical and motor measures, speech measures has diagnostic value and this should be further studied in the future.

Results comparing the in-person and remote subgroups
Due to space constraints, the main text only presented some of the graphs comparing the subset of participants tested in-person prior to the pandemic and the subset of participants tested remotely during the pandemic.Here, we present the full set of results, in which we often observe quite large CHR vs. HC group differences in the In-Person subgroup and reduced group differences in the Remote subgroup.For each of the two speech measures that showed significant group differences overall, we tested whether they also showed significant group differences within both the in-person and remote subsets across the three tasks that showed significant group effects (i.e., Diadochokinetic-AMR, Diadochokinetic-SMR, and Read speech tasks).

Read speech task C
Figure S20: Comparison of variability in voiceless voice onset time between Clinical High-Risk (left bar in each subplot) and Healthy Control participants (right bar in each subplot), broken down by whether the participants were recorded in-person prior to the pandemic (left plot in each row) or remotely during (right plot in each row).(A) shows results for the Diadochokinetic-AMR (papapa/tatata/kakaka) speech task; (B) shows results for the Diadochokinetic-SMR (pataka/katapa) speech task; and (C) shows results for the Read speech task.Each black dot represents one participant; the white dot represents the average value across participants.Overall, we observe that starker group differences in the in-person subgroup, relative to the remote subgroup.Remote Testing Each black dot represents one participant; the white dot represents the average value across participants.Overall, compared to the variability in consonant production, results are qualitatively more similar across the in-person and remote groups (with the exception of the Diadochokinetic-SMR subtask, where we again observe a starker group difference in the in-person subgroup, relative to the remote one).

Analyses studying the relationship between speech measures and demographics
Past work has found that some measures intended to measure speech/language differences actually tapped into sociodemographic factors.We tested whether this explained the group differences in speech variability, by studying the relationship between the speech measures and all sociodemographic factors reported in Table 3 of the main text (age, sex, race, ethnicity (Hispanic vs. not), and first language), within the Diadochokinetic-AMR, SMR, and Read speech tasks (i.e., those that showed significant group differences).To do so, for each sociodemographic factor, for each subtask, for each speech measure, we fit a linear regression model predicting the speech measure from two predictors: the sociodemographic factor and group (CHR vs HC), without interactions.When studying variability in voiceless stop production, we additionally included a third predictor, average speech rate, as a control.Our results suggest that sociodemographics cannot explain the group differences in speech measure variability observed in the main paper.

Race
To study the potential role of racial identity, we focused our analyses on pairwise comparisons between (i) individuals who self-identified as Black, (ii) individuals who self-identified as Asian, and (iii) individuals who self-identified as white.This was done to ensure sufficiently-sized comparison groups.Even with this decision, our conclusions are still limited by sample size (e.g., some N's near 10), and future work should follow-up on this question using large enough samples that allow for fully-powered, and more nuanced, analyses.
Results are visually presented in Figure S24.Across all three tasks and all pairwise comparisons, we find no relationship between racial identity and variability in speech measures.In particular, within the pairwise comparison of Asian and white participants, we find no relationship between racial identity and either variability in voiceless voice-onset-times (AMR: β = 0.04, s.e.= 0.06, t = 0.59, p = 0.558, SMR: β = 0.07, s.e.= 0.07, t = 1.11, p = 0.272, Read: β = 0.03, s.e.= 0.04, t = 0.64, p = 0.527) or variability in speech We found no relationship between participant ethnicity (Hispanic vs.Not Hispanic) and speech measures in our sample.The top row shows results for the variability in voiceless consonant production measure, while the bottom row shows results for variability in speech rate measure.The leftmost column corresponds to the Diadochokinetic-AMR (papapa/tatata/kakaka) speech task; the middle column corresponds to the Diadochokinetic-SMR (pataka/katapa) speech task; the rightmost column corresponds to the Read speech task.Each black dot is one participant; the white dot is the average across participants.

Analyses studying the relationship between non-speech motor measures vs. symptom severity and risk of conversion
Past work has established that motor symptoms (e.g., finger tapping scores) correlate with symptom severity and risk of conversion scores in individuals at clinical high-risk for psychosis.To better understand why we did not observe correlations between speech measures and our validation measures, we checked for these relationships in our sample.Surprisingly, we found no correlations between finger tapping and symptom severity and risk of conversion scores (Figure S27).

Clinical High-Risk vs. Healthy Control group differences
We found qualitatively similar, though statistically different, group difference results to those reported in the main text.We found that the coefficient of variation of voiceless consonants significantly predicted CHR vs HC in the Diadochokinetic SMR ("pataka"/"katapa") speech samples, but was just above the significance threshold in the Diadochokinetic AMR ("papapa"/"tataka"/"kakaka") and Read speech samples.We found that the coefficient of variation of speech rate still significantly predicted group status (CHR vs. HC) in the Diadochokinetic AMR and Diadochokinetic SMR speech samples, but was just above the significance threshold in the Read speech samples.Finally, we additionally found that the coefficient of variation in intersyllable duration significantly predicted CHR vs HC in the Diadochokinetic SMR speech samples, but not the Diadochokinetic SMR speech samples.Because of the large number of graphs and their similarity to those reported in the main text, we just provide details on the statistical analyses in Table S2.As in the main text, we then tested whether these two speech measures were related to symptom severity, non-speech motor measures, as well as risk of conversion variables, as measures of clinical, convergent, and predictive validity.Given that we reported results for Read speech in the main text and its group difference results were just above the significance threshold with the two participants removed, we also report correlation results for Read Speech here.
The consonant results (coefficient of variation of voiceless VOTs) were qualitatively similar to those reported in the main text (Table S3): we found no significant relationships between the speech measure and any of the clinical/motor/risk variables examined in any of the speech samples, with the exception of one significant positive correlation between variability in consonant duration and SIPS-RC risk of conversion scores in the Diadochokinetic-AMR subtask only.The speech rate results were identical to those reported in the main text (Table S4).In particular, as in the main text, we still observed a significant positive relationship between variation in speech rate and variation in finger tapping in the non-dominant hand in the read speech samples.
We observed no other significant correlations between the speech measure and clinical/motor/risk variables.Overall, as in the main text, we generally failed to find evidence of clinical, convergent, and predictive validity of the studied speech measures.

Figure S1 :
Figure S1: Density distribution of voiceless stop voice-onset-times (VOTs) in the diadochokinetic (DDK) speech samples, as output from the DDKtor software.

Figure S2 :
Figure S2: Density distribution of voiceless and voiced stop voice-onset-times (VOTs) in the read (top) and spontaneous (bottom) speech samples, as output from the AutoVOT software.

Figure S4 :
FigureS4: CHR vs HC group comparisons for vowel and pausing/timing speech measures in the (A) Diadochokinetic AMR and (B) Diadochokinetic SMR subtasks.In addition to the significant results reported in the main text, we found that average formant dispersion at 20% of the vowel was significantly higher in the CHR group than the HC group in the Diadochokinetic-AMR subtask, but we observed no other significant group differences.Each black dot represents one participant; the white dot represents the average value across participants.

Figure S5 :
Figure S5: Variability in voiceless stop production vs. SIPS (symptom) scores, finger-tapping (motor) scores, and SIPS-RC risk of conversion scores in the (A) Diadochokinetic-AMR subtask and (B) Diadochokinetic-SMR subtask.Variability in voiceless stop production was positively correlated with risk scores in the AMR subtask, but showed no significant correlations with any other measures in either subtask.Each point represents one participant; the line represents the line of best-fit, with shaded regions showing standard errors of the regression fit.
Figure S6: Variability in speech rate vs. SIPS (symptom) scores, finger-tapping (motor) scores, and SIPS-RC risk of conversion scores in the (A) Diadochokinetic-AMR subtask and (B) Diadochokinetic-SMR subtask.Variability in speech rate showed no significant correlations with any validation measures in either subtask.Each point represents one participant; the line represents the line of best-fit, with shaded regions showing standard errors of the regression fit.

Figure S8 :
Figure S8: We observed no significant correlations between variability in voiceless voice-onset-time production vs. clinical/non-speech motor/risk variables in the Read speech task.Each point represents one participant; the line represents the line of best-fit, with shaded regions showing standard errors of the regression fit.
Figure S9: With the exception of finger tapping in the non-dominant hand, we observed no other significant correlations between variability in speech rate vs. clinical/non-speech motor/risk variables in the Read speech task.Each point represents one participant; the line represents the line of best-fit, with shaded regions showing standard errors of the regression fit.

Figure S14 :
Figure S14: We observed no significant group differences in mean speech measure values in either the (A) Diadochokinetic AMR or (B) Diadochokinetic SMR subtasks.Each black dot represents one participant; the white dot represents the average value across participants.
Figure S17: Specificity/sensitivity trade-off for the logistic regression categorization model predicting CHR vs. HC status from the coefficient of variation of voiceless voice-onset-time.
Figure S18: Specificity/sensitivity trade-off for categorization model predicting CHR vs. HC status from the coefficient of variation of speech rate.
FigureS19: Specificity/sensitivity trade-off for categorization model predicting CHR vs. HC status from both speech measures and non-speech motor measures (finger-tapping coefficient of variation in the dominant hand).

Figure S21 :
Figure S21: Comparison of variability in speech rate between Clinical High-Risk (left bar in each subplot) and Healthy Control participants (right bar in each subplot), broken down by whether the participants were recorded in-person prior to the pandemic (left plot in each row) or remotely during (right plot in each row).(A) shows results for the Diadochokinetic-AMR (papapa/tatata/kakaka) speech task; (B) shows results for the Diadochokinetic-SMR (pataka/katapa) speech task; and (C) shows results for the Read speech task.Each black dot represents one participant; the white dot represents the average value across participants.Overall, compared to the variability in consonant production, results are qualitatively more similar across the in-person and remote groups (with the exception of the Diadochokinetic-SMR subtask, where we again observe a starker group difference in the in-person subgroup, relative to the remote one).

Figure S22 :
Figure S22:We found no consistent relationship between speech measures and age in our sample.The top row shows results for the variability in voiceless consonant production measure, while the bottom row shows results for the variability in speech rate measure.The leftmost column corresponds to the Diadochokinetic-AMR (papapa/tatata/kakaka) speech task; the middle column corresponds to the Diadochokinetic-SMR (pataka/katapa) speech task; the rightmost column corresponds to the Read speech task.Each point represents one participant; the lines represent the lines of best-fit by participant group (CHR vs HC), with shaded regions showing standard errors of the regression fit.

Figure S23 :
Figure S23:We found no relationship between participant sex and speech measures in our sample.The top row shows results for the variability in voiceless consonant production measure, while the bottom row shows results for the variability in speech rate measure.The leftmost column corresponds to Diadochokinetic-AMR (papapa/tatata/kakaka) speech task; the middle column corresponds to the Diadochokinetic-SMR (pataka/katapa) speech task; the rightmost column corresponds to the Read speech task.Each black dot represents one participant; the white dot represents the average value across participants.
FigureS25: We found no relationship between participant ethnicity (Hispanic vs.Not Hispanic) and speech measures in our sample.The top row shows results for the variability in voiceless consonant production measure, while the bottom row shows results for variability in speech rate measure.The leftmost column corresponds to the Diadochokinetic-AMR (papapa/tatata/kakaka) speech task; the middle column corresponds to the Diadochokinetic-SMR (pataka/katapa) speech task; the rightmost column corresponds to the Read speech task.Each black dot is one participant; the white dot is the average across participants.
FigureS26: We found no relationship between participants' first language (English vs. Other) and speech measure variability.The top row shows results for the variability in voiceless consonant production measure, while the bottom row shows results for the variability in speech rate measure.The leftmost column corresponds to the Diadochokinetic-AMR (papapa/tatata/kakaka) speech task; the middle column corresponds to the Diadochokinetic-SMR (pataka/katapa) speech task; the rightmost column corresponds to the Read speech task.Each black dot is one participant; the white dot is the average across participants.

Table S 2
: Summary of Clinical High-Risk vs. Healthy Control group difference analysis outcomes with two clinical-high-risk participants excluded.Bold denotes a significant difference (p<0.05);italics denotes a trending difference (p<0.10).

Table S 2
: Summary of Clinical High-Risk vs. Healthy Control group difference analysis outcomes with two clinical-high-risk participants excluded.Bold denotes a significant difference (p<0.05);italics denotes a trending difference (p<0.10).

Table S 3
: Summary of correlational analyses relating coefficient of variation in voiceless VOTs to symptom, motor, and risk variables with two participants excluded from the analyses.Table S 4: Summary of correlational analyses relating coefficient of variation in speech rates to symptom, motor, and risk variables with two participants excluded from the analyses.