Testosterone therapy masculinizes speech and gender presentation in transgender men

Voice is one of the most noticeably dimorphic traits in humans and plays a central role in gender presentation. Transgender males seeking to align internal identity and external gender expression frequently undergo testosterone (T) therapy to masculinize their voices and other traits. We aimed to determine the importance of changes in vocal masculinity for transgender men and to determine the effectiveness of T therapy at masculinizing three speech parameters: fundamental frequency (i.e., pitch) mean and variation (fo and fo-SD) and estimated vocal tract length (VTL) derived from formant frequencies. Thirty transgender men aged 20 to 40 rated their satisfaction with traits prior to and after T therapy and contributed speech samples and salivary T. Similar-aged cisgender men and women contributed speech samples for comparison. We show that transmen viewed voice change as critical to transition success compared to other masculine traits. However, T therapy may not be sufficient to fully masculinize speech: while fo and fo-SD were largely indistinguishable from cismen, VTL was intermediate between cismen and ciswomen. fo was correlated with salivary T, and VTL associated with T therapy duration. This argues for additional approaches, such as behavior therapy and/or longer duration of hormone therapy, to improve speech transition.

www.nature.com/scientificreports/ and stable within-participant reductions in f o after a minimum of ~ 3-4 months on T therapy [32][33][34][35] . Accordingly, participants typically perceived their own voices to be lower in pitch 32 and participants with lower f o reported a higher likelihood of being perceived as male over the phone 35 . Although T therapy often plays a significant role in female-to-male transition-noted by f o values that are largely indistinguishable from those of cismen for many speakers 36 -not all individuals experience enough f o lowering to reach the typical range of cismen. A meta-analysis found that 21% of patients fail to reach the f o range of cismen after one year on T therapy 37 , by which time voice f o lowering has typically reached an asymptote regardless of dose regimen 38 . Not surprisingly, an estimated 12-16% of patients are not fully satisfied with their vocal transition 32,37 .
Fundamental frequency (f o ) is often reported to be the most heavily weighted cue for listeners in determining speaker gender identity 39,40 . However, studies of experimentally manipulated speech, in which f o and formant frequencies are varied independently, reveal that listeners rely on more than just f o when making judgments of speaker gender. For example, listeners can correctly identify ciswomen speakers as women even when their f o values are artificially lowered to the range of cismen 41 . When f o and formant frequencies are altered to the same perceptual degree (based on empirically derived just-noticeable-differences), listeners appear to rely more on formant information when making judgments about speaker body size, masculinity, and physical dominance 28,42 (cf. Hillenbrand & Clark 15 ). Thus, speech features other than f o may impact the externally perceived gender identity of individuals undergoing the female-to-male transition, and these additional features, such as formant frequencies or the related measure of VTL, can potentially be used as outcome measures to assess efficacy of transition strategies and increase patient satisfaction. Yet, aside from one case study 43 and an unpublished dissertation 44 , no studies on transgender men have investigated changes in formant measures with T therapy.
The present research has four central research questions. First, how important is masculinization of speech parameters relative to other traits for those undergoing a female-to-male transition? Second, is T therapy effective at masculinizing acoustic properties of speech that drive gender perception: f o (mean and variation, f o -SD) and formant-based estimates of VTL? Research suggests that both are critical to perception of gender; however, little research exists on formant changes in transmen. Third, do higher salivary T levels and a longer duration of T therapy contribute to more masculine speech parameters? Finally, how do transmen rate their satisfaction with speech changes compared to other changes that occurred during T therapy?

Results
How important is masculinization of speech parameters for transmen undergoing T therapy? Voice masculinity was rated by participants as one of the traits they were least satisfied with prior to transition compared with all other traits (See Fig. 1A); 77% of participants rated voice masculinity as a "1" or a "2" on the 1-to-7 scale (1 indicated that they were extremely unhappy with the trait). Furthermore, when asked to rank the importance of observing change relative to other traits, change in voice masculinity was ranked as most important (see Fig. 1B); 83% of participants ranked it as a "6" or "7" (7 indicated that it was very important to see changes).  Table 1); however, seven transmen (23%) had greater f o -SD than the highest value in the cismen group. Transmen's estimated VTL was significantly longer than ciswomen (Cohen's d = 1.0), but shorter than cismen (Cohen's d = 1.4). Further, 23% of the VTL estimates were smaller in transmen than the lowest value in our cismen sample. See Table 1 for the mean, standard deviation, and range of speech parameters for transmen, cismen, and ciswomen groups. See Fig. 2 for visual comparisons of the three groups.

Is T therapy effective
Do higher salivary T levels and a longer duration of T therapy contribute to more masculine speech parameters? Two individuals with very high T levels (2,889 and 794 pg/mL) were identified using Grubbs outlier test 45 and excluded from the following analyses. Salivary T was significantly correlated with lower mean f o (r = -0.37, p = 0.05), but not f o -SD (r = -0.14, ns) or VTL (r = 0.03, ns). See Fig. 3. Testosterone levels were significantly correlated with time on T therapy; individuals who have been on therapy for a longer duration had higher T levels (r = 0.39, p < 0.05). See Fig. 4. Therefore, multiple regression models were then used to examine the independent contributions of circulating T and time on T therapy to each masculine speech parameter. Salivary T significantly predicted lower f o (ß = -0.46, SE = 14.39, t = -2.23, p = 0.04), but T therapy duration did not (ß = 0.19, SE = 5.48, t = 0.90, p = 0.35). As a second step, we added the age at beginning of T therapy to the model. Only salivary T was associated with lower f o ; however, it did not reach conventional significance levels (ß = -0.38, SE = 14.53, t = -1.83, p = 0.08). In a separate model predicting VTL, T therapy duration was a significant predictor (ß = 0.46, SE = 0.25, t = 2.30, p = 0.03), while salivary T (ß = − 0.09, SE = 0.67, t = -0.41, p = 0.69) and age at beginning T therapy were not (ß = 0.28, SE = 0.03, t = 1.49, p = 0.15). The model predicting f o -SD was not significant.
How do transmen rate their satisfaction with self-perceived speech changes compared to other changes that occurred during T therapy? When asked to rate perceived amount of change following T therapy, voice masculinity was rated as the most changed among the surveyed traits with 72% of participants indicating that voice masculinity was "6" or a "7" on the 1-to-7 scale (7 indicated an enormous amount  www.nature.com/scientificreports/ of change observed in the trait). When asked to rate satisfaction with perceived changes, voice masculinity was rated the highest among all survey traits with 77% indicating a "1" or a "2" (1 indicated that they were extremely satisfied with the changes). See Fig. 5.

Discussion
Voice masculinization is particularly important to transgender individuals undergoing a female-to-male transition; compared with eight other masculinity traits, participants indicated that they were least satisfied with their voice prior to transition and ranked it highest in priority for seeing change. Further, after T therapy (which was effective at masculinizing f o and f o -SD in our participants), transmen were most satisfied with their vocal masculinization compared with other traits. Given its importance in a female-to-male transition and the growing number of individuals undertaking this treatment 46,47 , the need for evidence-based research on voice masculinization is high.
Our results show that, on average, T therapy is effective at masculinizing f o and f o -SD. Transmen's f o values (mean and range) are comparable to those of cismen and statistically significantly lower than ciswomen's f o values. While we do not have recordings of these men prior to T therapy, we can assume that their f o was close to the average for ciswomen and that their f o has since changed by 3.5 standard deviations (or more 28 ), which is nearly 80 Hz. Research suggests that 50% of listeners can detect shifts as low as 1.2 semitones (e.g., 7 Hz for a 100 Hz voice 42 ); therefore, these changes likely have a strong impact on perception of gender. Overall, the current findings are consistent with previous results documenting substantial changes in f o with T therapy in transmen [32][33][34][35][36] and are suggestive of putative anatomical changes resulting from the action of T on the lengthening and thickening of vocal folds, similar to those occurring during puberty in natal males [19][20][21][22][23][24] . To understand the nature of these   www.nature.com/scientificreports/ structural changes, future studies should use imaging techniques to objectively quantify vocal fold length and thickness at regular intervals during T therapy. T therapy may not be sufficient for achieving formant frequencies that are indistinguishable from cismen. Our results showed that transmen's estimated VTL was significantly longer than ciswomen but shorter than cismen. 23% of our participants' VTL fell outside the range of our cismen sample, suggesting that T therapy alone does not fully masculinize larynx position. Despite research indicating that both f o and VTL contribute to gendered voice perception [13][14][15][16] , only one other published study on transmen's voice changes has examined VTL or formants 43 . This motivates development of additional treatments, such as behavioral therapy, to increase objective speech masculinity by increasing vocal tract length 48 . Previous studies on transmen's speech changes have shown that most changes have occurred prior to 9 months of continuous T therapy [32][33][34][35]49 ; however, these studies did not examine changes in estimated VTL. This study is the first, to our knowledge, to demonstrate statistical differences in VTL between samples of transmen and cisgender speakers [see Cler et al. 43 for a single, detailed case study and Papp 44 for an unpublished dissertation].
Incomplete masculinization of VTL (as well as f o -SD) may partly explain why 17% of our participants reported that they were 'neutral' to 'extremely dissatisfied' with changes in their vocal masculinity. This accords with previous studies showing 12-16% of patients are not fully satisfied with their vocal transition and 25% were still sometimes perceived as female on the phone 32,37 . Further, 31% expressed interest in further masculinizing their speech through additional treatments like behavioral voice therapy 32 . Despite the need for behavioral voice therapy among transmen, only one published study has examined its efficacy 48 . This is in contrast to transfeminine individuals where it has been shown to help individuals express their gender identity through speech, reduce gender dysphoria, and improve mental health and quality of life [50][51][52][53] .
Some of the acoustic properties of speech that drive gender perception were associated with features of T therapy. We found a significant inverse association between salivary T and f o ; however, the results appear largely driven by 3 data points (see Fig. 3). Given the small sample size, it is unclear whether these individuals represent the normal range of variation. The association between current salivary T and f o makes theoretical sense given the longer-term association between T administration and f o change in transmen, the presence of androgen receptors on the vocal folds 54,55 , and the associations between T and f o during puberty [18][19][20][21][22][23] . In addition, several studies have found links between salivary T and masculine vocal parameters in cismen 25,26,28 (cf. Arnocky et al. 29 ), and one study of cismen showed within-individual diurnal decreases in salivary T were associated with increases in f o 27 . Given the strong empirical and theoretical support for an association between T and f o , it is surprising that two previous studies on female-to-male speech changes 35,36 did not find an association between serum T levels and f o .
Although there was not a significant association between salivary T and estimated VTL, T therapy duration was statistically significantly associated with VTL: longer T therapy durations were associated with longer estimated VTLs. This finding may suggest a longer-term relationship between T therapy and VTL. However, an alternative explanation is that this association reflects the confounding effect of time since transition given its close association with duration of T therapy. That is, even without formal voice training, transmen may be implicitly learning how to manipulate their vocal tract over time to achieve longer VTLs. Clinical studies on the relationships among dosing regimen, biological T availability, and speech parameters among transmen are necessary.
In summary, we see two important implications of these findings. First, a voice with a low pitch is a central aspect of masculine gender presentation because it is easily observable, highly sexually dimorphic, and difficult to approximate if not an adult male. Vocal fold dimorphism is one of the largest anatomical sex differences observed in humans (approximately 5 standard deviations 28,56 ) and greater than any other extant ape 57 . Cisgender men and women differ by 60% in vocal fold length 30 but only 8% in height 58 . Because vocal sexual dimorphism is extensive and, importantly, features little overlap in gender-typical vocal ranges, it is extremely difficult to speak in a voice consistent with the opposite sex, particularly in a sustained fashion 31 . These facts help explain why transmen are so dissatisfied with their f o prior to T therapy. Similarly, our participants were also highly dissatisfied with body fat distribution, which is also very dimorphic 58,59 , easily observable, and difficult to change without hormonal therapy.
A second implication of these findings is that more research on speech changes in transgender males is necessary. The studies that have been published are limited by small sample sizes [32][33][34]49 , a lack of a control group for comparisons 32,33,35 , and a focus on only f o 34,49 . Additional research on T dosing regimens as well as the efficacy of behavioral voice therapy are particularly necessary. Better evidence-based treatments for transmen have health and safety repercussions. Transgender individuals are disproportionately targets of violence and being viewed as one's gender is likely a critical component for safety [60][61][62][63] . Approximately 20-47% of transgender individuals have been physically or sexually assaulted and an additional 34-46% have been verbally threatened or harassed 62,64 .
In contrast to voice masculinization, participants did not place high importance on seeing an effect of T therapy on the non-physical trait "psychological masculinity", highlighting participants' dissociation between their own perception of gender and outward display of gender prior to therapy 65 . This incongruence is a source of extreme distress, which is associated with higher levels of depression, anxiety, substance abuse, and suicidal ideation and attempts among the transgender population-particularly those that have not begun to transition 5,7,9,[65][66][67][68] . Receiving hormone treatment significantly improves mental health, social health, and physical health outcomes in transgender populations 2,7,9,66 . Vocal congruence contributes to these improvements; Watt et al. 2 showed that more masculine voices significantly contributed to improved well-being and mental health in female-to-male transgender patients.
To summarize, this research was designed with several goals in mind. First, we aimed to quantify the importance of voice change-relative to other masculine traits-for transgender men undergoing testosterone therapy. No previous studies have explored this question, in spite of the strong interest in voice change among the transmasculine population. Our results show that voice masculinization is of central importance to transgender individuals undergoing the female-to-male transition compared with eight other masculine traits. Second, we www.nature.com/scientificreports/ asked whether T therapy was effective at masculinizing three gendered speech parameters. Our results show that, on average, T therapy is effective at masculinizing fundamental frequency mean and variation (f o and f o -SD); however, transmen's formant-based measure of vocal tract length (VTL) was significantly shorter than cismen. This study is the first, to our knowledge, to demonstrate statistical differences in VTL between samples of transmen and cisgender speakers. Third, we examined the association between salivary testosterone and vocal parameters. We found a significant inverse association between salivary T and fundamental frequency but no association with VTL. T therapy duration, however, was statistically significantly associated with VTL. These findings point to the need for more research on speech changes in transgender males-of particular importance are transition strategies that affect formant frequencies, which have largely been ignored in previous research. Voice Recording. All speakers were asked to recite, in their normal speaking voice, the first sentence of the Rainbow Passage, numbers 1 to 10, and vowel sounds in the order /ε/ /i/ /ɒ/ /o͡ ʊ/ /u/. Cis-and transmen's voice samples were collected in a quiet room using an Audio-Technica AT4041 Cardioid Condenser Microphone connected to a Focusrite Scarlett 2i2 audio interface. For the ciswomen sample, participants were asked to recite the same vowel sounds in the identical order described above. Voice samples were recorded in a sound attenuated room using a Neewer NW-700 condenser microphone with a 48 V phantom power supply and a pop filter. Participant voices were recorded using Goldwave version 6.31 software in mono with a sampling rate of 44.1 kHz and 16-bit quantization. The voice recordings were saved as high quality uncompressed wav files.

Method
For all samples, recordings were analyzed using Praat version 5.3 69 for mean f o and standard deviation in f o across the utterance (f o -SD) using the 'voice report' function. Pitch floor was set to 75 Hz and pitch ceiling was set to 300 Hz for cismen and transmen, and 100 Hz and 500 Hz for ciswomen, which are the recommended parameters for men's and women's voices, respectively. Otherwise, default settings were used. We used the Rainbow Passage for analyses involving f o and f o -SD; however, between-group comparisons (i.e., transmen vs. cismen and transmen vs. ciswomen) were similar when either vowels or counting were used for analyses. We also computed  70 Transmen were not significantly different from either ciswomen or cismen in f o -CV; therefore, we only report f o -SD below. In order to estimate VTL, formants were calculated using the acoustic recordings of the vowel sounds /ε/ and /ɒ/. All other vowels were omitted from VTL estimates due to narrow constrictions of the vocal tract (/i/) or lip rounding (/o͡ ʊ/ and /u/) that complicate the relationship between VTL and formant frequencies 71 . Praat was used to generate a wide-band spectrogram of the acoustic signal, which was then used to calculate the first four formants using the standard formant tracking software in Praat. For each vowel, these automated formants were visually inspected, and the settings of the tracking software were adjusted until the formants aligned with the spectral representation of the signal. Formant values were calculated over the central stable part of the vowel. Third (F3) and fourth (F4) formant values for each participant were calculated by averaging the values from both vowels. These formant values were then used to calculate VTL estimates via Eq. (1) 72 , which shows an inverse linear relationship between formants and VTL that is derived from modeling the vocal tract as a uniform tube that is closed at one end (i.e., the vocal folds) and open at the other (i.e., the mouth). In Eq. (1), n is the formant number, F n is the formant frequency (in Hz), and c is the speed of sound. Finally, VTL estimates from F3 and F4 are averaged for a single VTL estimate for each participant. F3 and F4 are used, because higher formants tend to be more stable and a better estimate of VTL 73 .
Saliva collection and analysis. Participants collected saliva samples in a 2-mL cryovial immediately upon waking the morning after they visited the lab to provide questionnaires and voice samples. Mean sample provision start time was 8:44 AM ± 1 h 59 min. Samples were refrigerated immediately after collection and then brought to the research team the following day where they were inspected using the Blood Contamination in Saliva Scale 74 . They were then stored in a -80 °C freezer until they were shipped overnight on dry ice for analysis. Samples were assayed in duplicate for free T using commercially available enzyme-linked immunoassay kits (DRG International, NJ, USA); average intra-and inter-assay coefficients of variation were 6.84% and 8.97%, respectively. For further detail on collection and assay of saliva samples, see Arnocky et al. 29  Data analysis. The following variables were log-natural transformed to address skew and increase normality: salivary T, time on T therapy, and f o . A univariate analysis of variance (ANOVA) was used to compare groups (transmen, cismen, and ciswomen) for each of the speech parameters (f o , f o -SD, and VTL). Initially, age was added to the model as a covariate and was then removed because it did not affect the analysis outcome. Linear regressions were used to examine associations between salivary T and speech parameters in transmen.