Introduction

Hearing impairment is defined as the “inability to hear specific frequencies at the appropriate intensity” (Elzouki et al., 2011, p. 602). This disability affects language development and imposes limitations on the individual’s social interactions and performance in the community (Kotby et al., 2008). Hearing impairment, one of the most prevalent sensory disabilities, is a serious abnormality that affects the production process and causes articulatory/speech disorders (Al-Dhoun, 2016). In 2012, the World Health Organization suggested that 360 million people in the world suffer from disabling hearing impairment, which means 5.3% of the world’s population suffers from hearing impairment (Mulwafu et al., 2016). Studies in the field of speech, hearing, and communication disorders have tackled several causes of hearing impairment such as cryptogenic deafness (complete deafness in one or two ears), infectious causes, middle ear disease, cerumen impaction, ototoxicity (functional impairment of the tissues of the inner ear caused by therapeutic agents), noise exposure, and other undetermined causes (Mulwafu et al., 2016). Permanent childhood hearing impairment is hereditary in 50% of the cases, nongenetic in 25% of the cases, and unknown in 25% of the cases (Korver et al., 2011).

Concerning levels of hearing impairment, several descriptive labels were endorsed in the audiology literature, although they have never been standardized (Clark, 1981). The exact cut-off points for each level vary slightly according to different audiologists. Following Goodman’s (1965) classification system, the scale of hearing impairment is summarized in Table 1 as the most widely recognized by audiologists.

Table 1 Hearing impairment scale (From Goodman, 1965).

Individuals with hearing impairment have difficulty perceiving sounds because of an abnormality in the anatomic section of the ear responsible for auditory function. Three types of hearing loss were reported: conductive hearing loss, sensorineural hearing loss, and mixed hearing impairment. The first type occurs when the conveyance of sound information is mechanically disordered through the external ear or/and middle ear. The second one occurs either when the hair cells in the inner ear fail to convey neural impulses within the cochlea or when there is a problem in transmitting the nerve down the acoustic division of the eighth cranial nerve. The third one involves damage to both the middle and inner ear structures (Elzouki et al., 2011). Consequently, individuals with hearing impairment may experience problems with vowel production and articulation due to the lack of auditory feedback (see Oster, 1990; Jafari et al., 2016).

Young people with hearing impairment may face difficulties in learning a spoken language, which can negatively impact their educational performance and social interactions (Kotby et al., 2008). Recently, several studies have investigated the vowel production of hearing-impaired speakers (e.g., Nicolaidis and Sfakiannaki, 2007; Löfqvist et al., 2010; Baudonck et al., 2011; Jafari et al., 2016a, 2016b). However, little is known about hearing-impaired speakers in Jordan. Concerning the speech production of Jordanian hearing-impaired speakers, Al-Dhoun (2016) has studied the phonetic and phonological disorders of four hearing-impaired (HI) children in North Jordan. She investigated syllable repair processes, including consonant deletion and gemination, syllable deletion, final consonant substitutions, fronting, glottal replacement, stopping, de-emphasis, and distortion. However, her analysis did not examine the phonetic and phonemic inventory of vowels. Therefore, the present study aims to address this gap in the literature by examining the phonetic and phonemic vowel inventory of JA-speaking children with hearing impairment.

Previous research has demonstrated that CI users have significantly better speech intelligibility than conventional HA users (e.g., Chin et al. 2003; Van Lierde et al. 2005; Horga and Liker, 2006; Kant et al., 2012). In addition, vowel articulation plays a crucial role in understanding the acoustic properties of speech (Vorperian and Kent, 2007). The present study, therefore, aims to compare the acoustical characteristics of vowel production among cochlear implant (CI), hearing aid (HA), and normal-hearing (NH) speakers of Jordanian Arabic (JA). Specifically, Praat software (Boersma and Weenink, 2022) will be used to acoustically analyze the following JA vowels: long closed front /i:/, long closed back /u:/, long open central /a:/, long mid-front /e:/, long mid-back /o:/, rising and fronting /aj/, and rising and backing /aw/.Footnote 1 The study will examine errors in both monophthongs and diphthongs. The findings of this research could provide valuable insights for language speech therapists in Jordan in deciding on treatment options for their patients. Studies on hearing impairment can raise awareness among hearing-impaired individuals and enhance their understanding, ultimately increasing their likelihood of successful treatment (Korver et al., 2011). Given the above discussion, this study attempts to answer the following research questions:

  1. 1.

    How do the acoustic characteristics of vowels produced by the CI, HA, and NH groups differ?

  2. 2.

    What is the F1–F2 planar area for the three groups (CI, HA, and NH)?

  3. 3.

    What types of errors do the participants make when producing long vowels and diphthongs?

Background

The Arabic-speaking population is diglossic, which means they speak two languages: colloquial Arabic (the commonly spoken language) and Modern Standard Arabic (which is utilized in formal speeches and writings). Modern Standard Arabic has a well-defined vowel inventory, whereas the number of vowels in colloquial dialects varies widely and is dependent on each dialect because there are probably a wide variety of such dialects with different typological classifications (e.g., Bedouin vs. sedentary, Eastern (mašriq) dialects vs. Western (maġrid) dialects, etc.). For details on the different classifications of the Arabic dialects, see Mashaqba (2015, pp. 25–45). This study considers JA as spoken in Amman, the capital city; for details on the linguistics of Ammani-JA, see Al-Wer (2007), Huneety et al. (2021, 2023), Mashaqba et al. (2020, 2021, 2022). The next subsection overviews the vowel inventory of the dialect under investigation.

Vowel inventory in Jordanian Arabic

Vowel inventory in JA is made up of 10 vowel phonemes: three short vowels (the open mid /a/, the close front /i/, and the close back /u/), five long vowels (the open mid /a:/, the close front /i:/, the close back /u:/, the mid-front /e:/ and the mid-back /o:/), and two diphthongs (the rising and fronting /aj/ and the rising and backing /aw/). Long vowels are also contrastive in quality with their short counterparts. The long vowel /a:/ is fronter in articulation than the short /a/, the long /i:/ and /u:/ are closer in articulation than the short /i/ and /u/. Noteworthy, the long vowels /e:/ and /o:/ are derived historically from the Classical Arabic diphthongs, /aj/ and /aw/, for instance, the Classical Arabic *bajt ‘house’ and *lawn ‘color’ are realized as /be:t/ and /mo:t/ in JA, respectively. The long vowels /e:/ and /o:/ are stable and do not fluctuate with /i:/ and /u:/. The contrast /e:/-/i:/ and /u:/-/o:/ is firmly established and there is no phonetic overlap between these pairs of vowels; see for instance words like gūm ‘Stand up!’ vs. gōm ‘people’, and dēn ‘debt’ vs. dīn ‘religion’ (see Mashaqba, 2015). However, some phonetic variation in vowel distribution is observed across dialects due to dialectical differences and phonological processes. For example, the short vowel /a/ is used in urban JA (e.g. katab ‘to write’), while the short vowel /i/ is used in Bedouin JA (kitab ‘to write’) due to a process of vowel raising in an open syllable. Although these phonetic differences in vowel distribution are not contrastive, they reflect the identity that distinguishes each dialect.

Formant frequencies

The acoustic description of a vowel depends on three main formant frequencies: the first [the lower] formant (F1), the second formant (F2), and the third formant (F3). F1 and F2 are extensively studied in the literature (Horga and Liker, 2006; Liker et al., 2007; Nicolaidis and Sfakiannaki, 2007; Hocevar-Boltezar et al., 2008; Baudonck et al., 2011; Kant et al., 2012; Jafari et al., 2016, among others). F3 is also important for speaker normalization and vowel identification (Vorperian and Kent, 2007). F1 refers to the height of the tongue indicating the up-down displacement, while F2 is related to the frontness and backness of the tongue indicating the posterior–anterior displacement (Vorperian and Kent, 2007; Ladefoged and Johnstone, 2010). In JA, front vowels have a high F2, back vowels have a low F2, and mid/central vowels have a high F2. Rounding the lips helps in lowering F2 (Mashaqba, 2015, p. 78). The fundamental frequency (F0) refers to “the lowest component frequency of the speech signal, generated by the vibration of the vocal folds” (Ogden, 2009, p. 175).

Natour et al. (2011) conducted measurements of formant frequencies in normal adult males, adult females, and children in JA, and compared their results with formant frequency measurements of other ethnicities. Figure 1 displays the average formant values for long vowels produced by JA-speaking children, as plotted on an acoustic vowel chart (Natour et al., 2011).

Fig. 1
figure 1

Acoustic characteristics of long vowels by JA-speaking children.

Vowel space area

Vowel production is limited within the vowel space area (VSA), an abstract area in the oral cavity that presents the possible configurations of the vocal tract that produce vowels (Jafari et al., 2016; Ogden, 2009). Acoustically, it is formed by the Euclidean space (distances) between the F1 and F2 of the corner vowels in the F1–F2 planar (Kent and Kim, 2003; Vorperian and Kent, 2007; Jafari et al., 2016). VSA was investigated in the speech of children with developmental disorders, such as stuttering, dysarthria, and hearing loss. Calculating the VSA is a useful metric and can provide a sensitive indicator of articulation development after cochlear implantation (Hocevar-Boltezar et al., 2008). Smaller VSAs are associated with reduced excursions of the articulators in terms of tongue height or advancement, while larger VSAs imply greater articulatory freedom (Neel, 2008). The literature reported a compression of VSA and a centralization of F1 and F2 in the vowels produced by individuals with hearing impairment, indicating a reduction of the phonological space (Horga and Liker, 2006; Liker et al., 2007; Ibertsson et al., 2008). Notably, VSA in individuals with cochlear implants is significantly smaller (and more frontal) than that in NH individuals (Ibertsson et al., 2008; Liker et al., 2007). Comparing normative VSA data with data on individuals with hearing impairment is valuable because there is a correlation between vowel space size development and speech intelligibility. Small VSA is an index of unintelligible speech or speech disorder because speech disorders are characterized by ‘reductions in articulatory working space’ (2008, p. 575). Remarkably, literature reported that CI children have a greater VSA after implantation as F1 has increased in /i/ and /u/ (Hocevar-Boltezar et al., 2008; Horga and Liker, 2006).

Vowels production in hearing-impaired speech

Hearing-impaired speech is characterized by a deficiency in consonant and vowel production (Al-Dhoun, 2016; Jafari et al., 2016). Regarding vowel production, CI users demonstrate better differentiation of vowels and greater intelligibility compared to hearing-impaired individuals who use conventional HAs. For the most part, the acoustic analysis indicated that CI users exhibit improved speech production skills and outperformed conventional HA users (Horga and Liker, 2006). The order of sound acquisition in hearing-impaired children’s speech production is similar to that in NH children, although the acquisition process is slower in hearing-impaired children (Blamey et al., 2001). Additionally, speech perception and production in hearing-impaired children using CIs are highly correlated after implantation (Blamey et al., 2001). Literature reported that acoustic characteristics produced by a large percentage (46–97%) of CI users are within the range of speech production of NH children (Uchanski and Geers, 2003).

A correlation between errors in articulatory and overall speech intelligibility is reported (Gold, 1980). Numerous studies have demonstrated that the speech of hearing-impaired children is typically less intelligible (John and Howarth, 1965; Markides, 1970; Monsen, 1976b; Gold, 1980). Intelligibility scores tend to increase as hearing loss decreases. Factors such as the age of deafness onset, degree of hearing loss, and efficiency degree of HAs use are important determiners of speech intelligibility in hearing-impaired individuals (Markides, 1970). Furthermore, studies have reported a significant correlation between language measures and the age of intervention in the oral language production ability of children with HI. Delayed intervention can have detrimental effects on speech production in HI individuals, even those with milder hearing impairment (Ramkalawan and Davis, 1992).

Research has shown that F2 transitions in the speech of hearing-impaired adolescents are reduced in time and frequency, potentially contributing to low speech intelligibility (Monsen, 1976a). After cochlear implantation, the prelingual children exhibited significant changes in their speech, particularly in the /u/ vowel, and a noticeable but not statistically significant change in the F1/F2 ratio of /i/ and /u/ vowels. These changes led to better phonological differentiation between vowels and an expansion of the VSA (Hocevar-Boltezar et al., 2008). CI users have a smaller and more frontal vowel space. There is a small but steady improvement in the speech production of CI users over time, with early implantation playing an important role in successful speech acquisition (Geers et al., 2008; Liker et al., 2007). Significant differences have been found in F1 and F2 of the vowel /e/ and in F3 of the vowel /u/. It was also found that CIs provide users with auditory feedback, enabling them to monitor their speech sound production (Kant et al., 2012).

Hearing-impaired children have shown a tendency toward greater inter-vowel distances and vowel space. They tend to over-articulate vowels as they lack auditory feedback and compensate for it with proprioceptive feedback (Baudonck et al., 2011). In HI children who use CIs, mean values of F1 have increased, and significant differences in the mean values of F1 in the vowels /i/ and /a/ and in F2 in the vowels /a/ and /o/ have been reported. These changes have led to a centralized vowel space for HI children using CIs. Increase in F1 value may be attributed to the tendency toward over-articulation in their vowel production (Jafari et al., 2016).

Table 2 summarizes numeric data from several studies that have examined F1 and F2 and (F0 and F3, when available) in both HI and NH individuals from different languages and ethnicities. The data in the table also includes information on vowel formant frequencies in NH children from Jordan (Natour et al., 2011), which is used for comparison with the results of the current study.

Table 2 Mean fundamental and formant frequencies (Hz) of the vowels.

Worthy to note that common errors in hearing-impaired speech involving vowels and diphthongs include diphthongization, neutralization, and substitution (e.g., Kotby et al., 1996; Markides, 1970; Smith, 1975; Mathur, 2019).

Methods and procedure

Participants

For this study, three groups of participants were recruited: 20 CI, 20 HA, and 20 NH participants, with an equal number of males (n = 10) and females (n = 10) in each group. All participants were native speakers of JA and resided in Amman with their families, who belonged to the lower or lower-middle classes with limited income. The participants were selected through convenience sampling. As reported in their audiological reports, the hearing-impaired participants had severe to profound hearing impairment, using either analog HAs or CIs. Participants were between 7 and 11 years old, with a mean age of 9 years and 3 months. According to reports from their school/center doctors, they had received CIs or HAs at an average age of 4 years and 7 months. They had no other health or sensory problems or mental disabilities. They were drawn from Al Nouras Speech & Language Center, and Al Amal School for Sign Language, an all-age school. The NH participants were drawn from several elementary schools in Amman. A speech pathologist confirmed that NH participants had no voice, articulation, or phonological disorders. Demographic information for the participants is presented in Table 3.

Table 3 Demographic characteristics of participants.

Data collection procedure

The study examined monosyllabic words with long vowels (e.g. /fi:l/ ‘elephant’) and monosyllabic/disyllabic words with diphthongs in medial and final positions (e.g. /ḍaww/ ‘light’ and /laj.mu:n/ ‘lemon’). A picture-naming task was primarily used to investigate long vowels and diphthongs. The participants were asked to identify familiar objects in the pictures, such as fruits, vegetables, animals, and body parts. In cases where a participant was unable to identify an object, the researchers provided an explanation. Also, participants were asked to name objects found on the recording site and to name friends and family members. Participants who were 7 years and older were asked to read a list of proper nouns containing vowels and diphthongs. The researchers used Praat software for spectral analysis and to measure the acoustic properties of the vowels.

The researchers focused on long vowels for their clear steady-state position and lack of merging with other vowels, allowing for convenient extraction of fundamental frequency and formant frequencies (Waffa and Refaat, 1998). All samples were recorded using a lapel microphone (a small microphone that attaches to speakers’ clothing) with a sensitivity of (−52 ± 5 dB) and frequency range (30 Hz–15,000 kHz) in a separate, quiet room at the school or therapy center to ensure high recording quality. Data were recorded directly onto Praat Software. Each participant had a recording session lasting for 4–6 min. In addition, participants were also asked to narrate a story to obtain some spontaneous speech samples.

Acoustic analysis

Formant frequencies

The participants were asked to produce each word twice and the clearest one was chosen. The researchers used Praat software (version 6.1.05), selected each word, and located the vowel using a visual inspection of the waveform. To ensure vowel stability, formant frequency values were extracted from the center of the vowel (midpoint) to avoid the influence of the neighboring consonants. Following Berisha et al. (2014), formant frequencies were obtained by calculation of 14-pole, 128-point linear predictive coding (LPC) spectra with 16 ms (256-point) hamming windowed frames. Spectral peaks were estimated using three-point parabolic interpolation of the LPC spectrum. LPC order was in the range of 10–16. The speech samples were downsampled to 16 kHz before the LPC analysis.

Vowel space

To determine the vowel space, the mean values of F1 and F2 values for the /i:/, /a:/, and /u:/ vowels were taken as representative formant values. These three vowels are the corner vowels in JA and represent extreme articulatory positions of the tongue. The vowel space was portrayed with F1 lying on the y-axis and F2 on the x-axis to provide a graphic representation to illustrate the articulation space (as will be shown in Fig. 6 in the section “Vowel space”). The researchers utilized a triangular VSA metric (henceforth VSA) to measure the size of the acoustic vowel space as a triangular (rounded V-shape) area defined by the corner vowels. The area of the resulting triangles in the CI, HA, and NH groups was calculated using the following equation (Liu et al., 2003; Skodda et al., 2012; Weisstein, 2014):

$$Area = \frac{{\left| {\left( {F1i: \times \left( {F2a: - F2u:} \right) + F1a: \times \left( {F2u: - F2i:} \right) + F1u: \times \left( {F2i: - F2a:} \right)} \right.} \right|}}{2}$$

Statistical analysis

To perform the statistical analysis, SPSS for Windows (version 21) was utilized. The mean and standard deviation for F0, F1, F2, and F3 of the JA long vowels /i:/, /e:/, /a:/, /u:/, /o:/ in the CI, HA, and NH groups were calculated. The results of the CI and HA groups were compared to those of the NH group. One-way repeated measure ANOVA was used to determine whether there were significant differences between the means of the groups, with a significance level set at p < 0.05. Also, a post hoc Tukey honest significant difference test was conducted to determine where the differences occurred between the groups.

Results

In this section, the acoustic and statistical values of vowels and diphthongs produced by the CI, HA, and NH groups are presented in six major focus areas: F0, F1, F2, F3, VSA, and errors involving diphthongs.

Measurements of formant frequencies among the CI, HA, and NH groups

To determine the significance of the difference in formant frequency measurements between the CI, HA, and NH groups, a one-way ANOVA test for dependent measures was conducted, as shown in Table 4.

Table 4 Formant frequencies (Hz) of long vowels among the CI, HA, and NH groups.

Table 4 displays the significant differences that were found among measurements of the five long vowels in the three groups. To determine which specific groups differed, a Tukey honestly significant difference post hoc test was conducted, and the results are presented in Table 5.

Table 5 Tukey HSD post hoc test.

F0

Table 4 shows that the F0 measurements of the CI group are very close to those of the NH group for all vowels, particularly for the long vowel /a:/. This is further illustrated in Fig. 2, where the mean values of /a:/ for the CI and NH groups are almost equal.

Fig. 2
figure 2

F0 in the CI, HA, and NH groups.

Table 4 reveals that high vowels in the CI and NH groups exhibit a higher F0 compared to low vowels; see Waffa and Refaat (1998) for similar results. However, the results for the HA group show that F0 for the long mid vowel /o:/ is higher than that for the long high vowels /i:/ and /u:/, while the F0 for the long mid vowel /e:/ is lower than that for the long low vowel /a:/. These results demonstrate that HA participants, unlike CI participants, may have difficulty to maintain the normal intrinsic F0 distinction between high and low vowels. Results also report significant differences in F0 for /o:/ and /e:/ between the CI and HA groups, and between the HA and NH groups. However, no significant differences are observed for the other three corner vowels: /i:/, /u:/, and /a:/. Similar results can be found in Boone (1966) and Monsen (1979).

F1

In the NH group, the F1 measurements of /i:/ and /a:/ represent the minimum and maximum, respectively. The order of the vowels in the NH group, from minimum to maximum F1, is /i:/, /u:/, /e:/, /o:/, and /a:/, a pattern also observed in the HA group. However, in the CI group, the order of the vowels from minimum to maximum F1 is /u:/, /i:/, /e:/, /o:/, and /a:/. Furthermore, the HA group exhibits a significantly higher F1 for /u:/ (p < 0.01), and a significantly lower mean value of F1 for /a:/ (p < 0.05). This decrease in F1 would be ascribed to the reduced jaw movements (see also Liker et al., 2007). These findings differ from those reported by Jafari et al. (2016), who found that the F1 of /a:/ in the hearing-impaired children was significantly higher than that of NH children. On the other hand, our findings align with Liker et al. (2007), who found that the F1 of /a:/ in hearing-impaired children was significantly lower than that of NH children. These findings support the notion that /a:/ requires minimal tongue movement and minimal auditory feedback (Ozbić and Kogovsek, 2008).

In comparison, the HA group exhibits higher F1 values for all other vowels compared to the NH group. This can be attributed to the notion that HA children are unable to produce vowels distinctly because of a lack of kinesthetic feedback, which refers to tongue position while producing the vowel (Jafari et al., 2016). This lack of distinction between vowels can result in an increase in F1 (Mahmoudi et al., 2011). As F1 is closely related to changes in jaw position and height (Lindblom and Sundberg, 1971), the exaggerated visual feedback received by hearing-impaired individuals during speech therapy may further contribute to the increase of F1 observed in the HA group (Jafari et al., 2016).

The F1 values for /i:/, /o:/, and /u:/ in the CI group are slightly higher than those of the NH group, while the F1 values for /e:/ and /a:/ are slightly lower in the CI group compared to the NH group. However, overall, the F1 values in the CI group are similar to those in the NH group, and no significant differences are observed between these two groups, as shown in Fig. 3.

Fig. 3
figure 3

F1 in the CI, HA, and NH groups.

F2

In the HA group, F2 was significantly lower (more back) in /i:/, /e:/, and /a:/ and significantly higher (fronter) in /u:/ and /o:/. In the CI group, F2 was lower in /i:/ and /e:/ compared to the NH group, with the decrease being statistically significant for /e:/ (p < 0.05). These findings are in consistent with those reported by Jafari et al. (2016) for /i:/ and /e:/. In addition, F2 was higher in /a:/, /o:/, and /u:/ compared to the NH group, with the increase being statistically significant for /u:/ (p < 0.01) and /o:/ (p < 0.01). These results are in line with those reported by Jafari et al. (2016) for /a:/ and /o:/, and with Horga and Liker (2006) and Baudonck et al. (2011) for /a:/.

The observed decrease in F2 for /i:/ and /e:/ in the CI individuals aligns with the findings of Subtelny et al. (1992), and with Boone’s (1966) assumption that hearing-impaired speakers maintain their tongue in a low-back position. Conversely, the CI and HA groups have higher F2 values for /u:/ and /o:/, which is consistent with the results reported by Subtelny et al. (1992) for CI individuals and Stein’s (1981) assumption that hearing-impaired individuals fronted the back vowels.

F2 values are clustering within the range of 1161–2211 Hz for the NH group, 1352–1997 Hz for the CI group, and 1401–1899 Hz for the HA group. This point indicates that the HA group has more limited horizontal tongue movements, which may be an indicator of vowel neutralization. This observation supports the notion that the HA group may tend to ‘produce all vowels approaching the pattern for the neutral vowel /ǝ/’ (Osberger and McGarr, 1982, p. 236). Previous research has consistently identified vowel neutralization as a common error in hearing-impaired speech (Markides, 1970; Smith, 1975; Monsen, 1976b; Jafari et al., 2016; Mathur, 2019). Noteworthy, although significant differences in F2 values were observed between the CI and NH groups for three vowels in our study, the F2 values in the CI group still remain closer to those in the NH group than the HA group, as shown in Fig. 4.

Fig. 4
figure 4

F2 in CI, HA, and NH groups.

Auditory feedback plays a crucial role in vowel production as it provides sensory information that helps individuals locate their tongue position in the mouth. This feedback enables NH children to improve their muscle control during the production process (Poissant et al., 2006). The present study demonstrates that F2 values reveal more significant differences than F1 values in this comparison between hearing-impaired and NH speakers. This finding can be attributed to the reduced perception of F2 among hearing-impaired individuals. Hearing-impaired speakers have difficulty to perceive F2 because of the difficulty in perceiving the posterior–anterior displacement of the tongue in the oral cavity. In contrast, F1 variation is more noticeable for individuals with hearing impairments because the up-down displacement of the tongue (which is related to jaw movements) is easier to perceive (Nicolaidis and Sfakiannaki, 2007).

F3

The F3 measurements for the CI group are similar to those of the NH group, particularly for /u:/ and /e:/. Also, Fig. 5 shows that the NH group has greater variations than the CI and HA groups. The one-way ANOVA test results indicate significant differences between the HA and NH groups for /i:/, between the CI and HA and NH groups for /e:/, between the HA and NH groups for /a:/, and between the CI and HA groups for /u:/. However, no significant differences were observed between the CI and NH groups.

Fig. 5
figure 5

F3 in the CI, HA, and NH groups.

Vowel space

Figure 6 displays the vowel spaces for the CI, HA, and NH groups. The researchers calculated the F1–F2 planar area for /i:/, /u:/, and /a:/ as they have the most extreme formant frequency values among JA vowels (see section “Informed consent”).

Fig. 6
figure 6

Vowel production in F2–F1 space for CI, HA, and NH groups.

The F1–F2 planar area is 97.309 (Hz2) for the CI group, 43.446 (Hz2) for the HA group, and 220.812 (Hz2) for the NH group. There is evidence that the distances in F1 and F2 between /i:/, /u:/, and /a:/ are shorter in the HA group than the CI group, and shorter in the CI group than the NH group. Consequently, the vowel space is vertically and horizontally more compressed in the HA group. This lack of expansion in the VSA can be ascribed to the lack of excursions of the articulators in terms of tongue height or advancement (Neel, 2008). In essence, results of the F1–F2 planar area demonstrate that the vowel space in the NH group is larger than that in the CI and HA groups, indicating a reduction in articulatory space (see also Angelocci et al., 1964), and a lack of differentiation between different vowels’ distances in the hearing-impaired speech, which is “characterized by reductions in articulatory working space” (Neel, 2008, p. 575). However, it is worth noting that the vowel space in the CI group is still greater than that in the HA group, confirming that the age at implantation affects the outcome (in this study IC implantation age was 2;5–2;6). This finding suggests that the articulatory movements in the HA group are smaller than those in the CI group.

The findings support our hypothesis that the CI group’s results are more similar to those of the NH group due to their ability to exercise auditory control over their own and others’ speech production, leading to improved speech control. As such, the CI group was reported to develop genuine listening skills as the cochlea performs a sound analysis necessary for cortical perception of speech. Intriguingly, the effect of auditory feedback on speech is reflected in improved voice and pronunciation after three months of implantation (Horga and Liker, 2006, p. 211), enabling the CI group to differentiate between distinctive vowels and expanding their VSA. Similar results have been reported cross-linguistically (e.g., Geers et al., 2003; Jafari et al., 2016).

Errors involving long vowels and diphthongs

The present study identified instances of monophthongization (1a–1e) and raising (1f, 1g), with a particular focus on monophthongization.

(1) NH production

Hearing-impaired production

Gloss

(a) lay.mu:n

le:.mu:n

‘lemon’

(b) kaw.θar

ko:.θar

Proper name (F)

(c) lay.la

li:.la

Proper name (F)

(d) mays

mi:s

Proper name (F)

(e) qaws

ka:.wis

‘arch’

(f) be:t

bi:t

‘house’

(g) fo:ʔ

fu:ʔ

‘above’

A diphthong is reserved for sequences of a low vowel followed by the semivowel /w/ or /j/, as in rawdˁa ‘kindergarten’ and laymu:n ‘lemon’(Mashaqba, 2015). In JA, a diphthong may undergo monophthongization, a process in which a diphthong is reduced or coalesced into a monophthong. Monophthongization was found to be the most common error in the vowel system of hearing-impaired speech in this study. This phenomenon has also been reported in previous literature for other languages, such as Kannada (Ramadevi, 2006) and Hindi (Mathur, 2019).

The data presented above indicated that the rising-fronting diphthong /ay/ and the rising-backing diphthong /aw/ are pronounced as the long mid vowels /e:/ and /o:/, respectively. In some cases, these long mid vowels further developed into the long closed vowels /i:/ and /u:/, respectively.

Figure 7 highlights the shift from the open central vowel /a/ shifts to a mid-closed back semivowel /w/ in the spectrogram of the word /kaw.θar/ ‘proper name’. This shift is indicated by the decrease in F1 from /a/ to /w/, with F1 of /a/ being higher than F1 of /w/, indicating that /a/ is more open than /w/. F2, which reflects the front-back status of the vowel, is higher in /a/ than in /w/, indicating the shift from central to back. The spectrogram in Fig. 8 highlights the shift from the open central vowel /a/ to the mid-closed front vowel /y/. This shift is indicated by the decrease in F1 from /a/ to /y/, with F1 of /a/ being higher than F1 of /y/, indicating that /a/ is more open than /y/. In contrast, F2 is lower in /a/ than in /y/, indicating the shift from central to front. The production of monophthongized vowels /o:/ and /i:/ by hearing-impaired speakers does not involve any transition between F1 and F2, unlike diphthongs the /aw/ and /ay/ (see Figs. 9 and 10). In the case of monophthongized vowels, F1 and F2 appear in parallel lines with almost the same distance apart, whereas in diphthongs F1 and F2 are close together in the onset and drift wide apart in the offset.

Fig. 7
figure 7

Spectrogram of /kaw.θar/ ‘proper name’ showing /aw/ diphthong by NH child.

Fig. 8
figure 8

Spectrogram of /lay.mu:n/ ‘lemon’ showing /ay/ diphthong by NH child.

Fig. 9
figure 9

Spectrogram of /ko:.θar/ ‘proper name’ showing monophthongized long mid /o:/ by a hearing-impaired child.

Fig. 10
figure 10

Spectrogram of /li:.mu:n/ ‘lemon’ showing monophthongized long closed /i:/ by a hearing-impaired child.

Conclusions, clinical implications, and recommendations

Acoustic analysis was employed to detect subtle differences in the vowel production of hearing-impaired speakers. The results showed significant differences in F0, F1, F2, and F3 between hearing-impaired and NH speakers. It was concluded that HA users fail to vary their tongue position along the high-low dimension (F1) and the front-back dimension (F2), which is thought to be due to the lack of auditory feedback. In contrast, CI users have difficulty varying their tongue position only along the front-back dimension (F2), which is attributed to the reduced perception of F2 among hearing-impaired speakers in general, as a result of the lack of visual feedback on the tongue position along the front-back axis in the oral cavity.

According to the results of the F1–F2 planar area, the vowel space of the HA group was found to be less expanded than that of the CI group, as evidenced by values of 97.309 (Hz2) for the CI group, 43.446 (Hz2) for the HA group, and 220.812 (Hz2) for the NH group. These results confirm that vowel production in JA speakers differs significantly between the CI, HA, and NH groups and that the CI group performs better than the HA group, with their results being closer to those of the NH group. Additionally, the study observed that hearing-impaired speech is characterized by a change in vowel quality or formants values, with CI users demonstrating better articulation and differentiation of distinctive vowels than their HA counterparts. This difference may be attributed to the auditory feedback that CI users receive after implantation, which allows them to monitor their speech sound production.

The findings of this study have important implications for linguistics studies and related areas, particularly in the domain of phonetics and phonology. The results suggest that cochlear implantation can lead to improvements in the perception and production of speech sounds, and can help to restore the natural vowel space that is found in NH individuals. The findings of this research can contribute to the development of effective interventions for hearing-impaired indivisibles. Speech therapy clinicians should focus more on the up-down displacement of the tongue which is associated with jaw movements, in their therapy sessions with HA users to help them maintain F1 values that are similar to those of NH speakers. For CI and HA users, speech therapists should consider tracing the acoustic cues of vowel production during therapy sessions, particularly regarding F2, until the results are statistically similar to those of NH speech. Future work should focus on the temporal duration of vowels produced by hearing-impaired children using a larger sample from different regions in Jordan. Additionally, future research should investigate the impact of gender on the vowel production of hearing-impaired individuals.