Numerical investigation of effects of incisor angle on production of sibilant /s/

The effects of the inclination angle of the incisor on the speech production of the fricative consonant /s/ was investigated using an implicit compressible flow solver. The hierarchical structure grid was applied to reduce the grid generation time for the vocal tract geometry. The airflow and sound during the pronunciation of /s/ were simulated using the adaptively switched time stepping scheme, and the angle of the incisor in the vocal tract was changed from normal position up to 30°. The results showed that increasing the incisor angle affected the flow configuration and moved the location of the high turbulence intensity region thereby decreased the amplitudes of the sound in the frequency range from 8 to 12 kHz. Performing the Fourier transform on the velocity fluctuation, we found that the position of large magnitudes of the velocity at 10 kHz shifted toward the lip outlet when the incisor angle was increased. In addition, separate acoustic simulations showed that the shift in the potential sound source position decreased the far-field sound amplitudes above 8 kHz. These results provide the underlying insights necessary to design dental prostheses for the production of sibilant fricatives.


Scientific Reports
| (2021) 11:16720 | https://doi.org/10.1038/s41598-021-96173-2 www.nature.com/scientificreports/ formed by the geometry downstream of the constriction. However, the effects of geometrical differences resulting from dental prosthesis, e.g., the incisor positions and angles, on the flow and sound generation are still unclear. Therefore, in this study, we conducted numerical flow simulations of the vocal tract geometry of /s/ at different incisor angles to examine the origin of sound changes using the inclination angles of the incisor reported by Runte 1 . To explore the cause of the sound changes, both the airflow for /s/ and the sound in the vocal tract geometry were predicted using numerical simulations solving the three-dimensional compressible Navier-Stokes equation [18][19][20] . One difficulty with numerical flow simulations is maintaining high-quality computational grids for the complex flow channel in the vocal tract geometry. To reduce the grid generation time for the vocal tract geometry, we applied the hierarchical structure grid 21 in the simulations. By further developing the proposed methodology, this simulation technology will enable us to predict the effects of dental prostheses on the production of sibilant fricatives for patients prior to prosthetic surgery.

Methods
Vocal tract geometry. To simulate the phenomena involved in the pronunciation of /s/, the geometry of a vocal tract replica was constructed from CT images 12 . The subject was a 32-year-old Japanese male who self-reported no speech disorders with a normal dentition of angle class I, without inter-teeth spacing. The CT images were taken while the subject sustained the pronunciation of /s/ for 9.6 s without vowel context, and the image resolution was 0.1 × 0.1 × 0.1 mm 3 . The surfaces of the vocal tract geometry were extracted based on brightness values using the software itk-SNAP 22 . Since the resolution of the CT scan is 0.1 mm and the tips of the incisors were slightly smoothed through the vocal tract extraction from the CT scan, we have confirmed that the extracted vocal tract geometry reproduced the subject's pronunciation of /s/ up to 14 kHz by constructing an oral replica 12 . The ethics committee of the graduate school of Osaka University certified this study (H26-E39). Figure 1a shows the vocal tract geometry, including the throat, tongue, hard palate, incisors, and lips. The x 1 is defined as the anterior-posterior direction; the x 2 is defined as the inferior-superior direction; x 3 is defined as the transverse direction. The initial inclination angle of the incisor to the maxillary plane was 108° for this subject. For the inclined cases, the variation from + 10° to + 30° was resulted in 118° to 138° based on this original incisor angle, and the region of the inclination incisor is marked in red color (− 10 mm < x 3 < 10 mm). The geometries from the top and side views are shown in Fig. 1b,c, respectively. We confirmed that the exclusion of the upstream vocal tract geometry on the simulation of /s/ was negligible for the main acoustic characteristics of /s/ 14 . According to Runte 1 , by comparing the sound of /s/ generated by the human subject with the subject's vocal tract replica made of plaster using a 3D printer, the frequency characteristics of /s/ were produced up to 16 kHz with the maximum discrepancy of 8 dB. It indicates that the solid wall condition is valid to investigate the sound mechanisms.
To investigate the effect of the inclination angle of the upper incisor, the incisor position (− 17.7 mm < x 1 < − 12.2 mm) was raised from its original position (Fig. 2a, 0°) up to 30° (Fig. 2b). Figure 2c shows the mid-sagittal plane in the modified geometry (x 3 = 0) with the incisor angle increased in 10° increments from 0° to 30°, the length of the incisor is kept as the same as the original geometry. As shown in Fig. 2a, the overjet (horizontal overlap) and overbite (vertical overlap) of the incisor angle in the original model are 2.3 mm and 0.3 mm, respectively. When the inclination angle increases, the x 1 distance between the upper and lower incisors becomes longer; therefore, the overjet increases from 2.3 to 2.5 mm. Conversely, the x 2 distance becomes wider; therefore, the overbite decreases from 0.3 to − 1.7 mm. These values are within the range of clinical measurements 8 . The sidewalls of each tooth were modified to a smooth appearance to prevent the formation of small gaps between the teeth that lead to the instability of flow simulation.
Governing equations. The governing equations are the compressible Navier-Stokes equations: where t is the time, x i indicates three directions in Cartesian coordinate system (i = 1, 2, 3), and the conservative vector U is where the flux vectors F i are ρ is the density, u i indicates the velocity components (i = 1, 2, 3), and δ ij is the Kronecker delta.
The total energy e is calculated as and µA ij is the stress term, with (1), we applied the following numerical framework. The second-order-accurate implicit lower-upper symmetric Gauss-Seidel scheme (LUSGS) is adopted for time integration. The Roe scheme with a preconditioning method and dual time stepping is applied, and the discretized form of Eq. (1) with artificial time step Δτ is where Г is the preconditioning matrix proposed by Weiss and Smith 23 , U p is the primitive form [P, u 1 , u 2 , u 3 , T], τ is artificial time and t is physical times, the superscripts k and n are the iteration numbers in artificial time step and the proceeding step of real time, respectively. The quantities associated with the artificial time term of the (k + 1)th iteration are transferred approximately to quantities of the (n + 1)th time step in real time when the term ∂U p /∂τ converges to zero. Then, Eq. (9) will be reduced to the original Navier-Stokes equation including the transient term.
Finally, Eq. (1) can be rearranged as To accelerate the convergence speed, Lian et al. 24 proposed the solution-limited time stepping (SLTS) method by adaptively adjusting the CFL number and determine Δτ in the governing equation. When adopting LUSGS method, the estimation value ΔQ est defined as and ΔQ ref is defined as www.nature.com/scientificreports/ where ΔP sur is the maximum difference between the pressure at surrounding points and P global , V global denotes the global value that ensures the reference values are always greater than 0, c is the speed of sound, and γ is the heat capacity ratio. Equation (12) provide a criterion for determining whether the calculation is stable or not. The factor [α 1 α 2 α 3 α 4 ] is the coefficient of the maximum allowable fractional change according to Lian 24 . Under the SLTS method, the larger physical time step and the faster speed of convergence can be achieved. However, because of the Newton linearization error of the term ∂U p /∂τ , SLTS method is not suitable for aeroacoustic simulations even when the convergence criteria are satisfied. Hence, we applied the adaptively switched time stepping scheme (ASTS) 25 to reduce the computational cost and maintain the accuracy in the aeroacoustic simulation.
In the calculation of R k on the right-hand side of Eq. (10), the terms involving F i in Eq. (3) can be divided into an inviscid term F inviscid , and a viscous term F viscous , as shown below: When employed the Roe scheme in Eq. (14), F inciscid term will be discretized into where F d is the Roe dissipation term, which is composed of jumps of properties of work fluids. For the reconstruction of F R and F L , the fifth-order monotone upstream-centered scheme for conservation laws (MUSCL) 26 without a limiter function to prevent turbulent fluctuations from attenuating. Aside from the inviscid term, the derivative terms in A ij in the viscous term of Eq. (3) are calculated using the second-order central difference. The detail of the current framework can be found in previous study 18-20 . Computational conditions. To simulate the complex geometry, e.g., a realistic human oral cavity, the immersed boundary method with the hierarchical structure grid 21 was applied for the grid spacing. As the grid configuration, the computational domain is divided according to the hierarchical structure system proposed by Nakahashi 21 . Using a hierarchical structure grid can shorten the working time required to build computational grids and simultaneously provide better load balancing and higher performance for parallel computations.
After testing the space resolution, the minimum grid size was set as 0.05 mm around the upper incisors to keep the accuracy of the immersed boundary around the turbulent region. To simulate the sound waves www.nature.com/scientificreports/ propagating through the lip outlet, the far-field region was set outside of the vocal tract model. The total grid numbers of the 0-30° cases were 8.7 × 10 7 , 7.2 × 10 7 , 7.8 × 10 7 , and 6.7 × 10 7 , respectively. The three-view diagram of overall computational domain with the boundary condition and tracking point for the current model are shown in Fig. 3. The inlet was set to a uniform velocity condition to simulate the pronunciation of /s/. The uniform inlet velocity was set to 1.5 m/s, which resulted in a physiological flow rate of 330 cm 3 /s 12 . The Reynolds number was 5632, based on the maximum velocity ( − |u| max = 50.8 m/s ) inside the oral cavity in the original geometry. To keep the flow in the computational domain from being polluted by reflecting pressure waves, an absorbing boundary condition was used as the outlet condition. The absorbing boundary condition used in the current study is based on JB Freund 27 and extend by Li 20 which is adjusted for the low flow speed simulation. The time step was set to 10 −6 s, so that the CFL number was 7.8, which fulfilled the condition for the ASTS method 26 . The physical time of the performed simulations was 0.015 s and required parallel computing with 1152 cores on 32 nodes for 30 h. Table 1 summarizes the computational parameters for the simulations. Fast Fourier transform (FFT) using the Hann window was applied to the waveforms sampled 100 mm from the lip outlet (x 1 = 100 mm) to analyze the far-field sound spectrum. The FFT sampling frequency was 50 kHz with 256 points averaged five times. The sound pressure level (SPL) was calculated based on the reference pressure P ref = 20 × 10 −6 Pa. In addition to the sound spectrum, the magnitudes of the velocity fluctuations at each frequency were calculated via FFT on each grid to identify the highest contribution position for the potential sound source 23 .

Results and discussion
To ascertain the accuracy of the simulation in the present framework, the result was compared with experimental measurements. The acoustic pressure was collected 100 mm from the lip outlet (x 1 = 100 mm), and the frequency spectra of the sound was calculated via FFT, as shown in Fig. 4a. We have confirmed that the velocity magnitude around the tracking point is small enough to be neglected, which is less than 0.05 m/s. In the experiment, the vocal tract replica was constructed using a 3D printer (Objet30Pro, Stratasys, USA; accuracy: ± 0.1 mm) and a constant flow was input to the model using a compressor (YC-4RS, Yaezaki, Tokyo, Japan). The sound generated by the model was measured using a microphone (type 4939, Bruel & Kjaer, Naerum, Denmark) at a distance of x 1 = 100 mm in an anechoic chamber (with a volume of 8.1 m 3 ). The pronunciation of /s/ was recorded with an actual subject for 18 times of utterances. The subject sustained /s/ for 3 s without vowel context, and before the signal was calculated via FFT, some portions of signals after the onset and before the offset of /s/ are removed. Therefore, inclusion of the effect of the onset and offset was avoided.
The sibilant sound is characterized as a broadband noise above 4 kHz. At a frequency of 3 kHz, SPL rapidly increased, and the first characteristic peak reached around 5 kHz. This result shows good agreement with the measurements for the actual subject and oral replica, and the same characteristics of sibilant /s/ sound was observed by Runte 1 . Accordingly, this result suggests that the present framework of the direct aeroacoustic computation can provide aeroacoustic predictions with reasonable accuracies.
To investigate the effects of the incisor angle, the incisor angle was varied from 0° to 30°; the far-field SPL spectra for these angles are shown in Fig. 4b. According to Runte 1 , the changes of incisor angle of denture led to a different noise band range. In this study, amplitudes from 8 to 12 kHz were decreased by the increase of incisor angle. This means that the upper boundary frequency of noise was decreased and the noise band range became smaller with the inclined incisors. While the noise band range of 0° was approximately 4 kHz to 12 kHz, the noise band range of 30° was 4 kHz to 8 kHz. This result is consistent with that found by Runte 1 . Besides, according to Snow 28 , general audible frequency range for male and female speech are up to 7 kHz, and 9 kHz, respectively, and the characteristic peak of the sibilant sound at around 4 kHz was observed for all teeth angles. Therefore, the sounds generated by all the cases can be characterized and recognized as sibilant fricative /s/. However, the decreasing amplitudes in the frequency range 8 kHz to 12 kHz might affect the recognition of the sound. Figure 5 shows the normalized instantaneous velocity magnitude |u|/ − |u| max and root mean square (RMS) value of the velocity fluctuations |u| rms / − |u| max from the 0° to 30° models on the mid-sagittal plane (x 3 = 0). As shown in Fig. 5a,c,e,g, the instantaneous velocities of all cases are accelerated at the narrow channel between the tongue and the hard palate (the sibilant groove) (− 25 mm < x 1 < − 16 mm). Downstream of the sibilant groove, the flow became turbulent at the region between the teeth and the lower lip (− 15 mm < x 1 < − 11 mm) as a result of the jet flow leaving the sibilant groove. However, because the exit of the sibilant groove became wider in superior direction with the inclined incisors, the reduction of the occlusion made the mainstream flow faster and the flow reached more distant positions. Hence, the high RMS region shown in Fig. 5b,d,f,h moved from the cavity between the lower incisor and the lip to the tip of the lower lip with increasing incisor angle. Meanwhile, the turbulence intensity was not affected by the raised incisor angle. According to Lighthill's analogy 22 , the aeroacoustic sound source is primarily produced by the time variation in the space derivatives of the Reynolds stress tensor, which means that the sound source likely appeared in the region of high RMS values. Therefore, the different flow configurations caused by the raised incisor angle might be considered as the reason for the difference in the acoustic field.
To identify the cause of the different SPL values around 10 kHz between 0° and 30° in Fig. 4b, the positions of the potential sound sources for the 0° and 30° models were calculated. The magnitudes of the velocity fluctuations at specific frequencies (5 kHz and 10 kHz) were calculated via FFT on each grid. The magnitudes of the velocity fluctuations inside the vocal tract for the 0° model along the x 1 -x 2 -plane are shown in Fig. 6a at 5 kHz (x 3 = 1.1 mm) and Fig. 6b at 10 kHz (x 3 = − 7.7 mm), respectively. At 5 kHz, the maximum value is located behind the incisor, which is the exit of the sibilant groove. As seen in the flow configurations (Fig. 5) Fig. 6c,d. The maximum value for 5 kHz appeared at the exit of the sibilant groove, which is the same as the position for the 0° model at 5 kHz. These results indicate that the velocity fluctuations downstream of the constriction formed the characteristic peak at 5 kHz for both the 0° and 30° models. Conversely, at 10 kHz, the jet flow of the 30° model passed along the surface of the incisor and the maximum value of the velocity fluctuations appeared above the lower lip, which was at the position (58.5, − 7.7, − 4.5).
To determine the relationship between the positions of the velocity fluctuations, i.e., the assumed aeroacoustic sound sources, and the far-field SPL spectra, instead of the base simulation with constant velocity inlet from the throat, the acoustic simulations with acoustic monopole sources were conducted for the 0° and 30° cases. In the previous acoustic studies, monopole to quadrupole sound sources were used to emulate the sound generated by the turbulent flow 29,30 . Therefore, for the simplicity, the monopole sources composed of white noise were applied in the current study at Point 1 (49.1, − 8.4, − 7.7) and Point 2 (58.5, − 7.7, − 4.5), which corresponded to the positions of maximum velocity fluctuations at 10 kHz for both models.
A comparison of the far-field sound spectra in Fig. 7 shows an obvious amplitude difference between the two sound source positions in the frequency range of 4-8 kHz. When the acoustic source is located at Point 1, SPL shows the characteristics of the sibilant sound for both the 0° and 30° models. Conversely, SPL generated by the sound source at Point 2 consisted of a broad noise with amplitudes lower than those generated at Point 1. This is because the acoustic source at Point 1 was located inside the modeled vocal tract geometry and the sound wave was resonated by the front oral cavity. The distance from the glosso-palatal constriction to anterior lip surface www.nature.com/scientificreports/ was 1.39 cm, which was close to the quarter of wavelength around 6 kHz. Therefore, SPL around 6 kHz was larger and the sound generated by Point 1 still had the characteristics of the sibilant sound in the far field. In contrast, because the acoustic source at Point 2 was located almost entirely outside of the vocal tract, the source did not  www.nature.com/scientificreports/ couple strongly with the resonator, and no significant sound could be caught in the entire frequency range at the far-field. For this reason, the far-field SPL around 10 kHz was smaller in the 30° case. Consequently, the shift in the aeroacoustic source position affected the resonance of the sound waves and influenced the far-field SPLs of /s/. These findings and the framework of the current simulation model can clarify the effects of geometrical differences resulting from dental prosthesis, e.g., the incisor positions and angles, on the flow as well as the sound generation. Besides, it can be used to help design dental prostheses at the same time predict the outcomes of surgical procedures for the production of sibilant fricatives.

Conclusions
To investigate the effect of the inclination angle of the incisor on the speech production of the sibilant /s/, numerical flow simulations of a vocal tract geometry with different incisor angles was conducted. On the basis of the far-field SPL spectrum, increasing the incisor angle from 0° to 30° had no influence on the characteristic peak of the sibilant sound at 4 kHz. However, increasing the incisor angle reduced the amplitude of the sound in the frequency range from 8 to 12 kHz. In the flow field, the turbulence intensity kept the same level and the maximum velocity occurred at the sibilant groove in all cases, while the high RMS value region moved from the cavity between the teeth and the lower lip to the tip of the lower lip when the inclination angle increased. www.nature.com/scientificreports/ By conducting acoustic simulations with a monopole source at the potential sound source positions of the 0° and 30° cases, we found that the acoustic source position affects the resonance of the sound wave and influences the far-field SPL spectrum. Specifically, if the sound source position was located closer to the exit of the vocal tract, i.e., the lips, the source didn't couple strongly with the resonator, and no significant frequency could be caught at far-field. Because the flow channel downstream of the sibilant groove became wider when the incisor angle was increasing from 0° to 30°, the large velocity fluctuation region was shifted and the amplitude of the far-field sound around 10 kHz was reduced. Consequently, the slight change of the geometry affected less to the turbulent intensity, but changed the flow configuration and shifted the position of the potential sound source,  www.nature.com/scientificreports/ thereby influenced the performance on the acoustics field. These results provide the underlying insights necessary to design dental prostheses for the production of sibilant fricatives.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.