Acoustic-level and language-specific processing of native and non-native phonological sequence onsets in the low gamma and theta-frequency bands

Acoustic structures associated with native-language phonological sequences are enhanced within auditory pathways for perception, although the underlying mechanisms are not well understood. To elucidate processes that facilitate perception, time–frequency (T–F) analyses of EEGs obtained from native speakers of English and Polish were conducted. Participants listened to same and different nonword pairs within counterbalanced attend and passive conditions. Nonwords contained the onsets /pt/, /pət/, /st/, and /sət/ that occur in both the Polish and English languages with the exception of /pt/, which never occurs in the English language in word onset. Measures of spectral power and inter-trial phase locking (ITPL) in the low gamma (LG) and theta-frequency bands were analyzed from two bilateral, auditory source-level channels, created through source localization modeling. Results revealed significantly larger spectral power in LG for the English listeners to the unfamiliar /pt/ onsets from the right hemisphere at early cortical stages, during the passive condition. Further, ITPL values revealed distinctive responses in high and low-theta to acoustic characteristics of the onsets, which were modulated by language exposure. These findings, language-specific processing in LG and acoustic-level and language-specific processing in theta, support the view that multi scale temporal processing in the LG and theta-frequency bands facilitates speech perception.

www.nature.com/scientificreports/ Figure 1. Waveforms/spectrograms illustrating sample stimulus presentation trials. Figure displays the presentation sequence for sample trials. These sample trials illustrate different nonword pairs within the matchto-sample experimental design that highlight phonemic-level processing. Different trials were also presented in the reverse order with three-syllable words preceding two-syllable words. Trials were separated by a fixed two-second interval and nonwords within each pair were separated by a fixed interval of 250 ms. Beneath each waveform is shown the total word duration (far right) and the duration from word onset to the burst for /t/ (vertical dotted line). www.nature.com/scientificreports/ tions of each of the 35 nonwords. These 70 productions were presented as first words in the same pairs and the different pairs for a total of 140 presentations for each onset type (e.g., /sət/). The stimuli were produced using a penultimate stress pattern, as evident in Fig. 1 waveforms. The predominant stress pattern in the Polish language is penultimate stress and the English language includes words that have penultimate stress (e.g., petunia). Stimuli were produced by a bilingual Polish-English male speaker in his mid-twenties. He came to the United States with his family at six years of age and attended school in New York. We identified the speaker of the stimuli as being a dominant English-language speaker. Specifically, an error analyses identified that the speaker reduced the vowel in /pət/, such that native-Polish speakers could not distinguish a small number of nonword productions that began with /pət/ from nonwords that began with /pt/. These nonwords were not included in the current stimulus set. Even though the speaker was dominant in English, the Polish participants, who learned English late in life, judged his Polish pronunciations to be native-like. This is consistent with the speaker's report that he spoke only Polish in his family home and attended one full day of Polish school each week through 12th grade. Also, we previously demonstrated, and replicated with the current participant groups, that both speakers of Polish and English could differentiate words that began with /st/ versus /sət/, but only native-Polish speakers could distinguish words that began with /pt/ versus /pət/. Further, these behavioral results were confirmed by a late auditory evoked potential (AEP) component of conscious processing 9 .
Experimental design. Each of the 48 participants was tested twice, engaging in an attend condition during one experimental session and a passive condition during the alternate experimental session. During the passive condition, participants were instructed to only listen to the stimuli. During the attend condition, participants listened to the nonword pairs and performed a behavioral task to the second word in the word pairs, using a button press response. The participants were asked to determine whether the second word in the nonword pairs had two or three syllables. This task determined whether nonwords containing the onsets /pt/ versus /pət/ and / st/ versus /sət/ could be differentiated by English and Polish listeners.
Attend and passive conditions were counterbalanced such that 12 English and 12 Polish adults engaged in the passive condition during session one and the attend condition during session two. The alternate 12 English and 12 Polish participants engaged in the listening conditions in the reverse order. A minimum of two months (mean 5 months) separated the testing sessions to reduce the effects of stimulus and task repetition 33 .

Data acquisition.
Data was collected at The Graduate Center, City University of New York. The EEGs were recorded using Net Station 4.5.1 (Electrical Geodesics Inc.) in a sound treated, electrically shielded booth, using a 64-channel sensor net. The EEG was collected between 0.1 and 100 Hz with a sampling rate of 500 Hz, and referenced to the vertex electrode Cz. Stimuli were presented free field (62 dB SPL) through left and right diagonally-positioned speakers (Realistic Minimus-7) using Eprime 1.1. EEGs were analyzed using BESA Research 7.1 and BESA Statistics 2.1.
EEG pre-processing to obtain AEPs. AEPs were used to create a source localization model, which is described below. Bad channels, identified through visual inspection, were interpolated using spherical spline interpolation 34 . Then eye blink and ECG (electrocardiogram) artifact activity were reduced using spatial filtering. Both artifact topographies were modeled by applying principal component analysis (PCA) over averaged blink/ECG events, recognized by a pattern search. For eye blinks, the first PCA component, which explained almost all template variance, was used to reduce eye blink activity. To reduce ECG activity, all components that explained more than 10% of the variance in the averaged template were used. Spatial filtering was performed using the surrogate approach 35 . Data was segmented into epochs of 1500 ms (− 500 pre-to 1000 ms poststimulus onset). Epochs were scanned for additional artifact. EEG signal within trials that exceeded 120 μV were excluded and EEG signals within trials that contained amplitude jumps (gradient criteria) that exceeded 75 μV were also excluded from further processing. On an individual subject basis, additional channels were interpolated as deemed necessary. In total, an average of 4-5 channels were interpolated (of the 64 total channels) per language group and condition. The average number of accepted trials per word onset type, language group, and condition was as follows: Source localization model. A source localization model 36,37 was applied as a spatial filter, thereby transforming the EEG data into source space (BESA Research 7.1). This process transformed the 64-channel EEG into brain source-level channels, derived through mathematical calculation. Source-level channels were used to enhance efforts to clarify acoustic-level and language-specific processing of phonological sequences through time-frequency analyses. The source localization model was derived from AEPs, using all participant files in both listening conditions to the phonological-sequence onsets (/pt/, /pət/, /st/, and /sət/) combined in a grand average (~ 53,000 trials) (BESA Research 7.1). A principal component analysis (PCA) identified three peaks of activity, which were consistent with the global field power (GFP) waveforms, shown in Fig. 2a 38,39 , five-dipole sources were localized at the time windows, 90-120 ms, 170-200 ms, and 270-300 ms, which correspond to the timing of the three PCA peaks. The first two peaks coincide with the N1 and P2 components of the AEP. These first two time-windows, 90-120 ms and 170-200 ms, localized bilateral dipole sources in the region of auditory cortex (AC), hence, we use the terms ACN1-R, ACN1-L and ACP2-R, ACP2-L (R = right, L = left) to describe the four source channels. Primary sources for the N1 component originate from the superior temporal plane (STP) (primary auditory cortex, the lateral region of Heschl's gyrus and planum temporale) with additional smaller contributions from medial-posterior superior temporal gyrus (STG) [40][41][42] . The second bilateral source identified at the P2 time window, appears posterior to that found at the N1 time window, as evident in the head model, sagittal view of Fig. 2e. Figure 2d illustrates that the N1 and P2 peak activity evident in the source waveforms (SWFs) were only partially separated. This was unavoidable, as the waveforms reflect composite EEG responses to the time-varying acoustic changes within the naturally produced onsets within nonwords. The time window at the third PCA peak (270-300 ms) localized a fifth dipole, a central dipole in the region of the cingulate gyrus (CG) (Fig. 2e). We were confident that the fifth source was central, because attempts to fit bilateral dipoles at this time window resulted in the bilateral dipoles fitting adjacent to a midline location. Orientation for these five dipoles were fitted for each participant's data in the same time intervals as mentioned above, following the approach described by Hoechstetter et al. 43 . A regularization constant of 1% was used during the entire dipole fitting procedure. Goodness of fit for the five-dipole model between 50 and 500 ms post-word onset for each of the four onsets, separately, in each language group and listening condition was greater than 92%. To explain possible activity not time-locked to the stimuli, a regional source dipole was added to each brain region (BR), except for those brain regions that were within 3 cm of the modeled five-dipole sources (BESA Research 7.1) 44 . This resulted in a final model containing five-dipole sources and twelve-regional sources. It is worth mentioning that these additional sources do not hamper the five-dipole model because the crosstalk between sources is minimal when a modest regularization of 1% is used. Therefore, sources should contain almost no signal provided that no activity originates from the corresponding brain areas 45 . This approach can be particularly useful for time-frequency analysis, which examines both evoked and induced brain activity. The Talairach coordinates and orientations for  Table 1). Figure 2c, shows the final model, five dipoles + BR regional sources, and an overlay of residual variance, with all localized sources illustrated in Fig. 2e. Goodness of fit, for the final model (50-500 ms post word-onset) for each of the four onsets, separately, in each language group and listening condition was greater than 98%. Also, a video display (Supplementary Video 1) in supplementary materials illustrates the change in the degree of activity from each dipole in the left and right hemispheres between 0 and 400 ms.

Source-waveform (SWF) analyses.
Using the final source localization model, consisting of five dipoles + BR regional sources, SWFs were created. SWFs are averaged amplitude values, time and phase-locked to the stimulus, from each source-level channel (1-55 Hz; 1500 ms epochs as per AEP pre-processing). SWFs are similar to AEP waveforms, but from source-level, showing the activity of the brain region that is modelled by the source. In the current study, SWF are inverted relative to AEP positive and negative deflections. SWFs were analyzed, in part, to assess the source localization model. For example, to ascertain whether the model's auditory source-level channels ACN1-R, ACN1-L, ACP2-R, ACP2-L would effectively assess sensory-level processing of the stimuli, we compared the SWFs from all English and Polish participants combined to each onset in the passive compared to the attend conditions. Within-language group comparisons were also conducted for SWFs in response to four onset contrasts /pt-pət/, /st-sət/, /pt-st/, and /pət-sət/.

Time-frequency (TF) analyses.
For T-F analyses, single-trial raw EEG signals were segmented (1500 ms) and transformed into the time-frequency domain using complex demodulation 46 , with 1 Hz wide frequency bins and 50 ms time resolution (BESA Research 7.1). Complex Demodulation is based on the convolution of the EEG signal with series of sine and cosine waves. These calculations are performed in overlapping 50 ms bins, thus, activity is slightly smeared at the edges and may appear to precede the stimulus. A procedure summary can be found in BESA Research resources (https:// www. besa. de/ wp-conte nt/ uploa ds/ 2014/ 05/ BESA-Conne ctivi ty-1. 0-User-Manual. pdf Sect. 8.1, pp. 14). Information pertaining to the width of the time-frequency resolution can be found using the following resource: (http:// wiki. besa. de/ index. php? title= Time_ Frequ ency_ Resol ution_ In_ BESA). Spectral power and inter-trial phase locking (ITPL) values in response to speech were examined in the LG and theta-frequency bands. Temporal spectral evolution (TSE) values identified the percentage of change in spectral power (amplitude) relative to baseline and reflect both induced (random-phase) and evoked (phaselocked) responses to the stimulus. ITPL values measure the consistency at which the phase aligns to a stimulus across trials within a specific frequency band. A value of 0 indicates random phase alignment and a value of 1 reflects perfect alignment across trials. Language-group effects within LG were reported with the upper limit set at 55 Hz, as a 60 Hz notch filter affected data between 56 and 64 Hz.
As a preliminary test to ascertain whether the model's auditory source-level channels ACN1-R, ACN1-L, ACP2-R, ACP2-L would effectively assess sensory-level processing of the stimuli, we compared the TSE and ITPL values from all English and Polish participants combined to each onset in the passive compared to the attend conditions. To address the research questions, T-F analyses in response to the phonological sequence onsets /pt/, /pət/, /st/, and /sət/ were conducted on two levels, between-language group and within-language group comparisons.
Statistical analyses. SWFs and T-F data were analyzed using cluster analysis in combination with permutation testing, designed to minimize the possibility of type 1 error due to multiple comparisons 47-49 (BESA Statistics 2.1). Initially, preliminary t-tests (two-tailed) comparing data points from two conditions were used to identify clusters of data, showing significant effects that extended across multiple data points. The cluster was formed from neighboring points in the time domain for SWFs or in the time-frequency matrix for T-F data that had p-value lower than the alpha threshold (p < 0.05). A cluster value was then obtained for each identified cluster by summing the t-values for each data point within the cluster. Permutation testing followed, in which the data was randomly re-shuffled 1000 times. For each of 1000 permutations, t-tests were repeated on each data point and new clusters and their values were again obtained. This allowed for a cluster value distribution across permutations. This distribution was used to identify clusters in the data that were non-interchangeable between Polish and English listener groups (between-language group comparisons) or non-interchangeable for contrasting onset sequences (within-language group comparisons) or, non-interchangeable for the passive and attend conditions. To obtain statistical significance we accepted clusters for which more than 950 permutated tests out of 1000 had lower maximum cluster values (p < 0.05).
Statistical analyses of SWFs, bandpass filtered between 1 and 55 Hz, were conducted at each time point within the interval 0-400 ms. For statistical analyses of TSE and ITPL values, we assessed a low-frequency range between 2 and 29 Hz and a high-frequency range between 30 and 55 Hz for the time interval 0-400 ms.
Human subject testing. Experimental protocols were approved by the Internal Review Boards at The Graduate Center, City University of New York, New York, New York, USA and St. John's University, Queens, New York, USA. All research participants provided written informed consent in accordance with the Institutions' established protocols.

Results
Sensory processing from bilateral auditory source-level channels. Prior to time-frequency analyses, SWFs from the Polish and English participant groups combined in response to each of the four onsets within nonwords were compared in the passive versus the attend conditions. We reasoned that if sensory processing www.nature.com/scientificreports/ can be determined from bilateral auditory source-level channels, then responses from English and Polish listeners combined should be fairly similar from these source channels for both listening conditions, the passive and attend conditions. Also, within-language group comparisons of SWFs in response to the onset contrasts (/pt/ versus /pət/, /st/ versus /sət/, /pt/ versus /st/, and /pət/ versus /sət/) from bilateral auditory source-level channels should reflect sensory processing differences to the contrasting onsets. As expected, SWF responses to the speech stimuli were highly similar from bilateral auditory source-level channels in the attend and passive listening conditions. One significant difference was found from an auditory source. From the ACN1-L source a significant difference was found to the /sət/ onset with a larger amplitude for the attend compared to the passive condition at a late time interval (p = 0.015, attend mean 2.306, passive mean 0.49, latency 330-400 ms). This is consistent with our earlier work with this same participant group, in which attention for task (attend condition compared to passive condition) resulted in a negative shift of the AEP waveform, but the AEP waveform morphology that was specific for each onset sequence (/pt/, /pət/, /st/ and /sət/) and reflected sensory processing of the acoustic characteristics of the stimuli remained unchanged 30 .
Within-language group comparisons found significant differences in SWFs to each onset contrast /pt-pət/, /st-sət/, /pt-st/, and /pət-sət/ from auditory source-level channels for each language group and listening condition. This suggests sensory-level processing of the spectro-temporal characteristics within the contrasting onset sequences from bilateral auditory source-level channels (Supplementary Table 2).
For the purpose of ensuring sensory processing from bilateral auditory source-level channels, the same comparisons were conducted on time-frequency data. Comparisons of spectral power and ITPL values in the two-listening conditions from all Polish and English participants combined failed to find significant effects from the model's sources ACN1-R, ACN1-L, ACP2-R, or ACP1-L. However, numerous significant effects were found when contrasting phonological sequence onsets within each language group, as described in detail below.
Time-frequency analyses: between-language group comparisons. Between-language group comparisons (Polish, English) of spectral power (TSE) and ITPL values were analyzed in LG and theta to each onset sequence /pt/, /pət/, /st/, and /sət/, individually, from four-dipole sources (ACN1-R, ACN1-L, ACP2-R, ACP2-L) for the passive and attend conditions. Between-language group comparisons found TSE and ITPL values to the onset sequences to be highly similar with one exception, that was in TSE values in LG as described next.
Spectral power (TSE), low gamma. A significant difference in spectral power was found between the Polish and English listeners in response to the /pt/ onset, the phonological sequence onset that occurs only in the Polish language, from an auditory source-level channel in the right hemisphere (ACN1-R). The values were significantly different in the LG band between 31-40 Hz, 0-400 ms for the passive condition (p = 0.035, Polish mean − 0.072, English mean 0.028). The maximum effect for this response was found at 35 Hz, 150 ms postonset. Figure 3 illustrates the significant increase in spectral power for the English compared to the Polish listeners, from the ACN1-R source, for an extended time interval and frequency range. For comparison, the figure also shows TSE values from the ACN1-L source, which did not differ between Polish and English listeners for the / pt/ onsets. Additional comparisons of TSE values in LG for the passive condition failed to find significant language group differences to the onset sequences /pət/, /st/ and /sət/, which occur in both the Polish and English languages. Also, no significant differences were found to any of the four onset sequences in the attend condition.
Spectral power (TSE), theta. No significant language group differences in spectral power were found to any of the four onset sequences in the theta band for either the passive or attend conditions. ITPL, low gamma. Language group differences in ITPL values were not found in LG to the four onset sequences for either listening condition. ITPL, theta. No significant language group differences in ITPL values were found in the theta band to the four onset sequences for either listening condition.

Time-frequency analyses: within-language group comparisons.
Within-language group comparisons of TSE and ITPL values in response to the onset sequences that differed by a single phoneme (/pt-pət/, / st-sət/, /pt-st/, /pət-sət/) were conducted for the passive and attend conditions. The purpose was to assess differences in sensory processing for each onset contrast and to determine whether sensory processing was comparable across language groups. Spectral power (TSE), low gamma. Within-language group comparisons of the onset contrasts from bilateral auditory source-level channels found no significant differences in spectral power in LG for either language group for the passive or attend conditions. Spectral power (TSE), theta. For the attend condition, TSE values in response to the /st-sət/ contrast differed significantly for the Polish group in the high-theta band from the ACN1-L source (p = 0.009, /st/ mean 0.041, /sət/ mean − 0.077, 5-10 Hz, max 9 Hz, 0-400 ms) (Fig. 4). Comparison of TSE values in theta did not find additional significant effects to the contrasts for either language group or listening condition. www.nature.com/scientificreports/ ITPL, low gamma. Within-language group comparisons of the onset contrasts from auditory source-level channels did not reveal significant differences in ITPL values in LG for either language group or listening condition.

ITPL, theta.
Overall, for both language groups, the largest number of significant differences in ITPL values were found in response to the onset contrasts that differed most in spectro-temporal characteristics. That is, a larger number of significant differences in ITPL values were found for the /pt-st/ and /pət-sət/ contrasts as compared to the /pt-pət/ and /st-sət/ contrasts (Table 1a, b). For example, as illustrated in Fig. 1, the /p/ phoneme has low-frequency energy concentration and a short duration, whereas the /s/ phoneme has high-frequency energy concentration and a longer duration of ~ 100 ms compared to the /p/ phoneme. Also, the significant differences in ITPL values to the /pt-st/ and /pət-sət/ onset contrasts were remarkably similar for the Polish and English language groups in time and frequency ranges, as well as in the direction of the mean ITPL values for each onset sequence contrast. Within Table 1, the more positive ITPL mean value for each onset sequence contrast is highlighted in bold text, illustrating consistency in the direction of the mean values for specific time and frequency ranges across language groups. These findings demonstrate replicable patterns of acoustic-level processing to onset sequences in the theta-frequency band across two independent participant groups. In contrast, whereas acoustic-level processing patterns appeared similar across language groups in response to the /pt-pət/ contrast (Fig. 5a), ITPL values differed significantly only for the Polish group, in the passive condition (Table 1c). No significant differences were found for either group in the attend condition. Also, ITPL values differed significantly for the /st-sət/ contrast for both groups in the passive condition (Fig. 5b), but only for the English group in the attend condition. Table 1d shows that the ITPL effect for the English group in the attend condition to the /st-sət/ contrast was significant from the ACP2-R source-level channel, with a trend from the ACP2-L channel. These ITPL values to the /pt-pət/ and /st-sət/ comparisons, that were contrasted within the experimental design (Fig. 1) may reflect frequency of exposure to the contrasting phonotactic structures in each language, which we discuss below. www.nature.com/scientificreports/ Further, Fig. 5 illustrates that the significant differences in ITPL values to the /pt-pət/ comparison, were reflected in dissociated low and high-theta frequency ranges. The dissociated responses in low and high-theta were accompanied by an inversion in the direction of the ITPL mean value, with the larger ITPL values for the / pət/ onsets compared to /pt/ onsets in low-theta and the larger ITPL values to the /pt/ onsets compared to the / pət/ onsets in high-theta (Fig. 5a, Table 1c). This dissociated pattern of response in low and high-theta is visible for both language groups in response to the /pt-pət/ contrast, although, the effect reached significance only for the Polish group. A similar dissociated theta pattern is evident for the /st-sət/ comparison (Fig. 5b). In contrast to this, ITPL values to the /pt-st/ and /pət-sət/ contrasts showed significant differences for particular time intervals across a more-broad theta range.

Discussion
At cortical levels, acoustic structures associated with linguistic units, including the syllable and phoneme, entrain at oscillatory rates associated with the temporal structure of the linguistic unit, with the rate of acoustic change for the syllable and phoneme tracked in theta and LG frequency bands, respectively 12,13,16,17 . Activations are nested within this hierarchical process, such that oscillations at lower frequencies modulate those at higher frequencies [19][20][21] . The nature of this integrated, coordinated process is envisioned to support perception.
The current study assessed the association between sensory processing in LG and theta and language-specific perception of nonword onsets from two auditory source-level channels, each in the left and right hemispheres, in native-English and native-Polish speakers. All nonwords are phonotactically legal, and thus possible words in the English and Polish languages, with the exception of nonwords that begin with /pt/, which never occur in the English language. A principal finding of the study was that spectral power differed between the language groups in response to the /pt/ onsets in LG, between 31 and 40 Hz, 0-400 ms from an auditory source-level channel in the right hemisphere (ACN1-R). This singular language-group effect in LG from an early cortical stage of processing, supports the argument that temporal processing facilitates perception 13,20,22,26 . Further, ITPL values in low and high-theta sub-bands that differentiated the acoustic characteristics of the onset sequences /pt-pət/ and /st-sət/ appeared to be modulated by language experience. We discuss these findings below.
Language-specific processing in LG. We questioned whether language group differences in spectral power or ITPL values in LG would be selective to the /pt/ onset that occurs only in the Polish language and whether the response would be bilateral or restricted to one hemisphere. The English listeners compared to the Polish showed increased spectral power to the /pt/ onsets from the right hemisphere (ACN1-R) for an extended time and frequency interval, without associated differences in the left hemisphere. The Asymmetric Sampling in Time (AST) model purports left-hemispheric asymmetry for rapidly changing phonemic-level signals 13,16,50 , www.nature.com/scientificreports/ thus, differences in sensory processing for the language groups would be predicted for the left hemisphere. Although counterintuitive, our finding may be consistent with the AST model. The larger power shown by the English listeners to the /pt/ onsets from the right hemisphere source might reflect sensory processing of a novel phonotactic pattern 51,52 . The apparent left asymmetry for the Polish, but not the English listeners, to the / pt/ onsets is consistent with this view. In an earlier AEP study examining these same research participants, we analyzed the P1-N1-P2 complex to the onset sequences from fronto-central electrode sites (0.1-30 Hz) in counterbalanced, attend and passive conditions 30 . Current source density (CSD) maps in that study illustrated that the English compared to the Polish listeners had an increased response to the /pt/ onsets in the passive condition. This AEP effect, which reflects largely phase-locked theta processing was found at fronto-central electrode sites at a late stage of cortical processing between 400 and 900 ms. The current results revealed a similar-response direction with increased sensory processing to the /pt/ onsets for the English compared to the Polish participants for the passive condition, but T-F analyses expanded the previous results by revealing a spectral power effect to /pt/ in LG from the right hemisphere, beginning at an early stage of cortical processing. Sensory processing of syllables has been demonstrated in the left hemisphere auditory cortex in the LG frequency band 22,53 and it has been argued that activity within LG (28-40 Hz) and theta has a foundation in the evolution of the perception and production of speech 50 . Taken together, the finding of increased power by the English listeners to the unfamiliar /pt/ onset in the right hemisphere might suggest endogenous processing to a novel unfamiliar acoustic-phonetic signal.
Language-specific processing in LG to native and non-native phonological sequences in two-language groups, to the best of our knowledge, has not been previously demonstrated. Therefore, we suggest that presenting an experimental context that highlights phonemic-level contrasts 54 as well as a passive-listening condition 29,30 may be beneficial for disentangling automatic patterns of sensory processing in LG, at least for EEG 31,32 . Language-specific processing in theta. We then questioned whether language-specific processing would be evident in theta-band activity, which demarcates, primarily, syllabic-level structures 20,21,24,26 . Studies examining language-specific processing have shown mixed effects as to whether theta-band oscillations enhance www.nature.com/scientificreports/ native-language perception 12,23 . In the present study, ITPL values in theta in response to onset contrasts appeared to reflect acoustic-level and language-specific processing. ITPL values across single trials in response to the /pt-pət/ onset contrast showed a distinctive pattern in low and high-theta for both Polish and English listeners. However, the differences in ITPL values to the /pt/ versus the /pət/ onsets reached significance only for the Polish listeners in the passive condition (ACP2-L). The /pt-pət/ onset contrast occurs only in the Polish language. The common word "bird", learned early in life, begins with /pt/ in the Polish language (i.e., "ptak"), whereas the /pVt/ onset is common in word onset in both languages [1][2][3]55 . In contrast, the /pt/ onset has zero probability of occurrence in the English language 1 . Thus, the ITPL differences in low and high-theta, found only for the Polish listeners, might reflect more variable phase locking by the English listeners to an unfamiliar contrast. Theta oscillatory activity plays a critical role in speech perception with ongoing phase shaping syllable perception 56,57 and syllable identity 58 . In light of these findings, the suggestion that language input during development might modulate phase synchrony is reasonable.
Further, a language-specific effect in theta is consistent with studies that have examined the T-complex, an AEP component (phase-locked, mostly theta-band activity) from head surface electrodes overlaying lateral posterior temporal brain regions 40 . T-complex modulations have been found to reflect developmental language , red masking depicts larger ITPL values in response to /pt/ in high-theta and blue masking depicts larger ITPL in response to /pət/ in low-theta, reflecting phase locking in select time-frequency bands for different phonological sequences. The bottom row in (a) shows a similar ITPL pattern for the English group to each onset sequence, however, differences were not significant. (b) Images show the ITPL values to the /st-sət/ comparison from the ACP2-L source, for the Polish (top) and the English (bottom) groups, also for the passive condition. For each group, ITPL values differed, significantly, only in high-theta (Polish: max 8 Hz, 200-400 ms, p = 0.042; English: max 6, 6-10 Hz, 0-400 ms, p = 0.014). Notice in the difference images on the far right, that for both language groups, larger ITPL values in high-theta are in response to the /sət/ onset syllable (blue masking). This is a different pattern than for the /pt-pət/ contrast, which found larger ITPL values in high-theta to the consonant cluster /pt/. See text.  [59][60][61][62] , as well as the amount of acoustic input associated with native-language phonemes and phonotactic patterns 10,63 . Between language-group analyses failed to find differences in theta to the /pt/ onsets. Acoustic-level processing for the phonological sequences was highly similar across language groups in theta. This may have obscured language-specific effects that may originate from select higher-level brain regions 22,53 .
ITPL values to the /st-sət/ contrast followed a similar pattern of sensory processing across language groups. However, the differences in high-theta to the /st-sət/ contrast reached significance only for the English group in the attend condition from the ACP2-R source, with a trend found from the ACP2-L source. Whereas ITPL values were not significantly different to the /st-sət/ contrast for the Polish group in the attend condition, TSE values from the ACN1-L source to the /st-sət/ contrast were significantly different only for the Polish group. These spectral power effects may reflect cognitive effects of attention, rather than exogenous sensory processing to the onsets, which is evident through replicable ITPL patterns across language groups. We speculate that the combination of responses to the /st-sət/ contrast may reflect the rare occurrence of the /sVt/ onsets in the Polish language 55 . In the English language, both /st/ and /sVt/ are common. Consider the syllable /sVt/ in running speech: phrases such as "set it up", "set it on", and "set it out" include the schwa /ə/ within the onset /sVt/ syllable. Our previous neurophysiological studies of Polish and English listeners' responses to these onsets support this frequency of occurrence data. We have demonstrated a larger late-latency positive component, an AEP component of conscious processing, to the /st-sət/ contrast for the English compared to the Polish listeners 9 , which we have replicated in our lab with this current population of English and Polish listeners. These findings suggest that in addition to the presence (or absence) of a phonotactic pattern in a language, the frequency of occurrence of a phonotactic pattern in a language may modulate sensory processing.
Acoustic-level processing in low and high-theta, within syllabic-unit tracking. We questioned whether sensory processing of the physical characteristics of phonological-level structures would be evident in LG and/or theta-frequency bands. Temporal processing models predict acoustic structures associated with phonemic units to be tracked in the LG band, however, comparison of onset sequences in each language group (within-language group analyses) failed to find significant spectral power or ITPL effects in LG (between 30 and 55 Hz) from auditory source-level channels. Again, the signals may be small or from select brain regions, and thus undetectable from composite EEG signals.
Peaks in ITPL values within select theta sub-bands differentiated acoustic-level processing to the phonological sequence onsets. This was particularly apparent for the /pt-pət/ and /st-sət/ comparisons. The addition of the vowel, however, did not yield consistent ITPL peaks in either high or low-theta, thus acoustic features of the vowel were not strictly associated with phase-locking values within a particular theta sub-band. Rather, each phonological sequence onset showed a particular ITPL signature pattern that was replicable across language groups, with ITPL peaks selective within theta sub-bands at particular time intervals. The onset consonants /p/ and /s/, within the /pt-st/ and /pət-sət/ comparisons, are highly contrastive in spectral energy. In response to these contrasts, we found theta activation across a broader frequency range, spanning low to high-theta.
At rest, a predominance of low-theta activation (peaking ~ 5.5 Hz) has been localized to the right hemisphere of auditory cortex and a predominance of high-theta activation (peaking ~ 7.5 Hz) has been localized to the left hemisphere of auditory cortex 22 . It was suggested that this asymmetry in theta-band activation might support syllable tracking in the right hemisphere and fine-grain structure monitored for phoneme integration in the left hemisphere. Further, high-theta activity was localized from the superior temporal plane and superior temporal gyrus, purportedly reflecting two stages of processing, which would allow for both phoneme integration and more abstract representation 22,53 . Syllable-level tracking, which has been repeatedly demonstrated in theta, could not be identified in the present study using a match-to-sample experimental design. Rather, our findings revealed acoustic-level processing of structures corresponding to phonological sequences within select time-frequency bands within theta, which may be involved in integration and coordination of phonological information within a hierarchical nested network. Attention effect on acoustic-level and language-specific processing. Finally, we examined the effects of attention on acoustic-level and language-specific processing. Spectral power differences in LG and ITPL differences in theta in response to the /pt/ onsets that occurs only in the Polish language, and its contrast with the /pət/ onset, were found during the passive-listening condition. In the attend condition, participants were attending to the stimuli to prepare a button press response to the second word in the nonword pairs.
The language-specific effects found here began at an early stage of cortical processing, an automatic processing stage that is modulated less by selective attention 30,64,65 . This result expands on previous AEP studies that have shown automatic patterns of native-language speech perception to be elicited during tasks for which attention to auditory stimuli was reduced 29,30 . Automatic processing of lexico-semantic content has also been demonstrated within a non-attend compared to an attend condition 66 . Attention to speech stimuli enhances sensory processing 28,67 , but neural patterns developed in response to speech during critical periods of maturation 68 may be concealed by additional processing associated with attention and other cognitive demands. Consistent with the model Automatic Selective Perception 31,32 , activation within the attention network may have masked automatic patterns of language-specific processing within the LG and theta-frequency bands.

Conclusion
In summary, comparison of time-frequency measures in native-English and native-Polish speakers found a single significant effect from auditory source-level channels, a spectral power effect in LG to one of four phonological sequence onsets, the /pt/ onset that occurs in the Polish but not the English language. Phonological www.nature.com/scientificreports/ sequence onsets were also contrasted in each language group, separately. Whereas patterns of evoked activity were comparable across the language groups, phase-locking appeared modulated by frequency of occurrence of the onset contrasts in the native-language. Our findings, language-specific processing in LG and acoustic-level and language-specific processing in theta are consistent with temporal processing models that posit phonemiclevel segmentation in LG, along with coordinated activations between theta and LG support speech perception.

Data availability
Time-Frequency data used in this study can be found at the Harvard Dataverse Public Repository using the link here: https:// doi. org/ 10. 7910/ DVN/ 6SJ1MO.