Auditory processing (AP) refers to neural coding, synthesis, and analysis of sounds by both the peripheral and central auditory system and other multimodal regions of the brain. Skills important to AP include spatial localization and lateralization, discrimination, grouping, temporal aspects of hearing (eg, resolution, integration, and ordering) and effective hearing in the presence of competing or degraded signals.1 An auditory processing disorder (APD) occurs when there is a functional deficit in one or more of these skills; as such, APDs are phenotypically heterogeneous.

The most common clinical presentation of an APD is disproportionate difficulty understanding speech in degraded listening situations, such as background noise or reverberant rooms, despite normal hearing sensitivity.2, 3, 4 Developmental APD presents in childhood with no other identified etiological or risk factors such as neurologic damage (acquired APD) or peripheral hearing loss (secondary APD), and can persist into adulthood.5, 6 In the case of a child, caregivers may notice that the child appears to hear, but is not listening.7 The presentation of a developmental APD may be intertwined with that of other communication and learning disorders such as language delay, dyslexia, and problems with attention.8, 9 Although any causal relationships of these sometimes overlapping disorders are unclear, there are data to suggest that there may be a shared etiology in some cases.10

There is no gold standard for the diagnosis of APDs, neither are there any pathognomonic features. APD is typically diagnosed based on reduced performance on one or more tests designed to assess AP skills. However, there is no consensus on the components of an appropriate test battery, nor are there definitive criteria for interpretation of these tests.11 Hind et al.12 estimate the prevalence of isolated APD in the general population of the United Kingdom to be 0.5–1%. In combination with prevalence statistics for other learning disorders (learning disability,13 attention deficit disorder,14 and intellectual disability15), which may co-occur with APD in 30–70% of the cases16, 17, 18 we estimate the overall prevalence of APD is ~10%. However, without a consensual definition and standardized diagnostic criteria, it is difficult to determine actual prevalence.19 One of the challenges in assessing AP skills is the overlapping need for auditory perception and cognitive processes such as attention, memory, and decision-making as well as the requirement for verbal labeling or other language-based responses.20 Complementary to direct measures of AP are questionnaires formulated to assess communication and listening difficulties that parallel and objectively capture parent or teacher concerns, such as the Children’s Communication Checklist (CCC-2).21

Identification of specific heritable AP traits would provide a foundation to determine the genetic and physiopathogenic underpinnings, clarify relationships with other neurocognitive disorders, and inform therapeutic interventions for APD. Twin studies are a powerful approach to evaluate and estimate the genetic, environmental, and stochastic contributions to a specific trait. Twins raised together experience essentially the same environment. Monozygotic ((MZ), identical twins) are genetically identical, whereas dizygotic ((DZ), fraternal twins) share on average 50% of their genome. By comparing the correlation in traits between MZ twin pairs with the correlation between DZ twin pairs, an estimate of the degree of variation that can be ascribed to shared genes, known as heritability (h2), can be calculated.22

We previously examined heritability of speech-based AP skills and estimated that ~73% of the variance in dichotic listening and 46% of the variance in time-compressed speech understanding were attributable to genetic variation in adults.23 Given the considerable cognitive and linguistic demands of these tests, concern arises as to whether the heritability estimates for dichotic and time-compressed speech reflect those demands rather than the auditory aspects of the tests. The aim of the present study differs from this previous work in that it examines genetic and environmental contributions to phenotypic variance in temporal and spectral processing of non-speech sounds. Here, we report four non-speech measures of spectral and temporal AP with heritability estimates (h2) ranging from 0.61 to 0.74 in our twin cohort, providing evidence of substantial genetic influence on variance of this trait. These non-speech AP skills are important factors for accurate and efficient coding and recognition of the dynamic features of auditory signals fundamental to speech perception and segregation of speech from background sounds during language development.24, 25

Materials and methods


We recruited 192 twin pairs, aged 6 years 0 months to 11 years 11 months, comprising 122 MZ twin pairs (60 males and 62 females; mean age 9.47 years) and 70 same-sex DZ twin pairs (30 males and 40 females: mean age 8.83 years) attending the Annual Twins Days Festival in Twinsburg, OH, USA in 2009 and 2010. Zygosity was determined by molecular genetic analyses as described below. Age and sex distributions for the MZ and DZ twin groups were comparable (age: P=0.09, t=1.731, df=94; sex: P=0.55, χ2=0.357, df=1). We obtained written, informed consent from a parent, and either written or verbal assent from each participant. This study protocol (00-DC-0073) was approved by the Combined Neuroscience Institutional Review Board, National Institutes of Health, Bethesda, MD, USA.

Enrollment into the protocol required participation by both twins who each had to meet all inclusion but no exclusion criteria. All participants were required to be native speakers of American English, and have a negative history of significant head trauma, brain surgery, or ear surgery other than tympanostomy tubes. Eligibility required passing otoscopic, tympanometric, and hearing screenings at the time of the study. Eligibility criteria were reviewed with the parents and a brief otologic history was obtained for each participant prior to participation. Otoscopic examination was conducted by an otolaryngologist (AJG) to rule out evidence of active outer or middle ear disease and occluding cerumen. Middle ear function was screened using a GSI-38 immittance bridge (Grason Stadler Inc., Eden Prairie, MN, USA) to rule out significant negative middle ear pressure (<−200 daPa) and reduced peak static compliance (<0.3 ml). Air-conducted pure-tones were screened using a Maico 41 audiometer (MAICO Diagnostics, Eden Prairie, MN, USA) at 20 dB HL for 1000, 2000, 3000, and 4000 Hz delivered via Ear Tone ER-3A insert earphones (Etymotic, Inc., Elk Grove Village, IL, USA).

Test environment

All study tests were administered in a quiet room located in a building adjacent to the Twins Days festival site. Testing was conducted in private cubicles to ensure minimal visual and auditory distraction. Ambient noise levels were monitored continuously during the study test sessions using a Larson Davis Laboratories (Depew, NY, USA) Model 700 dosimeter, which showed a time-weighted average of 54.1 dBA and a peak signal of 86.2 dBA. Testing was conducted or supervised by licensed audiologists (CB, LH, KK, MM, AR, or CZ). All testers were formally trained in administration of the IMAP test battery and related study procedures.

Test battery of AP and cognition

Participants were tested during a 1-hour session using five non-speech measures of AP, one speech-in-noise test, and three measures of nonverbal cognitive skills and short-term memory from the previously described IMAP protocol.26

The test battery was presented using laptop computers running customized software (MRC Institute of Hearing Research, System for Testing Auditory Responses (IHR-STAR), Nottingham, UK)27 that generated test stimuli in a randomized order of administration for the AP tests, and ensured that test protocols were followed.26 All auditory stimuli were presented through Sennheiser HD25 headphones (Wedemark, Germany). AP and cognitive tests were interleaved. The tester provided positive reinforcement, in the form of verbal praise and stickers, as needed during each session in order to maintain motivation for the child.

To evaluate temporal processing we used backward masking without a temporal gap between the target tone and the masker (BM), backward masking with a 50-ms temporal gap between the target tone and masker (BM50), and frequency discrimination (FD) (Figures 1a, b and e). To assess spectral processing, we used simultaneous masking without (SM) and with a spectral notch (SMN) surrounding the target frequency (Figures 1c and d). The test paradigm employed a 3-alternative, 3-interval forced choice adaptive staircase (3-down, 1-up) strategy,28 by which target stimuli varied based on correct and incorrect responses of the child according to methods previously described.19, 20, 26

Figure 1
figure 1

Schematic diagrams depicting a single trial, comprising three presentation intervals, for each of the individual auditory processing tests in the IMAP test battery. The task for each trial is to detect the interval containing the target tone. (ad) The three boxes designate successive sound presentation intervals separated by standard interstimulus intervals (ISI). The heavy horizontal bar represents the 20-msec target tone (arrow) and shading represents frequency bands of masking. (a) Backward masking (masker occurs immediately after the target tone) without a temporal gap between the masker and target tone (BM). (b) Backward masking with a 50-msec gap between the masker and target tone (BM50). (c) Simultaneous masking (masker and target tone occur at the same time) (SM). (d) Simultaneous masking with a 400 Hz spectral notch in the masker (SMN). (e) Frequency discrimination (FD); the heavy horizontal bar represents a 200-msec tone that occurs in each of the presentation intervals; two of the tones are at 1000 Hz (standard) and the third is a target tone presented at a higher frequency. In all of these examples (ae) the presentation interval containing the target stimulus is shown in the middle, but the target could occur randomly at any interval. msec, millisecond; Hz, Hertz.

The speech-in-noise test involved repetition of recorded VCV nonsense syllables in 3-band, single-male-talker-weighted, idealized speech-modulated noise9 using matched procedures to the non-speech AP tests.

Cognitive tests comprised standardized measures of nonverbal reasoning and included nonverbal IQ (NVIQ) from the Matrices Reasoning subtest of the Wechsler Abbreviated Scale of Intelligence,29 working memory for forward and backward digit span, Wechsler Intelligence Scale for Children – Fourth Edition,30 and phonological processing and memory measured by nonsense word repetition (NWR) subtest of the Developmental Neuropsychological Assessment.31

While the twins were participating in the study tests, an accompanying parent completed the CCC-2 (US Edition) a standardized 70-item questionnaire used to assess a child’s communication and social interaction abilities.21 Parents also completed a brief questionnaire regarding their child’s hearing, developmental, and otologic history. The same parent completed questionnaires for both twins and the full set of questionnaires was completed for one child at a time.

Zygosity determination

Buccal swab samples were collected from each twin participant and DNA was extracted using a standard protocol.32 The DNA was PCR-amplified for short tandem repeat (STR) markers and analyzed until genotypes at a minimum of 14 genetically unlinked marker loci could be scored for each twin pair. Twins were considered monozygous if they had concordant genotypes for all marker loci or dizygous if they had discordant genotypes for at least five STR markers.23

Data analysis

Pretreatment of data

Thresholds for individual AP measures were calculated by averaging the target level or frequency in the last three trials of each track. Two derived AP scores, temporal resolution (TR; TR=BM−BM50) and frequency resolution (FR; FR=SM−SMN), were calculated. This subtraction process was designed to eliminate the influence of non-sensory factors such as memory on performance.20 Raw scores for both the individual and derived AP measures were corrected for age by simple linear regression.

The summed responses for each raw subscale score of the CCC-2 were converted to age-based standard scores and a generalized communication composite (GCC) score was calculated based on the first eight subscales.21, 33 Cognitive tests were scored according to standard methods.29, 30, 31

Comparisons between twin groups (MZ and DZ) for age and sex was based on twin pairs. A χ2-test was used to compare groups for sex, and an independent sample t-test was used to compare groups for age. Comparisons of performance on the AP and cognitive tests between twin types were conducted using a mixed model with zygosity as the fixed factor, age as a covariate for the AP measures, and pairs as a random factor.

Twins modeling and heritability estimates

For each of the age-corrected AP measures, we calculated Pearson correlations between co-twin pairs within the MZ and DZ groups. As this involves arbitrarily assigning the siblings in each twin pair to two groups (ie, A and B), we averaged correlations from over 500 random assignments of all twin pairs. The values obtained in this way were very close to the intraclass correlation, which quantifies the degree to which, in this case, siblings’ performance resembles one another. Reported P-values were computed from the average correlations.

We then applied genetic model-fitting techniques using Mx structural equation modeling software (Version 1.70a)34 to obtain estimates of the contribution of additive (A) and dominant (D) genetic components and shared (C), and unique (E) environmental factors influencing test performance (Figure 2).22 Genetic modeling allows quantitative decomposition of the total variance of the observed trait into contributions from these four factors (A, C, D, and E), which provide the fractions a2, c2, d2, and e2 of the total variance, respectively. By iterative comparisons of combinations of these factors, the most compatible and parsimonious model is determined and estimates of heritable and environmental influence can be made.

Figure 2
figure 2

Path diagram for modeling of heritability and shared environment components of variance. Phenotypic variability is divided into additive genetic (A), dominant genetic (D), shared environmental (C), and unique (E) environmental components, which provide the fractions, a2, d2, c2, and e2 of the total variance, respectively. AP, auditory processing; T1, twin 1; T2, twin 2.

Standard hierarchic χ2-tests in combination with Akaike’s Information Criterion (AIC=χ2–2df) were used to select the best fitting model.22 The selected model reflects the best balance of goodness-of-fit and parsimony. As our data set contained missing values for individual tests, the genetic models were fitted using the full-information maximum likelihood method, avoiding the need to discard subjects for whom the data on their co-twin were missing. This method also avoided randomly assigning twins to one of two groups as is necessary when a covariance matrix is submitted to the Mx software.

As a complement to this univariate analysis, multivariate analysis was conducted to provide further insight into the nature of the genetic and environmental factors influencing the AP measures using a common pathway model (Figure 3) fitted with a single common latent factor for the four masking measures (BM, BM50, SM, SMN), and for all five non-speech AP measures (FD, BM, BM50, SM, and SMN). The common pathway model assumes that variation in each of the observed measures is derived from a common latent factor (in our case, AP), and the model estimates genetic and environmental contributions to the variation of this factor. In addition, this model takes additional measure-specific genetic influences into account.

Figure 3
figure 3

Diagram of the ADE common pathway model for multivariate analysis showing specific genetic and environmental influences. The model assumes that the observed measures, frequency discrimination (FD), backward masking (BM), backward masking with a 50 ms gap (BM50), simultaneous masking (SM), and simultaneous masking with a spectral notch (SMN) (boxes) are derived from a common latent factor, auditory processing (AP) (ovals). The model estimates the genetic (Ac and Dc) and environmental contributions (Ec) to the variation in AP. In addition, the model takes into account measure-specific variance of traits (represented by the small circles below the observed measures), and splits this into a unique environmental part and an additive genetic part that correlates between twin pairs (small circles). Other models (ACE, AE, and CE) are defined analogously. MZ, monozygotic; DZ, dizygotic; A, additive genetic contributions; D, dominant genetic contributions; E, unique environmental contributions, T1, twin 1 and T2, twin.


Both twin groups performed similarly on AP and cognitive tests. Each component test of the IMAP battery was completed by the majority of participants (Table 1) and 95% completed the entire test battery. We compared test performance between the MZ and DZ twin groups in toto in order to identify any differences in performance between the two groups. Performance on tests of BM, BM50, SMN, and FD were not significantly different between the MZ and DZ groups (P>0.05). There was a significant difference between twin groups for results of the SM test (P=0.006). Overall group performance by our twin cohort (Table 1; Supplementary Figure S1a–e) was comparable to that previously reported for all of the non-speech AP measures (Supplementary Figure S1f–j).20 Performance on the two derived AP scores, TR and FR, was not significantly different between the MZ and DZ groups (P>0.05; Table 1). Results for the VCV speech-in-noise test, NVIQ, NWR, and short-term memory (forward and backward recall of digits), as well as the GCC score of the CCC-2 parental questionnaire did not show a significant effect of zygosity (P>0.05; Table 1). These findings suggest that group differences in AP and cognitive test performance between the MZ and DZ twins were not confounding factors for heritability assessment.

Table 1 Comparison of MZ and DZ performance on the IMAP tests

In order to determine the degree to which twin pairs co-varied on the same trait, performance correlations were evaluated for both MZ and DZ co-twin pairs for all raw AP test scores, the two derived AP scores, TR and FR, cognitive tests, and the parental questionnaire-derived GCC score (Table 2). Correlations between MZ co-twins were significantly greater than zero (P<0.05) for all measures of AP, whereas significant correlations in performance between DZ co-twin pairs were not observed for any of the AP results with the single exception of BM. A significant correlation was found between both MZ and DZ co-twins for overall memory, forward recall of digits and GCC scores. A significant correlation was found for MZ, but not DZ, co-twins for VCV, NWR, and backward recall of digits. There were no significant correlations of NVIQ between either MZ or DZ twin pairs. The magnitude of correlations between MZ co-twins on the raw AP measures ranged from 0.357 to 0.784 and were substantially larger than those of DZ co-twins that ranged from −0.118 to 0.344. These findings showed that MZ twin pairs performed similarly on tests of AP, whereas DZ pairs did not, suggesting a heritable component to these abilities.

Table 2 Correlations (r) between co-twins and estimates of heritable (h2) or shared environment (c2) components of variance based on best fitting models for age-adjusted data

Spectral and temporal AP skills were strongly influenced by genetic factors. Genetic model-fitting of the genetic (A, D) and environmental (C, E) components was used to determine the most compatible and parsimonious models that influenced performance on the study tests. The AE model, derived from additive genetic and unique environmental contributions to variance in a trait, provides an estimate for heritability (h2) of a trait. This was the best fitting model for all five individual non-speech AP measures corrected for age by simple linear regression (Table 2 and Supplementary Table S1). Based on our data we estimate heritability for performance on the non-speech AP tests to range from 0.32 to 0.74. The CE model, derived from shared and unique-environment components, provides an estimate for environmental contribution (c2) to variance in a trait. The CE model was the best fitting model for VCV and the derived measure of temporal resolution, TR, with estimated environmental contributions to these measures of 0.47 and 0.26, respectively. The best fitting model for the derived measure of frequency resolution, FR, was ADE (additive- and dominant-genetic and unique environmental components).

The AE model was the best fitting model for NWR with an estimated heritable contribution of 0.45 (Table 2 and Supplementary Table S1). The CE model was the best fitting model for NVIQ, and working memory with estimated environmental contributions to these measures of 0.20 for NVIQ, and 0.42, 0.43, and 0.27 for total, forward, and backward digit scores, respectively. The ACE model, which predicts simultaneous contributions from environmental and additive genetic factors, was the best fitting for the questionnaire-derived GCC score (Table 2 and Supplementary Table S1). Based on the results of modeling, we conclude that the spectral and temporal AP skills we evaluated are strongly influenced by genetic factors, whereas the cognitive skills we tested were more influenced by environmental factors.

Multivariate analysis supports heritability of non-speech AP skills. As a final step, we looked at two versions of the common pathway model (Figure 3); one included all four of the masking measures (BM, BM50, SM, SMN) and the other included the masking measures and FD. Based on both the χ2-statistic and Akaike’s Information Criterion derived from the common pathway model, the genetic and environmental influences contributing to the common latent factors were best described by the AE model, which is in agreement with the univariate analysis. Ranking of the models remained the same with and without measure-specific genetic factors for both the four masking measures and all five non-speech AP measures (Table 3), providing further evidence for the AE model. Variation in the common latent factor was found to be mainly due to genetic variation (86.7%) (Table 4). The common pathway model was also used to compute genetic and environmental correlation matrices for the individual AP measures. Genetic correlations were large (0.417–0.934), whereas environmental correlations were small (0.054–0.189) (Supplementary Table S2). These data corroborate and supply further evidence for the robust heritability of the AP skills observed using the univariate model and suggest that spectral and temporal processing of sound is reliant on genetic factors.

Table 3 Summary and comparison of multivariate model fit for a single common latent factor for the four masking measures (BM, BM50, SM, SMN) and all five non-speech AP tests (four masking measures plus FD) derived from the common pathway model
Table 4 Latent heritability estimates for all AP measures and variance decomposition of observed measures


In order to discover any of the molecular neurogenetic causes of APD, it is essential to first identify AP traits that are demonstrably heritable and can be reliably measured. Heritability estimates (h2) for four of the non-speech measures of spectral and temporal AP (BM, BM50, SMN, and FD) ranged from 0.61 to 0.74 in our twin cohort, providing evidence of substantial genetic influence on variance of these traits. These estimates use AP scores corrected for age by simple linear regression. We have also conducted corresponding analyses based on raw AP scores (Supplementary Table S3) and for AP data corrected for both age and sex (Supplementary Table S4). Results show only minor quantitative differences, whereas all qualitative conclusions are unchanged. In particular, the selected genetic models remain the same showing that our results are insensitive to the particular method of data correction.

Our estimates of heritability for AP skills are comparable in magnitude to those of other hearing-related phenotypes including dichotic listening (~0.73)23 and tune deafness (~0.71–0.80),35 as well as related cognitive disabilities such as dyslexia (0.44–0.75),36 phonological processing (~0.72),37 and late language emergence (0.42–0.44).38 These AP measures (BM, BM50, SMN, and FD) also appear to be reliable and reproducible as evidenced by the similarity between our current data and those obtained through a different subject recruitment paradigm in the United Kingdom20 (Supplementary Figure S1).

Multivariate modeling of all five AP measures versus the four AP measures incorporating masking alone suggests that the genetic correlations of FD to the four masking measures are somewhat lower than the genetic correlations between those measures (Supplementary Table S2). This implies that FD has more specific genetic contributions than the other measures.

The best estimate for variance in performance on TR is for a shared environmental contribution. The best fitting model for FR is for the ADE model representing the additive- and dominant-genetic effects and unique-environment effects. However, there is a negative correlation between DZ co-twins, which is implausible, and the validity of the FR results is doubtful. Taken together, these findings for the derived measures suggest that sensory aspects of perception are not so much subject to inherited influences, or that they simply reflect variability of the derived measures.

Performance on the speech-in-noise test was influenced more by environmental than genetic factors. Environmental influences accounted for ~47% of the variance in performance on the VCV test. This test required the child to repeat recorded VCV nonsense syllables spoken by an adult male speaker with a UK English accent in the presence of background noise. Although this finding was not anticipated, we hypothesize that the UK-accented English presented to American-accented English-speaking subjects added to the complexity of this task, so that it was no longer just recognition of the nonsense syllable in noise, but also resolution of accent differences.

There are a number of genes connected with neuronal migration that are associated with other complex neurodevelopmental disorders, including dyslexia (eg, KIAA0319L),39 language (eg, CNTNAP2),40 and autism (eg, CNTNAP2),41 which may merit investigation for their influence on AP. There are likely common genetic factors linking these phenotypically complex disorders that may influence current nosological classifications and our understanding of underlying etiology. In addition, genes that regulate development of the cochlea, the auditory nerve, and central auditory pathways may influence the accurate representation and efficient processing of sounds.42

In the quest to identify genetic factors that contribute to differences in AP abilities, the non-speech AP skills of BM, BM50, FD, and SMN show evidence of genetic influence with heritability estimates of 0.72, 0.61, 0.74, and 0.67, respectively. These AP measures have potential application to both human and animal models. We hypothesize that the heritability of these AP skills will translate to non-twin populations. Other population-based measures of heritability can test this hypothesis and further refine our heritability estimates.

It is important to acknowledge that current clinical test batteries for identification of APDs rely on a variety of speech and non-speech tests that are dominated by measures based on speech perception.43 The tests used in the current study investigate basic, non-speech auditory perception and are not the only skills that contribute to or underlie an APD. In combination with heritable speech-based AP traits of dichotic listening and time-compressed speech,23 these skills may serve as phenotypic measures in families segregating variation in such traits and in case–control genetic association studies that will help generate an understanding of at least one etiology of APD at the molecular and cellular levels.