We report a difference between humans and macaque monkeys in the functional organization of cortical regions implicated in pitch perception. Humans but not macaques showed regions with a strong preference for harmonic sounds compared to noise, measured with both synthetic tones and macaque vocalizations. In contrast, frequency-selective tonotopic maps were similar between the two species. This species difference may be driven by the unique demands of speech and music perception in humans.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Data are available at the following repository: https://neicommons.nei.nih.gov/#/toneselectivity. We are releasing raw scan data (formatted as NIFTIs), anatomicals and corresponding Freesurfer reconstructions, preprocessed surface data, and timing information indicating the onset of each stimulus block. We also provide the underlying data for all statistical contrast maps and ROI analyses (that is, all data figures) for Figs. 1c,d, 2c–f and 3a–c,e,f and Supplementary Figs. 1–5, 7, 8, 9c,d and 10–13.
Our custom MATLAB code mainly consists of wrappers around other FSL and Freesurfer software commands. MATLAB routines are available at https://github.com/snormanhaignere/fmri-analysis. The commit corresponding to the state of the code at the time of publication is tagged as HumanMacaque-NatureNeuro.
Journal peer review information: Nature Neuroscience thanks Frederic Theunissen, Kerry Walker and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Lafer-Sousa, R., Conway, B. R. & Kanwisher, N. G. J. Neurosci. 36, 1682–1697 (2016).
Van Essen, D. C. & Glasser, M. F. Neuron 99, 640–663 (2018).
de Cheveigné, A. Oxf. Handb. Audit. Sci. Hear. 3, 71 (2010).
Patterson, R. D., Uppenkamp, S., Johnsrude, I. S. & Griffiths, T. D. Neuron 36, 767–776 (2002).
Penagos, H., Melcher, J. R. & Oxenham, A. J. J. Neurosci. 24, 6810–6815 (2004).
Norman-Haignere, S., Kanwisher, N. & McDermott, J. H. J. Neurosci. 33, 19451–19469 (2013).
Baumann, S., Petkov, C. I. & Griffiths, T. D. Front. Syst. Neurosci. 7, 11 (2013).
Petkov, C. I., Kayser, C., Augath, M. & Logothetis, N. K. PLoS Biol. 4, e215 (2006).
Petkov, C. I. et al. Nat. Neurosci. 11, 367–374 (2008).
Romanski, L. M. & Averbeck, B. B. Annu. Rev. Neurosci. 32, 315–346 (2009).
McPherson, M. J. & McDermott, J. H. Nat. Hum. Behav. 2, 52 (2018).
D’Amato, M. R. Music Percept. Interdiscip. J. 5, 453–480 (1988).
Schwarz, D. W. & Tomlinson, R. W. J. Neurophysiol. 64, 282–298 (1990).
Fishman, Y. I., Micheyl, C. & Steinschneider, M. J. Neurosci. 33, 10312–10323 (2013).
Bendor, D. & Wang, X. Nature 436, 1161–1165 (2005).
Miller, C. T., Mandel, K. & Wang, X. Am. J. Primatol. 72, 974–980 (2010).
Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Science 343, 1006–1010 (2014).
Overath, T., McDermott, J. H., Zarate, J. M. & Poeppel, D. Nat. Neurosci. 18, 903–911 (2015).
Leaver, A. M. & Rauschecker, J. P. J. Neurosci. 30, 7604–7612 (2010).
Norman-Haignere, S. V., Kanwisher, N. G. & McDermott, J. H. Neuron 88, 1281–1296 (2015).
Lafer-Sousa, R. & Conway, B. R. Nat. Neurosci. 16, 1870–1878 (2013).
Norman-Haignere, S. V. et al. J. Neurosci. 36, 2986–2994 (2016).
Semal, C. & Demany, L. Music Percept. Interdiscip. J. 8, 165–175 (1990).
Pressnitzer, D., Patterson, R. D. & Krumbholz, K. J. Acoust. Soc. Am. 109, 2074–2084 (2001).
Pfingst, B. E., Laycock, J., Flammino, F., Lonsbury-Martin, B. & Martin, G. Hear. Res. 1, 43–47 (1978).
Heffner, R. S. Anat. Rec. A. Discov. Mol. Cell. Evol. Biol. 281A, 1111–1122 (2004).
Shera, C. A., Guinan, J. J. & Oxenham, A. J. Proc. Natl Acad. Sci. USA 99, 3318–3323 (2002).
Walker, K. M., Gonzalez, R., Kang, J. Z., McDermott, J. H. & King, A. J. eLife 8, e41626 (2019).
Sumner, C. J. et al. Proc. Natl Acad. Sci. USA 115, 11322–11326 (2018).
Joris, P. X. et al. Proc. Natl Acad. Sci. USA 108, 17516–17520 (2011).
Small, A. M. Jr & Daniloff, R. G. J. Acoust. Soc. Am. 41, 506–512 (1967).
Schroeder, M. Inf. Theory IEEE Trans. 16, 85–89 (1970).
Pressnitzer, D. & Patterson, R. D. In Proc. 12th International Symposium on Hearing (eds Breebaart, D. J. et al.) 97–104 (Shaker, 2001).
Norman-Haignere, S. & McDermott, J. H. Neuroimage 129, 401–413 (2016).
Moore, B. C. J., Huss, M., Vickers, D. A., Glasberg, B. R. & Alcántara, J. I. Br. J. Audiol. 34, 205–224 (2000).
Herculano-Houzel, S. Front. Hum. Neurosci. 3, 31 (2009).
Leite, F. P. et al. Neuroimage 16, 283–294 (2002).
Zhao, F., Wang, P., Hendrich, K., Ugurbil, K. & Kim, S.-G. Neuroimage 30, 1149–1160 (2006).
Gagin, G., Bohon, K., Connelly, J. & Conway, B. fMRI signal dropout in rhesus macaque monkey due to chronic contrast agent administration. https://www.abstractsonline.com/Plan/ViewAbstract.aspx?sKey=c1451d63-ca65-4a44-afcc-ce1132062d6e&cKey=efbbc764-4eda-4422-9f70-f6d03b2e2eed&mKey=54c85d94-6d69-4b09-afaa-502c0e680ca7 (Society for Neuroscience, 2014).
Jenkinson, M. & Smith, S. Med. Image Anal. 5, 143–156 (2001).
Greve, D. N. & Fischl, B. Neuroimage 48, 63 (2009).
Kay, K., Rokem, A., Winawer, J., Dougherty, R. & Wandell, B. Front. Neurosci. 7, 247 (2013).
Nichols, T. E. & Holmes, A. P. Hum. Brain Mapp. 15, 1–25 (2002).
Triantafyllou, C., Polimeni, J. R. & Wald, L. L. Neuroimage 55, 597–606 (2011).
Efron, B. The Jackknife, the Bootstrap and Other Resampling Plans (SIAM, 1982).
Loftus, G. R. & Masson, M. E. Psychon. Bull. Rev. 1, 476–490 (1994).
Hauser, M. D. Anim. Behav. 55, 1647–1658 (1998).
Boersma, P. Praat, a system for doing phonetics by computer. Glot International 5 http://dare.uva.nl/search?arno.record.id=109185 (2002).
Gockel, H. E., Moore, B. C. J., Carlyon, R. P. & Plack, C. J. J. Acoust. Soc. Am. 121, 373–382 (2007).
Oxenham, A. J., Micheyl, C., Keebler, M. V., Loper, A. & Santurette, S. Proc. Natl Acad. Sci. USA 108, 7629–7634 (2011).
Kawahara, H. & Morise, M. Sadhana 36, 713–727 (2011).
McDermott, J. H., Ellis, D. P. & Kawahara, H. In Proc. SAPA-SCALE Conference (Citeseer, 2012).
Popham, S., Boebinger, D., Ellis, D. P. W., Kawahara, H. & McDermott, J. H. Nat. Commun. 9, 2122 (2018).
The authors thank G. Gagin and K. Bohon for their help in training and scanning animals M1, M2 and M3. They also thank K. Schmidt, D. Yu, T. Haile, S. Eastman and D. Leopold for help in scanning animals M4 and M5. This work was supported by the National Institutes of Health (grant EY13455 to N.K. and grant EY023322 to B.R.C.), the McDonnell Foundation (Scholar Award to J.H.M.), the National Science Foundation (grant 1353571 to B.R.C. and Graduate Research Fellowship to S.N.-H.), the NSF Science and Technology Center for Brains, Minds, and Machines (CCF-1231216) and the Howard Hughes Medical Institute (LSRF Postdoctoral Fellowship to S.N.-H.). The animal work was performed using resources provided by the Neurophysiology Imaging Facility Core (NIMH/NINDS/NEI), as well as the Center for Functional Neuroimaging Technologies at MGH (grant P41EB015896) and a P41 Biotechnology Resource grant supported by the National Institute of Biomedical Imaging and Bioengineering (MGH). The experiments conducted at MGH also involved the use of instrumentation supported by the NIH Shared Instrumentation Grant Program and/or High-End Instrumentation Grant Program (grant S10RR021110). The work was also supported by the Intramural Research Program at the NEI, NIMH, and NINDS.
Integrated supplementary information
Same as Fig 1, but cluster-corrected and showing data from all humans and macaques tested (two-sided, voxel-wise p < 0.01, cluster-corrected to p < 0.05; statistics computed via a permutation test across stimulus blocks). Outlines of tonotopic areas were derived from the uncorrected maps shown in Fig 1c. Tonotopic maps with unfilled outlines indicate clusters that were too small to reach significance after correction.
Compared with Fig. 1c, d, these maps use a more liberal voxel-wise significance cutoff (two-sided p < 0.05 instead of p < 0.01; statistics computed via a permutation test across stimulus blocks) to reveal any voxels that might respond preferentially to tones vs. noise in macaques. Maps are shown for all humans and macaques tested. Outlines of tonotopic areas are the same as in Fig 1.
The values plotted here and in Supplementary Fig 4 were used to compute the selectivity values plotted in Figs 2c–f. The plotted responses are averaged across the two lowest frequency ranges and the two highest frequency ranges (averaging across harmonic tones and noise in both cases). Responses are plotted as a function of ROI size (percent of sound-responsive voxels). a,b Group data averaged across subjects (N indicates the number of human/macaques). c,d, Individual subject data. Each human subject (columns) was reliability-matched to each monkey (rows) by subsampling runs (N indicates the number of runs). For macaques, we used all available data. Error bars reflect one standard error (median and central 68%) of the bootstrapped sampling distribution. Bootstrapping was performed across runs for individual subject analyses, and across both subjects and runs for group data.
Responses to harmonic tones and noise were averaged across all frequency ranges. Conventions and error bars the same as Supplementary Fig 3.
Voxels were selected as responding preferentially to low (a) or high (b) frequencies (two lowest ranges vs. two highest ranges, as in all other frequency contrasts). We then measured the response of the selected voxels to harmonic tones and noise in independent data, averaged across all frequency ranges tested (left two panels). Tone vs. noise selectivity (right two panels) was quantified using a standard index ([preferred – nonpreferred] / [preferred + nonpreferred]). The number of selected voxels was varied as in other ROI analyses. Error bars reflect one standard error (median and central 68%) of the bootstrapped sampling distribution. Bootstrapping was performed across runs for individual subject analyses, and across both subjects and runs for group data.
Same format as Figure 2a,b. a, Reliability (Pearson correlation) of stimulus-driven fMRI responses in humans and macaques per block of data. Error bars show 1 standard deviation across subsampled sets of runs. b, Reliability of the entire human and macaque datasets. Monkey data was slightly more reliable than the human data, due to the large amount of data collected, and thus we did not subsample the human data.
a, b, Response of tone/noise-selective ROIs to harmonic tones and noise averaged across frequency. Left plots show responses collapsed across sound intensity for all ROI sizes. Right plots show responses broken down by intensity for a fixed ROI size (1% of sound-responsive voxels). c, d, Response of frequency-selective ROIs to the two lowest and the two highest frequency ranges, averaged across harmonic tones and noise. Same format as panels a, b. Error bars reflect one standard error (median and central 68%) of the bootstrapped sampling distribution across runs for each subject.
Similar format as Fig 3a–c, which shows tone vs. noise selectivity. a, Maps of voxels that respond preferentially to the two lowest or the two highest frequency ranges, averaged across harmonic tones and noise and across all sound intensities presented (number of blocks per frequency range: M4=1116, M5=1104; maps plot two-sided, voxel-wise p-values; statistics computed via a permutation test across stimulus blocks). b, ROI analyses of frequency selectivity. Human data from Experiment IA was used for comparison. c, The intensity-dependence of frequency-selective ROIs in macaques. Responses are shown for an ROI of fixed size (top 1% of sound-responsive voxels). Error bars reflect one standard error (median and central 68%) of the bootstrapped sampling distribution. Bootstrapping was performed across runs for individual subject analyses, and across both subjects and runs for group data.
Supplementary Figure 9 Reliability-matched ROI analyses using data from all 5 macaques tested in Experiment II.
Fig 3f shows ROI analyses for two macaques with similar response reliability to humans. Here we show results from all five macaques and subsample the human data to match reliability. a, Reliability (Pearson correlation) of stimulus-driven fMRI responses from each human and macaque subject per block of data (same format Fig 2a). Error bars show 1 standard deviation across subsampled sets of runs. b, Reliability of macaque and human data, as well as subsampled human data designed to best match the reliability of macaques (same format Fig 2b). c, d, ROI analysis of voxels preferentially responsive to voiced (c) or noise-vocoded calls (d). Left plots show group-averaged responses to voiced and noise-vocoded calls (averaged across two matched sound intensities: 70&75 dB). Right plots show a measure of selectivity applied to group or individual subject responses (same format as Fig 3f). Error bars reflect one standard error (median and central 68%) of the bootstrapped sampling distribution. Bootstrapping was performed across runs for individual subject analyses, and across both subjects and runs for group data.
Supplementary Figure 10 Maps of voxels preferentially responsive to voiced vs. noise-vocoded calls using various thresholds.
The maps are either uncorrected at a p < 0.01 (a) or p < 0.05 (b) threshold, or are cluster-corrected for multiple comparisons (c, family error rate: p < 0.05) (voxel-wise p-values are two-sided; statistics were computed via a permutation test across stimulus blocks). Subjects are ordered based on the reliability of their fMRI responses in auditory cortex (Supplementary Fig 9a). Tonotopic outlines are shown for all but 2 human subjects (in whom we did not have tonotopic data). All human subjects showed significant voicing-selective voxels when tested with an ROI analysis even though the clusters in H5 did not survive correction (p < 0.05 in every subject for all but the two largest ROI sizes).
Supplementary Figure 11 Response of ROIs preferentially responsive to voiced (a) or noise-vocoded calls (b) from individual humans (left) and macaques (right).
We plot the response to voiced and noise-vocoded calls, averaged across two sound intensities (70 & 75 dB). Each human subject was matched in reliability to each monkey by subsampling runs (N indicates the number of runs). For monkeys we used all of the available data. Error bars reflect one standard error (median and central 68%) of the bootstrapped sampling distribution. Bootstrapping was performed across runs.
Supplementary Figure 12 Intensity dependence of voxels preferentially responsive to voiced or noise-vocoded calls.
Group (a,b) and individual-subject ROIs (c,d) preferentially responsive to voiced (a,c) or unvoiced calls (b,d). Each plot shows the response to voiced and noise-vocoded calls for all of the sound intensities tested in that group/subject. Results are shown for an ROI of fixed size (top 1% of sound-responsive voxels), but responses were similar for other ROI sizes. Human subjects have been individually matched to each monkey subject in reliability by subsampling runs (N indicates the number of runs). Error bars reflect one standard error (median and central 68%) of the bootstrapped sampling distribution. Bootstrapping was performed across runs for individual subject analyses, and across both subjects and runs for group data.
Supplementary Figure 13 Effect of using dichotic vs. diotic noise on tone-selective responses in humans.
a, Maps showing voxels preferentially responsive to harmonic tones compared with either dichotic or diotic noise (same format as Fig 1d) (maps plot two-sided voxel-wise p-values computed via a permutation test across stimulus blocks). The type of noise had little effect on the maps. b, ROI analyses showing the average response to harmonic tones, dichotic noise and diotic noise across the four human subjects tested. Voxels were identified using a harmonic tones > dichotic noise contrast, and responses were then measured in independent data to both dichotic and diotic noise (as well as to tones). Responses to dichotic and diotic noise were very similar. Responses to the diotic noise are difficult to see on the figure because the response curve for the dichotic noise overlaps the curve for diotic noise. Error bars reflect one standard error (median and central 68%) of the bootstrapped sampling distribution across subjects (N=4) and runs.
Supplementary Figure 14 Sensitivity of fMRI measurements across auditory cortex in humans and macaques.
For each voxel, this figure plots the test-retest error for repeated measurements of the response to the same stimulus vs. silence, in units of percent signal change (measured separately for each stimulus condition and then averaged across stimulus conditions). Smaller errors indicate greater sensitivity. Data from Experiments IA and IB were used to compute the maps. In macaques, test-rest responses were measured using a split-half analysis applied to all of the runs (subdividing the dataset in half and averaging across each set). In humans, in some cases we used fewer runs to match the reliability of each macaque tested.