Brain responses to repetition-based rule-learning do not exhibit sex differences: an aggregated analysis of infant fNIRS studies

Studies have repeatedly shown sex differences in some areas of language development, typically with an advantage for female over male children. However, the tested samples are typically small and the effects do not always replicate. Here, we used a meta-analytic approach to address this issue in a larger sample, combining seven fNIRS studies on the neural correlates of repetition- and non-repetition-based rule learning in newborns and 6-month-old infants. The ability to extract structural regularities from the speech input is fundamental for language development, it is therefore highly relevant to understand whether this ability shows sex differences. The meta-analysis tested the effect of Sex, as well as of other moderators on infants’ hemodynamic responses to repetition-based (e.g. ABB: “mubaba”) and non-repetition-based (e.g. ABC: “mubage”) sequences in both anatomically and functionally defined regions of interests. Our analyses did not reveal any sex differences at birth or at 6 months, suggesting that the ability to encode these regularities is robust across sexes. Interestingly, the meta-analysis revealed other moderator effects. Thus in newborns, we found a greater involvement of the bilateral temporal areas compared to the frontal areas for both repetition and non-repetition sequences. Further, non-repetition sequences elicited greater responses in 6-month-olds than in newborns, especially in the bilateral frontal areas. When analyzing functional clusters of HbR timetraces, we found that a larger right-left asymmetry for newborn boys in brain responses compared to girls, which may be interpreted in terms of a larger right-left asymmetry in cerebral blood flow in boys than in girls early in life. We conclude that extracting repetition-based regularities from speech is a robust ability with a well-defined neural substrate present from birth and it does not exhibit sex differences.

Second, and relatedly, the effect sizes reported for the "female advantage" in single studies are typically very small 4 , and when taken together with other predictors, sex accounts for only a minor share of the observed variance (e.g.1-2% in 5 ).Sex differences are, therefore, generally investigated by means of meta-analyses, which extend the sample sizes and thus the statistical power of any single study 19 .
Third, available findings are unclear about the developmental origins of sex differences in language.For instance, while Maccoby and Jacklin 20 reported them to occur mostly after the age of 11 years, Bornstein et al. 16 found them to be significant only until 6 years of age.Zambrana et al. 21reported girls to have better language comprehension abilities than boys at 18 months and to a lesser extent at 36 months of age, while Le Normand et al. 22 found that girls produced more words than boys up to 3 years of age, but not after that.Importantly, studies often fail to distinguish between sex differences in language development and differences in overall maturational speed between sexes.Furthermore, age has often been analysed using different age ranges and age groups across different studies, making comparisons more difficult 6 and in fact some authors suggested that, rather than using age ranges, studies on sex differences should focus on very specific and relevant time points 12 .
Fourth, in many studies, potential sex and gender differences are simply overlooked as they are not statistically tested for in any systematic way.This masks possible differences, limits the generalizability of findings and may be a particularly important hinderance to interventions, e.g. in language pathologies, if differences in how the two sexes respond to treatment remain unexplored.Various funders, journals and research consortia now, therefore, recommend or even require that the sex and gender dimension be taken into account in any research involving humans and animals (e.g.Sex and Gender Equity in Research guidelines 23 .Yet, in neuroimaging studies of language development, sex and gender differences have very rarely been explicitly tested.
For all these reasons, the study of sex differences in language abilities has so far yielded inconsistent results, especially in development.Neuroimaging studies carried out very early in life have the potential to meaningfully contribute to this research question while circumventing some of the above mentioned methodological challenges.By testing very young infants, i.e. those with none or very little experience with the environment, the contribution of socio-cultural factors can be reduced, thus affording the opportunity to get closer to the biological basis of any sex difference.Furthermore, by measuring brain function associated with language processing, rather than its behavioral reflections, which are very challenging to test in young infants, neuroimaging studies have the potential to provide a more sensitive measure of sex differences and their developmental trajectory.
The current work, therefore, addresses the question of early sex differences in language processing by conducting a meta-analysis of neuroimaging studies carried out with functional near-infrared spectroscopy (fNIRS) on newborn and 6-month-old infants with the aim of investigating whether one of the earliest language abilities, the extraction of linguistic regularities from speech, shows sex differences.
We have chosen to investigate this ability as it is a very basic mechanism that is in place already at birth and that has been found to be foundational for learning grammar and thus for language development.At birth, infants have been shown to be able to extract structural regularities from speech.In particular, they can learn and generalize repetition-based rules, i.e. the identity relation (A = A), which allows them to discriminate between artificial grammars with ABB (e.g."mubaba", "penana" etc.), AAB (e.g."babamu", "nanape" etc.) and ABA (e.g."bamuba", "napena" etc.) structures and distinguish them from random controls (ABC: "mubage", "penaku" etc.; 24 ).Repetition-based rule learning is considered a basic mechanism of language acquisition 25 and has been investigated with both behavioral 26 and neuroimaging methods, especially fNIRS 27 , but the question whether it shows any sex difference has never been investigated.Testing whether any sex difference may be observed for such an elemental mechanism, and at birth, would offer an important contribution to the understanding of sex differences in language processing and especially to the hypotheses about their biological basis.No currently available evidence points to the existence of sex difference in this ability, especially at such an early age, but as pointed out before, sex differences are typically not tested for in NIRS studies, and even when tested in a single study, they may not be detected due to small sample sizes and high individual variability in the data.This, however, may lead to undetected effects and less inclusive, potentially unequitable results, which may falsely generalize findings found predominantly for one sex/gender to the other or may fail to generalize effects tested only in one gender to both.Therefore, the availability of a highly coherent dataset of fNIRS studies with young infants, tested in very similar paradigms, offers a unique opportunity to test for potential sex differences.
fNIRS is a non-invasive technique for functional imaging of brain hemodynamics 28 .Relying on the different absorption spectra of oxygenated and deoxygenated hemoglobin (HbO and HbR, respectively) in the near-infrared region of the electromagnetic spectrum (650-900 nm), fNIRS measures the relative changes of oxygenation in the human brain at rest or in response to stimulation.As fNIRS is non-invasive, portable, silent and relatively robust with respect to motion artifacts, it is widely employed in developmental cognitive neuroscience research, and in particular in research on early speech perception and language development 29 .
In the current work, we have gathered seven published and unpublished fNIRS studies, carried out on newborns and six-month-old infants in three laboratories, and explored whether patterns of neural responses underlying infants' rule learning exhibited sex differences.The seven studies (Table 1) used very similar paradigms, whereby sequences generated by artificial grammars following repetition-based (e.g.ABB: "mubaba", "penana" etc.; AAB: "babamu", "nanape" etc.) and/or non-repetition-based (ABC: "mubage", "penaku" etc.) regularities were presented in blocks to infants (Fig. 1a), while NIRS measures were obtained from the bilateral temporal, frontal and parietal areas covering the language network (Fig. 1b).
Meta-analyses are especially valuable for this type of research questions, as they offer greater sample sizes and statistical power and are thus better able to detect small effects.They have been successfully conducted over behavioral data 31,32 , but very few, to date, have been conducted on developmental fNIRS data 33,34 , despite the pressing need in the developmental fNIRS community for efforts aimed at addressing issues of replicability 35 .To our knowledge this is the first meta-analysis to address sex differences in neural mechanisms during early human development, measured with fNIRS.We have gathered all the existing studies that addressed this issue and that possessed the relevant demographic information.
While the number of studies included in our analysis is not particularly large, it is now increasingly recognized in the literature that even small-scale meta-analyses, aggregating over as few as two studies, help gain significant insight by increasing sample sizes, variability covered and representativity 36 In fact, several authors argue that, whenever a series of conceptually comparable studies is available, internal meta-analyses should be routinely carried out, as they allow moving away from focusing on p-values of individual studies and identifying potential moderators of variability [37][38][39][40] .This latter point is especially relevant for our case, as sex differences in language development are small, and thus require large sample sizes to be statistically detectable.We are thus running this comprehensive analysis with the specific goal of increasing sample size and thus be able to find the effect of sex, if present.
In particular, our analysis includes a cohort of newborns (n = 91) that is much larger than the usual sample size of single studies 33 .The study of this developmental time point is particularly relevant as the impact of environmental factors immediately at birth is still smaller than later in development.We can thus more readily probe the biological basis of sex differences in language development.Secondly, including studies with six-month-old Table 1.Main characteristics of the studies included in the present meta-analysis.All studies employed two auditory conditions, one containing adjacent repeated syllables, the other containing random non-repetead syllables; Trials refers to the number of trials administered for each condition.www.nature.com/scientificreports/infants (n = 59) elucidates the developmental trajectory of putative sex differences underlying rule learning.Since rule learning is foundational for later language acquisition, the question whether its neural correlates show sex differences early in development is highly relevant for our understanding of differences found in later linguistic performance.This paper takes two complementary methodological approaches.First, it employs state-of-the-art metaanalytic techniques to estimate the overall magnitude of the effect size of neural responses to two different linguistic rules, repetition-("R") and non-repetition-based ("N") regularities, across studies, and compares them across sexes.Second, it statistically compares effect sizes across sexes using a linear mixed effects model.A schematic illustration of the methodology is provided in Fig. 2. The analyses concentrate on the effects of Sex, and within-subject variables inherent to the NIRS data such as Hemisphere or Region-of-Interest.We abstract away from other factors that the meta-analysis could potentially address, such as cross-lab variability, as they have been reported elsewhere 34 .

Results
We performed analyses using two different and complementary approaches: a meta-analytical approach, carried out on study-level effect sizes and a mixed-effects modeling approach, carried out on infant-level effect sizes (Fig. 2).The meta-analytic framework estimates the variability of effect sizes across studies.It can be conducted even when only group-level averages, but not individual participant data, are available and when procedures or data types are not standardized.When trial-level data for each participant is available, it is also possible to compute individual effect sizes and perform a mixed-effects model, yielding a more sensitive measure of within-study variability.We thus calculated both study-level and infant-level effects sizes, both across and between sexes, and analysed the former with meta-analytic regression, and the latter with mixed-effects linear models.

Meta-analysis
Using meta-analytic regression, we fitted intercept-only random-effects models to the study-level effects, in order to estimate the magnitude of the effects of interest within Sexes, Regions-of-interest (ROIs) and Ages.Regions-ofinterest were chosen anatomically, covering the classical language network: the bilateral temporal areas (channels 3, 6 in the LH and 17, 19 in the RH, Fig. 1a, known to be responsible for auditory processing, and the bilateral frontal areas (channels 2, 5 in the LH and 13, 15 in the RH), responsible for the computation of structure and higher-order linguistic/sequential representations, following Gervain et al. 24,44 .In addition to estimating the magnitude of the effects, we analysed whether they showed any moderating effect of sex, for each region.Other variables that could potentially moderate the effect sizes, such as the position of the repetition or the different  Figure 5 shows the forest plots for this effect.Full details are reported in the Supplementary Material (Table S1).

Moderator analysis: the effect of sex
To answer our research question, we assessed whether female and male infants' responses to repetition-and nonrepetition sequences was significantly different.For each contrast, ROI and age group, meta-regression models were fitted that included sex as a moderator.In neither of the models was the effect of sex found to be statistically significant at α = 0.05.Figures 3, 4 and 5 show the effects broken down by sex for each study to illustrate that effect sizes were very similar for male and female infants.Full details can be found in the Supplementary Material.

Linear mixed effects model
To better assess whether Sex, Region-of-Interest and Hemisphere have statistically significant effects, we also ran linear mixed effects models for the three comparisons (R vs. 0, N vs. 0, R vs. N) and the two age groups separately, using infant-level effect sizes (we also ran the analyses for the three comparisons with Age as a factor in addition to Sex, ROI and Hemisphere.The results are qualitatively very similar to the ones reported here, and are shown in the Supplementary Material).We built all possible models by incrementally adding the factors Sex, ROI and Hemisphere and their interactions to the model as fixed effects.The model with the best fit to the data, i.e. achieving the lowest AIC, was retained as the final model for each analysis.All tested models with their respective AIC are listed in the Supplementary Material.www.nature.com/scientificreports/Moreover, in addition to using the anatomically-defined ROIs described above, since no a priori hypotheses were available as to whether sex differences may be observed in different anatomical regions of the brain or in hemodynamic response patterns, we also employed functionally-defined ROIs.To do this, we carried out a data-driven functional localization analysis in order to identify ROIs that significantly activate for repetition and non-repetition sequences in each study.These functional ROIs were identified using cluster-based permutation tests involving t-tests 45 over oxygenated hemoglobin (HbO) concentrations, separately for all three contrasts (R vs. 0, N vs. 0, R vs. N).

Anatomically derived regions
Figure 6 shows grand average responses for repetition-and non-repetition sequences.
Oxygenated hemoglobin (HbO).In newborns for the R vs. 0 contrast, the best fitting model included the fixed effect of ROI.This model yielded a significant main effect of ROI (F(1, 249) = 13.18,p < 0.001) with greater activation in the temporal ROIs than in the frontal ones.For the N vs. 0 and R vs N contrasts, the best fitting models also included the fixed effect of ROI, but did not yield significant effects.Figures 7, 8 and 9 illustrate distributions of the effects for the three contrasts, respectively.
In the 6-month-olds for the R vs. 0 contrast, the best fitting model included the fixed effect of Hemisphere, but did not yield any significant effect.For the N vs. 0 contrast, the best fitting model included the main effect of ROI, and yielded a significant effect of ROI (F(1, 135) = 4.90, p < 0.05) due to greater activation in the temporal than the frontal areas.No significant effects were found for the R vs. N contrast.Figures 7, 8 and 9 illustrate distributions of the effects for the three contrasts, respectively.
Importantly, no significant effect of Sex was found in any of the models.

Functionally derived regions of interest
The functional localization analysis was used to identify one ROI per hemisphere for each study.Table 2 and Fig. 10 summarize the obtained ROIs (when the analysis identified no significant cluster, the average across all channels of that hemisphere was used in the statistical analyses).Effect sizes computed on these functional ROIs were analysed with linear mixed effects models similarly to anatomically defined ROIs.HbO.In newborns, the best fitting model for the R vs. 0 contrast included only a fixed effect of Hemisphere, and it yielded no significant effects.For both N vs. 0 and R vs. N, the best fitting models included the fixed effects of Sex, but did not yield any significant effect.
For 6-month-olds, the best fitting models for the R vs. 0 and N vs. 0 comparisons included Sex as a fixed factor, but did not yield any significant effect.The best fitting model for the R vs. N comparison included a fixed factor for Hemisphere, and it yielded a significant main effect (1, 46) = 6.42, p < 0.05), carried by a greater involvement of the RH compared to the LH. Figure 11 illustrates the distributions of the effects for the three contrasts in the two age groups.
HbR.In newborns, the best fitting model for the R vs. N contrast included fixed factors for Sex and Hemisphere as well as their interaction.The interaction was found to be significant (F(1, 83) = 4.61, p < 0.05), carried by a greater, i.e. more negative, effect in males in the RH than in females (estimate Males-Females = − 0.19, p < 0.05).No other significant effects were found in any of the other HbR analyses (Fig. S8).
Figures 12, 13 and 14 show the grand averages of hemodynamic responses for the three contrasts (R vs 0, N vs 0 and R vs N, respectively) in the functionally-derived regions of interest.

Discussion
The current study contributes to testing whether hitherto unexplored sex differences exist in early language abilities through a meta-analysis of fNIRS studies with newborns and six-month-olds, focusing on the ability to extract repetition-and non-repetition-based regularities from speech.Specifically, we tested whether brain responses to repetition and non-repetition regularities measured with fNIRS differed between males and females at birth and at 6 months of age.We investigated 7 fNIRS studies with very similar experimental designs, stimuli and setups.We tested sex differences using both a meta-analytic approach as well as linear mixed effects models.NIRS data was sampled from both anatomically and functionally defined regions of interest.In particular, we carried out separate analyses for three contrasts of interest, R vs 0 (repetition-based sequences against baseline), N vs 0 (non-repetition-based sequences against baseline) and R vs N (repetition-based-against non-repetition-based sequences) because they reflect three different brain mechanisms, conceptually independent of one another.The R vs 0 contrast is a test for the ability of repetition sequences to elicit a significant brain response with respect to the zero baseline.Similarly, the N vs 0 contrast tests the ability of non-repetition-or diversity-based, sequences to produce a significant brain response.These two mechanisms, repetition-and diversity-based rule learning, build on different cognitive mechanisms, the former is thought to be in place already at birth, while the latter is thought to develop at a later stage 27 , and hold different roles for subsequent language acquisition.Infants' ability to learn repetition-based rules is instrumental for learning abstract patterns involved in grammar, while their ability to learn diversity-based structures has been related to the beginning of word learning, an ability that starts at around 6 months of age.The R vs N comparison, by contrast, reflects whether and how the infant brain is able to detect differences between two types of sequences.We observed no sex differences in any of the analyses when comparing repetition-based sequences to baseline or to non-repetition sequences in newborns or in 6-month-olds.
The only result involving a sex difference in our analyses was found for hemispheric lateralization.We detected a larger right-left asymmetry for the differential response to repetition and non-repetition sequences in boys than in girls at birth.A closer look at the average effect sizes reveals that, qualitatively, this right-left asymmetry occurs for both types of sequences, even if it is not statistically significant.This suggests that this may be a general   difference of hemispheric asymmetry between boys and girls, and not so much a language-specific result.Indeed, male infants have been reported to have a larger right-left asymmetry in cerebral blood flow than female infants at birth 46 .In addition to the effect sizes over which the statistics were conducted, the hemodynamic responses themselves also support this physiological interpretation (Figs. 12, 13).Moreover, this result is found uniquely for HbR, which is known to be a less reliable measure than HbO in infants 47,48 .
Overall, our results suggest that there are no sex differences early in life in the infant brain's ability to extract basic structural regularities from the speech input.While language is an area where sex differences have been reported in behavioral studies, they have not been investigated systematically in the neural correlates of speech perception and language processing abilities.Our study is the first to show, on a large sample of newborns and 6-month-old infants, that such differences cannot be observed for the ability of extracting structural regularities from speech based on repetitions.Considering how fundamental this ability is for learning grammar, since reduplication is a productive morphosyntactic process in the majority of the world's languages 49 , finding a robust ability that is uniform across the two sexes is not unexpected and may serve as a solid foundation for language development.Whether sex and gender differences arise in this ability later in development or whether this is an area of language where such differences never emerge remains an open question.While not observing sex differences in rule-learning abilities, we obtained other interesting effects.In newborns, we found that the temporal region was more strongly activated by repetition sequences than the frontal area in both hemispheres (Figs. 6, 7), consistently with findings from single studies 24,44 and with the overall maturational patterns of the newborn brain 50 .The meta-analytic findings confirm that bilateral temporal areas in the newborn brain exhibit large responses to both initial repetitions (AAB) and final repetitions (ABB), thus distinguishing them from random sequences, and in fact, no hemispheric lateralization was found for the differential response to repetition vs. non-repetition sequences (R vs N) either in anatomically-(Figs.6, 9), or functionally-defined ROIs (Figs. 11, 14).
At 6 months, responses to the non-repetition sequences increased in the bilateral frontal area as compared to birth, although effect sizes remained greater in the bilateral temporal areas, as in newborns (Figs. 6, 8).The fact that the infant brain starts to encode non-repetition-based sequences by 6 months of age could indicate infants' emerging ability, at this age, to encode and represent diversity in linguistic stimuli, as suggested by de Figure 10.Results of the functional localization analysis: each channel is color coded to represent the number of studies in which it was found to show significant activation, for each of the three contrasts.Results of the analysis from each single study are listed in Table 2. www.nature.com/scientificreports/compared to girls, which we attribute to the well-documented larger right-left asymmetry in cerebral blood flow in boys than in girls early in life.Additionally, in newborns, we found a greater involvement of the bilateral temporal areas compared to the frontal area for both repetition and non-repetition-based sequences.We also found that non-repetition sequences elicited greater responses in 6-month-olds than in newborns, especially in the bilateral frontal areas.As rule learning is foundational for language acquisition, it may not be surprising that it is robust across sexes at the onset of language experience.

Participants
We aggregated seven studies conducted in three laboratories on young, typically developing infants' processing of repetition-and non-repetition-(i.e.diversity-)based regularities tested with NIRS.These studies were identified using a Google Scholar search with the terms "repetition-based regularity", "NIRS", "infant".Papers including more than one study were considered separate studies.Of the 43 hits, those that did not meet the selection criteria (e.g.studies with atypical infants, behavioral studies etc.) were discarded, leaving 12 published studies.9 further unpublished studies from the last author's laboratory were added.Of these, studies for which information on participants' sex was not available were discarded.The final sample comprised 7 studies with a total sample size of 150 infants (72 M, 78 F; 91 newborns, 59 six-month-olds).Information about included studies and their characteristics is given in Table 1.

Materials
All included studies share the use of two artificial grammars, repetition-based and non-repetition-based bisyllabic or trisyllabic auditory sequences.The specific structures employed in each study are described in Table 1.Details about the materials and stimuli can be found in the respective publications.

Procedure
Infants were tested with a CW-NIRS device (brand, wavelengths and sampling frequencies listed in Table 1 while sound stimuli were administered through two loud-speakers.Eight-ten sources and eight detectors were placed on infants' heads bilaterally with a 3 cm source-detector distance, forming 10-12 channels per hemisphere (Fig. 1b).Details about the specific procedures of each study can be found in the corresponding publications.

Data analysis
fNIRS pre-processing fNIRS data was processed largely in the same way as in the original studies.Briefly, light intensities were first converted to optical densities and then to HbO and HbR concentration changes, using the modified Beer-Lambert Law with the following absorption coefficients (µ a, mm −1 •mM −1 ): µ a (HbO, 695 nm) = 0.0955, µ a (HbO, 760 nm) = 0.1496, µ a (HbO, 830 nm) = 0.2320, µ a (HbO, 850 nm) = 0.2526; µ a (HbR, 695 nm) = 0.4513, µ a (HbR, 760 nm) = 0.3865, µ a (HbR, 830 nm) = 0.1792 and µ a (HbR, 850 nm) = 0.1798.The product of the optical pathlength and the differential pathlength factor was set to 1, so that the resulting concentrations were expressed in mM x mm.Then, data was bandpass filtered between 0.01 and 0.7 Hz, using a fft digital filter.Single blocks were rejected if the light intensity reached the saturation value, if they contained motion artifacts, or both.Artifacts were defined as concentration changes larger than 0.1 mM × mm over 0.2 s 24,44 .This procedure was performed on each channel independently.Channels with fewer than 20% valid blocks were discarded entirely from the analysis (M: 3.36, STD: 3.9).For the non-rejected channels, per each experiment, an average of 40.1% of blocks were discarded for poor data quality (STD: 17.9%).Rejection was carried out in batch for all infants and all studies, before the statistical analyses were performed.For the non-rejected blocks, a baseline was linearly fit between the means of the 5 s preceding the onset of the block and the 5 s preceding the onset of the next one.This pre-processing routine has been shown to yield an accurate recovery of the infant hemodynamic response 54 .
After pre-processing, channel-wise block averages were computed for each condition.Finally, grand averages were obtained by computing the average and standard errors of repetition blocks and non-repetition blocks across all studies.

Calculation of effect sizes
Two complementary analytic frameworks were employed: meta-analysis and mixed-effects modelling.For the meta-analysis, the effect size was calculated for each study, then its weighted average and variability across studies was estimated.This allowed to define a standardized, overall effect size for the neural manifestation of the www.nature.com/scientificreports/rule-learning mechanism.Within this framework, sex differences could be analyzed at the group level.The linear mixed effects model conducted over individual-level effect sizes investigated how sex moderates the response to repetition-and non repetition-sequences and interacts with other factors, like age and brain regions.This approach has been described in detail in 34 In particular, for each study and for each participant, activation was computed by averaging the amplitude of the response across trials of the same conditions in a time window starting at the onset of the stimulus and lasting up to 15 s after the end of the stimulation block.Activation was computed for (i) the repetition condition with respect to the zero baseline (R vs. 0), (ii) the non-repetition condition with respect to the zero baseline (N vs. 0), as well as (iii) comparing the repetition and non-repetition conditions (R vs. N), as the difference between the mean activations for repetition and the non-repetition conditions (Fig. 2).
For each of the three comparisons (R vs. 0, N vs. 0 and R vs. N), the infant-level effect size d infant was computed by dividing the mean response by the standard deviation of responses across trials 55 .The study-level effect size d study was computed by dividing the average of the individual responses by their standard deviation across participants.The effect size sampling variances, referring to the extent to which effect sizes are expected to vary from study to study 56 , were computed as V d = 2/n + d study 2 /4n, with n being the number of participants 57 ; i.e. effect sizes were weighted by the number of participants in a study 31 .
Individual and meta-analytic effect sizes were calculated for each channel and hemoglobin component.Subsequent analyses were carried out separately for the three contrasts of interest outlined above (R vs 0, N vs 0 and R vs N), as they address different theoretical questions.While the R vs N contrast more directly represents infants' ability to discriminate repetition-based from non-repetition-based sequences, comparisons of each condition against the baseline allows to estimate the ability of the brain to represent and process the two types of regularities independently of one another.This is relevant, because the ability to represent repetitionbased and non-repetition-based, i.e. diversity-based, sequences do not emerge at the same developmental time, and constitute different underlying abilities.Most importantly, brain responses to non-repetitions, measured through the N vs 0 contrast, are larger in 6-month-olds than in newborns, and this has been recently proposed to support the beginning of word learning, an ability that indeed starts emerging at around 6 months of age 27 .

Selection of ROIs
Regions of interest over which activations are calculated are typically defined either anatomically or functionally.We have implemented both approaches since no a priori hypotheses were available as to whether sex differences may be observed in different anatomical regions of the brain or in hemodynamic response patterns.
Anatomically defined regions of interest.Four regions of interest were chosen covering the classical language network: the bilateral temporal areas (channels 3, 6 in the LH and 17, 19 in the RH, Fig. 1b), known to be responsible for auditory processing, and the bilateral frontal areas (channels 2, 5 in the LH and 13, 15 in the RH), responsible for the computation of structure and higher-order linguistic/sequential representations, following Gervain et al. 24,44 .For each participant, effect sizes were averaged across channels within each ROI.This approach allowed us to test whether sex differences exist in the responses to repetition-based structures in the classical speech and language areas.Functionally defined regions of interest.Since different studies may show effects in different brain areas, for instance due to differences in the stimuli, the headgear used or the ages tested, we also carried out a data-driven functional localization analysis in order to identify clusters of channels that significantly activate for repetition and non-repetition sequences in each study.
These functional ROIs were identified using cluster-based permutation tests involving t-tests 45 over oxygenated hemoglobin (HbO) concentrations, i.e. the chromophore that shows stronger effects in infants 29 .Separate permutation tests were conducted for all three contrasts (R vs. 0, N vs. 0, R vs. N).Statistical significance was assessed against the null distribution of t values obtained by randomly relabelling data (1000 iterations), as is now standard in infant NIRS studies 30,58,59 .For each study, the strongest cluster (i.e.having the largest t value) was selected in each hemisphere.The functionally defined ROIS were used in the linear mixed effects model.

Statistical analysis
Meta-analysis.Study-level effect sizes were analysed by fitting a meta-analytic random-effects model with the metafor R package 60,61 .Models were fit using restricted maximum likelihood (REML).A model was fit over the entire dataset, while models were also applied to the four anatomically defined ROIs separately.After establishing the overall effect sizes for all babies confounded, similar analyses were run with Sex as a moderator to test for sex differences.

Linear mixed effects model
Anatomically defined regions of interest.Linear mixed effects models were carried out over individual-level effect sizes separately for newborns and six months olds as there are known developmental changes in the processing of non-repetition patterns 27 .The random-effects structure consisted of random intercepts for participant, study and lab.Candidate fixed effects included Sex (female/male), ROI (temporal/frontal), Hemisphere (LH/RH) as well as their interactions.They were incrementally included in the fixed-effects structure and the resulting models were compared.For each contrast and age group, the best fitting model was chosen based on the AIC (Akaike information criterion) value.www.nature.com/scientificreports/Functionally defined regions of interest Models were again fit separately for the two age groups.The planned random-effects structure consisted of random intercepts for participant, study and lab.Candidate fixed effects included Sex (female/male) and Hemisphere (LH/RH) as well as their interaction.Model selection was performed by selecting the best fitting model based on the AIC (Akaike information criterion) value.For both analyses, models were fit using the lmer function from the lme4 R package 60,62 , with denominator degrees of freedom being estimated with the Kenward-Roger method.

Figure 2 .
Figure 2. Schematic illustration of the methodology employed.(Pre-processing) Within each study, subjectwise block-averages are computed, as well as study-level grand-averages; in the plots, magenta and cyan indicate repetition trials (HbO and HbR, respectively), red and blue indicate non-repetition trials (HbO and HbR, respectively).(Calculation of effect sizes) After pre-processing data, for each study infant-level and study-level effect sizes are computed as described in Section "Calculation of effect sizes", where Activation refers to the R vs. 0, N vs. 0 and R vs. N contrasts computed as the average of the HRF along its time course.(Statistical analyses) Infant-level effect sizes are analysed through mixed-effects linear models, investigating the effects of Sex, Age, Regions of Interest and Hemisphere; plots reported are examples of such analysis.Study-level effect sizes are analysed though meta-analytic methods, aimed at estimating effects within each subset of data and at investigating the moderating effects of Sex.

Figure 3 .
Figure 3. Forest plots of the effect sizes and corresponding confidence intervals obtained for responses elicited by repetition-based sequences compared to baseline.The bottom row, labelled as "Overall RE Model (k = 7)" reports the results of the meta-analysis carried out within each anatomical region across age groups and sexes.Summary 'diamonds' show the summary estimates of each group, based on the results of the model, with the center of the diamond corresponding to the estimate and the left/right edges indicating the confidence interval limits.The estimates for the two sexes are indicated for each study.The corresponding HbR forest plot is shown in Fig. S1.

Figure 4 .
Figure 4. Forest plots of effect sizes and corresponding confidence intervals obtained for responses elicited by non-repetition-based sequences compared to baseline.The corresponding HbR forest plot is shown in Fig. S2.

Figure 6 .
Figure 6.Grand average hemodynamic responses to repetition-and non-repetition sequences across all studies in the four anatomically defined regions of interest by age.Top panel shows the overall average, the middle and bottom panels show males and females, respectively.

Figure 7 .
Figure 7. Box plots of infant-level effect sizes as a function of age, anatomically defined ROIs and hemisphere for responses elicited by repetition-based sequences compared to baseline.Boxplots display the median value of the distribution, its first and third quartiles (hinges) and whiskers extend to 1.5 times the interquartile range from each hinge.

Figure 8 .
Figure 8. Box plots of infant-level effect sizes as a function of age, anatomically defined ROIs and hemisphere for responses elicited by non-repetition-based sequences compared to baseline.

Figure 9 .
Figure 9. Box plots of infant-level effect sizes as a function of age, anatomically defined ROIs and hemisphere for responses elicited by repetition-based sequences compared to non-repetition-based sequences.

Figure 11 .
Figure 11.Box plots of infant-level effect sizes for the functionally defined ROIs in each hemisphere for the three contrasts in newborns (top panel) and 6-month-olds (bottom panel).

Figure 13 .
Figure 13.Grand average hemodynamic responses across all studies in the functionally defined regions of interest, for the N vs. 0 contrast.

Figure 1. (a) A typical experimental design, with repetition-based-(AAB, in this specific example) and non- repetition-based regularities (ABC) being presented to infants in blocks (figure adapted from 24 ) (b) The optode arrangement employed in the studies included 8 or 10 sources (red dots) and 8 detectors (blue dots), forming
30total of 20 or 24 channels.Grey dots with a red outline indicate sources that were not present in studies 2 and 5. Anatomically relevant regions described in Sect.2.2.3 are the bilateral frontal area (LH: channels 2, 5; RH: 13, 15) and the bilateral temporal area (LH: channels 3, 6; RH: channels17, 19).The anatomical localization of the array is described in detail in30).

Table 2 .
The channels constituting significant clusters in each study and each contrast (R vs. 0, N vs. 0, R vs. N).