Eye tracking in an everyday environment reveals the interpersonal distance that affords infant-parent gaze communication

The unique morphology of human eyes enables gaze communication at various ranges of interpersonal distance. Although gaze communication contributes to infants’ social development, little is known about how infant-parent distance affects infants’ visual experience in daily gaze communication. The present study conducted longitudinal observations of infant-parent face-to-face interactions in the home environment as 5 infants aged from 10 to 15.5 months. Using head-mounted eye trackers worn by parents, we evaluated infants’ daily visual experience of 3138 eye contact scenes recorded from the infants’ second-person perspective. The results of a hierarchical Bayesian statistical analysis suggest that certain levels of interpersonal distance afforded smooth interaction with eye contact. Eye contacts were not likely to be exchanged when the infant and parent were too close or too far apart. The number of continuing eye contacts showed an inverse U-shaped pattern with interpersonal distance, regardless of whether the eye contact was initiated by the infant or the parent. However, the interpersonal distance was larger when the infant initiated the eye contact than when the parent initiated it, suggesting that interpersonal distance affects the infant’s and parent’s social look differently. Overall, the present study indicates that interpersonal distance modulates infant-parent gaze communication.


Data Processing Locomotor Status
The infants' behavioural data were coded from the respective parents' video for each observation day. We identified 3 types of infant movements -crawling, cruising, and walking -and recorded each behaviour with one-zero sampling [1] for 15 s. Crawling was defined as a series of steps with a prone posture, and infants were on hands and knees or hands and feet. Cruising was defined as a series of steps with an upright posture and supported (by a caregiver or furniture). Walking was defined as a series of steps with an upright posture and unsupported. Steps were defined in terms of alternating leg movements that changed the infants location on the floor. Steps could be omnidirectional (infants often stepped backward or sideways or in place). These definitions are based on those in Adolph et al. [2]. The second coder independently judged 20 % of the parent's perspective video with 92 % intercoder agreement.
Throughout each infant's entire observation period, the first observation day when the percentage of walking became larger than that of crawling was defined as "acquisition of walking", and we categorised the infant's locomotor status before acquisition of walking as "crawler" and otherwise as "walker" (Fig. S1).
To assess whether infant age was related to locomotor status, we set locomotor status on each observation day as the response variable and infant age as the explanatory variable in a generalised linear mixed model with binomial error structure. To consider individual differences, we set infant identity as a random intercept. The effect of infant age was tested with the statistical packages lme4 [3] and car [4] in R 3.5.0 [5]. The likelihood ratio test revealed a significant effect of infant age (χ 2 (1) = 362.4, p < 0.0001) and that as infants grew, their locomotor status was likely to be "walker".

Eye Contact Session
We defined eye contact session (EC session) as a series of eye contact bouts (EC bouts) with short inter-eyecontact-bout intervals (IEIs) and use it as an independent observation unit. To determine the IEI criteria for which a new EC session starts, we performed parameter estimation using a statistical model based on the method described by Langton et al. [6]. We applied a single gamma distribution model and a 2-process gamma distribution model and evaluated which model was more parsimonious based on widely applicable information criterion (WAIC) [7].
First, let us assume that the frequency of the observed IEIs x follows the single gamma distribution of shape parameter α 1 and rate parameter β 1 . The probability density of the IEIs x is defined as where Γ(α 1 ) is the gamma function. Next, let us assume that the frequency of the observed IEIs x follows the mixture of two gamma distribution components. One gamma distribution component of the shape parameter α 1 and rate parameter β 1 represents the IEIs in each EC session, and the other gamma distribution component of the shape parameter α 2 and rate parameter β 2 represents the IEIs between the adjacent EC sessions. The probability density of the IEIs x is defined as where Γ(α 1 ) and Γ(α 2 ) are the gamma functions. p 1 is the mixture ratio of the gamma distribution component of the shape parameter α 1 and rate parameter β 1 . The models were fitted using the Hamiltonian Monte Carlo engine Stan 2.17.0 [8], in R 3.5.0 [5]. All iterations were set to 5500 and burn in samples were set to 500, with the number of chains set to four. The value of Rhat for all parameters was below 1.1, indicating convergence across the four chains [9].
The best-performing model was the mixture of two gamma distribution model (Table S2). One gamma distribution component represented the IEIs in each EC session (α 1 = 1.02; β 1 = 0.061), and the other gamma distribution component represented the IEIs between adjacent EC sessions (α 2 = 0.806; β 2 = 0.007).
We set the IEI criteria at the intersection of the two estimated gamma distributions, that is, where the 2 processes occurred with the same probability, 44.3 s (Fig. S2). Then, the EC session was defined as continuous EC bouts that include IEIs briefer than 44.3 s.

Removal of Data
The purpose of this study was to evaluate how infant-parent distance affects their gaze communication in their everyday life. To evaluate eye contact when each member of the dyad was in a natural spatial location, we excluded EC bouts for which the infant's movements were constrained by the parent or environmental objects such as enclosure. For example, infants were sometimes put in play pens when the parent did not want to be disrupted by infants in order to do light housekeeping. Moreover, infants were sometimes held or carried by their parents in social interactions. We excluded EC bouts in such situations because infants could not adjust the interpersonal distance. Moreover, to detect the effect of locomotor status clearly, we try to control for the effect of the parent's posture and also excluded EC bouts when the parent did not sit on the floor because a previous study reported that when parents are standing or walking, infants' social look decreases more than when parents are sitting on the floor [10]. Thus, 18.3% of the total EC bouts were excluded.
Several EC bouts were also excluded from the analysis because we could not perform coding. A reason why we could not perform coding was because the interpersonal distance of some EC bouts was too close and the infant's face became larger than the visual angle of the scene camera. We also could not code the initiator of some EC bouts because they occurred with the infant and parent looking at each other's faces simultaneously. Thus, 12.8% of the total EC bouts were excluded.
Moreover, obvious outlier EC sessions were excluded from the dataset. The 3 SD of the distance of the EC sessions (mean distance of EC bouts in the same EC session) was calculated, and if the distance of the EC session exceeded 3 SD from the mean value, this EC session was marked as an outlier. Thus, 2.1% of the total EC sessions (including 1.2% of total EC bouts) were excluded.
For these reasons, we removed 1504 EC bouts and finally analysed 1206 EC sessions, including a total of 3138 EC bouts.

Data Analysis
We conducted two main statistical analyses and one additional analysis using hierarchical Bayesian models. The core of the hierarchical Bayesian model is a generalised linear mixed model (GLMM) [11] that estimates the effects of the various factors on the response variable. Analysis 1 was intended to estimate the factors affecting how many times gaze communication was exchanged between the infant and parent, and the response variable was the number of EC bouts exchanged in each EC session. Analysis 2 was intended to estimate the factors affecting how many times each member of the dyad initiated eye contact with their partner, and the response variable was the number of infant-led or parent-led EC bouts in each EC session. Analysis 3 was intended to estimate the factors affecting the ratio of eye contacts initiated by the infant to those initiated by the parent, and the response variable was the proportion of infant-led EC bouts in each EC session.
To determine the most parsimonious model, we compared models using widely applicable information criterion (WAIC) [7]. In each model selection procedure, we included all likely explanatory variables and then excluded the explanatory variable and calculated the WAIC sequentially. Finally, we selected the model with the smallest WAIC value as the best model among relevant candidate model sets.
Models were fitted using the Hamiltonian Monte Carlo engine Stan 2.17.0 [8], in R 3.5.0 [5]. All iterations were set to 6000 and burn in samples were set to 1000, with the number of chains set to four. The values of Rhat for all parameters were below 1.1, indicating convergence across the four chains [9]. We chose conservative, weakly informative priors for the hyperprior of some random effects. This made our models sceptical of large effects and helped ensure convergence.

Analysis 1: Number of Eye Contact Bouts Exchanged
The response variable of the statistical model was the number of EC bouts exchanged in each EC session. A series of models with up to four explanatory variables (fixed effects; Table S3) were fitted, and the most parsimonious model was determined based on the WAIC.
First, let us assume that the observed number of EC bouts exchanged in each EC session k, that is, N k , between the infant-parent dyad i on observation day j follows a negative binomial distribution of the mean µ k , where µ k represents the mean number of EC bouts exchanged in each EC session. The log link function is applied for µ k such that the factors in the linear predictor affect µ k multiplicatively. The linear predictor of the maximum model is defined as β 0 is the intercept, and the set of β * from β 1 to β 4 represents the coefficients of explanatory variables (fixed effects). The explanatory variable age ij represents the age in months of the infant i on observation day j, and the explanatory variable walker ij indicates whether the locomotor status of the infant i was walker or not on observation day j. The explanatory variable distance k represents the interpersonal distance of the eye contact session k. To consider differences in the dyad, we set the dyad identity as a random intercept r i . We chose conservative, weakly informative priors for the hyperprior of random effect r i . When the best model includes the effect of interpersonal distance squared, we can predict the mean number of EC bouts as a quadratic curve for interpersonal distance. When we were able to draw predictions as quadratic curves from the selected model, we calculated the extremal value D using the MCMC samples.

Analysis 2: Number of Infant-led and Parent-led Eye Contact Bouts
The response variable of the statistical model was the number of infant-led or parent-led EC bouts in each EC session. A series of models with up to nine explanatory variables (fixed effects; Table S3) were fitted, and the most parsimonious model was determined based on the WAIC.
First, let us assume that the observed number of infant-led or parent-led EC bouts in each EC session k, that is, N k , between the infant-parent dyad i on observation day j follows a poisson distribution of the mean λ k , where λ k represents the mean number of infant-led or parent-led EC bouts in each EC session. The log link function is applied for λ k such that the factors in the linear predictor affect λ k multiplicatively. The linear predictor of the maximum model is defined as β 0 is the intercept, and the set of β * from β 1 to β 9 represents the coefficients of the explanatory variables (fixed effects). The explanatory variables age ij , walker ij and distance k represent the same variable as the statistical model of Analysis 1. The explanatory variable initiator k is a dummy variable that indicates whether N k is the number of infant-led EC bouts or parent-led EC bouts. To consider the individual differences of the infant and parent separately, we set dyad identity as the random intercept r i1 and random slope r i2 in the fixed effect initiator. We also set eye contact session identity as the random intercept r k to correct for overdispersion. We chose conservative, weakly informative priors for the hyperprior of the random effects, r i1 and r i2 .
If λ k(parent) is the mean number of parent-led EC bouts in each EC session k, the linear predictor of the maximum model is defined as If λ k(infant) is the mean number of infant-led EC bouts in each EC session k, the linear predictor of the maximum model is defined as log λ k(infant) = β 0 + β 5 + r i1 + r i2 + (β 1 + β 6 ) * age ij + (β 2 + β 7 ) * walker ij When the best model includes the effect of interpersonal distance squared and any interaction effect, we can draw the predictions of the mean number of infant-led EC bouts and parent-led EC bouts as two different quadratic curves for interpersonal distance. When we were able to draw two different predictive curves from the selected model, we calculated the extremal value for each quadratic curve using MCMC samples. The extremal value of the infant-led EC bout's predictive curve was defined as D (infant) , and the extremal value of the parent-led EC bout's predictive curve was defined as D (parent) . We took the difference of two extremal values ∆D as follows:

Analysis 3: Proportion of Infant-led Eye Contact Bouts
In the result of Analysis 2, the actual characteristic of the interpersonal distance effect was different for the infant-led EC bouts and the parent-led EC bouts. The interpersonal distance where infant-led eye contact occurred most was larger than that of parent-led eye contacts. To confirm this tendency, we conducted an additional analysis in which the response variable was the proportion of infant-led EC bouts in each EC session (Analysis 3).

Supplementary Methods
The response variable of the statistical model was the proportion of infant-led eye contact bouts in each EC session. A series of models with up to four explanatory variables (fixed effects; Table S3) were fitted, and the most parsimonious model was determined based on the WAIC. First, let us assume that the observed number of infant-led EC bouts Y k in the total number of EC bouts N k at EC session k between the infant-parent dyad i on observation day j follows a binomial distribution of parameter q k , the ratio of infant-led EC bouts in each EC session. The logit link function is applied for q k . The linear predictor of the maximum model is defined as β 0 is the intercept, and r k is the random intercept for EC session k. To consider dyad differences, we set dyad identity as the random intercept r i . The set of β * from β 1 to β 4 represents the coefficients of the explanatory variables (fixed effects). The explanatory variables age ij , walker ij and distance k represent the same variables as used in the statistical model in Analysis 1. We chose conservative, weakly informative priors for the hyperprior of the random effect r i .

Supplementary Results
The best model included the effect of age in months and interpersonal distance (Table S4). Referring to the 95% credible interval of each effect's parameter (Table S5) The positive effect of age in months (β 1 ) also suggests that the proportion of infant-led EC bouts in each EC session was likely to increase along the infant's age in months. This result is consistent with the result that the EC bouts that were not parent-led but were infant-led increased along the infant's age in months in Analysis 2.   Inter−Eye−Contact−Bout Interval (s) Density Figure S2. Distribution of the inter-eye-contact-bout intervals (IEIs), the fitted curves of the combination of the 2 gamma distribution model (black line) and its 2 components of gamma distribution curves representing IEIs in the same eye contact session (EC session) (purple line) and IEIs between adjacent EC sessions (pink line). We determined the IEI criteria (dashed vertical line) to define the EC session from the estimated parameters of the 2-process model. The subplots on the column dimension represent individual differences in the gaze communication of infants that are the same age in months.  Age (months) Figure S5.
Predictions of the best model in Analysis 3. The observed data (coloured dots), posterior mean (coloured lines) and 95% credible interval (grey areas) of the proportion of infant-led EC bouts in each EC session are shown in each subplot.
The colour of dots and lines represents the infant's age in months, and the size of dots represents the total number of EC bouts in each EC session. All subplots are for each participant and each observation day. The subplots on the row dimension represent the longitudinal change in the gaze communication of each infant-parent dyad. The subplots on the column dimension represent individual differences in the gaze communication of infants that are the same age in months.