Remote sensing of emperor penguin abundance and breeding success

Emperor penguins (Aptenodytes forsteri) are under increasing environmental pressure. Monitoring colony size and population trends of this Antarctic seabird relies primarily on satellite imagery recorded near the end of the breeding season, when light conditions levels are sufficient to capture images, but colony occupancy is highly variable. To correct population estimates for this variability, we develop a phenological model that can predict the number of breeding pairs and fledging chicks, as well as key phenological events such as arrival, hatching and foraging times, from as few as six data points from a single season. The ability to extrapolate occupancy from sparse data makes the model particularly useful for monitoring remotely sensed animal colonies where ground-based population estimates are rare or unavailable.

distribution and the offset number are free parameters.We chose an exponential distribution as the prior for both parameters to ensure minimization of error distribution width during the sampling process.Table 1 contains all free parameters of the model, the numerical range, prior distribution, and a brief description.We used the 511 counts of adult individuals to optimize the parameters of our model.We sampled the 208 parameters (16 per season, 10 seasons at Point Géologie, 3 seasons at Atka Bay) of the model with the No-U-Turn-Sampler 45 for 200 tuning and 200 draw iterations in 8 independent chains.The parameters and therefore sampling statistics for different seasons and colonies are independent by model definition.The tuning samples were discarded after the sampling process.We use the pooled 1600 draw samples to compute means and errors for all parameters and other result plots.We computed a R-hat statistics 46 of less than 1.02 for each of the parameters.Consequently, the sampling has converged.Sup.Fig 2 shows a trace plot containing the trace and kernel density estimate for all parameters.
We used MCMC sampling and Bayesian statistics to infer the parameters of the windchill model while only using the measured areas and meteorological data as inputs.We applied noninformative normal distributions as priors for the linear factors and the transition temperature (cT = 1 / b0, cW, cR, cH, Tc).Again, we modeled the error distribution of the density (⍴) as a lognormal distribution with standard deviation σ.Analogous to the phenological model, we choose an exponential distribution as the prior for σ.

Bay colonies.
In each panel, the left part shows the parameters for Pointe Géologie colony (PG) and the right part for Atka Bay colony (AB).The points show the parameter value for each year, while the violin plots show the kernel density estimation for the parameter.

Supplementary Table 1
This table contains the numerical values of the best estimate for the parameters of the phenological model.

Supplementary Note 2
We calculate the R2 score as follows from two arrays x, and y containing the true and respective predicted values: We calculate the statistical significance of inter-colony variability as follows: We calculate the average parameter value for each season/colony combination.Then we pool the Atka Bay (AB) and Pointe Géologie (PG) values and perform a two-sided Mann-Whitney U statistics (using the python package scipy version 1.7.3).If the resulting p-value is smaller than 0.05, we accept the parameter as colony-variant.
We calculate the statistical significance of inter-year variability as follows: For each parameter set from the sampling chain, we calculate the pairwise differences of annual parameter values.This yields 45 distributions of parameter differences for the 10 years from Pointe Géologie and 3 distributions for the 3 years from Atka Bay.In these distributions zero equals no difference between years and a symmetric distribution around zero indicates no inter year variance.We count the number of samples left of zero and right of zero.We use Bonferroni-corrected significance values of 0.05/45 for Pointe Géologie and 0.05/3 for Atka Bay.If less than a significant portion of samples are on one side of zero, we accept the pair to be significantly different.If one of the inter year pairs for one colony is significantly different, we call the parameter inter-year variable at the respective colony.
We perform this test independently for Atka Bay and Pointe Géologie parameter sets.Duration of first foraging trip of females:sfem ∼ U (cmax,smax) With those parameters, the model provides a function for the number of breeding males NM(t), breeding females NF(t), chicks NC(t), and non-breeders NNB(t) observed at the colony at a time t.These functions are based on presence/absence patterns of individuals, which again are based on a set of 26 events, when individuals can either enter or leave the colony.The numbers of individuals are therefore the sum of 26 presence functions Pi(t) weighted by 26 participation factors fi to account for the fact that not all animals participate in all events (see equations 2, 1, 3, and 4).We assume that each event timing is normally distributed with a mean ti and a width ∆ti.The presence function Pi(t), i.e. the fraction of individuals of a type (male breeder, female breeder, chick, or non-breeder) that have undergone an event and therefore entered or left the colony, is therefore an error function with the two parameters ti and   :   () = ( −
The 26 x 4 participation factors fi, 26 mean timings ti, and 26 distribution widths ∆ti are however not free parameters, but governed by knowledge about the species' breeding cycle and thereby reduced to the 14 original model parameters.In the following we define every factor, mean timing and distribution width and how it is calculated from the original parameters.
(1)   () =  ∑  ,   () The presence and absence of one individual would be described by a sharp step function (see Sup. Fig. 8A).However, as we are monitoring a whole colony, the step is likely "blurred" by the individual variation in timing (see Sup. )) (see Sup. Fig. 8D).The gaussian shape parameters ti and ∆ti for each of the 26 events are calculated from the original 14 model parameters as follows.
• t1 = t0+m is the first female departure, with m the duration of courtship.
• t2 = t1 + b is the first female return and first male departure, with b the duration of female absence (incubation).
• t3 = t2 + s0 is the first male return with s0 the time spent at sea by the male breeders during their first foraging trip.
• t4 = t3 + c0 is the second female departure with c0 the time spent at the colony by both breeding partners.

D
of this normal distribution describes the fraction of individuals that has already entered the colony because of the event at ti.The time spent at sea si and at colony ci during each foraging turnover is controlled by the model parameters smin,smax,cmin, and cmax, so that the time at sea and at colony perform a linear decrease throughout the breeding season.
( The special cases s1,s2,s3, and c0 arise from the first female departure being reportedly shorter than the male absence, and the first two times spent at colony being equally long: 2 contains a recursive definition of every event timing.Table 3 contains an explicit formula for every event timing.The distribution widths are governed by the two model parameters ∆t0 and ∆b, the width of arrival timing, and the width of the female return timing (see equation 7).We assume that only those two events are significantly impacted by external triggers (e.g.large storms) that desynchronize the behavior of individuals within a colony.We further assume, that a desynchronization in the beginning of the breeding season can not be recovered within one breeding season leading to a cumulative addition of the width parameters.
(7)   =  0   ≤ 1 √ 0 2 +  2 As not all types of individuals (male breeders, female breeders, chicks, nonbreeders) and also not always all individuals of that type, but a fraction, participate in every event, the presence function Pi needs to be multiplied by a scaling factor fX,i with i ∈ [0,1,...,25] and X noting one of M,F,C, or NB for the individual type.These factors are shown in table 1 and follow the breeding pattern as known from literature.For example, the male breeders participate in arrival (fM,0 = 1), stay at the colony, while females leave at t1 (fM,1 = 0).After the incubation phase (t2) all males leave (fM,2 = −1) and so on.The factors do have to follow several mathematical constraints: • |fX,i| ≤ 1 At max 100% of individuals of one type can participate in an event ≤ 1    [0,1, . . .,25] The number of present individuals at any time must lie between 0% and 100% of all individuals of that type.These constraints are met by our definition of each factor as in table 1.To estimate the free parameters of the model defined above, we use the Bayesian inference python package PYMC3 with the No-U-Turn Sampler (NUTS).The inference process takes the model structure and the measured data (number of observed adults N ˆj and the respective dates tˆj for each observation j) as input, and outputs a series of random samples from the distribution of the inferred parameters.These (posterior) parameter distributions can then be used to compute the mean inferred parameter value and to quantify the uncertainty tied to the parameter estimate.To carry out the inference process, PYMC3 needs to know how likely the model could generate the observed data given certain parameter values.The probability of generating a data set given a model and its parameter values is called likelihood function.The likelihood is the probability density of the observation error distribution, but with the distribution parameters as variables (see equation 8).We assume that the error of manually estimating the number of penguins is a relative error (if one counts 1,000 penguins and misses the true count by 10, they will likely miss by 100 on average if they count 10,000 penguins).Such a relative error is implemented by assuming that estimated penguin counts follow a log-normal distribution.The log-normal distribution has one additional shape parameter σ, which corresponds to the width of the distribution.The shape parameter encodes the size of estimation error that researchers make when counting penguins manually.Since we do not know the accuracy of the penguin count a-priori, the shape parameter represents an additional free parameter of the model and it is estimated from the data.As is common for such parameters in Bayesian inference, we choose a prior distribution that favors small values (Exponential Distribution), still allows a wide range of estimation errors.The total likelihood of all observations j is their product (see equation 9).Therefore the optimization of the log-normal distributed error function is essentially a minimization of the sum of squared differences of the log values The ratio of participating individuals for each of the 26 events are defined as follows: The event timing for each individual event is defined as follows: () =  ∑  ,   ()   () =  ∑  ,   ()   () =  ∑  ,   () 25 =0 =  ∑  , ( −    ) 25 =0 Fig. 8 B & C), resulting in a smooth sigmoidal function Pi, that describes the fraction of individuals that have already undergone the event.We assume that the distribution of individual event timing follows a Gaussian distribution with mean ti and width ∆ti, leading to an error function shaped presence function (  ( 8: Presence/absence patterns for one "entering" event at time ti: A) The presence of one individual is a step response.B) Different individuals will enter at different times, creating a jitter on the presence/absence function.C) We assume that those different entering times follow a normal distribution with mean ti and width ∆ti.D) The cumulative distribution function No individuals are at the colony after the end of the breeding season.

Supplementary Figure 2 Sup. Fig. 2.: Plot of kernel density estimates and traces from the sampling process.
Every row shows one parameter of the sampling process.The left column shows the kernel density estimates: On the X-axis the parameter value on the Y-axis the probability of that value.The right column shows the trace of the sampling process (after the removing the tuning samples): The X-axis shows the sample index from 0-2400.TheY-axis shows the parameter value associated with that sample.Different colors indicate different seasons.

Supplementary Figure 3 Sup. Fig. 3.: Correlation of model parameters with breeding success. The
panels show the average predicted parameter value on X-axis and the ratio of fledging chicks to the number of breeders from manual observations on the Y-axis.Points show PG, squares show Atka Bay data.Colors indicate the year.The black line shows a linear regression fit performed on the data in the plot.We find significant correlation for the parameters cmin, smax, and smin indicated by a red outline of the plot box.

Supplementary Figure 4 Sup. Fig. 4.: Model predictions for satellite-based data. The
plots show the total number of individuals per breeding season and colony.The points show the total number of The vertical solid line shows the first sunrise after mid winter at the respective colony locations.According to our findings for PG and AB we used this date to align the phenological patterns of the other colonies.Colors indicate the colony and data source (ground or satellite-based) Expanding these definitions leads to the following formulas for the timing of each event. 0 +  +  +   +   +   +    5 =  0 +  +  +   +   +   +   +    6 =  0 +  +  +   +   + 2   + 2   +    7 =  0 +  +  +   +   + 2   + 2   + 2    8 =  0 +  +  +   +   + 3   +