Modelling menstrual cycle length in athletes using state-space models

The ability to predict an individual’s menstrual cycle length to a high degree of precision could help female athletes to track their period and tailor their training and nutrition correspondingly. Such individualisation is possible and necessary, given the known inter-individual variation in cycle length. To achieve this, a hybrid predictive model was built using data on 16,524 cycles collected from a sample of 2125 women (mean age 34.38 years, range 18.00–47.10, number of menstrual cycles ranging from 4 to 53). A mixed-effect state-space model was fitted to capture the within-subject temporal correlation, incorporating a Bayesian approach for process forecasting to predict the duration (in days) of the next menstrual cycle. The modelling procedure was split into three steps (1) a time trend component using a random walk with an overdispersion parameter, (2) an autocorrelation component using an autoregressive moving-average model, and (3) a linear predictor to account for covariates (e.g. injury, stomach cramps, training intensity). The inclusion of an overdispersion parameter suggested that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$26.36\%$$\end{document}26.36% \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[23.68\%,29.17\%]$$\end{document}[23.68%,29.17%] of cycles in the sample were overdispersed. The random walk standard deviation for a non-overdispersed cycle is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$27.41 \pm 1.05$$\end{document}27.41±1.05 [1.00, 1.09] days while under an overdispersed cycle, the menstrual cycle variance increase in 4.78 [4.57, 5.00] days. To assess the performance and prediction accuracy of the model, each woman’s last observation was used as test data. The root mean square error (RMSE), concordance correlation coefficient and Pearson correlation coefficient (r) between the observed and predicted values were calculated. The model had an RMSE of 1.6412 days, a precision of 0.7361 and overall accuracy of 0.9871. In conclusion, the hybrid model presented here is a helpful approach for predicting menstrual cycle length, which in turn can be used to support female athlete wellness.

The availability of mobile apps developed to track the menstrual cycle is growing as they are becoming increasingly popular for contraception purposes, fertility awareness and exercise planning. These apps can be grouped broadly as calendar-based, basal body temperature (BBT), or symptothermal [1][2][3] . Calendar apps generally use simple algorithms based on empirical measurements to predict cycle phase length 4 ; BBT apps describe a woman's menstrual variation through her basal body temperature rise 5 and symptothermal apps measure parameters such as cervical mucus changes, bleeding period and so on 2 .
The mobile app that generated the data used in the study is called FitrWoman. It is a free calendar-based app that enables users to track their menstrual cycle and symptoms, and provides relevant information about wellness, nutrition and exercise, based on the athlete's predicted menstrual cycle phases and length. The user inputs daily information on 25 symptom variables such as flow, bloating, constipation, injury, illness, irritability and weakness. The target audience is female athletes who wish to track their menstrual cycle to improve their performance and understanding of their individual cycle.
As a woman's body may respond and adapt differently throughout their cycle, different planning and preparation over the menstrual cycle phases 6-8 might be required. McNulty et al. 9 observed through meta-analysis that exercise performance might be trivially reduced during the early follicular phase of the menstrual cycle when compared to the other phases.
As few apps are accurate in terms of menstrual cycle length prediction 10 , the development of an appropriate, exact parametric model for one-step-ahead forecast cycle length is required. Such a model should take into www.nature.com/scientificreports/ account the between and within-woman variability to identify menstrual cycle patterns and how each symptom could affect cycle length, alongside the implications of significant alterations in cycle length. According to several studies [11][12][13][14] , the menstrual cycle length can be classified into two groups 'standard' and 'menstrual dysfunction', where a cycle length greater than 35 days is classified as 'menstrual dysfunction' and otherwise as standard. Many statistical models have been proposed in the literature to describe these different groups of menstrual cycles 2,[15][16][17][18] . Generally, cycle length related to the 'standard' group can be analysed using classical statistical approaches. In contrast, the mixture of standard and non-standard cycles can be analysed using a mixture distribution accounting for the significant symmetric distribution and the component corresponding to the heavy right tail 14,15 . To account for the within-individual variability, we focused on the dynamic aspect of menstrual cycles over time, as discussed by Bortot et al. (2010) 16 , who derived a predictive distribution based on individual repeated measurements using a state-space model formulation. According to these authors 16 , statespace models under a Bayesian approach have the advantage of incorporating between subject information to compensate for the relatively large number of subjects with a low quantity of repeated measurements and to make predictions for women not included in the sample.
It is well-established that having a regular menstrual cycle is a 'vital sign' demonstrating that the body is likely to be in an adaptive state and is tolerating the physical and psychological stressors that are being placed on it 19 . Significant elongations in cycle length are associated with adverse health and fertility outcomes 20-23 , therefore gaining a better understanding of the interrelating risk factors for cycle length extension is important.
In this paper, the first objective was to develop an appropriate parametric state-space formulation for the marginal distribution of standard menstrual cycles for female athletes. In addition, symptom variables were included in the model's linear predictor to evaluate how the individual reported symptoms might affect an athlete's menstrual cycle duration. The second aim was to develop a one-step-ahead forecasting interval approach, based on a state-space formulation, to describe the experimental and state process while considering both between and within-woman variability.

Results and discussion
Results from the state-space models, state-space mixed-effects models and linear mixed-effects models (LMM), fitted using the available data, are summarised in Table 1. In general, the Bayesian information criteria (BIC) suggests that the random walk models fitted better than the LMM when modelling menstrual cycle length, in agreement with the results reported by Bortot at al. (2010) 16 while contradicting the results of 2 who report an R 2 = 0.99 when fitting a simple linear regression.
The inclusion of r ij to model overdispersed cycle lengths was fundamental to describe menstrual cycle dynamics as evidenced by the BIC criteria where a reduction of 56.22% compared to y ij = m ij + ǫ ij , and 56.62% compared to y ij = β 0 + b 0i + (β 1 + b 1i )Age ij + ǫ ij is evident, as shown in Fig. 1. Additionally, the inclusion of a moving average (MA) parameter was necessary to capture the dynamism of shorter cycles followed by longer cycles and vice-versa. In summary, a random walk with a random variable to capture overdispersion r ij plus a MA(1) model demonstrated the best fit to the data.
To assess model performance, we compared the forecasts of these models using the RMSE of one-step-ahead predictions, CCC and Pearson correlation coefficient evaluated on the test group. Table 1 demonstrates that better forecast predictions were made using a random walk rather than an LMM and that there was little difference between the random walk models in terms of forecasting. As a consequence, the BIC criteria can be used to select the error structure. After selecting the trend and error structures, the next stage of the analysis was the selection of potentially useful explanatory variables. The set of 28 available represented a variety of reported symptoms by the i-th woman, including an interval-based variable representing a woman's body mass index (Kg/m 2 ) (  24 . In this analysis, underweight classes I and II were classified as 1.6810 0.7171 0.7326 17,305.01  Fig. 1. The selected state-space model summary with posterior means and 95% credibility intervals for the population parameters (after predictor selection) is presented in Table 3.As the model parameterisation facilitates the interpretation of the role played by the explanatory variables, our analysis reveals important insights on how some symptoms affect menstrual cycle length.
We found that the overall menstrual cycle length without any reported symptoms was around 27.41 [27.33, 27.50] days, which is in agreement with Guo et al (2006) 15 and Bull et al. (2019) 2 . Additionally, the reporting of injury, stomach cramps and flow amount was associated with increased menstrual cycle length. In contrast, the reporting of tender breasts was associated with decreased cycle length. For example, if a woman reported tender breasts ten times over her cycle, as a consequence, her predicted menstrual cycle length is estimated to reduce, on average, by 0.154 × 10 = 1.54 days.
Self-track symptoms quality depends on both user engagement, app design and unambiguous language to describe the level of a symptom. Consequently, to make it more consistent, filtering the original database based on the scientific literature is a critical way to reduce bias in the covariates used to fit the model, as described by Li et al. (2020) 14 .
The estimated value of π suggests that the probability of a non-standard (overdispersed) menstrual cycle length occurring in this population of interest is 0.2636. Consequently, we can infer that 26.36% [23.68%, 29.17%] of cycles in the sample are overdispersed. Furthermore, while a non-overdispersed cycle had a standard deviation (SD) of σ η = 1.0417 [0.9971, 1.0875], the SD of an overdispersed cycle increases where σ w = 4.7803 [4.5738, 5.0007], which represents a 4-fold increment. According to Najmabadi et al. (2020) 25 , between and within-variability in cycle characteristics should be emphasised as an important health indicator to assess behavioural, metabolic, and environmental factors. Therefore, the inclusion of θ and σ w play an essential role in the proposed model, as illustrated in Fig. 2. This Figure shows the probability that the proposed model (3) considers an observation as overdispersed where the results clearly demonstrate that r ij = ij w ij is capturing menstrual cycles with overdispersion.
Using this model, knowledge and understanding can be gleaned as to how symptom variables affect the menstrual cycle, which is essential for individual athletes, coaches and healthcare professionals. Furthermore, these results can improve the forecasting intervals, helping women to know more about their bodies and cycles based on symptoms during a particular phase of their cycles. Further work is needed to translate these findings into www.nature.com/scientificreports/ recommendations. Although information relating to follicular and luteal phases was not available in the data, a strong linear correlation between menstrual cycle length and follicular phase has been reported [26][27][28] . Where the correlation tended to increase with age. To predict ovulation time, further studies, which include both luteal and follicular phases and basal body temperature (BBT), are needed to extend the proposed model 2 .
Although an ARMA(1,1) model was not needed in this analysis, we have demonstrated that some women have a positive lag-one autocorrelation while others have a negative lag-one autocorrelation. These results contradict the findings of 16,29 who report a small general negative autocorrelation for a woman's profile. In order to better investigate the variability of an autoregressive coefficient, we modified the state-space formulation to  www.nature.com/scientificreports/ accommodate this source of random variation by assuming that However, the normality assumption for φ 0i was not justified as the normal Q-Q plot suggested a distribution with heavy tails and asymmetry; as a consequence, 80% of points were outside of the 95% simulated envelopes for this random effect ( Figure S1).
We also observed that some women had a long cycle followed by a short cycle and vice versa, as observed by Bortot et al. (2010) 16 . However, we found while θ = −0.0915 with CI 95% : [−0.1563, −0.0320] the estimate of the same parameter described by Bortot et al. (2010) 16 was −0.61 [−0.77, −0.45] . It appears that the sample of female athletes that these analyses are based on had more regular menstrual cycles than a sample of 1,798 women observed from clients of the Catholic Marriage Advisory Council of England and Wales. Although we have a higher number of women in our sample than in 16 , the time series in their sample were longer (up to 109 measurements) compared with up to 55 measurements in this sample. In order to account for the betweensubject variability, we included a random effect in the moving-average coefficient given by θ i = θ 0 + θ 0i , with θ 0i ∼ N 0, σ 2 θ . However, we observed the same problem as reported when considering the autoregressive coefficient where more than 70% of points were outside of the 95% simulated envelopes, lower asymmetry compared with φ 0i and heavy tails ( Figure S2). Therefore, to avoid bias in individual forecasting predictions, these random effects were dropped from the model. Further work is needed to accommodate individual estimation for the autocorrelation and moving-average coefficients to improve model performance at the individual level.
The analysis workflow was as follows: we initially checked the Bayesian assumptions and the posterior distribution using suitable plots of the Markov Chain Monte Carlo (MCMC) draws from the posterior distribution and Gelman-Rubin diagnostic and autocorrelation plots of all model parameters. Figure 3a shows the iterates of β 0 , π , θ 0 , σ η , σ w , and σ ǫ after a burn-in of 10,000 simulated iterations, which indicates convergence of the chains and stationary distributions, as the samples appear to be randomly sampled from the same region of the y-axis rarely venturing outside that area. The autocorrelation and Gelman-Rubin statistics 30 were used to assess model convergence. The results suggest that the autocorrelation does not drop dramatically from lag 0 to 50 ( Figure S3), indicating a moderate to high autocorrelation among samples. To reduce the impact of this problem, we stipulated a thinning of 50. On the other hand, the Gelman-Rubin statistic based on three chains showed all upper 95% confidence intervals were exactly equal to 1, meaning the chains had converged. Figure 3b shows the posterior densities obtained for estimated parameters derived from 3 Markov chains with 3000 samples per chain, leading to a computational time of around 23 hours executed on Dell Inspiron 17 7000 with 10 th Generation Intel Core TM i7 processor, 1.80GHz × four-processor speed, 16GB random access memory (RAM) plus 20GB of swap space, 64-bit integers, and the platform used is a Linux Mint 19.2 Cinnamon system version 5.2.2-050202-generic. In summary, the posterior distribution has been well characterised by the drawn samples as no unexpected peaks or strange shapes in the posterior density were observed that could signify poor model convergence. As a final assessment, the autocorrelation function, as well as the standardized residual against the athlete's age, were checked (Fig. 4). No serious discrepancies nor patterns that warrant attention were observed in both graphs.
Once the assumptions were verified, we evaluated the agreement between the fitted and observed values and forecast intervals. Figure 5 shows the fitted curves for menstrual cycle length of six women, their 95% credible interval, and the one-step-ahead point forecast with 80%, 95% and 99% forecast intervals. We observed that the random walk with overdispersion parameter and MA(1) model performed well in describing the complex dynamics of menstrual cycle length over time. This conclusion is underpinned by CCC's residual diagnostic and high values and Pearson correlation between fitted and observed values by the woman. These results also show that linear or linear mixed-effects models should not be applied to explain the variability of menstrual cycle length. They generally do not follow the necessary assumptions of linearity-however, a study done in 2019 by Bull    The necessity of including r ij = ij w ij in our model to describe cycle length is demonstrated in Fig. 6 where the improvement in the point estimates, credible and forecasting intervals when r ij = ij w ij was and was not included in the model is given.
The results show that the improvement in the Pearson and concordance correlation coefficients when r ij was included in the model was mainly for women who had more overdispersed cycles, resulting in better forecast predictions, and narrower corresponding credible intervals.
Finally, to evaluate the one-step-ahead point forecast prediction we generated prediction using a test set comprised of 1,029 women, each of whom had at least 3 repeated measurements. The results are shown in Fig. 4.
As there are not the same number of repeated measurements for each woman, this makes the forecasting prediction evaluation difficult as the number of women who drop out of the test set increases over time. With this in mind, we found that RMSE values could be two times higher than those presented in Table 1, suggesting that these models are not working well for some women in the test group. The same conclusion is evident when considering the CCC and Pearson correlation coefficients. As the CCC can be written as CCC = r × C b , where r represents a measure of precision and C b a measure of accuracy 32 , we can conclude that our model has high accuracy, with the potential to increase as the number of women with repeated measurements increases. The lower precision reported for the test set suggests that the explanatory variables used in the model may not be enough to explain the variability in the data. Including additional variables such as those that capture information on polycystic ovary presence, daily diet, country of origin,may improve model forecasts in general.

Limitations
The limitation of this study is that it is based on observational data which depends on users logging their information on the app. As a consequence, the models proposed are not intended to elucidate the causal pathway of reported symptoms on cycle length.

Conclusion
State-space models, incorporating a probability π as a random effect at the subject level in the random walk component. are a valuable approach for predicting menstrual cycle length. They could be used to support female athlete wellness and optimize performance. For this reason a random walk with an overdispersion parameter Figure 6. Age versus fitted menstrual cycle length for six women with more than 40 repeated measurements with addition of 95% credible interval (dashed line), 80%, 95%, and 99% forecast intervals for their next cycle, and observed menstrual cycle length (points) using the model y ij = m ij + γ ij + c ij , when dropping the term r ij from the model. The estimated concordance correlation coefficient (CCC), and Pearson correlation coefficient (r) between fitted and observed values are reported for each woman. Accuracy ( C b ) can be obtained using C b = CCC /r. We also found that reporting injury, stomach cramps, tender breasts, and flow amount had a significant effect on menstrual cycle length amongst female athletes using the FitrWoman app. Although accurate forecast predictions are reported, improvements in the variables collected and enhancements to the model are still needed, such as considering a random effect for the moving-average coefficient θ 0 , to improve forecast precision.

Methods
Data characteristics. The sample was comprised of female athletes using the FitrWoman app 33 who had given their consent for the use of their data for research purposes. The sample size contains data on 16,524 cycles collected from 2,125 women (Fig. 7a), whose mean (sd) age was 34.38 (7.05) years (range 18 to 47 years); mean (sd) weight 62.75 (9.16) Kg (range 42.18 to 100.23 Kg); mean (sd) height 165.88 (6.89) cm (range 152.4 to 186.0 cm); with several repeated measurements per woman ranging from 4 to 53 cycles. There was approximately 60% of information missing for height and weight where the 95% quantile of the sample distribution, based on 893 women, was between 153.0 and 180.0 cm for height and 48.3 and 86.11 Kg for weight. A bivariate density plot for weight and height given age is shown in Fig. 7b in order to visualise the relationship between anthropometry and age in the sample.
Menstrual cycle length is assumed to be normally distributed as the data represent standard cycles 15 , where the shortest cycle length record was 18 days and the longest was 43 days. The sample mean and variances are 27.62 and 3.51 days, respectively. As some women contributed more than one sequence to the database, we decided to consider only the first sequence available because we don't know the reasons that caused this temporary dropout. The inclusion of the following sequences might bias the analysis, as also discussed by Bortot et al. (2010) 16 . Figure 7c shows profiles for six women with a blue line representing a fitted mixed-effects linear regression model. It can be observed that the inclusion of a random intercept and slope plays an essential role as each woman's cycle can be affected by different non-observed explanatory variables. However, the conditional R 2 was equal to 0.40, implying that the linear mixed-effects regression is a good approximation for some profiles, but not for all of them, differing from the results presented by Bull  It is clear, based on our sample, that each woman's specific trend must be accounted for in terms of their within-subject temporal dependence and the between subject variability across women. Figure 7c,d show that for some women a short cycle can be followed by a long cycle and vice-versa, suggesting the need for a moving-average model. Furthermore, Fig. 7d shows that cycle length for some women has a positive autocorrelation. In contrast, others have a negative autocorrelation suggesting the need for an autoregressive moving-average model incorporating individual random effects for the autocorrelation and the moving-average coefficients. Finally, Fig. 7e shows a table containing the reported proportion of reported symptoms, where in most cases symptoms did not happen or were not reported.
As a consequence of possible missing data due to non reporting of symptoms, the effect of symptoms on cycle length may be biased towards the null hypothesis of no association between symptom and cycle length (i.e. a type II error). Despite this possible bias and loss in power, the p values obtained from statistical methods fitted to data subject to random error or misclassification are still valid [34][35][36] .  where M i,J i +1 y i,J i +1 is fully specified and is a vector of unknown fixed-effect and variance components parameters. In order to accommodate the within-subject temporal correlation between repeated measures and the between-subject variability a random walk state-space model and mixed-effects state-space model was used, incorporating a Bayesian approach for process forecasting to predict the duration, in days, of the next menstrual cycle. Each prediction is accompanied by a corresponding interval forecast as point prediction is of limited value without an accompanying measure of uncertainty 37 . We assumed that cycle length are independent and that menstrual cycles tend to decrease over time as a woman ages 15,16 . In addition, we combined the Bayesian approach and forecasting proceses to include covariates where model validation procedures were used to compare model adequacy.
State space models for cycle length. The state-space formulation is an attractive choice due to its flexibility to work with discrete response variables and temporal dependency amongst observations. At the same time, the mixed-effects model can be used to account for between-subject variability. As the observed event is the difference, in days, between the interval from the first day of one bleeding episode up to and including the day before the next bleeding episode, observed cycle lengths can be modelled as discrete random variables. Let Y ij be a continuous random variable, where y ij is a realisation of Y ij , which represents the observed cycle length. Furthermore, let O ij be a discrete random variable, where o ij is a realisation of O ij which represents the cycle length in days as a continuous process, that is, y ij = o ij + ε ij . As we have no way to estimate the error term ε ij (observation process), we assume that o ij = ⌊y ij ⌉ is a good approximation for y ij , where ⌊.⌉ indicates rounding. Thus, the true non-observed continuous cycle length y ij can be generated by the random walk state-space model: www.nature.com/scientificreports/ where y ij is the menstrual cycle length for the i-th woman at j-th cycle; m it is a random walk model that allows an individual trend in the series with η ij assumed to be normally distributed with mean 0 and variance σ 2 η . We assumed an ARMA(1,1) model for γ it , where φ is the autoregressive parameter; θ is the moving average parameter; and ǫ ij is assumed to be normally distributed with mean 0 and variance σ 2 ǫ (process error). Furthermore, c ij captures the information provided by additional symptoms predictors ( C ij ) that may have useful roles in understanding and forecasting cycle length, where α k represents the k-th fixed effect parameter. Finally, r ij is a random effect term used to account for extra-variability (overdispersion) of some menstrual cycle lengths measured on i-th woman at cycle j, which could be classified as outliers. Consequently, under model (2), y ij has probability π of being an overdispersed menstrual cycle (non-standard) for the j-th cycle measured on i-th woman, where its additional magnitude is given by r ij (Fig. 8).
In this way, m ij can be interpreted as the trend for a standard cycle. In contrast, m ij + r ij can be interpreted as the trend for a non-standard cycle, where r ij is an overdispersion parameter at the subject level for measures which induce extra-variability, as discussed in 38  (2) www.nature.com/scientificreports/ Fitting a separate linear regression for each woman will result in a subject-specific intercept that may account for variability due to non-observed variables likely to affect their first observed menstrual cycle. In contrast, a mixed model incorporating random slopes assumes that each woman has a different menstrual cycle length trend relative to her age. To verify if the random walk model proposed has the necessary flexibility to capture differing trends, it was compared to a linear mixed-effects model 16 . In that case, m ij = β 0 + b 0i + (β 1 + b 1i )Age ij , where β 0 and β 1 are the (marginal) intercept and slope, respectively; b 0i and b 1i are the random effects for the intercept and slope for the i-th woman at Age ij , respectively, where it is assumed that and Age ij represents a woman's age.
Bayesian implementation and choice of prior distribution. A Bayesian analysis combines information from observed data with prior distribution for the model's parameters in order to generate a posterior distribution. In this analysis the inverse-gamma(κ, κ) is a natural candidate for the prior distributions and are often used for random walk state-space models and variance components of mixed effect models. Such a choice of prior is attractive as it can be considered as non-informative within the conditionally conjugate family, when κ is set to a low value such as 0.1 3 : A likelihood ratio test was used to test whether the presence of correlations between the random effects in these models played a crucial role. Based on a 95% credible interval for the variance component σ b01 for the proposed mixed-effects model there was sufficient evidence that the random effects are plausibly mutually independent and a term to capture the correlation structure between the intercept and slope could be removed from the model.
Model selection procedure. The model selection procedure used to compare candidate models involved a balance between forecast accuracy and the Bayesian Information Criterion (BIC). Forecast accuracy was calculated based on RMSE, CCC and Pearson Correlation Coefficient while the BIC was calculated using the following formulation 41 : where N is the total number of observations; and p is the number of parameters estimated by the model. The procedure was split into three steps namely the time trend component, the autocorrelation component, and an additional linear predictor as a function of available explanatory variables.
The first step was to account for a possible trend, by identifying the most appropriate error structure for the model, which in our case consisted of a comparison of a random walk model or a linear mixed effect model (Fig. 9). The second step involved the inclusion of temporal dependence among observations, as evident in some women in the sample, where an ARMA model was considered as shown in the Fig. 9. The third and final step involved the inclusion of explanatory variables to account for their (possible) relationship with cycle length. This was achieved using the posterior distribution on the parameter α k to select all those variables that did not have the null value for their parameter contained in their corresponding 95% credibility interval.
A novel use of train and test set data was used to validate model performance and to estimate the one-step ahead forecasting prediction accuracy as a function of the number of cycles reported. The complete sample of 2125 was used for model validation by treating the last observed cycle length as test data. The procedure is illustrated in Fig. 10 where the last observed cycle length (red dot) is 'held back' as test data and the remaining data (blue dots) were used as training data. The forecast performance was calculated using the RMSE and CCC between the observed and predicted cycle lengths and used jointly with the BIC criteria in the model selection process.
Once the model was selected, it was then sequentially tested using i) the complete data as training data and ii) a random sample of 1029 (approximately half the complete data) as test data. As the number of cycles reported varied from 2 to 12, one-step ahead forecasting prediction accuracy was calculated for each of these scenarios by treating the last observed cycle in each scenario as test data. As the number of athletes in the test set decreased with increasing reported cycle lengths, individuals that had fewer observed cycle lengths for the cycle length scenario under consideration were included in the training set to account for this attrition.
The forecast error for an observed value and its forecast was computed as www.nature.com/scientificreports/ where the training data are given by y i1 , y i2 , . . . , y iJ i and the test data by y i,J i +1 (i.e. one-step ahead prediction for each woman), see Fig. 10. The forecast accuracy was measured by the root mean square error (RMSE), concordance correlation coefficient 32 , and the Pearson correlation coefficient between the observed response in the test data and corresponding predicted cycle length value.
Posterior computation. Markov Chain Monte Carlo (MCMC) was used to generate samples from the posterior distribution for the random walk and mixed-effects state-space models using a Gibbs sampler algorithm 40 , as this approach is widely used to obtain parameter estimates from a posterior distribution. The convergence of the MCMC algorithm was checked by multiple comparisons of MCMC chains with different starting points. The normality assumptions were checked using suitable residual plots and quantile-quantile plots with simulate envelopes 42 . The one-ahead predictive distribution of F i,J i +1 Y i,J i +1 was derived through draws from the posterior distribution. Consequently, the κ-step ahead predictive distribution was obtained by running the Kalman filter sequentially. All analysis were implemented in R including runjags 43 , coda 44 , hnp 42 , and ggplot2 45 packages.
Ethics approval. This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, co-funded by the European Regional Development Fund in partnership with Orreco. All methods were carried out in accordance with relevant guidelines and regulation. In particular the data that support this study were made available by ORRECO. Upon first use, all FitrWoman app users provide informed consent by agreeing to their anonymised data being used with third parties for research purposes. However, restrictions apply to the availability of these data used under license for the current study. In order to use the Fitrwoman app each participant must agree to the following conditions: Without prejudice to the foregoing, ORRECO shall have an exclusive, royalty free, perpetual licence to use and retain the User Data and all other information arising from the provision of the Services:-(i) for research purposes, (ii) in order to improve the standard of service provided by ORRECO in the future; (iii) in order to validate ORRECO's proprietary algorithms or intervention programmes; (iv) to analyse and report anonymously on patterns in User Data by reference to their age, sex, ethnicity, discipline, field, training schedule, performance, results or such other data sets as ORRECO may decide; and (v) in order to develop similar or new services, provided that in each case the identity of the User and any personal data comprised within the  www.nature.com/scientificreports/ User Data shall be kept, removed or anonymised. Anonymised data shall be sent to third party processors to be analysed to uncover patterns and trends and to further sports science research. The FitrWoman app is compliant with the General Data Protection Regulation laws (GDPR 2016/679). All experimental protocols and ethical use of data were approved by the ethics committee of the Insight Centre for Data Analytics, National University of Ireland Galway, Ireland.