Abstract
The ability to predict an individual’s menstrual cycle length to a high degree of precision could help female athletes to track their period and tailor their training and nutrition correspondingly. Such individualisation is possible and necessary, given the known interindividual variation in cycle length. To achieve this, a hybrid predictive model was built using data on 16,524 cycles collected from a sample of 2125 women (mean age 34.38 years, range 18.00–47.10, number of menstrual cycles ranging from 4 to 53). A mixedeffect statespace model was fitted to capture the withinsubject temporal correlation, incorporating a Bayesian approach for process forecasting to predict the duration (in days) of the next menstrual cycle. The modelling procedure was split into three steps (1) a time trend component using a random walk with an overdispersion parameter, (2) an autocorrelation component using an autoregressive movingaverage model, and (3) a linear predictor to account for covariates (e.g. injury, stomach cramps, training intensity). The inclusion of an overdispersion parameter suggested that \(26.36\%\) \([23.68\%,29.17\%]\) of cycles in the sample were overdispersed. The random walk standard deviation for a nonoverdispersed cycle is \(27.41 \pm 1.05\) [1.00, 1.09] days while under an overdispersed cycle, the menstrual cycle variance increase in 4.78 [4.57, 5.00] days. To assess the performance and prediction accuracy of the model, each woman’s last observation was used as test data. The root mean square error (RMSE), concordance correlation coefficient and Pearson correlation coefficient (r) between the observed and predicted values were calculated. The model had an RMSE of 1.6412 days, a precision of 0.7361 and overall accuracy of 0.9871. In conclusion, the hybrid model presented here is a helpful approach for predicting menstrual cycle length, which in turn can be used to support female athlete wellness.
Introduction
The availability of mobile apps developed to track the menstrual cycle is growing as they are becoming increasingly popular for contraception purposes, fertility awareness and exercise planning. These apps can be grouped broadly as calendarbased, basal body temperature (BBT), or symptothermal^{1,2,3}. Calendar apps generally use simple algorithms based on empirical measurements to predict cycle phase length^{4}; BBT apps describe a woman’s menstrual variation through her basal body temperature rise^{5} and symptothermal apps measure parameters such as cervical mucus changes, bleeding period and so on^{2}.
The mobile app that generated the data used in the study is called FitrWoman. It is a free calendarbased app that enables users to track their menstrual cycle and symptoms, and provides relevant information about wellness, nutrition and exercise, based on the athlete’s predicted menstrual cycle phases and length. The user inputs daily information on 25 symptom variables such as flow, bloating, constipation, injury, illness, irritability and weakness. The target audience is female athletes who wish to track their menstrual cycle to improve their performance and understanding of their individual cycle.
As a woman’s body may respond and adapt differently throughout their cycle, different planning and preparation over the menstrual cycle phases^{6,7,8} might be required. McNulty et al.^{9} observed through metaanalysis that exercise performance might be trivially reduced during the early follicular phase of the menstrual cycle when compared to the other phases.
As few apps are accurate in terms of menstrual cycle length prediction^{10}, the development of an appropriate, exact parametric model for onestepahead forecast cycle length is required. Such a model should take into account the between and withinwoman variability to identify menstrual cycle patterns and how each symptom could affect cycle length, alongside the implications of significant alterations in cycle length.
According to several studies^{11,12,13,14}, the menstrual cycle length can be classified into two groups ‘standard‘ and ‘menstrual dysfunction‘, where a cycle length greater than 35 days is classified as ‘menstrual dysfunction‘ and otherwise as standard. Many statistical models have been proposed in the literature to describe these different groups of menstrual cycles^{2,15,16,17,18}. Generally, cycle length related to the ‘standard‘ group can be analysed using classical statistical approaches. In contrast, the mixture of standard and nonstandard cycles can be analysed using a mixture distribution accounting for the significant symmetric distribution and the component corresponding to the heavy right tail^{14,15}. To account for the withinindividual variability, we focused on the dynamic aspect of menstrual cycles over time, as discussed by Bortot et al. (2010)^{16}, who derived a predictive distribution based on individual repeated measurements using a statespace model formulation. According to these authors^{16}, statespace models under a Bayesian approach have the advantage of incorporating between subject information to compensate for the relatively large number of subjects with a low quantity of repeated measurements and to make predictions for women not included in the sample.
It is wellestablished that having a regular menstrual cycle is a ’vital sign‘ demonstrating that the body is likely to be in an adaptive state and is tolerating the physical and psychological stressors that are being placed on it^{19}. Significant elongations in cycle length are associated with adverse health and fertility outcomes^{20,21,22,23}, therefore gaining a better understanding of the interrelating risk factors for cycle length extension is important.
In this paper, the first objective was to develop an appropriate parametric statespace formulation for the marginal distribution of standard menstrual cycles for female athletes. In addition, symptom variables were included in the model’s linear predictor to evaluate how the individual reported symptoms might affect an athlete’s menstrual cycle duration. The second aim was to develop a onestepahead forecasting interval approach, based on a statespace formulation, to describe the experimental and state process while considering both between and withinwoman variability.
Results and discussion
Results from the statespace models, statespace mixedeffects models and linear mixedeffects models (LMM), fitted using the available data, are summarised in Table 1. In general, the Bayesian information criteria (BIC) suggests that the random walk models fitted better than the LMM when modelling menstrual cycle length, in agreement with the results reported by Bortot at al. (2010)^{16} while contradicting the results of^{2} who report an \(R^2=0.99\) when fitting a simple linear regression.
The inclusion of \(r_{ij}\) to model overdispersed cycle lengths was fundamental to describe menstrual cycle dynamics as evidenced by the BIC criteria where a reduction of \(56.22\%\) compared to \(y_{ij}=m_{ij}+\epsilon _{ij}\), and \(56.62\%\) compared to \(y_{ij}=\beta _0+b_{0i}+\left( \beta _{1}+b_{1i}\right) Age_{ij}+\epsilon _{ij}\) is evident, as shown in Fig. 1. Additionally, the inclusion of a moving average (MA) parameter was necessary to capture the dynamism of shorter cycles followed by longer cycles and viceversa. In summary, a random walk with a random variable to capture overdispersion \(r_{ij}\) plus a MA(1) model demonstrated the best fit to the data.
To assess model performance, we compared the forecasts of these models using the RMSE of onestepahead predictions, CCC and Pearson correlation coefficient evaluated on the test group. Table 1 demonstrates that better forecast predictions were made using a random walk rather than an LMM and that there was little difference between the random walk models in terms of forecasting. As a consequence, the BIC criteria can be used to select the error structure. After selecting the trend and error structures, the next stage of the analysis was the selection of potentially useful explanatory variables. The set of 28 available represented a variety of reported symptoms by the ith woman, including an intervalbased variable representing a woman’s body mass index (Kg/\(\hbox {m}^2\)) (Table 2), classified as discussed by Corbel at al. (2004)^{24}. In this analysis, underweight classes I and II were classified as ’severely’ and ’very severely underweight’ while obese classes I, II, and III represented moderately, severely and very severely obese, respectively. The sample of women had a reported BMI of between 14.44 and 54.25, with a mean of 22.85 Kg/\(\hbox {m}^2\); the absolute frequency is shown as a histogram in Fig. 1.
The selected statespace model summary with posterior means and 95% credibility intervals for the population parameters (after predictor selection) is presented in Table 3.As the model parameterisation facilitates the interpretation of the role played by the explanatory variables, our analysis reveals important insights on how some symptoms affect menstrual cycle length.
We found that the overall menstrual cycle length without any reported symptoms was around \(27.41 \left[ 27.33,27.50\right]\) days, which is in agreement with Guo et al (2006)^{15} and Bull et al. (2019)^{2}. Additionally, the reporting of injury, stomach cramps and flow amount was associated with increased menstrual cycle length. In contrast, the reporting of tender breasts was associated with decreased cycle length. For example, if a woman reported tender breasts ten times over her cycle, as a consequence, her predicted menstrual cycle length is estimated to reduce, on average, by \(0.154 \times 10 = 1.54\) days.
Selftrack symptoms quality depends on both user engagement, app design and unambiguous language to describe the level of a symptom. Consequently, to make it more consistent, filtering the original database based on the scientific literature is a critical way to reduce bias in the covariates used to fit the model, as described by Li et al. (2020)^{14}.
The estimated value of \(\pi\) suggests that the probability of a nonstandard (overdispersed) menstrual cycle length occurring in this population of interest is 0.2636. Consequently, we can infer that \(26.36\%\) \([23.68\%,29.17\%]\) of cycles in the sample are overdispersed. Furthermore, while a nonoverdispersed cycle had a standard deviation (SD) of \(\sigma _{\eta }=1.0417\) [0.9971, 1.0875], the SD of an overdispersed cycle increases where \(\sigma _{w} = 4.7803\) [4.5738, 5.0007], which represents a 4fold increment. According to Najmabadi et al. (2020)^{25}, between and withinvariability in cycle characteristics should be emphasised as an important health indicator to assess behavioural, metabolic, and environmental factors. Therefore, the inclusion of \(\theta\) and \(\sigma _{w}\) play an essential role in the proposed model, as illustrated in Fig. 2. This Figure shows the probability that the proposed model (3) considers an observation as overdispersed where the results clearly demonstrate that \(r_{ij}=\lambda _{ij}w_{ij}\) is capturing menstrual cycles with overdispersion.
Using this model, knowledge and understanding can be gleaned as to how symptom variables affect the menstrual cycle, which is essential for individual athletes, coaches and healthcare professionals. Furthermore, these results can improve the forecasting intervals, helping women to know more about their bodies and cycles based on symptoms during a particular phase of their cycles. Further work is needed to translate these findings into recommendations. Although information relating to follicular and luteal phases was not available in the data, a strong linear correlation between menstrual cycle length and follicular phase has been reported^{26,27,28}. Where the correlation tended to increase with age. To predict ovulation time, further studies, which include both luteal and follicular phases and basal body temperature (BBT), are needed to extend the proposed model^{2}.
Although an ARMA(1,1) model was not needed in this analysis, we have demonstrated that some women have a positive lagone autocorrelation while others have a negative lagone autocorrelation. These results contradict the findings of^{16,29} who report a small general negative autocorrelation for a woman’s profile. In order to better investigate the variability of an autoregressive coefficient, we modified the statespace formulation to accommodate this source of random variation by assuming that \(\phi _{i}=\phi _{0}+\phi _{0i}\), with \(\phi _{0i} \sim N\left( 0,\sigma ^2_{\phi }\right)\). However, the normality assumption for \(\phi _{0i}\) was not justified as the normal QQ plot suggested a distribution with heavy tails and asymmetry; as a consequence, \(80\%\) of points were outside of the 95% simulated envelopes for this random effect (Figure S1).
We also observed that some women had a long cycle followed by a short cycle and vice versa, as observed by Bortot et al. (2010)^{16}. However, we found while \({\hat{\theta }}=0.0915\) with \(\text{ CI}_{95\%}: \left[ 0.1563, 0.0320\right]\) the estimate of the same parameter described by Bortot et al. (2010)^{16} was \(0.61\) \([0.77,0.45]\). It appears that the sample of female athletes that these analyses are based on had more regular menstrual cycles than a sample of 1,798 women observed from clients of the Catholic Marriage Advisory Council of England and Wales. Although we have a higher number of women in our sample than in^{16}, the time series in their sample were longer (up to 109 measurements) compared with up to 55 measurements in this sample. In order to account for the betweensubject variability, we included a random effect in the movingaverage coefficient given by \(\theta _{i}=\theta _{0}+\theta _{0i}\), with \(\theta _{0i} \sim N\left( 0,\sigma ^2_{\theta }\right)\). However, we observed the same problem as reported when considering the autoregressive coefficient where more than \(70\%\) of points were outside of the 95% simulated envelopes, lower asymmetry compared with \(\phi _{0i}\) and heavy tails (Figure S2). Therefore, to avoid bias in individual forecasting predictions, these random effects were dropped from the model. Further work is needed to accommodate individual estimation for the autocorrelation and movingaverage coefficients to improve model performance at the individual level.
The analysis workflow was as follows: we initially checked the Bayesian assumptions and the posterior distribution using suitable plots of the Markov Chain Monte Carlo (MCMC) draws from the posterior distribution and GelmanRubin diagnostic and autocorrelation plots of all model parameters. Figure 3a shows the iterates of \(\beta _0\), \(\pi\), \(\theta _0\), \(\sigma _{\eta }\), \(\sigma _{w}\), and \(\sigma _{\epsilon }\) after a burnin of 10,000 simulated iterations, which indicates convergence of the chains and stationary distributions, as the samples appear to be randomly sampled from the same region of the yaxis rarely venturing outside that area. The autocorrelation and GelmanRubin statistics^{30} were used to assess model convergence. The results suggest that the autocorrelation does not drop dramatically from lag 0 to 50 (Figure S3), indicating a moderate to high autocorrelation among samples. To reduce the impact of this problem, we stipulated a thinning of 50. On the other hand, the GelmanRubin statistic based on three chains showed all upper 95% confidence intervals were exactly equal to 1, meaning the chains had converged. Figure 3b shows the posterior densities obtained for estimated parameters derived from 3 Markov chains with 3000 samples per chain, leading to a computational time of around 23 hours executed on Dell Inspiron 17 7000 with 10\(^{ \text{ th }}\) Generation Intel\(\circledR\) \(\hbox {Core}^{\text{ TM }}\) i7 processor, 1.80GHz \(\times\) fourprocessor speed, 16GB random access memory (RAM) plus 20GB of swap space, 64bit integers, and the platform used is a Linux Mint 19.2 Cinnamon system version 5.2.2050202generic. In summary, the posterior distribution has been well characterised by the drawn samples as no unexpected peaks or strange shapes in the posterior density were observed that could signify poor model convergence. As a final assessment, the autocorrelation function, as well as the standardized residual against the athlete’s age, were checked (Fig. 4). No serious discrepancies nor patterns that warrant attention were observed in both graphs.
Once the assumptions were verified, we evaluated the agreement between the fitted and observed values and forecast intervals. Figure 5 shows the fitted curves for menstrual cycle length of six women, their 95% credible interval, and the onestepahead point forecast with 80%, 95% and 99% forecast intervals. We observed that the random walk with overdispersion parameter and MA(1) model performed well in describing the complex dynamics of menstrual cycle length over time. This conclusion is underpinned by CCC’s residual diagnostic and high values and Pearson correlation between fitted and observed values by the woman. These results also show that linear or linear mixedeffects models should not be applied to explain the variability of menstrual cycle length. They generally do not follow the necessary assumptions of linearity–however, a study done in 2019 by Bull et al. (2019)^{2} appears to use linear models to explain cycle length observed from an extensive database of cycles collected through an app. The authors show an \(R^2=0.987\) without any discussion as to whether the model assumptions are likely to be fulfilled; a high \(R^2\) value does not necessarily imply that a regression model provided an adequate fit to the data^{31}.
The necessity of including \(r_{ij}=\lambda _{ij}w_{ij}\) in our model to describe cycle length is demonstrated in Fig. 6 where the improvement in the point estimates, credible and forecasting intervals when \(r_{ij}=\lambda _{ij}w_{ij}\) was and was not included in the model is given.
The results show that the improvement in the Pearson and concordance correlation coefficients when \(r_{ij}\) was included in the model was mainly for women who had more overdispersed cycles, resulting in better forecast predictions, and narrower corresponding credible intervals.
Finally, to evaluate the onestepahead point forecast prediction we generated prediction using a test set comprised of 1,029 women, each of whom had at least 3 repeated measurements. The results are shown in Fig. 4.
As there are not the same number of repeated measurements for each woman, this makes the forecasting prediction evaluation difficult as the number of women who drop out of the test set increases over time. With this in mind, we found that RMSE values could be two times higher than those presented in Table 1, suggesting that these models are not working well for some women in the test group. The same conclusion is evident when considering the CCC and Pearson correlation coefficients. As the CCC can be written as \(\text{ CCC } = \text{ r } \times \text{ C}_{\text{ b }}\), where r represents a measure of precision and \(\text{ C}_{\text{ b }}\) a measure of accuracy^{32}, we can conclude that our model has high accuracy, with the potential to increase as the number of women with repeated measurements increases. The lower precision reported for the test set suggests that the explanatory variables used in the model may not be enough to explain the variability in the data. Including additional variables such as those that capture information on polycystic ovary presence, daily diet, country of origin,may improve model forecasts in general.
Limitations
The limitation of this study is that it is based on observational data which depends on users logging their information on the app. As a consequence, the models proposed are not intended to elucidate the causal pathway of reported symptoms on cycle length.
Conclusion
Statespace models, incorporating a probability \(\pi\) as a random effect at the subject level in the random walk component. are a valuable approach for predicting menstrual cycle length. They could be used to support female athlete wellness and optimize performance. For this reason a random walk with an overdispersion parameter and an MA(1) model was selected to describe the complex dynamics of menstrual cycle length over time, which resulted in high values of CCC and Pearson correlation between observed and fitted values. Moreover, the importance of incorporating an overdispersion parameter to capture the variability of nonstandard cycles was demonstrated. The data suggested that \(26.36\%\) \([23.68\%,29.17\%]\) of cycles are overdispersed. The random walk standard deviation for a nonoverdispersed cycle is \(\sigma _{\eta }=1.0417\) [0.9971, 1.0875] days which increased to \(\sigma _{w} = 4.7803\) [4.5738, 5.0007] days for nonstandard cycles.
We also found that reporting injury, stomach cramps, tender breasts, and flow amount had a significant effect on menstrual cycle length amongst female athletes using the FitrWoman app. Although accurate forecast predictions are reported, improvements in the variables collected and enhancements to the model are still needed, such as considering a random effect for the movingaverage coefficient \(\theta _{0}\), to improve forecast precision.
Methods
Data characteristics
The sample was comprised of female athletes using the FitrWoman app^{33} who had given their consent for the use of their data for research purposes. The sample size contains data on 16,524 cycles collected from 2,125 women (Fig. 7a), whose mean (sd) age was 34.38 (7.05) years (range 18 to 47 years); mean (sd) weight 62.75 (9.16) Kg (range 42.18 to 100.23 Kg); mean (sd) height 165.88 (6.89) cm (range 152.4 to 186.0 cm); with several repeated measurements per woman ranging from 4 to 53 cycles. There was approximately 60% of information missing for height and weight where the 95% quantile of the sample distribution, based on 893 women, was between 153.0 and 180.0 cm for height and 48.3 and 86.11 Kg for weight. A bivariate density plot for weight and height given age is shown in Fig. 7b in order to visualise the relationship between anthropometry and age in the sample.
Menstrual cycle length is assumed to be normally distributed as the data represent standard cycles^{15}, where the shortest cycle length record was 18 days and the longest was 43 days. The sample mean and variances are 27.62 and 3.51 days, respectively. As some women contributed more than one sequence to the database, we decided to consider only the first sequence available because we don’t know the reasons that caused this temporary dropout. The inclusion of the following sequences might bias the analysis, as also discussed by Bortot et al. (2010)^{16}.
Figure 7c shows profiles for six women with a blue line representing a fitted mixedeffects linear regression model. It can be observed that the inclusion of a random intercept and slope plays an essential role as each woman’s cycle can be affected by different nonobserved explanatory variables. However, the conditional \(R^2\) was equal to 0.40, implying that the linear mixedeffects regression is a good approximation for some profiles, but not for all of them, differing from the results presented by Bull et al. (2019)^{2}, who used a simple linear regression model and obtained an \(R^2=0.99\). This may have happened because the number of linear profiles observed by Bull et al. (2019)^{2} is suppressing the nonlinear profiles in their sample. It is clear, based on our sample, that each woman’s specific trend must be accounted for in terms of their withinsubject temporal dependence and the between subject variability across women.
Figure 7c,d show that for some women a short cycle can be followed by a long cycle and viceversa, suggesting the need for a movingaverage model. Furthermore, Fig. 7d shows that cycle length for some women has a positive autocorrelation. In contrast, others have a negative autocorrelation suggesting the need for an autoregressive movingaverage model incorporating individual random effects for the autocorrelation and the movingaverage coefficients. Finally, Fig. 7e shows a table containing the reported proportion of reported symptoms, where in most cases symptoms did not happen or were not reported.
As a consequence of possible missing data due to non reporting of symptoms, the effect of symptoms on cycle length may be biased towards the null hypothesis of no association between symptom and cycle length (i.e. a type II error). Despite this possible bias and loss in power, the p values obtained from statistical methods fitted to data subject to random error or misclassification are still valid^{34,35,36}.
Statistical analysis
Let \(Y_{ij}\) be a random variable, representing the length of menstrual cycle, where \(y_{ij}\) represents the observed cycle length for the ith woman, \(i=1,2,\ldots ,I\) for her jth menstrual cycle where \(j=1,2,\ldots , J_{i}\). The main objective is to derive the onestep ahead predictive distribution given by
Consequently, we are interested in evaluating \(F_{i,J_{i}+1}\left( Y_{i,J_{i}+1}\right)\) under a parsimonious parametric model, that is,
where \(M_{i,J_{i}+1}\left( y_{i,J_{i}+1}\right)\) is fully specified and \(\varvec{\Theta }\) is a vector of unknown fixedeffect and variance components parameters. In order to accommodate the withinsubject temporal correlation between repeated measures and the betweensubject variability a random walk statespace model and mixedeffects statespace model was used, incorporating a Bayesian approach for process forecasting to predict the duration, in days, of the next menstrual cycle. Each prediction is accompanied by a corresponding interval forecast as point prediction is of limited value without an accompanying measure of uncertainty^{37}. We assumed that cycle length are independent and that menstrual cycles tend to decrease over time as a woman ages^{15,16}. In addition, we combined the Bayesian approach and forecasting proceses to include covariates where model validation procedures were used to compare model adequacy.
State space models for cycle length
The statespace formulation is an attractive choice due to its flexibility to work with discrete response variables and temporal dependency amongst observations. At the same time, the mixedeffects model can be used to account for betweensubject variability. As the observed event is the difference, in days, between the interval from the first day of one bleeding episode up to and including the day before the next bleeding episode, observed cycle lengths can be modelled as discrete random variables. Let \(Y_{ij}\) be a continuous random variable, where \(y_{ij}\) is a realisation of \(Y_{ij}\), which represents the observed cycle length. Furthermore, let \(O_{ij}\) be a discrete random variable, where \(o_{ij}\) is a realisation of \(O_{ij}\) which represents the cycle length in days as a continuous process, that is, \(y_{ij} = o_{ij} + \varepsilon _{ij}\). As we have no way to estimate the error term \(\varepsilon _{ij}\) (observation process), we assume that \(o_{ij} = \lfloor y_{ij} \rceil\) is a good approximation for \(y_{ij}\), where \(\lfloor . \rceil\) indicates rounding. Thus, the true nonobserved continuous cycle length \(y_{ij}\) can be generated by the random walk statespace model:
where \(y_{ij}\) is the menstrual cycle length for the ith woman at jth cycle; \(m_{it}\) is a random walk model that allows an individual trend in the series with \(\eta _{ij}\) assumed to be normally distributed with mean 0 and variance \(\sigma ^2_{\eta }\). We assumed an ARMA(1,1) model for \(\gamma _{it}\), where \(\phi\) is the autoregressive parameter; \(\theta\) is the moving average parameter; and \(\epsilon _{ij}\) is assumed to be normally distributed with mean 0 and variance \(\sigma _{\epsilon }^ {2}\) (process error). Furthermore, \(c_{ij}\) captures the information provided by additional symptoms predictors (\(C_{ij}\)) that may have useful roles in understanding and forecasting cycle length, where \(\alpha _{k}\) represents the kth fixed effect parameter. Finally, \(r_{ij}\) is a random effect term used to account for extravariability (overdispersion) of some menstrual cycle lengths measured on ith woman at cycle j, which could be classified as outliers. Consequently, under model (2), \(y_{ij}\) has probability \(\pi\) of being an overdispersed menstrual cycle (nonstandard) for the jth cycle measured on ith woman, where its additional magnitude is given by \(r_{ij}\) (Fig. 8).
In this way, \(m_{ij}\) can be interpreted as the trend for a standard cycle. In contrast, \(m_{ij}+r_{ij}\) can be interpreted as the trend for a nonstandard cycle, where \(r_{ij}\) is an overdispersion parameter at the subject level for measures which induce extravariability, as discussed in^{38} when modelling the reported number of cases of COVID19 where the inclusion of \(r_{ij}\) allowed for the flexible modelling approach needed.
The statespace representation of the model (2) using the definition described by Brockwell & Davis (2002)^{39} is given by
with initial value \(m_{i1} \sim N\left( \beta _{0}, \sigma _{\eta }^{2}\right)\) for the local level model and \(\gamma _{ij}=\sum _{t=0}^{j1}\phi ^{t}\epsilon _{i,jt}\), with \(j\ge 1\). The linear Gaussian statespace model defined by equation 3 are generated efficiently using the Kalman filter recursions^{40}.
Fitting a separate linear regression for each woman will result in a subjectspecific intercept that may account for variability due to nonobserved variables likely to affect their first observed menstrual cycle. In contrast, a mixed model incorporating random slopes assumes that each woman has a different menstrual cycle length trend relative to her age. To verify if the random walk model proposed has the necessary flexibility to capture differing trends, it was compared to a linear mixedeffects model^{16}. In that case, \(m_{ij} = \beta _0 + b_{0i}+ \left( \beta _{1} + b_{1i}\right) Age_{ij}\), where \(\beta _0\) and \(\beta _1\) are the (marginal) intercept and slope, respectively; \(b_{0i}\) and \(b_{1i}\) are the random effects for the intercept and slope for the ith woman at \(Age_{ij}\), respectively, where it is assumed that
and \(Age_{ij}\) represents a woman’s age.
Bayesian implementation and choice of prior distribution
A Bayesian analysis combines information from observed data with prior distribution for the model’s parameters in order to generate a posterior distribution. In this analysis the inversegamma\((\kappa ,\kappa )\) is a natural candidate for the prior distributions and are often used for random walk statespace models and variance components of mixed effect models. Such a choice of prior is attractive as it can be considered as noninformative within the conditionally conjugate family, when \(\kappa\) is set to a low value such as \(0.1^3\):
A likelihood ratio test was used to test whether the presence of correlations between the random effects in these models played a crucial role. Based on a 95% credible interval for the variance component \(\sigma _{b01}\) for the proposed mixedeffects model there was sufficient evidence that the random effects are plausibly mutually independent and a term to capture the correlation structure between the intercept and slope could be removed from the model.
The choice of Prior distribution for fixed effect parameters is given by \(\beta _{0} \sim N\left( \mu _{\beta _{0}}, \sigma ^{2}_{\beta }\right)\), with \(\mu _{\beta _{0}} \sim \text{ Uniform }\left( 24,32\right)\); \(\beta _{1} \sim N\left( \mu _{\beta _{1}}, \sigma ^{2}_{\beta }\right)\), with \(\mu _{\beta _{1}} \sim \text{ Uniform }\left( 2,2\right)\); \(\phi _{0},\theta _{0} \sim {\mathcal {N}}\left( \mu _{ar},\sigma ^2_{ar}\right)\), with \(\mu _{ar} \sim {\mathcal {N}}\left( 0,100\right)\); and we assumed \(\alpha _{k} \sim {\mathcal {N}}\left( 0, 100\right)\), which is a vague normal density prior. All assumptions were checked to make sure that results were not sensitive to specific choices of prior parameters.
Model selection procedure
The model selection procedure used to compare candidate models involved a balance between forecast accuracy and the Bayesian Information Criterion (BIC). Forecast accuracy was calculated based on RMSE, CCC and Pearson Correlation Coefficient while the BIC was calculated using the following formulation^{41}:
where N is the total number of observations; and p is the number of parameters estimated by the model. The procedure was split into three steps namely the time trend component, the autocorrelation component, and an additional linear predictor as a function of available explanatory variables.
The first step was to account for a possible trend, by identifying the most appropriate error structure for the model, which in our case consisted of a comparison of a random walk model or a linear mixed effect model (Fig. 9). The second step involved the inclusion of temporal dependence among observations, as evident in some women in the sample, where an ARMA model was considered as shown in the Fig. 9. The third and final step involved the inclusion of explanatory variables to account for their (possible) relationship with cycle length. This was achieved using the posterior distribution on the parameter \(\alpha _k\) to select all those variables that did not have the null value for their parameter contained in their corresponding 95% credibility interval.
A novel use of train and test set data was used to validate model performance and to estimate the onestep ahead forecasting prediction accuracy as a function of the number of cycles reported. The complete sample of 2125 was used for model validation by treating the last observed cycle length as test data. The procedure is illustrated in Fig. 10 where the last observed cycle length (red dot) is ‘held back’ as test data and the remaining data (blue dots) were used as training data. The forecast performance was calculated using the RMSE and CCC between the observed and predicted cycle lengths and used jointly with the BIC criteria in the model selection process.
Once the model was selected, it was then sequentially tested using i) the complete data as training data and ii) a random sample of 1029 (approximately half the complete data) as test data. As the number of cycles reported varied from 2 to 12, onestep ahead forecasting prediction accuracy was calculated for each of these scenarios by treating the last observed cycle in each scenario as test data. As the number of athletes in the test set decreased with increasing reported cycle lengths, individuals that had fewer observed cycle lengths for the cycle length scenario under consideration were included in the training set to account for this attrition.
The forecast error for an observed value and its forecast was computed as
where the training data are given by \(\left\{ y_{i1},y_{i2},\ldots , y_{iJ_i}\right\}\) and the test data by \(\left\{ y_{i,J_i+1}\right\}\) (i.e. onestep ahead prediction for each woman), see Fig. 10. The forecast accuracy was measured by the root mean square error (RMSE), concordance correlation coefficient^{32}, and the Pearson correlation coefficient between the observed response in the test data and corresponding predicted cycle length value.
Posterior computation
Markov Chain Monte Carlo (MCMC) was used to generate samples from the posterior distribution for the random walk and mixedeffects statespace models using a Gibbs sampler algorithm^{40}, as this approach is widely used to obtain parameter estimates from a posterior distribution. The convergence of the MCMC algorithm was checked by multiple comparisons of MCMC chains with different starting points. The normality assumptions were checked using suitable residual plots and quantilequantile plots with simulate envelopes^{42}. The oneahead predictive distribution of \(F_{i,J_{i}+1}\left( Y_{i,J_{i}+1}\right)\) was derived through draws from the posterior distribution. Consequently, the \(\kappa\)step ahead predictive distribution was obtained by running the Kalman filter sequentially. All analysis were implemented in R including runjags^{43}, coda^{44}, hnp^{42}, and ggplot2^{45} packages.
Ethics approval
This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, cofunded by the European Regional Development Fund in partnership with Orreco. All methods were carried out in accordance with relevant guidelines and regulation. In particular the data that support this study were made available by ORRECO. Upon first use, all FitrWoman app users provide informed consent by agreeing to their anonymised data being used with third parties for research purposes. However, restrictions apply to the availability of these data used under license for the current study. In order to use the Fitrwoman app each participant must agree to the following conditions: Without prejudice to the foregoing, ORRECO shall have an exclusive, royalty free, perpetual licence to use and retain the User Data and all other information arising from the provision of the Services: (i) for research purposes, (ii) in order to improve the standard of service provided by ORRECO in the future; (iii) in order to validate ORRECO’s proprietary algorithms or intervention programmes; (iv) to analyse and report anonymously on patterns in User Data by reference to their age, sex, ethnicity, discipline, field, training schedule, performance, results or such other data sets as ORRECO may decide; and (v) in order to develop similar or new services, provided that in each case the identity of the User and any personal data comprised within the User Data shall be kept, removed or anonymised. Anonymised data shall be sent to third party processors to be analysed to uncover patterns and trends and to further sports science research. The FitrWoman app is compliant with the General Data Protection Regulation laws (GDPR 2016/679). All experimental protocols and ethical use of data were approved by the ethics committee of the Insight Centre for Data Analytics, National University of Ireland Galway, Ireland.
References
Regidor, P. A., Kaczmarczyk, M., Schiweck, E., GoeckenjanFestag, M. & Alexander, H. Identification and prediction of the fertile window with a new webbased medical device using a vaginal biosensor for measuring the circadian and circamensual core body temperature. Gynecol. Endocrinol. 34, 256–260. https://doi.org/10.1080/09513590.2017.1390737 (2018).
Bull, J. R. et al. Realworld menstrual cycle characteristics of more than 600,000 menstrual cycles. npj Digit. Med. 2, 83. https://doi.org/10.1038/s4174601901527 (2019).
Symul, L. et al. Assessment of menstrual health status and evolution through mobile apps for fertility awareness. npj Digit. Med. 2, 64. https://doi.org/10.1038/s4174601901394 (2019).
Ali, R., Gürtin, Z. B. & Harper, J. C. Do fertility tracking apps offer women useful information about their fertile window?. Reproductive BioMedicine Online 00, 1–10. https://doi.org/10.1016/j.rbmo.2020.09.005 (2020).
Scherwitzl, E. B., Hirschberg, A. L. & Scherwitzl, R. Identification and prediction of the fertile window using NaturalCycles. Eur. J. Contraception Reprod. Health Care 20, 403–408. https://doi.org/10.3109/13625187.2014.988210 (2015).
Schoene, R. B., Robertson, H. T. & Pierson, D. J. Respiratory drives and exercise in menstrual cycles of athletic and nonathletic women. J. Appl. Physiol. Respir. Environ. Exerc. Physiol. 50, 1300–1305. https://doi.org/10.1152/jappl.1981.50.6.1300 (1981).
Sung, E. et al. Effects of follicular versus luteal phasebased strength training in young women. SpringerPlus 3, 668. https://doi.org/10.1186/219318013668 (2014).
Freemas J. A. et al. Exercise Performance Is Impaired during the Midluteal Phase of the Menstrual Cycle. Med Sci Sports Exerc. 53(2), 442–452. https://doi.org/10.1249/MSS.0000000000002464 (2021).
McNulty, K. L. et al. The effects of menstrual cycle phase on exercise performance in eumenorrheic women: A systematic review and metaanalysis. Sports Med. 50, 1813–1827. https://doi.org/10.1007/s40279020013193 (2020).
Duane, M., Contreras, A., Jensen, E. T. & White, A. The performance of fertility awarenessbased method apps marketed to avoid pregnancy. J. Am. Board Fam. Med. 29, 508–511. https://doi.org/10.3122/jabfm.2016.04.160022 (2016).
Harlow, S. D. & Matanoski, G. M. The association between weight, physical activity, and stress and variation in the length of the menstrual cycle. Am. J.Epidemiol. 133, 38–49. https://doi.org/10.1093/oxfordjournals.aje.a115800 (1991).
Harlow, S. D. & Zeger, S. L. An application of longitudinal methods to the analysis of menstrual diary data. J. Clin. Epidemiol. 44, 1015–1025 (1991).
Harlow, S. D., Lin, X. & Ho, M. J. Analysis of menstrual diary data across the reproductive life span Applicability of the bipartite model approach and the importance of withinwoman variance. J. Clin. Epidemiol. 53, 722–733. https://doi.org/10.1016/S08954356(99)002024 (2000).
Li, K. et al. Characterizing physiological and symptomatic variation in menstrual cycles using selftracked mobilehealth data. npj Digital Med. 3, 1–13. https://doi.org/10.1038/s4174602002698 (2020). arXiv:1909.11211.
Guo, Y., Manatunga, A. K., Chen, S. & Marcus, M. Modeling menstrual cycle length using a mixture distribution. Biostatistics 7, 100–114. https://doi.org/10.1093/biostatistics/kxi043 (2006).
Bortot, P., Masarotto, G. & Scarpa, B. Sequential predictions of menstrual cycle lengths. Biostatistics 11, 741–755. https://doi.org/10.1093/biostatistics/kxq020 (2010).
Fukaya, K., Kawamori, A., Osada, Y., Kitazawa, M. & Ishiguro, M. The forecasting of menstruation based on a statespace modeling of basal body temperature time series. Stat. Med. 36, 3361–3379. https://doi.org/10.1002/sim.7345 (2017). arXiv:1606.02536.
Lieberman, J. L., De Souza, M. J., Wagstaff, D. A. & Williams, N. I. Menstrual disruption with exercise is not linked to an energy availability threshold. Medi. Sci. Sports Exerc. 50, 551–561. https://doi.org/10.1249/MSS.0000000000001451 (2018).
Diaz, A., Laufer, M. R. & Breech, L. L. Menstruation in girls and adolescents: Using the menstrual cycle as a vital sign. Pediatrics 118, 2245–2250. https://doi.org/10.1542/peds.20062481 (2006).
Sunni, L. et al. The utility of menstrual cycle length as an indicator of cumulative hormonal exposure. J. Clin. Endocrinol. Metab 97(10), E1871–E1879. https://doi.org/10.1210/jc.20121350 (2012).
Gizzo, S. et al. Menstrual cycle length: A surrogate measure of reproductive health capable of improving the accuracy of biochemical/sonographical ovarian reserve test in estimating the reproductive chances of women referred to ART. Reprod. Biol. Endocrinol. 13, 1–11. https://doi.org/10.1186/s1295801500241 (2015).
Mountjoy, M. et al. International Olympic Committee (IOC) Consensus statement on relative energy deficiency in sport (reds): 2018 update. Int. J. Sport Nutrit. Exerc. Metabol. 28, 316–331. https://doi.org/10.1123/ijsnem.20180136 (2018).
Melin, A. K., Heikura, I. A., Tenforde, A. & Mountjoy, M. Energy availability in athletics: Health, performance, and physique. Int. J. Sport Nutri. Exerc. Metabol. 29, 152–164. https://doi.org/10.1123/ijsnem.20180201 (2019).
Corbel, M. J., Tolari, F. & Yadava, V. K. Appropriate bodymass index for Asian populations and its implications. The Lancet 363, 157–163 (2004). https://doi.org/10.1016/S01406736(03)152683.
Najmabadi, S. et al. Menstrual bleeding, cycle length, and follicular and luteal phase lengths in women without known subfertility: A pooled analysis of three cohorts. Paediatric Perinatal Epidemiol. 34, 318–327. https://doi.org/10.1111/ppe.12644 (2020).
Treloar, A. E., Boynton, R. E., Behn, B. G. & Brown, B. W. Variation of the human menstrual cycle through reproductive life. Int. J. Fertil. 12, 77–26 (1967).
Chiazze, L., Brayer, F. T., Macisco, J. J., Parker, M. P. & Duffy, B. J. The length and variability of the human menstrual cycle. JAMA 203, 377–380. https://doi.org/10.1001/jama.1968.03140060001001 (1968).
Vollman, R. F. The menstrual cycle. Major Probl. Obstet. Gynecol. 7, 1–193 (1977).
Colombo, B. & Bassi, F. Studi in onore di Giampiero Landenna. Studi in onore di Giampiero Landenna 111–126 (1996).
Gelman, A. & Rubin, D. B. Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–472 (1992).
Montgomery, D., Peck, E. A. & ViningG, G. Introduction to linear regression analysis 5th edn. (Wiley, London, 2012).
Lin, L. I. A Concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268 (1989).
FitrWoman app. https://www.fitrwoman.com/. (2018).
Stefanski, L. A. The effects of measurement error on parameter estimation. Biometrika 72, 583–592. https://doi.org/10.1093/biomet/72.3.583 (1985).
Lagakos, W. S. Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable. Stat. Med. 7, 257–274 (1988).
Buonaccorsi, J. P., Laake, P. & Veierod, M. B. On the effect of misclassification on bias of perfectly measured covariates in regression. Biometrics 61, 831–836. https://doi.org/10.1111/j.15410420.2005.00336.x (2005).
Christoffersen, P. F. Evaluating interval forecasts. Int. Econ. Rev. 39, 841–862 (1998).
Oliveira, Td. P. & Moral, Rd. A. Global shortterm forecasting of Covid19 cases. Scientific Reports 11, 1–9. https://doi.org/10.1038/s4159802187230x (2021).
Brockwell, P. & Davis, R. Introduction to time series and forecasting 2nd edn. (Springer, New York, 2002).
Carter, A. C. K. & Kohn, R. On gibbs sampling for state space models. Biometrika 81, 541–553 (1994).
Bengtsson, T. & Cavanaugh, J. E. An improved akaike information criterion for statespace model selection. Comput. Stat. Data Anal. 50, 2635–2654. https://doi.org/10.1016/j.csda.2005.05.003 (2006).
Moral, R. A., Hinde, J. & Demétrio, C. G. Halfnormal plots and overdispersed models in R: The hnp package. J. Stati. Softw. 81(10). https://doi.org/10.18637/jss.v081.i10 (2017).
Denwood, M. J. runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. J. Stati. Softw. 71. https://doi.org/10.18637/jss.v071.i09 (2016).
Plummer, M., Best, N., Cowles, K. & Vines, K. CODA: convergence diagnosis and output analysis for MCMC. R News 6, 7–11 (2006).
Wickham, H. ggplot2: Elegant graphics for data analysis 2nd edn. (Springer, New York, 2016).
Acknowledgements
The authors are grateful to The Insight Centre for Data Analytics, National University of Ireland Galway and Orreco, for supporting this research project. We extend our thanks to the Science Foundation Ireland (SFI) under grant number SFI/12/RC/2289, cofunded by the European Regional Development Fund.
Author information
Affiliations
Contributions
T.P.O. and J.N. conceived and implemented the modelling framework and wrote the manuscript with input from all coauthors. G.B., B.M. and C.P. designed the observational study and data collection. All the authors commented and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
de Paula Oliveira, T., Bruinvels, G., Pedlar, C.R. et al. Modelling menstrual cycle length in athletes using statespace models. Sci Rep 11, 16972 (2021). https://doi.org/10.1038/s41598021959601
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598021959601
Further reading

A proficient approach to forecast COVID19 spread via optimized dynamic machine learning models
Scientific Reports (2022)

Tracking Womenâs Reproductive Data in Sport: Practical Limitations, Perils and Pitfalls
Sports Medicine (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.