Confirmation of ovulation from urinary progesterone analysis: assessment of two automated assay platforms

Urinary concentrations of the major progesterone (P4) metabolite pregnanediol-3-glucuronide (PDG) are used to confirm ovulation. We aimed to determine whether automated immunoassay of urinary P4 was as efficacious as PDG to confirm ovulation. Daily urine samples from 20 cycles in 14 healthy women in whom ovulation was dated by ultrasound, and serial weekly samples from 21 women in whom ovulation was unknown were analysed. Daily samples were assayed by two automated P4 immunoassays (Roche Cobas and Abbott Architect) and PDG ELISA. Serial samples were assayed for P4 by Architect and PDG by ELISA. In women with detailed monitoring of ovulation, median (95% CI) luteal phase increase was greatest for PDG, 427% (261–661), 278% (187–354) for P4 Architect and least for P4 Cobas, 146% (130–191), p < 0.0001. Cobas P4 also showed marked inaccuracy in serial dilution. Similar ROC AUCs were observed for individual threshold values and two-sample percent rise analyses for P4 Architect and PDG (both >0.92). In serial samples classified as (an)ovulatory by PDG, P4 Architect gave ROC AUC 0.95 (95% CI 0.89 to 1.01), with sensitivity and specificity for confirmation of ovulation of 0.90 and 0.91 at a cutoff of 1.67 μmol/mol. Automated P4 may potentially be as efficacious as PDG ELISA but research from a range of clinical settings is required.

We therefore aimed to determine whether daily measurement of creatinine-corrected urinary progesterone using an automated progesterone assay could be used for reliable confirmation of ovulation in a cohort in whom ovulation had already been reliably identified (confirmatory cohort). We also aimed to explore whether a P4 threshold value for one sample, or a percent rise between two samples (one follicular and one luteal) was the more discriminatory in the confirmatory cohort. Furthermore, in weekly samples from an additional cohort, in whom no further information on ovulation status was known (exploratory cohort), we aimed to explore the sensitivity and specificity of weekly P4 in confirming ovulation (threshold value and two sample percent rise) with PDG as the referent.

Materials and Methods
Human subject recruitment. Ethical approval was obtained from South East Scotland Research Ethics Committee (Ref: 09/S1101/67). The study conformed to the principles outlined in the Declaration of Helsinki. The study consisted of two cohorts: a confirmatory cohort, evaluating the ability of urine P4 to confirm ovulation diagnosed by ultrasound, relative to PDG, and an exploratory cohort, comparing P4 and PDG in a likely realworld setting, where true ovulation status is unknown, daily testing is not possible and weekly urinary PDG is used clinically.
The confirmatory cohort comprised 14 healthy women aged 27 to 43 years with self-reported regular menses, who were controls in a study in which timing of ovulation was characterized using gold-standard techniques 13 . All provided informed consent. Inclusion criteria were: reproductive age (between menarche and menopause), no steroidal contraception or other hormonal medication, intrauterine device use or fertility treatment, normal physical examination, and renal and liver function and electrolytes within normal limits. Participants were excluded if they could not complete the required sampling regimen or became pregnant.
The exploratory cohort comprised 21 women attending a reproductive endocrinology service in Edinburgh, undergoing a standard clinical assessment. None were taking hormonal contraceptives or fertility treatment, all were of reproductive age and all had completed the required sampling regimen. Informed consent was not required for the exploratory cohort, since the investigation was part of their routine investigations. Investigators were blinded to the findings of history, examination and other investigations. Seven healthy men aged 28 to 61 years provided single urine aliquots for assessment of linearity and dilution recovery.
Sample size. Since confirmatory and exploratory cohorts were assessed as part of other research or clinical activities, they may be considered convenience samples. Although sample sizes (confirmatory: 20 cycles from 14 women, exploratory: 42 cycles from 21 women) were smaller than previous studies 1,11,[14][15][16] , assessments were in greater detail (e.g. daily and weekly urine sampling, respectively, see 'Capability in confirming ovulation'), hence we anticipated the sample size would be sufficient to confirm the ability of urinary P4 to identify ovulation and explore its diagnostic utility versus PDG.
Creatinine and LH assays. LH was measured in serum and urine by in-house ELISA using two different anti-human LH beta subunit mouse monoclonal antibodies (Medix Biochemica, Kauniainen, Finland), as described elsewhere 17 . While LH may be unstable in urine at −20 °C 18 , a measured peak in urine LH would intended to be supportive of a serum measurement, and TVUS, both within 2-3 days. Urine creatinine was determined using the creatininase/creatinase specific enzymatic method utilizing a commercial kit (Alpha Laboratories Ltd. Eastleigh, UK) adapted for use on a Cobas Fara centrifugal analyser (Roche Diagnostics Ltd, Welwyn Garden City, UK) 19 .
Urine steroid assays. Urine samples were stored at −20 °C until steroid analysis. Measurement of PDG and two automated P4 immunoassays were undertaken on each sample. PDG was measured in duplicate by competitive PDG ELISA. A 96-well plate was coated with 100 μL of pre-precipitated donkey 0.2 μg anti rabbit IgG serum per well (Scottish Antibody Production Unit, Carluke, UK) in ELISA coating buffer for 14 hours at 4 °C and washed twice with 50 mM TRIS buffer containing 137 mM NaCl and 0.05% tween 20 (wash buffer). The plate was blocked with 220 μL 10 mM phosphate buffer and 0.5% w/v bovine serum albumin (BSA), washed twice with wash buffer, 20 μL of sample was added with 80 μL PDG-HRP (in house reagent) 1 in 200,000 in PBS 0.1%BSA (assay buffer) and shaken for 2 minutes. 50 μL Rabbit anti-PDG Ab (in house reagent) 1 in 40,000 assay buffer were then added and incubated in a shaker at 30 °C for 2 h. The plates were washed 5 times with wash buffer and 120 μL 3,3′,5,5′-Tetramethylbenzidine (TMB) was added. After 12-15 minutes 80 μL 2 M H 2 SO4 stop solution was added and the plate read on a plate reader at 450 nm.
Automated P4 chemiluminescent microparticle immunoassay (Abbott Laboratories, Lake Bluff, Illinois, USA): P4 was measured on the Abbott Architect c8000 automated analyser, using a proprietary serum assay kit (Architect System Progesterone, Abbott Ireland Diagnostics Division, Longford, Ireland) according to the manufacturer's instructions. The analytical sensitivity was quoted as ≤0.3 nmol/L. No significant cross-reactants are quoted by the manufacturer. Comparison and correlation between assay methods. A total of 536 daily, early morning urine samples were assayed using PDG ELISA, Architect and Cobas P4.
Cross-reactivity of P4 asssays for PDG. Since neither P4 assay quoted cross-reactivity with PDG, we measured P4 using both assays in three spiked male urine samples (100 nM PDG) and three unspiked samples. Mean progesterone concentrations were compared between spiked and unspiked samples to give percent crossreactivity.
Freeze thaw stability. There was one freeze-thaw cycle between each assay, in the order PDG, Cobas, Architect. PDG is known to be stable following up to 10 freeze-thaw cycles 11 , but the stability of urinary progesterone measured by automated assay has not previously been demonstrated. We examined the effect of up to five freeze-thaw cycles on hormone concentration. An aliquot of male urine was spiked with 240nmol/L P4 and divided into 18 aliquots. Three aliquots were each subjected to 0, 1, 2, 3, 4 or 5 freeze-thaw cycles. P4 was then measured using both assays and the percentage decrease from the index samples (0) calculated.
Assay precision. Standard samples supplied by the manufacturers spanning the low, middle and high range were measured with both P4 methods at four different time points within each run (Architect 4 runs, Cobas 5 runs). Runs were carried out on separate days and the reagent lot was varied to simulate normal operating procedures.
Linearity and dilution recovery. Seven male samples were spiked with 12.5 mmol/L P4 and were diluted serially with unspiked male urine 2, 4 and 8-fold. These samples were tested on both P4 methods, to assess if any disparity diminished with subsequent dilutions.
Capability in confirming ovulation. Daily early morning urine samples were collected across 20 menstrual cycles from the confirmatory cohort. Ovulation was confirmed by the appearance and disappearance of a dominant follicle on transvaginal ultrasound (TVUS), performed every 2-3 days. The precise day of ovulation was determined from the surge in daily urinary LH, corroborated by serum LH (measured every 2-3 days). Samples from ovulation day −10 to −3 were categorized as follicular, and ovulation day +3 to +10 as luteal.
In order to compare the rates of confirmation of ovulation for P4 Architect with PDG ELISA, a further eight, weekly samples were assayed from the confirmatory cohort (in whom the presence or absence of ovulation was otherwise undetermined). These women provided one urine sample every seven days for eight weeks, starting on a random day of the cycle.
Statistical Analysis. Non-normally distributed data were log transformed prior to analysis. Correlation of P4 assays with PDG was performed by Pearson's correlation analysis. Values obtained by Cobas and Architect were also compared using Passing-Bablok regression. A Bland-Altman plot was used to check graphically for systematic bias and heterogeneity across the range of values. For assay precision, a coefficient of variance (CV) within or between assays of 10% or less was considered acceptable. We estimated within and between series imprecision using analysis of variance (ANOVA). For linearity and dilution recovery, the correlation between observed and expected values was compared using Pearson's test.
To assess the performance in confirming ovulation, all three assays were compared graphically by plotting the median, 10 th and 90 th percentile concentrations by day of ovulation. Follicular and luteal P4 concentrations by each assay were compared using paired samples t tests. Percent luteal change for all three assays was calculated using the median ratio of all combinations of follicular and luteal creatinine-corrected concentrations by one-way ANOVA. The sensitivity and specificity of PDG was compared with the closest-correlating P4 assay using 1. Daily urine samples (ovulation confirmed above a threshold concentration) and 2. A percent-rise between pairs of samples (ovulation confirmed above a certain percent rise) to confirm ovulation. Receiver-operator characteristics (ROC) curves were constructed and compared.
To compare the ovulation detection rate of P4 with P3G in the exploratory cohort, the diagnostic threshold concentrations and ratios calculated by the ROC curves in the confirmatory cohort were applied to these weekly samples. Where PDG exceeded the threshold concentration, this cycle was deemed ovulatory. The rise in PDG or P4 was assigned as week 3 for graphical purposes. Other cycles were deemed anovulatory. Luteal percentage changes were calculated as the difference between a sample and each of the other seven weekly samples (56 combinations for each woman). The sensitivity and specificity of P4 relative to PDG (the referent) were calculated for anovulatory and ovulatory cycles. The peak samples from each 4-sample consecutive series were analysed by ROC curve.
A p-value < 0.05 was considered statistically significant. Statistical analyses were undertaken using Analyse-It version 2.2 (Leeds, UK) and SPSS Statistics for Mac version 23.0 (IBM, New York, USA).

Results
Comparison and correlation between assay methods. Bland-Altman analysis demonstrated positive bias towards Cobas values with a trend towards greater discrepancy at lower concentrations of P4 than by Architect (Fig. 1). Passing Bablok plots similarly demonstrated higher values by Cobas at lower concentrations, with higher values by Architect at higher concentrations. The correlations between PDG and P4 by Cobas and P4 by Architect were r = 0.454, r = 0.708, respectively, both p < 0.0001.   (Fig. 4). The optimal individual sample threshold identified to confirm ovulation for P4 and PDG were 1.14 μmol/mmol and 0.208 mmol/ mmol, respectively, yielding sensitivity and specificity of 0.88 to 0.99 with no meaningful differences between single threshold concentration or luteal percent rise, or between assays (Table 2). There was no significant difference between P4 and PDG ROC AUCs for threshold concentration ( Table 3). The optimal percent luteal value rise for P4 and PDG were 165% and 195%, respectively.

Discussion
In a sample of women in whom ovulation was carefully characterized (confirmatory cohort) we found that measurement of urinary P4 using an automated assay reproducibly demonstrated comparable relative concentration changes to PDG. Compared with PDG, the Architect P4 assay demonstrated satisfactory receiver operator characteristics and positive and negative predictive values.   In women in whom ovulation was otherwise undefined (exploratory cohort), P4 Architect was closely comparable to PDG concentration as the referent, with AUC CIs crossing unity and sensitivity and specificity of 90% and 91% respectively. The luteal percent change method estimated a marginally lower sensitivity than single threshold concentration, however this was likely due to the impartial analytical approach (each weekly sample was compared with seven other samples). We were unable to demonstrate any difference between a single threshold value or percent luteal rise in confirmation of likely ovulation.
The Architect method demonstrated a greater luteal rise for P4 than reported by Stanczyk et al. 12 , however this was still significantly less than was seen for PDG. A specific ELISA for PDG remains superior in confirming ovulation using urine samples to either automated P4 assay. In the exploratory cohort, the sensitivity and specificity of P4 were calculated relative to PDG, as a gold-standard technique such as TVUS had not been undertaken. This confirmed the potential of urinary P4 analysis using the Architect system for confirmation of ovulation in a clinical setting. The manual PDG assay however requires several hours to perform with overnight plate coating and/ or antibody incubation 11,20 . Such a time investment will carry cost implications. Recent studies advocating the use of a PDG threshold concentration to confirm ovulation utilized a time-resolved fluorimetric immunosorbent assay, but details of the assay were not described 9,10,14,15 . Time-resolved fluorimetry requires more specialised equipment than the competitive TMB-based ELISA, hence it is likely this technique will be predominantly utilized by specialised reproductive laboratories. Liquid chromatography and tandem mass spectroscopy represent an accurate alternative 21 , however the cost is likely to be prohibitive in the general laboratory. Autoanalyzers such as those tested here are in widespread use for plasma/serum in general biochemistry laboratories and using them  Table 3. Differences between areas under receiver operator characteristics curves between progesterone (P4) and pregnanediol (PDG) single-sample threshold and two-sample percent rise to confirm ovulation (difference and non-directional p value (two-tailed)), in cohort of women in whom ovulation had been confirmed. PDG, pregnanediol glucuronide; P4 progesterone; CI, confidence interval.  for confirmation of ovulation, where available, would be of great practical value, reducing direct and indirect costs and improving efficiency. The message is autoanalyser use improves efficacy and is practical so is less resource intensive than a manual ELISA. They are also less likely to be associated with human error. The Abbott and Cobas P4 assays are not developed or marketed for urine but this analysis suggests that the Abbott assay shows good characteristics and may potentially be of clinical value in this context, after further validation in larger cohorts. While Architect P4 demonstrated a correlation with PDG of r = 0.71, it shows potential for clinical application, since the identification of change from follicular to luteal concentrations is robust (90% sensitive and 91% specific in this exploratory cohort of 21 women). It may also prove a useful tool for population-based research studies, where large numbers of samples need to be analysed. The Cobas demonstrated a matrix effect for measurement of P4 in urine, overestimating concentrations thus limiting the assay's ability to differentiate between follicular and luteal samples. Cobas also demonstrated a poor percentage recovery, an effect which was reduced by serial dilutions with phosphate buffered saline. Architect by comparison was unaffected by matrix effect in urine and showed good recovery, and thus was chosen for further comparisons.
As far as we are aware this is the first time PDG ELISA has been compared with an automated assay of P4 in urine for the confirmation of ovulation. Strengths of our study include the detailed assessment of ovulation and excellent adherence to a daily urine sampling regimen. Our study has several weaknesses. An important limitation is that the confirmatory population was relatively small, although statistical significance was achieved for the key comparison of fold increase in luteal versus follicular P4. Ultrasound, blood and urine hormone measurements in this cohort provide more detail than previous studies and represent a gold standard of ovulation determination, hence we feel this sample size was sufficient to confirm the ability of daily urinary Architect P4 to identify ovulation. Nevertheless these data should therefore be interpreted with caution and substantially larger sample sizes are required for determination of reference ranges for threshold or cutoff values. Our assays were not contemporaneous, with 1 freeze-thaw cycle between each of them. While significant degradation of steroids was not detected, in future researchers should aim to run the assays concurrently 11,22 . Larger studies including TVUS in both ovulatory and anovulatory women are required to determine the best sampling strategy to confirm ovulation. It would also be necessary to determine the efficacy of automated P4 assessment versus PDG in a range of ovulatory patterns before recommending this test for widespread clinical use, for which sufficient reliability has not yet been demonstrated.

Data Availability
The datasets generated and analysed during the current study are available from the corresponding author.