Introduction and objectives
It is estimated that approximately 15% of couples cannot conceive after 1 year of unprotected intercourse. Of those cases, 20% are solely related to the male factor and in another 30–40%, the male factor is contributory.1 Some of these cases are due to an endocrinopathy that may be treatable. The discovery of low testosterone with elevated prolactin, for instance, is likely due to a prolactinoma, and it may be possible to restore fertility with bromocriptine or cabergoline therapy. Hypogonadotropic hypogonadism due to Kallmann syndrome is also important to identify, as it may be treatable with human chorionic gonadotropin and recombinant follicle-stimulating hormone (FSH), and as the Kallmann gene has been cloned, such a diagnosis has implications for the patient's future offspring. In addition to identifying reversible causes of endocrinopathy, examining the parameters that best predict an endocrinopathy may lead to better understanding of the underlying pathophysiology of infertility.
The conventional evaluation for men with infertility has evolved over the years. Recently, Sharlip et al.1 published best practice policies on the evaluation of subfertile men. The authors recommended endocrine screening only with FSH and testosterone levels in men with sperm counts less than 10 million/ml or other evidence of endocrinopathy such as physical findings or decreased libido complaints. This recommendation was based on our observation of a low probability of endocrinopathy in men with sperm count greater than 10 million/ml.2
Since detection of endocrinopathies affecting male fertility may provide significant prognostic information to infertile couples as well as uncover significant underlying medical conditions, we investigated whether we could improve upon the current clinical guidelines using more advanced statistical models. Neural networks are a form of statistical modeling that provides a better curve fit for nonlinear data compared to traditional statistical techniques. Investigators have previously used neural networks, a form of statistical modeling, to analyze infertility data. To our knowledge, these methods have not yet been utilized to predict the presence of endocrinopathy based solely on clinical or semen analysis data.3, 4
Materials and methods
In this paper, we use the term 'endocrinopathy' to mean the presence of an abnormality in the serum hormonal panel (testosterone, FSH, luteinizing hormone (LH) or prolactin), without necessarily implying a primary endocrine cause of infertility.
A retrospective analysis of 1525 patients attending two fertility centers for infertility evaluations was performed. A complete medical history and physical examination was obtained on all patients. Testicular volume was determined using either a Seager or Prader orchidometer. All patients underwent endocrine testing including FSH and testosterone. At one center, follow-up studies including LH and prolactin were only obtained if the initial endocrine studies were abnormal. At the other center, prolactin and LH were obtained for all patients in the initial panel. A minimum of two semen analyses were obtained from all patients after a 2- to 3-day period of abstinence and evaluated by trained laboratory technicians. Seminal volume, sperm density, percent motility, forward progression and percentage of normal morphologic forms were recorded. The final study population comprised 1035 men, 385 from the first center and 650 from the second, in whom semen analysis and hormone levels were known. Patients were excluded who had undergone prior fertility workup or vasectomy.
The four models investigated were linear and quadratic discriminant function analysis (LDFA and QDFA), logistic regression (LR) and a neural network. Neural networks were implemented using 'neUROn' (Neural computational environment for UROlogical numericals), a suite of C++ programs developed by the investigators and cross-compiled using Microsoft Visual C++ version 6 (Microsoft Corp., Redmond, WA, USA) and GNU C++ (Cygwin port) version 2.95 (Red Hat Corp., Raleigh, NC, USA). The training method was canonical off-line back propagation with weight decay, with the weight decay term
chosen to be 5
10-5. All transfer functions were sigmoidal, allowing for odds ratios to be easily computed at the output node, and the error function was chosen to be cross-entropy for feature extraction using Wilk's generalized likelihood ratio test (GLRT). Decision boundaries of the various models were visualized with software developed by us with Microsoft Visual C++ version 6. The chosen algorithm calculated the model's output for each pixel in the plot, and then identified the boundary pixels as those having an adjacent pixel with a different outcome. The data set was randomly divided into a 'training' set of 777 exemplars and a 'test' set of 258 exemplars. The proportion of exemplars with endocrinopathies was kept similar in both sets using a randomization algorithm, which preserved initial outcome frequencies in each. The test set was excluded from training and used only for cross-validation ('n1/n2' method). Multiple random sets of initial conditions (connection weights) were derived, and the training set was iteratively applied to the neural computational system. When overlearning was observed by divergence of training and test set errors, hidden nodes were removed to reduce network topology. A single hidden layer with four hidden nodes was determined to represent an optimal topology which maintained acceptable goodness-of-fit without overlearning. We considered the network to be trained to completion when the error was observed to be oscillating at a local error minimum, and the error gradient was less than 1
10-6.
We employed Wilk's GLRT to determine which input features were significant to the model's outcome in a reverse regression analysis. We also modeled the data set using LR and LDFA and QDFA to compare the nonlinear computational method of neural computation with traditional linear statistical modeling tools. Receiver operating characteristic (ROC) curves were generated by plotting sensitivity vs (1-specificity) for all possible thresholds. We computed ROC area under the curve (AUC) statistically where possible using the method described by Wickens and compared threshold-independent true and false-positive and -negative rates statistically according to the method described by DeLong.5, 6
To use Wilk's GLRT for feature extraction, the full network was trained to a strict local error minimum, and the cross-entropy error was calculated. Subtracting each input node sequentially created feature-deficient networks. These subnetworks were retrained to a strict local error minimum, and the cross-entropy error for each subnetwork was recorded. The probability P that the modeler can reject the null hypothesis (that the full network and feature-deficient network are equivalent) follows a
2 probability distribution with degrees of freedom equal to the number of nodal connections removed by generating the feature-deficient subnetwork.7
Results
Average and median testicular volumes, semen parameters and hormonal panels for the study population are listed in Table 1. For the purpose of describing the data set further, Table 2 lists the incidence of endocrinopathy in the study population. Over 90% of the population had a normal endocrine (eugonadotropic) evaluation. The most common endocrinopathy seen was germ cell failure, seen in 7.8%, followed by complete testicular failure (0.8%), hyperprolactinemia (0.4%), hypogonadotropic hypogonadism (not from hyperprolactinemia) (0.2%), Leydig cell tumor (0.2%) and androgen resistance (0.1%). These terms are defined in Table 3.
Table 1 - The average and median clinical, seminal and hormonal characteristics of the infertile men included in the study (n=1035).
Table 2 - The final diagnoses of the study subjects, with the great majority of men having no evidence of endocrinopathy.
Table 3 - Definitions of the diagnostic categories used to describe an endocrine cause for infertility in men.
Testis volume, sperm density and motility percentage data were compiled. To illustrate how these variables affect the likelihood of endocrinopathy, we generated decision boundary plots based on the neural network model (Figure 1). Figure 1a shows how the likelihood of endocrinopathy increases with decreasing sperm count and decreasing testis volume, while Figure 1b shows that endocrinopathy is less likely with higher motility values. Four models were constructed, and ROC areas were calculated. ROC AUC for each of the four models in the test set is shown in Table 4. The neural network produced the highest ROC AUC of 0.948. LR produced an ROC AUC close to that of the neural network at 0.925. Wilk's regression was performed on each variable as described above to determine the significance of each, and the P-values for each are shown in Table 5. With a P-value of 7.13
10-13, testis volume added significant predictive information to the model. Removal of testis volume during reverse regression analysis decreased the ROC from 0.948 to 0.908, a value similar to that obtained by Sigman with an LR model employing only sperm count.2 Both QDFA and LDFA demonstrated less accuracy, with ROC AUC of 0.854 and 0.637, respectively. To determine whether these differences in ROC AUC were statistically significant, DeLong's test was employed.6 The P-value between LR and a four-hidden-node neural network was 0.23, not significant at the P<0.05 level. By contrast, the P-values between LDFA and QDFA when compared with the neural network were both <0.05, suggesting that the neural network was superior to these linear models and the difference was significant.
Figure 1.
(a) Effect of testicular volume and sperm count on likelihood of endocrinopathy, with motility kept constant at its mean of 45. Lighter gray indicates higher likelihood of endocrinopathy. The decision boundary, where the odds of endocrinopathy and nonendocrinopathy are equal, is indicated by the white line. The boundary would be slightly different at various values of motility. (b) Effect of motility and testis volume with the count kept constant at 10 million/ml, a value suggested as a cutoff for endocrine evaluation.2
Full figure and legend (45K)Table 4 - The ROC areas under the curve for the four models used to predict endocrinopathy based on testicular volume, sperm count and sperm motility, with logistic regression and the neural network outperforming discriminant function analysis methods.
Table 5 - The relative importance of the input variables used to develop the neural network model with the P-value representing the likelihood that the variable is statistically important to the model's accuracy.
Discussion
Endocrinopathy is a relatively uncommon cause of male infertility, accounting for less than 3% of infertile men.2 Yet, it remains an important diagnosis to consider despite its rarity as it can be reversible and signify a life-threatening condition. Reproductive endocrine abnormalities may be due to etiologic endocrinopathies, those endocrine disorders that cause infertility, or due to endocrine abnormalities that are reflective of testicular dysfunction. Etiologic endocrine disorders such as hyperprolactinemia due to a pituitary adenoma, may be treatable, allowing for an improvement in fertility. In contrast, endocrine abnormalities reflective of impaired testicular function—such as an isolated elevation of FSH—may not be directly treatable but have prognostic and diagnostic value. Thus both types of endocrine disorders should be diagnosed during the evaluation of the infertile male.
In our prior investigation, endocrinopathy was rarely noted in men with a sperm density greater than 10 million/ml. For that reason, we suggested using a threshold of less than 10 million/cc as an indication for endocrine evaluation. Sperm count was the most significant single factor in predicting presence of endocrinopathy. We reported an ROC AUC of 0.902 using sperm count alone.2 In the four models investigated in this study, the addition of testis volume and sperm motility percentage improved the model's ability to predict an endocrinopathy significantly, as evidenced by the higher ROC AUC demonstrated previously in Table 4. The assumption made at the beginning of the investigation was that all three of these variables would be important to the model. However, given the P-value of sperm motility seen in Table 5 it is likely that motility is not a significant predictor of endocrinopathy in subfertile males.
We posit the following formula, based on our analysis of the data using an LR model with continuous variables, to calculate the probability of endocrinopathy:

where x is testis volume in ml, y is sperm count in million/ml, and z is motility percentage.
Finally, the probability of endocrinopathy can be computed as follows:

A user-friendly version of the four-hidden node neural network model can be accessed by clinicians at www.urocomp.org, and a screen shot of the model using an Internet browser on the Windows XP operating system can be seen in Figure 2. We envision clinicians using this model to tailor the diagnostic algorithm to each individual patient. The model reports the odds of endocrinopathy over a wide range of testicular volume and sperm counts allowing each clinician to customize his or her own threshold values for ordering an endocrine evaluation.
Figure 2.
This model can be made available online via the Internet as well as on palm and pocket PC handheld PDA devices.
Full figure and legend (187K)Analyzing individual variables for significance may provide new insights to the pathophysiology of endocrinopathy. The reverse regression analysis revealed that testis volume appears to be predictive of an endocrinopathy (Table 5). This easily measured clinical variable may be an effective surrogate for sperm count in predicting endocrinopathy in the future, but further study is needed.
It is also interesting to note that sperm motility was not shown to be predictive of an endocrinopathy. With further investigation, motility might even be shown to be unrelated to the endocrine parameters investigated in this study. The method by which a technician determines motility in the laboratory may also play a role in its significance.
Conclusion
We developed a highly accurate neural network model to predict the presence of endocrinopathy in men undergoing an infertility evaluation. Sperm count as well as testis volume was found to be highly predictive of an endocrinopathy in the subfertile male. This model is available for clinical use on the World Wide Web at the following address: http://www.urocomp.org.
References
- Sharlip ID, Jarow JP, Belker Am, Lipshultz LI, Sigman M, Thomas AJ et al. Best practice policies for male infertility. Fertil Steril 2002; 77: 873–882. | Article | PubMed | ISI |
- Sigman M, Jarow JP. Endocrine evaluation of infertile men. Urology 1997; 50: 659–664. | Article | PubMed | ISI | ChemPort |
- Niederberger CS, Lipshultz LI, Lamb DJ. A neural network to analyze fertility data. Fertil Steril 1993; 60: 324–330. | PubMed | ISI | ChemPort |
- Lamb DJ, Niederberger CS. Artificial intelligence in medicine and male infertility. World J Urol 1993; 11: 129–136. | Article | PubMed | ISI | ChemPort |
- Wickens TD. Elementary Signal Detection Theory. Oxford University Press: New York, 2002.
- DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44: 837–845. | Article | PubMed | ISI | ChemPort |
- Golden RM. Mathematical Methods for Neural Network Analysis and Design. MIT Press: Cambridge, 1996.
MORE ARTICLES LIKE THIS
These links to content published by NPG are automatically generated
RESEARCH
Computational models for detection of endocrinopathy in subfertile males
International Journal of Impotence Research Original Article
Computational models for detection of endocrinopathy in subfertile males
International Journal of Impotence Research Original Article
