Introduction

Enzyme deficiencies along the pathway from ammonia to urea constitute the “urea cycle disorders” and result in hyperammoniemic encephalopathy. Ammonia that is transformed to carbamylphosphate normally enters the urea cycle via ornithine transcarbamylase (OTC). In the case of OTC deficiency (OTCD), carbamylphosphate drains into the pyrimidine synthesis pathway and enhances production of orotate and orotidine. Orotic aciduria therefore sets OTCD apart from other urea cycle disorders (Bachmann and Colombo 1980).

OTCD is an X-chromosomal disorder with an estimated incidence of 1:14,000 (Brusilow and Horwich 2001). In a typical male patient, encephalopathy develops soon after birth when placental clearance of ammonia has ended. Female carriers also have different levels of reduced enzyme activity and are endangered by metabolic stress. Usually, however, the medical history of carriers is unremarkable (Maestri et al. 1998), and the excretion of orotate is only slightly increased (Bachmann and Colombo 1980).

Since male OTCD patients do not reproduce, Haldane’s (1935) equilibrium consideration predicts that about one-third of all OTCD gene mutations occur de novo. However, the mutation rate is much higher in spermatogenesis than in oogenesis (Tuchman et al. 1995). Thus, the mothers of four-fifths rather than two-thirds of all male patients are carriers, while in four-fifths of heterozygous females the mutation occured de novo (McCullough et al. 2000). The rate of detection of OTCD mutations is limited to 80% (Tuchman et al. 2002). Thus, there is a considerable fraction of OTCD families where carriership in females is possible but uncertain.

Oxypurinol ribonucleotide, an allopurinol metabolite, enhances the excretion of orotate and orotidine by inhibiting orotidine decarboxylase. Therefore, the “allopurinol test” has been proposed as a tool for the detection of OTCD carriers (Hauser et al. 1990; Maestri et al. 1998). Recent analyses have shown, however, that the reliability of this test is limited (Bonham et al. 1999; Oexle et al. 2002; Grunewald et al. 2004). The rate of type II errors (carriership despite a negative test) in mothers of male patients may be as high as 23–34% (Oexle et al. 2002; Grunewald et al. 2004). Other biochemical parameters, such as serum glutamine concentration, also do not definitively distinguish carriers from noncarriers. However, biochemical data can impart useful information. Using logistic regression and Bayesian analysis I show here how these data can be used and how they can be combined with genetic information in estimating carrier risk in OTCD.

Combination of information

The joint probability P(C,D) of the two states C (e.g., “carrier”) and D is the product of the probability of one state times the conditional probability of the other. Thus, P(C,D) = P(D) P(C|D) = P(C) P(D|C) and, analogously, \(P({\text{-{\hskip-7pt}{C}}},{\text{D}}) = P({\text{D}})\;P({\text{-{\hskip-7pt}{C}}}|{\text{D}}) = P({\text{-{\hskip-7pt}{C}}})\;P({\text{D}}|{\text{-{\hskip-7pt}{C}}}), \) where the state \( {\text{-{\hskip-7pt}{C}}} \) (noncarrier) is the complement of state C. Dividing P(C,D) by \( P({\text{-{\hskip-7pt}{C}}},{\text{D}}) \) and taking the logarithm yields

$$ {\text{log}}{\left( {\frac{{P({\text{C}}|{\text{D}})}} {{P({\text{-{\hskip-7pt}{C}}}|{\text{D}})}}} \right)} = {\text{log}}{\left( {\frac{{P({\text{D}}|{\text{C}})}} {{P({\text{D}}|{\text{-{\hskip-7pt}{C}}})}}} \right)} + {\text{log}}{\left( {\frac{{P({\text{C}})}} {{P({\text{-{\hskip-7pt}{C}}})}}} \right)}, $$
(1)

where \( P({\text{C}}) = 1 - P({\text{-{\hskip-7pt}{C}}}) \) is the prior probability of state C, \( P({\text{C}}|{\text{D}}) = 1 - P({\text{-{\hskip-7pt}{C}}}|{\text{D}}) \) is the posterior probability of state C given observed data D, and \( \log (P({\text{D}}|{\text{C}})/P({\text{D}}|{\text{-{\hskip-7pt}{C}}})) \) is the logarithm of odds (lod).

The mutually independent data values D1 and D2 may influence the probability of C. After taking information D1 into consideration, the prior probabilities P(C) and \( P({\text{-{\hskip-7pt}{C}}}) \) must be replaced by the posterior probabilities P(C|D1) and \( P({\text{-{\hskip-7pt}{C}}}|{\text{D}}_{1} ) = 1 - P({\text{C}}|{\text{D}}_{1} ), \) respectively. Considering D2 in the next step then gives Eq. 1 as

$$ {\text{log}}{\left( {\frac{{P({\text{C}}|({\text{D}}_{1} ,{\text{D}}_{2} ))}} {{P({\text{-{\hskip-7pt}{C}}}|({\text{D}}_{1} ,{\text{D}}_{2} ))}}} \right)} = {\text{log}}{\left( {\frac{{P({\text{D}}_{2} |{\text{C}})}} {{P({\text{D}}_{2} |{\text{-{\hskip-7pt}{C}}})}}} \right)} + {\text{log}}{\left( {\frac{{P({\text{D}}_{1} |{\text{C}})}} {{P({\text{D}}_{1} |{\text{-{\hskip-7pt}{C}}})}}} \right)} + {\text{log}}{\left( {\frac{{P({\text{C}})}} {{P({\text{-{\hskip-7pt}{C}}})}}} \right)}{\text{.}} $$
(2)

The lods of mutually independent data D1,..., Dr may thus be added,

$$ {\text{log}}{\left( {\frac{{P(({\text{D}}_{1} ,...,{\text{D}}_{r} )|{\text{C}})}} {{P{\text{((D}}_{1} ,...,{\text{D}}_{r} )|{\text{-{\hskip-7pt}{C}}})}}} \right)} = {\sum\limits_i^r {{\text{log}}{\left( {\frac{{P({\text{D}}_{i} |{\text{C}})}} {{P({\text{D}}_{i} |{\text{-{\hskip-7pt}{C}}})}}} \right)}} }. $$
(3)

Logistic regression

Logistic regression may be applied to describe how a random variable, X, influences a binary alternative. Consider a proband who, depending on X, has the probabilities P(C|X) and \( P({\text{-{\hskip-7pt}{C}}}|X) = 1 - P(C|X) \) of being a carrier or not, respectively. In logistic regression the “logit”, i.e. the natural logarithm of \( P({\text{C}}|X)/P({\text{-{\hskip-7pt}{C}}}|X), \) is modelled as a linear function of X,

$$ \log _{{\text{e}}} {\left( {\frac{{P({\text{C}}|X)}} {{1 - P({\text{C}}|X)}}} \right)} = a_{0} + a_{1} X, $$
(4)

where a0 and a1 are constants. Taking the natural exponential function on both sides results in P(C|X) as a function of X,

$$ P({\text{C}}|X) = \frac{{{\text{exp}}(a_{0} + a_{1} X)}} {{1 + {\text{exp}}(a_{0} + a_{1} X)}}, $$
(5)

with an S-shaped graph between P(C|X) → 0 for small X and P(C|X) → 1 for large X. The coefficients a0 and a1 are estimated by fitting this curve to empirical data points derived from a group of probands whose P(C|X) are known, i.e. 1 for obligate carriers and 0 for noncarriers. The fitting procedure (maximum likelihood) makes an assumption regarding the error distribution (e.g. not depending on X) and hinges on the fact that small errors are more probable than large ones. The best fit thus results in the highest joint probability (likelihood L) of the data points with respect to the regression curve. Maximizing L with respect to a0 and a1, i.e. ∂L/∂a0 = 0 and ∂L/∂a1 = 0, yields the estimated values of a0 and a1. For calculation (Newtonian–Raphson method), commercial software (e.g. release 8.1 SAS/STAT of SAS, Cary, NC) or free Internet programs (Pezzullo 2005) may be used.

Logistic regression yields the odds (i.e. the lod) since the prior probability of carriership P(C) in the proband group is indicated by the fraction of NC of carriers among the N probands on which the regression is based. With Eq. 1 the lod is

$$ \log _{{\text{e}}} {\left( {\frac{{P(X|{\text{C}})}} {{P(X|{\text{-{\hskip-7pt}{C}}})}}} \right)} = \log _{{\text{e}}} {\left( {\frac{{P({\text{C}}|X)}} {{1 - P({\text{C}}|X)}}} \right)} - \log _{{\text{e}}} {\left( {\frac{{N_{{\text{C}}} }} {{N - N_{{\text{C}}} }}} \right)}. $$
(6)

The last term is zero if half of the probands are carriers, i.e.

$$ {\text{log}}_{{\text{e}}} {\left( {\frac{{N_{{\text{C}}} }} {{N - N_{{\text{C}}} }}} \right)}{\text{ = log}}_{{\text{e}}} {\text{1 = 0}}{\text{.}} $$

Logistic regression of allopurinol test results

Only a few allopurinol test results of individual obligate OTCD carriers are available in the literature. Results of four obligate carriers and four individuals from the general population can be extracted from Sebesta et al. (1994). Peak orotidine excretions following 300 mg allpurinol were 14.6, 15.3, 23.1, and 36.4 mmol/(mol creatinine) in carriers (Table 3 in Sebesta et al. 1994) and 1.7, 3.9, 4.3, and 6.4 mmol/(mol creatinine) in noncarriers (Fig. 2 in Sebesta et al. 1994). The authors observed that their own results as well as the results of Hauser et al. (1990) were skewed slightly towards large values. They therefore proposed logarithmic transformation. Thus, using the natural logarithms of the allopurinol test results of Sebesta et al. (1994), logistic regression yields

$$ \log _{{\text{e}}} {\left( {\frac{{P({\text{C}}|\log _{{\text{e}}} (x))}} {{1 - P({\text{C}}|\log _{{\text{e}}} (x))}}} \right)} = - 107.75 + 47.54\;\log _{{\text{e}}} (x){\text{,}} $$
(7)

where x is the peak orotidine excretion in mmol/(mol creatinine). The same linear approximation applies to the lod since NC = N − NC = 4 (see Eq. 6).

Carriership and noncarriership have the same probability at x = 9.65; ratios of 1:100 and 100:1 are reached at x = 8.76 and x = 10.63, respectively. The steepness of this regression curve results from the small and clearly separated proband groups. This separation does not represent the real situation, however. Grunewald et al. (2004) observed a large overlap between carriers and noncarriers, with 2 of 22 carriers having negative results and 6 of 20 noncarriers having positive results. Unfortunately, these authors did not provide the individual test results.

Deriving odds from normal distributions

Individual biochemical data, such as plasma glutamine concentrations, in sufficiently large groups of OTCD carriers are not available. Only mean and standard deviation (SD) are indicated in the literature. Nonetheless, the odds may be derived if an assumption regarding data distributions is made. Here, I assume normal distributions, i.e. the probability density function,

$$ f(x) = \frac{1} {{\sigma {\left( {{\sqrt {2\pi } }} \right)}}}\;{\text{exp}}{\left( {\frac{{ - (x - \mu )^{2} }} {{2\sigma ^{2} }}} \right)}, $$
(8)

in carriers and noncarriers with the expected means μC and \( \mu _{{{\text{-{\hskip-7pt}{C}}}}} \) and variances σ 2C and \( \sigma ^{{\text{2}}}_{{{\text{-{\hskip-7pt}{C}}}}} , \) respectively. Using the probability density functions fC(x) and \( f_{{{\text{-{\hskip-7pt}{C}}}}} (x) \) in carriers and noncarriers, respectively, the lod can be calculated as

$$ \log _{{\text{e}}} {\left( {\frac{{P(x|{\text{C}})}} {{P(x|{\text{-{\hskip-7pt}{C}}})}}} \right)} = \log _{{\text{e}}} {\left( {\frac{{f_{{\text{C}}} (x)}} {{f_{{{\text{-{\hskip-7pt}{C}}}}} (x)}}} \right)} = \log _{{\text{e}}} {\left( {\frac{{\sigma _{{{\text{-{\hskip-7pt}{C}}}}} }} {{\sigma _{{\text{C}}} }}} \right)} + \frac{{(x - \mu _{{{\text{-{\hskip-7pt}{C}}}}} {\text{)}}^{2} }} {{2\sigma _{{{\text{-{\hskip-7pt}{C}}}}} ^{2} }} - \frac{{(x - \mu _{{\text{C}}} )^{2} }} {{2\sigma _{{\text{C}}} ^{2} }}. $$
(9)

The odds are 1 (and lod = 0) at position x0 where \( P(x_{0} |{\text{C}}) = P(x_{0} |{\text{-{\hskip-7pt}{C}}}), \) i.e. where x has the same probability density in carriers (C) and in noncarriers \( ({\text{-{\hskip-7pt}{C}}}). \) Thus, Eq. 9 results in a quadratic equation with the solution

$$ x_{0} = A \pm \sqrt[]{{A^{2} - \frac{{\sigma _{{\text{C}}} ^{2} \mu _{{{\text{-{\hskip-7pt}{C}}}}} ^{2} - \sigma _{{{\text{-{\hskip-7pt}{C}}}}} ^{2} \mu _{{\text{C}}} ^{2} + 2\sigma _{{\text{C}}} ^{2} \sigma _{{{\text{-{\hskip-7pt}{C}}}}} ^{2} \,\log _{{\text{e}}} (\sigma _{{{\text{-{\hskip-7pt}{C}}}}} /\sigma _{{\text{C}}} )}} {{\sigma _{{\text{C}}} ^{2} - \sigma _{{{\text{-{\hskip-7pt}{C}}}}} ^{2} }}}}\quad A = \frac{{\sigma _{{\text{C}}} ^{2} \mu _{{{\text{-{\hskip-7pt}{C}}}}} - \sigma _{{{\text{-{\hskip-7pt}{C}}}}} ^{2} \mu _{{\text{C}}} }} {{\sigma _{{\text{C}}} ^{2} - \sigma _{{{\text{-{\hskip-7pt}{C}}}}} ^{2} }}, $$
(10a)
$$ x_{0} = \frac{{\mu _{{\text{C}}} + \mu _{{{\text{-{\hskip-7pt}{C}}}}} }} {2}\quad {\text{if}}\;\sigma _{{\text{C}}} = \sigma _{{{\text{-{\hskip-7pt}{C}}}}} . $$
(10b)

The first derivative at x0 yields the linear approximation in the vicinity of x0,

$$ \log _{{\text{e}}} {\left( {\frac{{P(x|{\text{C}})}} {{P(x|{\text{-{\hskip-7pt}{C}}})}}} \right)} = (x - x_{0} )\;\frac{\partial } {{\partial x}}\log _{{\text{e}}} {\left( {\frac{{f_{{\text{C}}} (x)}} {{f_{{{\text{-{\hskip-7pt}{C}}}}} (x)}}} \right)}_{{x = x_{{\text{0}}} }} = (x - x_{{\text{0}}} )\,{\left( {\frac{{x_{{\text{0}}} - \mu _{{{\text{-{\hskip-7pt}{C}}}}} }} {{\sigma _{{{\text{-{\hskip-7pt}{C}}}}} ^{2} }} - \frac{{x_{{\text{0}}} - \mu _{{\text{C}}} }} {{\sigma _{{\text{C}}} ^{2} }}} \right)} = b_{0} + b_{1} x, $$
(11)

where b0 and b1 are constants.

Odds of the plasma glutamine concentration

Consider, for instance, the plasma glutamine concentrations in OTCD carriers (mean ± SD = 702 ± 167 μmol/l) and noncarriers (568 ± 84.4 μmol/l) as determined by Maestri et al. (1998). With these estimations on the μ and σ values of the assumed normal distributions and their probability density functions fC(x) and \( f_{{{\text{-{\hskip-7pt}{C}}}}} (x), \) the linear approximation above (see Eqs. 10a, 10b, 11) yields

$$ \log _{{\text{e}}} {\left( {\frac{{P(x|{\text{C}})}} {{P(x|{\text{-{\hskip-7pt}{C}}})}}} \right)} = b_{0} + b_{1} x = - 10.20 + 0.015\,x, $$
(12)

in the vicinity of a plasma glutamine concentration of x0 = 668 μmol/l. The odds \( P(x|{\text{C}})/P(x|{\text{-{\hskip-7pt}{C}}}) \) are identical to the quotient \( P({\text{C}}|x)/P({\text{-{\hskip-7pt}{C}}}|x) = P({\text{C}}|x)/(1 - P({\text{C}}|x)) \) if the prior probability is 50%, i.e. if \( P({\text{C}}) = 0.5 = P({\text{-{\hskip-7pt}{C}}}) \) (see Eq. 3). The posterior probability P(C|x) is then given as P(C|x) = (odds)/(1+odds). Figure 1 displays the probability density functions fC(x) and \( f_{{{\text{-{\hskip-7pt}{C}}}}} (x), \) and, assuming P(C) = 0.5, the posterior probability P(C|x) as derived from Eq. 9 and as approximated in Eq. 12. The approximation deviates considerably for concentrations outside the interval 668 ± 70 μmol/l.

Fig. 1
figure 1

Probability density functions fC(x) and \( f_{{{\text{-{\hskip-7pt}{C}}}}} (x) \) of plasma glutamine concentration in ornithine transcarbamylase deficiency (OTCD) carriers and noncarriers, respectively, assuming normal distributions (mean ± SD values) as determined by Maestri et al. (1998). Posterior carrier probability P(C|x) as derived from Eqs. 3 to 9 assuming a prior probability of 50%. Approximation P(C|x)approx of this posterior probability as determined for the interval comprising the intersection of fC(x) and \( f_{{{\text{-{\hskip-7pt}{C}}}}} (x) \) at x0 = 668 μmol/l (see Eq. 12)

Associated random variables

Biochemical variables, such as glutamine and ammonia concentrations, for instance, are not always independent. If two associated random variables X and Y are used in carrier risk calculation, logistic regression must involve in both of them, thus resulting in

$$ \log _{{\text{e}}} {\left( {\frac{{P({\text{C}}|(X,Y))}} {{1 - P({\text{C}}|(X,Y))}}} \right)} = d_{0} + d_{1} X + d_{2} Y. $$
(13)

Equation 13 is identical to the sum of the individual regression of the two variables if both variables are independent and the regression is performed on equal numbers of carriers and noncarriers, i.e. \( {\text{P}}({\text{C}}) = 0.5 = {\text{P}}({\text{-{\hskip-7pt}{C}}}), \) as follows immediately from the considerations above (see Eqs. 2, 4).

Example

Consider the case of a woman who lost a son from OTCD. Molecular genetic examination of this son’s DNA was not performed. Analysis of the woman’s DNA does not reveal an OTCD gene mutation. She has a second son who is healthy. Her plasma glutamine concentration is 600 μmol/l, i.e. below the value where the likelihood ratio (odds) is 1 (see Eq. 12). What is her risk of being a carrier (“C”)? The prior probability P(C) in mothers of male OTCD patients is 0.8 (McCullough et al. 2000). In the present example, this probability is modified by three independent pieces of information, i.e. by (1) the nondetection \( ``{\text{-{\hskip-7pt}{M}}}'' \) of a mutation in the mother’s DNA, (2) the existence “U” of an unaffected son, and (3) the plasma glutamine concentration x. Independence means that the detectability of a carrier’s mutation, her plasma glutamine level, and the likelihood of her unaffected son do not influence each other. Equation 2 therefore yields

$$ \log _{{\text{e}}} {\left( {\frac{{P({\text{C}}|({\text{-{\hskip-7pt}{M}}},{\text{U}},x))}} {{P({\text{-{\hskip-7pt}{C}}}|({\text{-{\hskip-7pt}{M}}},{\text{U}},x))}}} \right)} = \log _{{\text{e}}} {\left( {\frac{{P({\text{-{\hskip-7pt}{M}}}|{\text{C}})}} {{P({\text{-{\hskip-7pt}{M}}}|{\text{-{\hskip-7pt}{C}}})}}} \right)} + \log _{{\text{e}}} {\left( {\frac{{P({\text{U}}|{\text{C}})}} {{P({\text{U}}|{\text{-{\hskip-7pt}{C}}})}}} \right)} + \log _{{\text{e}}} {\left( {\frac{{P(x|{\text{C}})}} {{P(x|{\text{-{\hskip-7pt}{C}}})}}} \right)} + \log e{\left( {\frac{{P({\text{C}})}} {{P({\text{-{\hskip-7pt}{C}}})}}} \right)}. $$
(14)

where \( P({\text{-{\hskip-7pt}{M}}}|{\text{C}}) \) is 0.2, i.e. 1− the mutation detection rate in OTCD as indicated by Tuchman et al. (2002). \( P({\text{-{\hskip-7pt}{M}}}|{\text{-{\hskip-7pt}{C}}}) \) is 1 since noncarriers do not have OTCD mutations. P(U|C), the probability that a carrier of the X-chromosomal disorder of the unaffected son, is 0.5, while \( P({\text{U}}|{\text{-{\hskip-7pt}{C}}}) \) is almost 1 since the incidence of OTCD is small in the general population. Equation 12 yields \( \log _{{\text{e}}} (P(x|{\text{C}})/P(x|{\text{-{\hskip-7pt}{C}}})) = - 10.20 + 0.015 \times 600 = - 1.20. \) With Eq. 14

$$ \log _{{\text{e}}} {\left( {\frac{{P({\text{C}}|({\text{-{\hskip-7pt}{M}}},{\text{U}},x))}} {{1 - P({\text{C}}|({\text{-{\hskip-7pt}{M}}},{\text{U}},x))}}} \right)} = \log _{{\text{e}}} {\left( {\frac{{0.2}} {1}} \right)} + \log _{{\text{e}}} {\left( {\frac{{0.5}} {1}} \right)} - 1.20 + \log _{{\text{e}}} {\left( {\frac{{0.8}} {{0.2}}} \right)} = - 2.12, $$
(15)

the posterior probability can thus be calculated as \( P({\text{C}}|({\text{-{\hskip-7pt}{M}}},{\text{U}},600)) = 0.11. \) Hence, by including the genetic and biochemical data the carrier risk has been reduced from 80% to 11%.

Discussion

Unambiguous diagnosis (or exclusion) of OTCD carriership cannot be derived from biochemical data since carriers and noncarriers overlap. This is true for plasma glutamine concentration (Fig. 1) as well as orotidine and orotate excretion after allopurinol load (Grunewald et al. 2004). Nonetheless, the biochemical data may be used in OTCD carrier risk estimation as I have shown here. Similar logistic regression analyses of biochemical data have been performed before in other disorders, e.g. logistic regression of the logarithm of creatine kinase activity in carriers and noncarriers of Duchenne muscular dystrophy (Percy et al. 1981, 1987).

The concrete quantitative results provided here should be considered as examples only, however. The proband groups used for logistic regression of the allopurinol test data were too small and did not sufficiently represent the overlap between carriers and noncarriers. Therefore, Eq. 7 should not be used in carrier risk estimation. It is meant as a suggestion of how one could analyse representative proband groups such as those tested by Grunewald et al. (2004) if individual data were available.

The odds derived for plasma glutamine concentration (Eq. 12) may be more realistic. However, the assumption that this concentration has normal distributions in carriers and noncarriers of OTCD needs to be validated. Therefore, if the glutamine concentration in a possible carrier is unusually small, for instance, its use according to Eqs. (12) and (14) may result in an incorrectly low carrier probability. Logistic regression of glutamine concentrations in individual probands would provide a more reliable basis. Even then, however, the caveat of Percy et al. (1987) would apply; this states that in order to prevent misclassification of carriers as normal individuals, no logistic carrier probability should be lower than 1/NC (where NC is the number of obligate carriers on which the logistic regression was based).

Frequently, direct mutation detection or indirect marker analysis may indicate a carrier status sufficiently well. However, such analyses are not always definite. Indeed, only 80% of all OTCD gene mutations can be detected (Tuchman et al. 2002). Nonetheless, in order to achieve adequate carrier risk estimation, nondefinite information derived from molecular genetic analyses must be taken into account. How this may be done has been shown in the example above where the nondetection of an OTCD gene mutation in a possible carrier has been combined with pedigree information and biochemical data. Indirect genetic analysis using DNA markers produces odds that depend on the recombination probability between marker and gene. These odds could also be combined with other odds as shown in Eq. 14.

Pieces of information are not necessarily independent of each other. This certainly applies to biochemical data and, indeed, allopurinol test results and plasma glutamine concentration in OTCD carriers may be correlated. To adequately account for different variables and their possible associations, multiple logistic regressions should be performed (Percy et al. 1981; Pezzullo 2005). For that purpose, all variables of interest have to be measured in sufficiently large groups of individual carriers and noncarriers.