Introduction

Despite the emergence of newer interventions for bipolar disorder, lithium remains a first-line treatment in all phases of illness. US and international guidelines emphasize the importance of considering lithium treatment based on proven efficacy, understanding of long-term risks, and potential benefit in reducing liability for suicide (Goodwin et al, 2003; Yatham et al, 2005, 2013). To date, there is little evidence that any intervention is consistently superior to lithium, particularly for prevention of recurrence of mood episodes (BALANCE Investigators et al, 2010).

On the other hand, concerns persist about the long-term safety of lithium treatment, and particularly renal toxicity. It is recognized that a subset of individuals treated with lithium will develop chronic renal failure attributable to interstitial fibrosis, which appears to be irreversible (Grunfeld and Rossier, 2009; Presne et al, 2003). The prevalence of lithium-associated chronic renal failure has been challenging to estimate, with estimates of prevalence ranging from 1.2% (Bendz et al, 2010) to 21% (Lepkifker et al, 2004), the latter in a cohort of individuals treated with lithium for at least 4 years. (Such estimates are further complicated by the range of definitions for renal insufficiency, although most adopt a threshold of glomerular filtration rate less than 60 ml/min per 1.73 m2, or stage three chronic kidney disease (Shine et al, 2015).)

To date, there is very little known about predictors of lithium-associated renal failure, other than age and/or longer treatment duration (Bocchetta et al, 2015; Presne et al, 2003). This lack of predictability likely contributes to underuse of lithium, because it makes it even more difficult to weigh benefits and risks for an individual patient. Some risk factors for nephropathy in general are well established, among them are increased age (Coresh et al, 2007), at least in part because of decreased glomerular filtration rate (Weinstein and Anderson, 2010). Other major risk factors are diabetes mellitus and hypertension (Coresh et al, 2007), as well as medication interactions (Dennison et al, 2011). These latter risk factors have also been identified in small cohorts of lithium-treated patients (Lepkifker et al, 2004). In an effort to develop a clinically useful prediction tool to estimate risk for renal failure, as well as to identify potentially novel risk factors, we queried electronic health records (EHRs) to identify and analyze a large cohort of lithium-treated patients with and without renal failure.

Materials and methods

Overview and Definition of Cases

The present study contrasted individuals drawn from an EHR who did or did not develop new onset of renal insufficiency, defined as stage 3 chronic kidney disease, in the presence of documented lithium treatment. Patients with renal insufficiency following a lithium prescription were considered cases. Those with no history of renal insufficiency despite lithium prescription were considered controls.

Sociodemographic and clinical data were drawn from the Partners HealthCare EHR, which spans two large academic medical centers, Massachusetts General Hospital and Brigham and Women’s Hospital, in addition to community and specialty outpatient clinics. We identified any patients aged 18 years or older with at least one lithium prescription between 2006 and 2013 based on e-prescribing data.

Outcome Definition

Among individuals with lithium exposure, renal failure or insufficiency was identified based on either ICD9 code for acute renal failure (ICD-9 586.*) or estimated glomerular filtration rate decreasing below 60 ml/min and not subsequently improving. Estimated glomerular filtration rate is calculated from serum creatinine levels using the Modification of Diet in Renal Disease formula (Levey et al, 1999). Patients with a history of renal insufficiency before index lithium prescription were excluded. A data mart containing all clinical data was generated with the i2b2 server software (i2b2 v1.6, Boston, MA, USA; Murphy et al, 2007), a computational framework for managing human health data (Murphy et al, 2009, 2010). The Partners Institutional Review Board approved all aspects of this study (protocol 2011-P-002231). No informed consent was required, as this project is a retrospective health-care utilization/clinical study involving thousands of patients and multiple years of data in which consent could not feasibly be obtained from all subjects, in accordance with 45 CFR 46.116.

Identification of Controls Using Risk Set Sampling

For each case patient, the index date was defined as the first point in the patient’s health record with evidence of renal insufficiency. Each case patient was matched 1 : 3 to a control patient using risk set sampling (Langholz and Goldstein, 1996). This sampling strategy randomly matches each case with a control patient selected from the risk set—all patients with a documented lithium exposure at the time of the renal insufficiency event of the case patient. For example, consider a case with a renal insufficiency event in March 2008: The risk set for this patient is all control patients with lithium exposure in March 2008. Matching was 1 : 3 rather than 1 : 1 in order to take advantage of the greater number of available controls to maximize statistical power.

The control patient was assigned the same index date as its matched case for the purposes of contrasting time-varying variables. All cross-sectional clinical variables (eg, concomitant medications) other than lithium level were defined based on nearest available data at or before this visit. The Partners’ EHR includes only medication prescription, not confirmation of dispensation, for outpatient records by prior agreement between the hospital system and the pharmacy data provider. Clinical covariates were defined, including the presence or absence of a diagnosis of schizophrenia or schizoaffective disorder at any point, based upon ICD9 code and/or documentation in the problem list. Age-adjusted Charlson comorbidity index (Charlson et al, 1994), a validated measure of overall burden of medical illness, was also calculated.

Cohort Description and Model Development

As the aim of this analysis was to develop and validate a risk stratification tool, the cohort of cases and controls was randomly divided into a training data set, comprising two-thirds of the subjects, and a testing data set, comprising one-third. Randomization was stratified to ensure balanced representation of cases and controls in each data set. Primary analyses contrasted cases and controls on sociodemographic and clinical features in the training data set, using conditional logistic regression to calculate crude odds ratios for renal failure as well as adjusted odds ratios incorporating other potential predictor variables. (Odds ratios were calculated without adjustment, and then with adjustment for sociodemographic and clinical features identified in the training data set.) We examined the discrimination of the logistic regression model using 10-fold cross-validation in the training set, as well as in the full testing set. For discrimination, we examined area under the receiver operating characteristic curve (AUC) as well as sensitivity and specificity; the matched case–control design precludes reliable estimates of positive and negative predictive value. To estimate model calibration in the testing set, we examined Hosmer–Lemeshow goodness of fit and plotted calibration curves for risk quintiles.

Investigation of Medication-Specific Risk

We also examined possible medication-related features for association with renal insufficiency in the full cohort; they were not incorporated in risk models as they represent time-varying elements of treatment. These included lithium preparation (citrate, carbonate standard release, or carbonate sustained release), lithium dosing frequency (multiple daily dosing vs daily dosing), mean and most recent lithium level, and concomitant psychotropic medications (first-generation antipsychotics, second-generation antipsychotics, and newer antidepressants). For medication preparation and dosing, we utilized the most recent prescription before the index date (ie, when renal insufficiency was identified or during the matched exposure period in controls). For lithium level, we measured mean lithium level before index date, censoring the 90-day period before index date in case lithium level was influenced by early and unrecognized decline in renal function. In sensitivity analysis, we also examined most recent lithium level within 365 days but omitting the prior 90 days. To capture possible nonlinear effects of lithium level, values were categorized a priori as <0.6, 0.6–0.8, 0.8–1, and >1 mEq/l. (See Supplementary Methods for additional analyses including lithium toxicity.)

Results

Cohort Description and Model Development

In all, we identified 1445 patients who met criteria for renal insufficiency during follow-up, and matched them 1 : 3 to 4306 lithium-treated controls using risk set sampling. Duration of documented lithium exposure ranged from 1 day to more than 9 years (median 178 days, mean 501 days). Median duration of documented lithium treatment based on e-prescribing among cases was 158 days (IQR=60–841) vs 180 days (IQR=60–539) for controls (z=−0.89; p=0.37). The full cohort was randomly split into a training (N=3834) and testing (N=1917) data set; characteristics of the two cohorts are presented in Supplementary Table 1.

In the training set, cases and controls differed significantly on nearly all baseline features, including age, sex distribution, and lifetime comorbidity (Table 1). Table 2 presents odds ratios for renal insufficiency adjusted for all other baseline clinical features. In regression models, terms significantly (adjusted p<0.01) associated with renal insufficiency included older age, female sex, history of smoking, history of hypertension, overall burden of medical comorbidity as measured by Charlson index, and lifetime diagnosis of schizophrenia or schizoaffective disorder. (In sensitivity analysis excluding ICD9-defined renal failure, results were essentially unchanged (not shown); likewise, incorporating duration of documented lithium treatment did not meaningfully change results (Supplementary Methods).)

Table 1 Comparison of Lithium-Associated Renal Failure Cases with Controls, Training Data Set
Table 2 Multiple Logistic Regression Model of Baseline Clinical and Demographic Features Associated with Renal Failure (N=3850)

The ability of a regression model to discriminate individuals who develop renal insufficiency was examined by comparing observed to predicted outcomes. In 10-fold cross-validation in the original training set, the resulting logistic model yielded an AUC of 0.81; likewise, in the independent testing data set, AUC was 0.81. In the testing data, 81% of subjects were correctly classified, sensitivity for renal insufficiency was 45% and specificity was 92%. With sensitivity constrained at 80%, specificity was 68%. The model was well calibrated in the testing set (Figure 1; Hosmer–Lemeshow X2 (3df)=2.45; p=0.48), with 359/483 (74%) of renal insufficiency cases among the top two quintiles. Recognizing that Charlson score requires longitudinal data that may be unavailable and insurance type may limit generalizability, we re-fit the model in the training set omitting those two variables (Supplementary Table 2), resulting AUC was 0.81 in the testing set.

Figure 1
figure 1

Calibration curve for renal failure model, testing data set. The curve compares the proportion of individuals in each risk quintile with observed renal failure, to the proportion predicted to develop renal failure.

PowerPoint slide

Investigation of Medication-Specific Risk

We also examined whether concomitant psychotropic medications, or specific lithium preparation or dosing, might influence renal failure risk. As these are modifiable, time-varying elements of treatment that might moderate risk, they were examined in the full cohort rather than utilized for prediction. Table 3 shows crude (univariate) odds ratios for each element of treatment in the full cohort, as well as odds ratios for partially adjusted (for elements of the clinical model derived above) and fully adjusted (clinical, plus all other treatment variables) models. Significantly reduced risk was observed among individuals receiving once-daily lithium dosing, but not those receiving extended-release preparations. In fully-adjusted models, treatment with standard antipsychotics was associated with increased renal failure risk, whereas newer antidepressants were associated with reduced risk. Notably, although univariate analyses suggested a protective effect for atypical antipsychotics, fully adjusted models indicate that this effect is likely to arise from confounding.

Table 3 Crude and Adjusted Odds Ratios for Association of Elements of Treatment, Including Lithium Preparation and Frequency as well as Concomitant Medications, with Renal Failure (N=5751)a

Finally, mean lithium levels were also examined in regression models, excluding the 90 days before index visit. Mean levels were available for 2650 subjects, including 926 (35%) with subsequent renal failure. To allow for nonlinear effects, lithium level was binned in 0.2 mEq/l increments, with levels less than 0.6 compared with greater values. In univariate analyses, odds ratios for 0.6–0.8, 0.8–1.0, and >1.0 were 1.41 (95% CI=1.17–1.70), 1.98 (95% CI=1.58–2.48), and 2.23 (95% CI=1.55–3.20), respectively. In fully adjusted models, odds ratios were 1.42 (95% CI=1.14–1.77), 2.03 (95% CI=1.56–2.65), and 2.20 (95% CI=1.43–3.38), respectively. (Supratherapeutic lithium level was also significantly associated with risk; Supplementary Results.)

Discussion

In this analysis of more than 5700 lithium-treated patients, including more than 1400 with renal insufficiency in the context of lithium treatment, we identified robust associations with risk. These include known general clinical risk factors for renal failure, including hypertension and diabetes, as well as novel ones, including at least one lifetime diagnosis of schizophrenia. A model incorporating sociodemographic and clinical features yielded an AUC exceeding 0.81 in a testing cohort, suggesting the potential informativeness of integrating these features to stratify risk. For example, to achieve 80% sensitivity for renal failure (ie, 80% of those who develop renal failure would be predicted to develop renal failure), specificity is 68% (ie, 68% of those who do not develop renal failure would be predicted to not develop renal failure). Although these values cannot substitute for ongoing monitoring, they may at least assist in estimating relative degrees of risk.

Dosing and Concomitant Medications as Risk Factors

We hypothesized that once-daily lithium, vs more frequent dosing, might be associated with lesser risk of renal toxicity. More than 20 years ago, another nephrotoxin, gentamicin, was shown to be safer when dosed once daily (Prins et al, 1993). The safety of single daily dosing for lithium has been suggested based on small cohorts and short-term investigations, but not examined in large cohorts (Carter et al, 2013). Here, we observed a significantly decreased risk of renal insufficiency among those receiving once-daily lithium, an effect independent of preparation—indeed, sustained vs standard release lithium was not associated with decreased risk. Absent randomized treatment assignment, we cannot exclude the possibility of confounding, clinicians may be more apt to prescribe lithium in divided doses to individuals with greater risk for nephrotoxicity. However, none of the known risk factors for renal disease included in our regression models explain the observed risk. Given that once-daily dosing may have other advantages in terms of toxicity, with no known impact on efficacy or pharmacokinetics (Abraham et al, 1992; Perry et al, 1981), and has the additional likely advantage of improved adherence (Bae et al, 2012; Laliberte et al, 2013), we would argue that once-daily dosing should represent the default approach to treatment.

We also report evidence of increased risk associated with use of typical antipsychotic medications, but not atypical antipsychotics, as well as a possible protective effect for newer antidepressants. Given the increasing use of antipsychotics among mood disorder patients, and the particular risks associated with second-generation antipsychotics, further study to understand or disprove these potential risk or protective factors will be important. One possible alternate explanation for the observed effects might be that treatments act as a proxy for unobserved illness severity or psychosis, although their persistence after incorporation of prior schizophrenia or schizoaffective diagnosis make this somewhat less likely. Similarly, although this effect might be explained by older patients being more likely to receive typical antipsychotic than younger patients, the association persisted in spite of controlling for age. Conversely, the first-generation antipsychotic risk may explain the risk observed with individuals with a history of schizophrenia or schizoaffective disorder, if these individuals received treatment with such medications in the past.

Finally, we observed increasing risk with greater lithium trough levels, such that odds of renal failure are more than twofold greater among individuals with lithium levels of 0.8 mEq/l or greater. Our result adds to an abundant literature suggesting risks of dosing lithium beyond this level are significant, and without clear additional benefit compared with lower doses. (Supplementary analyses confirming elevated risk with even one lithium level exceeding 1.2 mEq/l may indicate another reason to aim for lower doses where feasible.) More generally, although the optimal clinical dose for efficacy remains disputed 25 years after the last large randomized trial to address this question (Gelenberg et al, 1989; Nolen and Weisler, 2013; Perlis et al, 2002), this finding underscores the importance of considering the long-term health and safety of a patient when making decisions about dosing. As with the other risks identified, only randomized trials can fully exclude the possibility of confounding.

Limitations

By design, we did not intend to estimate the absolute risk associated with lithium exposure, more appropriate to a cohort design (Shine et al, 2015), or to distinguish renal failure, which may be multifactorial. The former remains a subject of debate, with most but not all studies finding a modest but nonzero increase in risk, particularly with long-term treatment (Bendz et al, 2010; Bocchetta et al, 2015; Lepkifker et al, 2004; Shine et al, 2015). The decision to focus on all-cause renal failure here recognizes that multiple factors may contribute to nephropathy, and that it may sometimes be difficult to distinguish etiology. Moreover, even in cases where histopathology strongly suggests a particular etiology, it is possible that lithium treatment may contribute to increased risk. Therefore, we elected to simply estimate risk for nephropathy in general among lithium-treated patients, recognizing that some of the identified risk factors will not be specific to lithium per se. Likewise, for concomitant diagnoses such as hypertension, it is possible that the treatments rather than the diagnosis per se contributes to the observed risk. Nonetheless, it seems advisable that clinicians be aware of all risk factors for kidney disease in the patients they treat with lithium.

Another important further caveat in interpreting these results is the relatively short follow-up duration. Previous reports have examined long-term follow-up, in some cases up to 30 years (Lepkifker et al, 2004), although in cohorts an order of magnitude smaller; a recent investigation of lithium exposure vs unexposed individuals also included fewer exposed individuals (Shine et al, 2015). We note that given the relatively low rate of renal failure among lithium-treated patients, misclassification would be low even in an unscreened set of controls. Moreover, any remaining misclassification would be expected to bias associations toward the null—as such, it is possible that we underestimate the risk associated with some of the variables identified, but misclassification should not lead to inflation of risk in this context.

In aggregate, these results suggest that it is possible to stratify risk for renal failure among lithium-treated patients. We would emphasize that discrimination—and, in particular, specificity—is not sufficient to justify discontinuing or avoiding lithium treatment in higher-risk individuals. However, our model may be useful in identifying individuals requiring more frequent surveillance for renal function, and perhaps in developing strategies, including dosing strategies, to reduce nephropathy risk. In light of the marked efficacy of lithium for many bipolar disorder patients, approaches to making lithium treatment safer and more acceptable to patients merit further emphasis and investment of resources.

Funding and disclosure

RHP has served on scientific advisory boards or received consulting fees from Genomind LLC; Healthrageous; Pfizer; Perfect Health; Proteus Biomedical; Psybrain; and RID Ventures. He receives royalties through Massachusetts General Hospital from Concordant Rater Systems (now UBC). Over the past 3 years JFR has held equity holdings in Medavante and PsyBrain. MO is a Consultant to Eli Lilly, Inc., Otsuka America. VMC, AMR, THM, AW, AC, and JWS report no competing interests. This work is supported in part by the National Institute of Mental Health and the National Human Genome Research Institute (P50 MH106933). The sponsor of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.