Systematic assessment of prescribed medications and short-term risk of myocardial infarction – a pharmacopeia-wide association study from Norway and Sweden

Wholesale, unbiased assessment of Scandinavian electronic health-care databases offer a unique opportunity to reveal potentially important undiscovered drug side effects. We examined the short-term risk of acute myocardial infarction (AMI) associated with drugs prescribed in Norway or Sweden. We identified 24,584 and 97,068 AMI patients via the patient- and the cause-of-death registers and linked to prescription databases in Norway (2004–2014) and Sweden (2005–2014), respectively. A case-crossover design was used to compare the drugs dispensed 1–7 days before the date of AMI diagnosis with 15–21 days’ time -window for all the drug individually while controlling the receipt of other drugs. A BOLASSO approach was used to select drugs that acutely either increase or decrease the apparent risk of AMI. We found 48 drugs to be associated with AMI in both countries. Some antithrombotics, antibiotics, opioid analgesics, adrenergics, proton-pump inhibitors, nitroglycerin, diazepam, metoclopramide, acetylcysteine were associated with higher risk for AMI; whereas angiotensin-II-antagonists, calcium-channel blockers, angiotensin-converting-enzyme inhibitors, serotonin-specific reuptake inhibitors, allopurinol, mometasone, metformin, simvastatin, levothyroxine were inversely associated. The results were generally robust in different sensitivity analyses. This study confirms previous findings for certain drugs. Based on the known effects or indications, some other associations could be anticipated. However, inverse associations of hydroxocobalamin, levothyroxine and mometasone were unexpected and needs further investigation. This pharmacopeia-wide association study demonstrates the feasibility of a systematic, unbiased approach to pharmacological triggers of AMI and other diseases with acute, identifiable onsets.


Table S1
Relative risks of acute myocardial infarction within 7 days after the drug was dispensed. Univariable and BOLASSO results for drugs selected by BOLASSO approach in Norway.

Table S2
Relative risks of acute myocardial infarction within 7 days after the drug was dispensed. Univariable and BOLASSO results for drugs selected by BOLASSO approach in Sweden.

Table S3
Relative risks of acute myocardial infarction within 14 days after the drug was dispensed, selected by BOLASSO approach in both countries.

Table S4
Relative risks of acute myocardial infarction within 7 days after the drug was dispensed, selected by BOLASSO approach in both countries. AMI diagnosed in hospital only.

Table S5
Relative risks of acute myocardial infarction within 7 days after the drug was dispensed. Univariable results for drugs selected by BOLASSO approach in both the countries. Sub-group analyses according to age in Norway

Drug Names
Case window (exposed); N Control window (exposed); N

Beta-blocking agents
The case-crossover design A.1 The likelihood function under the case-crossover design In our analysis, the subjects are the patients who experienced the event (case), and each patient is both the case and the control of itself, in different time periods. For the whole analysis, we used one case and one control window for each patient. According to [6], our design is called a non-localizable design and, more specifically, an unidirectional design, because the windows are placed after observing the event for each patient and not a priori. Then, if it is assumed that events follow a proportional hazards model, and if it is assumed that events are rare, we will assume that where β = (β 1 , ...β p ) is a vector of coefficients. The baseline function λ 0it is assumed to vary among subjects, carrying characteristics of the ith subject like, for example, age, or other characteristics which may change with time t.
In other words, Eq. (1) is the probability that the ith subject will experience the event exactly at time t, given its exposure vector and characteristics.
Assume first that the exposure of each subject is observed throughout the time set {1, 2, ..., T }. Then by conditioning on the occurrence of exactly one event in {1, 2, ..., T }, and as above assuming that events are rare, the contribution to the likelihood from the ith subject would be ( [8,6]): The complete likelihood is then the product of these contributions, namely Since we do not know the exposure X it for all t ∈ 1, 2, ..., T , but only in a referent window W i , following Lumley and Levy [8], we replace the denominators in Eq.
(3) by the simplification where s runs only in the referent window. Thus, we replace Eq. (3) by In our application, W i = {t i , s i }, where t i and s i are the case and the control windows of the ith subject, respectively. Furthermore, the λ 0it i and λ 0it are cancelled because they are assumed to be (approximately) constant in the referent windows. Now, X it i is a vector of length p corresponding to the exposure of each drug in the case window, while X is i is a vector of the same length corresponding to the exposures in the control window. Both vectors contain the values 0 and 1 for non-exposure and exposure, respectively.
It should be noted that Eq. (4) is not an actual likelihood because of the reduction of the sum in the denominator of (3). Thus, as noted by Lumley and Levy [8], the resulting estimates may be biased. This bias, called overlap bias, will however usually be small (see [8]).
On the other hand, Eq. (4) has the form of the likelihood of a conditional logistic regression (CLR) and can hence be analyzed using standard statistical packages allowing CLR, Note that the window W i in Eq. (4) consists of two time points, t i (case) and s i (control), while more controls can be included by adding points in the window.

A.2 Generalized linear model
Our model can be represented differently in order to show that it is a generalized linear model of logistic type. Consider the matrices X t = (X 1t 1 , ..., X N t N ) T , corresponding to the case windows, and X s = (X 1s 1 , ..., X N s N ) T , corresponding to the control windows, both of dimension N ×p, where X it i and X is i are defined above. Consider also the corresponding response vectors Y t = (Y 1t 1 , ..., Y N t N ) T and Y s = (Y 1s 1 , ..., Y N s N ) T , for the cases and controls, respectively. These are both N -dimensional, where the response vector for the cases contains only the element 1 and the response vector for the controls contains only the element 0. This is because we restrict attention to patients with exactly one event in the given time set. The likelihood L(β) can hence be written in the form where X i = X it i − X is i , that is, we subtract the control vector from the case vector for each patient. As will be seen below, for a case-crossover design with one case and one control window, the likelihood can be written in the form of the unconditional likelihood for a binary logistic regression with no intercept and constant response equal to one (see Avalos et al. [3]). The fact that the response is constant and equal to one, becomes more clear when we subtract the response vector Y s from Y t . Note that now the data matrix X consists of values −1, 0, 1. Furthermore, for a specific component β j of the vector β, e β j is the relative risk for the event, if the jth component of X i is increased by 1, which means that the corresponding medicine is taken in the case window and not in the control window.
By letting p i be the contribution of the ith individual to the likelihood in Eq. (5), that is p i = 1 1+e −X i β = e X i β 1+e X i β , then the likelihood can be written in the following form: since y i = 1 for all i. The log-likelihood is of the form which corresponds to the likelihood of logistic regression ([9, 10]) Finally, the log-likelihood function given in Eq. (7) is a concave function with respect to the β coefficients. This can easily be seen by rewriting the function in the following form, y i X i β − log 1 + e X i β and noting that − log(1 + e x ) is a concave function.

A.3 The lasso method and glmnet
In this section we describe how we estimated the coefficients in β using the lasso (the least absolute shrinkage and selection operator) method ( [11]). More specifically, we estimated the coefficients under the lasso penalty by minimizing the negative penalized log-likelihood in Eq. (7), that is, The estimation was done via a coordinate descent algorithm using the glmnet package ( [5]) in the statistical package R. We chose the binomial family with logit link function in the settings of the glmnet function. Furthermore, the choice of λ was done by generating a sequence of 100 λ values and then choosing the best, using 10-fold cross-validation. All of those functions are provided by the package.
The glmnet function for the binomial model requires a two-level response. In our case we have, however, a constant response equal to one. To overcome this problem and allow the use the glmnet function, we added N rows of zeros to the data matrix X, and N new observations to the response vector Y , all with the value zero. We demonstrate below that this does not change the likelihood function, and since the response vector now contains both zeros and ones, the estimation of β can be done using the glmnet function, To see why the above claim is true, let j correspond to lines of the extended data matrix X which were added, while i corresponds to lines of the original data matrix. Then p j = 1/(1 + e 0 ) = 1/2, while p i = 1 1+e −X i β , as before. Hence the likelihood in Eq. (6), using the extended data matrix, becomes which equals the original likelihood in Eq. (6) multiplied by a constant independent of the parameters. Hence, the estimated coefficients are not affected by the modification. We chose N additional observations with zero response because, as we will discuss in Appendix A.4, we use bootstrapping to sample from the data matrix X. We found that N extra observations are enough to ensure both 0 and 1 responses in each bootstrap sample. If we alternatively added, say, only one extra observation, we would most probably get some bootstrap samples with only 1 as a response, and thus the glmnet function would not work. However, as already noted, the actual number of extra observations with zero response does generally not affect the estimated coefficients.

A.4 Bolasso
Lasso sometimes allows a few irrelevant variables to enter the model ( [3]). Further, according to Tibshirani [11], the standard errors for the lasso estimates are difficult to acquire. Those problems can be reduced by bootstrapping. Tibshirani [11], Avalos et al. [3] and Avalos et al. [2] suggested a bootstrap version of lasso which is called bolasso, which reduces the uncertainties of the lasso estimates. Bolasso runs a bootstrap on many samples, and estimates the parameters for each sample, using the lasso approach.
In our analysis we ran a bootstrap on the lasso model described in Section A.3, using 1000 bootstrap samples. For each bootstrap sample, a new λ sequence was generated by the glmnet package and the optimal λ was chosen using 10fold cross validation. Then the model was fitted by glmnet using the current bootstrap sample.
The β-coefficients frequently selected by the lasso from the bootstrap samples are taken into account, while the others are set to zero [3]. Avalos et al. [3] suggested using Akaike's information criterion ([1]), AIC = −2 (β) + 2k, for estimating the frequency threshold of bolasso. Here (β) is the log-likelihood function applied on the estimated coefficients from each model under selection, and k is the total number of non-zero elements in the β vector. AIC corresponds to an estimation of the allowed number of times that a coefficient could have been set to zero among the bootstraps. We can compute the AIC for different models and then choose the model which gives the lowest AIC-value.
The output from the bolasso in our analysis is a matrix of dimension p × B. Each column of the matrix corresponds to one bootstrap replicate, that is, the estimated vector of coefficients from that bootstrap sample. Each row of the matrix corresponds to the values that the specific coefficient took among the bootstrap samples. For each row in the matrix, we computed the total number of zeros and we created a frequency vector of length p. For each value in the frequency vector we created a model by checking which of the coefficients had been set to zero less times than the chosen threshold. For those, their mean among the bootstraps was taken as an estimate, that iŝ where β * b j is the b-th bootstrap estimate of coefficient β j and B is the total number of bootstrap replicates ( [4]). The other coefficients are simply set to zero. Next, the AIC was computed for the corresponding model, where k is the number of non-zero estimates in the newly computed β vector. Finally, this is done for all thresholds in the sequence and the one which gives the lowest AIC is chosen as optimal. For that optimal threshold we again check which of the coefficients have been set to zero less times than the optimal threshold and we computeβ j as before, while the others are set to zero. This vector of coefficients is the bolasso estimate of the coefficients for the objective function ( [3]).
Finally, we can compute the standard error of each coefficient aŝ , as well as 95% confidence intervals based on the central limit theorem (β j − 1.96Ŝ D j ,β j + 1.96Ŝ D j ) ( [4]). However, according to Lockhart et al. [7], con-fidence intervals obtained by the bootstrap estimates are not optimal for the lasso.