Development of A Machine Learning Algorithm to Classify Drugs Of Unknown Fetal Effect

Boland, Mary Regina; Polubriaginof, Fernanda; Tatonetti, Nicholas P.

doi:10.1038/s41598-017-12943-x

Download PDF

Article
Open access
Published: 09 October 2017

Development of A Machine Learning Algorithm to Classify Drugs Of Unknown Fetal Effect

Scientific Reports volume 7, Article number: 12839 (2017) Cite this article

4242 Accesses
22 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Many drugs commonly prescribed during pregnancy lack a fetal safety recommendation – called FDA ‘category C’ drugs. This study aims to classify these drugs into harmful and safe categories using knowledge gained from chemoinformatics (i.e., pharmacological similarity with drugs of known fetal effect) and empirical data (i.e., derived from Electronic Health Records). Our fetal loss cohort contains 14,922 affected and 33,043 unaffected pregnancies and our congenital anomalies cohort contains 5,658 affected and 31,240 unaffected infants. We trained a random forest to classify drugs of unknown pregnancy class into harmful or safe categories, focusing on two distinct outcomes: fetal loss and congenital anomalies. Our models achieved an out-of-bag accuracy of 91% for fetal loss and 87% for congenital anomalies outperforming null models. Fifty-seven ‘category C’ medications were classified as harmful for fetal loss and eleven for congenital anomalies. This includes medications with documented harmful effects, including naproxen, ibuprofen and rubella live vaccine. We also identified several novel drugs, e.g., haloperidol, that increased the risk of fetal loss. Our approach provides important information on the harmfulness of ‘category C’ drugs. This is needed, as no FDA recommendation exists for these drugs’ fetal safety.

A medication-wide association study (MWAS) on repurposed drugs for COVID-19 with Pre-pandemic prescription medication exposure and pregnancy outcomes

Article Open access 24 November 2022

Lena Davidson, Silvia P. Canelón & Mary Regina Boland

Medication history-wide association studies for pharmacovigilance of pregnant patients

Article Open access 16 September 2022

Anup P. Challa, Xinnan Niu, … David M. Aronoff

Derivation and external validation of risk stratification models for severe maternal morbidity using prenatal encounter diagnosis codes

Article 19 May 2021

Mark A. Clapp, Thomas H. McCoy Jr, … Roy H. Perlis

Introduction

In the late 1950s thalidomide, an approved sedative, was promoted as a new modern treatment for morning sickness¹. Thousands of pregnant women began taking the drug resulting in a dramatic increase in spontaneous abortions (i.e., ‘miscarriages’), and congenital abnormalities². By mid-1961 it became clear that thalidomide was the culprit behind the observed increase in malformations. This led to the drug’s removal from the market³ and a permanent usage ban among women who may become pregnant. Afterwards, stringent guidelines were implemented for drugs targeted at pregnant females.

Over the years the number of medications taken by pregnant women has grown. Concern over this ‘epidemic of prescribing’ among pregnant women began in the 1970 s⁴. A Danish study found that 44.2% of women received prescriptions for at least one medication during pregnancy⁵. Anti-inflammatory drugs were commonly prescribed medications in pregnancy despite studies showing increased risk of miscarriage or fetal loss⁶. However, in many cases the effects that specific pharmacologics have on fetal outcome remains unknown. The Food and Drug Administration (FDA) lists pharmacological drugs with unknown fetal outcomes as category C (‘risk not ruled out’) while those with known teratogenic properties (such as thalidomide) are listed as category X (‘contraindicated in pregnancy’). An estimated 37.8% of pregnant women on medications received at least one FDA category C drug⁷ without having clear guidance over the potential fetal risks these medications incur. Therefore, detailed study of these enigmatic drugs is greatly needed.

Many pediatric-based research networks exist, including PEDSnet⁸. Unfortunately, these large pediatric-based clinical data research networks are insufficient for investigating the effect of maternal drug exposure on the developing fetus as they lack linked maternal-fetal records. At the same time, traditional methods that utilize post-market reporting systems to identify agents responsible for fetal anomalies and/or loss are limited, especially with regards to sample size (e.g., parents must report the anomaly to a registry)⁹. Further, many studies focus on a drug’s effect on fetal development among drugs such as doxycycline (a category D drug that is known to be harmful during fetal development)¹⁰. Recruiting pregnant women for participation in prospective trials is challenging even for non-pharmacological interventions, which often results in small sample sizes that may be underpowered to assess fetal risk^11,12. EHRs were used previously to study birth-related effects¹³ with machine learning algorithms showing promise^14,15. Many birth-related elements are available within EHRs even if sometimes access is limited¹⁶. Therefore, an EHR system containing linked maternal and fetal information would be the ideal dataset for an algorithm that classifies FDA category C (i.e., drugs with unknown fetal effect) into harmful and safe bins.

This study aims to systematically investigate fetal outcomes, both fetal loss and congenital anomalies, following pharmacological exposure to category C drugs. This will provide both pharmacologists and physicians with a much-needed initial classification of these ‘unknown fetal effect’ drugs. Because fetal loss and congenital anomalies are two distinct outcomes, we perform two separate retrospective cohort studies.

Results

Clinical Cohorts

We extracted females with live-born births at Columbia University Medical Center (CUMC) - New York Presbyterian Hospital (NYPH) or CUMC-NYPH where data on maternal drug exposure was captured in the Electronic Health Record (EHR) system. This means that the female had at least one prescription recorded in the EHR within a 1.3-year period prior to the child’s birthdate. Infants with congenital anomalies were identified as those having a diagnosis within 90 days of life. The resulting dataset contained 31,240 pregnancies resulting in a live birth without a congenital anomaly and 5,658 pregnancies with a congenital anomaly (either major or minor). This cohort is referred to as the ‘congenital anomaly’ cohort while the cohort containing the subset with minor anomalies is referred to as the ‘minor congenital anomaly’ cohort. Of pregnancies with a recorded anomaly, 1,588 had a minor anomaly. Demographics of all pregnant females in both cohorts are given in Table 1. We obtained approval for this study from CUMC’s Institutional Review Board.

Table 1 Demographics of Pregnant Females Included in Study

Full size table

For the ‘fetal loss’ cohort, we selected patients with any recorded fetal loss/death as indicated by a diagnosis within the International Classification of Diseases, 9^th edition (ICD-9) range 630–639. For controls, patients were selected with no prior fetal loss in their records and having at least one single live birth recorded at NYPH. This resulted in a dataset of 14,922 pregnancies with fetal loss and 33,043 pregnancies without fetal loss. The most frequent fetal loss diagnoses are provided in Table S1.

Pharmacological Drug Dataset

The FDA pregnancy categories and their descriptions along with the distinct number of drugs belonging to each are given in Table S2. The most common category was category ‘C: Risk Not Ruled Out’ followed by the lower risk category ‘B: No Risk in Other Studies’. We also extracted the ATC first-level categories for all distinct drugs included in our analysis. The most common categories were ‘Alimentary Tract and Metabolism’ followed by ‘Dermatologicals’ (Table S3). Drugs that were commonly prescribed with legal termination were identified as these could bias our fetal loss results. Supplementary Dataset 1 contains a list of all drugs where at least 2% of women were first prescribed the drug the same day as a legal termination. These are ‘drugs typically prescribed with legal termination’. Two drugs used in chemical abortions: Mifepristone (200 MG) and Misoprostol (0.2 MG)¹⁷ were commonly first prescribed to women at CUMC-NYPH with legal termination (15.1% and 14.6% respectively).

Classifying FDA Category C Drugs As ‘Harmful or ‘Safe’

Logistic Regression

We constructed a logistic regression model with a binary outcome variable representing a not-known-to-be-harmful pregnancy classification (FDA category A or B), hereafter referred to as ‘safe’ versus a severe pregnancy classification (FDA category D or X), hereafter referred to as ‘harmful’. For both congenital anomaly models (i.e., all anomalies, and minor anomalies), we added all possible features that could inform the model. The odds ratios along with their 95% confidence intervals (CIs) are shown (Figure S1).

Random Forest Classification of Category C Drugs

A random forest model was built using the proportion with anomaly (for the congenital anomaly cohort) or the proportion with fetal loss (for the fetal loss cohort) at each trimester of exposure. The model was run with 1000 trees and we constructed a multi-dimensional scaling (MDS) component plot to illustrate the separation among drugs achieved using only the proportion with anomaly/fetal loss. Fifty-seven medications were classified as harmful and 206 safe in the fetal loss cohort. Eleven medications were classified as harmful and 181 safe in the congenital anomalies cohort. Figure 1 shows the separation between the known-harmful drugs (category D or X) in bright red from the safe drugs in light blue (category A or B). The separation between the harmful and safe drugs is most evident for the fetal loss cohort. We also separated out drugs that are used in legal termination to show where in the various plots those drugs appear. In most cases drugs prescribed during legal terminations cluster with known harmful (category D or X) drugs.

We observed a clear relationship between the first MDS component and the proportion of women experiencing fetal loss following first trimester drug exposure (Fig. 2). Some of this effect could be due to legal termination, so we identify those drugs empirically determined to be involved in legal termination procedures. This showed that the relationship was not solely due to drugs involved in legal termination. We also investigated the relationship between the MDS scaling components and proportions of offspring with a congenital anomaly across all three trimesters, not just the first trimester. These are shown in Figures S2–S7.

Next we ran the random forest model with all potentially informative features using 2000 trees. Each feature’s contribution to the model’s performance was assessed using Mean Decrease in Accuracy (MDA). Features with high MDA are more important in contributing to the model’s performance. We found that the number of individuals exposed to a drug at a given trimester was highly informative in predicting whether a drug was harmful or safe (Figure S8). Interestingly, the proportion born with an anomaly following maternal drug exposure during a given trimester was not as informative in the model because the known FDA class affects the exposure pattern.

Certain ATC drug classes were found to be very informative in the model (Figure S8). These include nervous system drugs (ATC: N), systemic hormonal preparations excluding sex hormones (ATC: H), anti-neoplastic and immune-modulating agents (ATC: L), genito-urinary system and sex hormones (ATC: G) and respiratory system (ATC: R) drugs. The ordering of the specific categories importance varied by model with nervous system drugs being the most informative in both the congenital anomalies (major and minor) and the minor anomalies only models.

Importantly, a binary indicator variable for whether a drug affected a vitamin-related gene (from DisGeNET) was consistently more informative then a drug being a prenatal supplement/mineral/vitamin (Figure S8). Additionally, knowing whether a drug affected or inhibited a Mendelian gene was more informative then knowing whether a drug affecting a vitamin-related gene. This indicates the importance of drugs’ Mendelian gene inhibition status.

The out-of-bag (OOB) estimated error rate was 9.36% for the fetal loss model (containing 235/499 drugs with known non-C FDA class), and 12.90% for both the congenital anomalies model (containing 186/378 drugs with known non-C FDA class) and the minor anomalies model (also containing 186/378 drugs with known non-C FDA class). The estimated accuracy was 90.64% for the fetal loss model, and 87.10% for both anomalies models. The null accuracy was 71.06% for fetal loss and 75.27% for congenital anomalies. Our models outperformed the null with p-values of 4.95 × 10^-4 and 0.0465 respectively. Supplementary Dataset 2–4 containing the prediction results for all category C drugs along with features (a dataset per outcome).

Drugs predicted to be harmful in the fetal loss model are displayed in Table 2 (Overall OOB accuracy: 90.64%) and Table 3 shows drugs predicted to be harmful in the congenital anomaly model (Overall OOB accuracy: 87.10%). All known harmful drugs (D or X) had a model probability_harmful above 50% while all not-known-to-be-harmful (or safe) drugs (A or B) had a model probability_harmful below 50% (Fig. 3). Category C drugs, where the FDA gives no pregnancy recommendation, had probabilities of being harmful across a large spectrum (Fig. 3). All 192 category C drugs included in the congenital anomalies cohort were also in the fetal loss cohort (fetal loss model contained 264 category C drugs). This allowed us to compare the probability that a drug was harmful in increasing the risk of fetal loss and congenital anomalies. These two probabilities were highly correlated (r = 0.63, p < 0.001). Drugs like rubella virus vaccine were predicted harmful in increasing the risk of fetal loss and also congenital anomalies (Fig. 4). Some drugs, like Fentanyl and Benzocaine were only predicted harmful in one model. These drugs require further investigation to determine if there is a mechanistic reason for this difference.

Table 2 Category C Drugs Predicted to be Harmful (D or X): Fetal Loss Cohort.

Full size table

Table 3 Category C Drugs Predicted to be Harmful (D or X): Congenital Anomalies Cohort.

Full size table

For the fetal loss model, there was a clear and intuitive relationship between the proportion with fetal loss during the first trimester and the model’s prediction that the drug was harmful (Fig. 5). Nervous system drugs in general (ATC: N) were more likely to be classified as being harmful drugs. However, a couple of nervous system medications were classified as safe (similar to category A or B drugs) by the model. Two of these drugs: Citalopram 10 MG and Levetiracetam 500 MG are shown versus a predicted harmful drug: Haloperidol 5 MG in Fig. 5. Citalopram, like haloperidol, is an anti-depressant; however, citalopram is an Selective Serotonin Reuptake Inhibitor (SSRI) whereas haloperidol is not. Levetiracetam is used in treating epilepsy and is an anti-seizure medication. Importantly, not all nervous system category C medications were predicted as harmful to the fetus by our model.

On the other hand, congenital anomalies are often better described in the literature, which affects treatment patterns. This was evidenced in our models as well. Overall, we did not find a clear increase of anomaly risk among our predicted harmful vs. safe medications in the anomaly models (Fig. 6). Except for our Non-Steroidal Anti-Inflammatory Drugs (NSAIDs) and rubella live vaccine findings, which showed a clear increase in anomaly risk following first trimester exposure. Importantly, our model classifies category C medications as being harmful if their features (including exposure rates and anomaly rates) are similar to known harmful medications (D or X). Therefore, several medications were prescribed during the pre-conception period but not during the first and second trimesters. Our algorithm detected these medications as being harmful while other medications were predicted to be harmful due to the increased risk of anomalies observed in our dataset. We distinguish these two types of findings in Table 3. One NSAID – Ketorolac Tromethamine – was not predicted as harmful by our model for two dosage levels. We compare this to two dosages of Naproxen, both predicted as harmful by our model, to illustrate the increased first trimester risk of anomalies for Naproxen 250 MG versus another NSAID (Fig. 6). Importantly, not all NSAIDs were predicted as harmful by our model, but only those that increased the risk of anomalies.

Discussion

Our models successfully identified category C drugs that are likely to be harmful and those likely to be safe for fetal loss or congenital anomalies. This information is important as no prior recommendation for a drug’s effect during pregnancy was provided. This is especially true for two similar medications (e.g., two NSAIDs) with one predicted as safe (e.g., ketorolac) and the other as harmful (e.g., naproxen).

Drugs Predicted Harmful in Congenital Anomaly Model

We predicted 11 distinct medications (eight distinct drugs) to be harmful in the congenital anomalies model. We employed a machine-learning algorithm to predict drugs that were harmful based on anomaly rates and usage patterns for drugs with known FDA pregnancy classifications. This machine learning approach predicts a drug to be harmful if one of the following conditions is met: a.) drug exposure results in a high proportion of anomalies; b.) drug usage was greatly restricted during pregnancy (i.e., females were exposed during pre-conception period at much higher rates then during pregnancy; or c.) drug was similar to known harmful drugs in terms of mechanism (e.g., ATC classification, targets proteins involved in Mendelian diseases, targets known vitamin-related genes/proteins). We clearly distinguish drugs by type in Table 3 for clarity of interpretation.

Non-Steroidal Anti-Inflammatory Drugs (NSAIDs)

Two predicted harmful drugs (four distinct medications) were NSAIDs, namely ibuprofen and naproxen. Several studies report an increased risk of anomalies, specifically cardiac anomalies among infants exposed to naproxen, ibuprofen, or combinations of NSAIDs^18,19. In most cases studying the fetal effects of NSAIDs, the drugs - naproxen and/or ibuprofen - were often associated with the most number of congenital anomalies^18,20. For both drugs, we observed the highest risk among first and second trimester exposures, with higher risk among naproxen users than ibuprofen users (Table 3). Furthermore, restriction to NSAIDs was greatly restricted during the third trimester (Fig. 6), which is consistent with current recommendations²¹. Ketorolac was classified as safe by our model and had a lower rate of anomalies especially following first trimester when compared to other NSAIDs (Fig. 6). Ketorolac is a COX-2 specific inhibitor and has been used safely in neonates and infants^22,23,24.

Live Rubella Vaccine

Maternal exposure to live rubella vaccine was classified as harmful in both our congenital anomaly and fetal loss models (Table 2) with increased risk of fetal loss and an increased risk of anomalies. This is consistent with the literature on the harms of rubella exposure during early pregnancy^25,26,27. Please note that the rubella vaccine was the only vaccine predicted to be harmful during pregnancy by our model. Prior studies demonstrate that increases in both anomalies and fetal loss were observed in women infected with rubella during pregnancy^28,29, which we confirm in this study. However some conflicting evidence does exist regarding the fetal harm of rubella exposure³⁰. We observed 96.4% of those receiving rubella vaccination in the first trimester (83 exposed during first-trimester in fetal loss cohort) resulted in a fetal loss (Table 2). This indicates the severity of first-trimester rubella exposure on fetal outcomes underscoring the importance of avoiding rubella vaccination prior to conception. It should be noted that live rubella vaccine is not indicated in pregnancy and often occurred in the pre-conception and first-trimester period of the pregnancy, indicating that the prescribing clinician was likely not aware that a pregnancy had taken place.

Prescribing Pattern Drop-offs During Pregnancy – Predicted Harmful in Congenital Anomaly Model

Two drugs were rarely prescribed during the entire pregnancy – Benzocaine mucosal spray and Hydromorphone Hydrochloride (Dilaudid), but were prescribed during the pre-conception period. This sudden drop-off in prescribing caused our algorithm to detect these drugs as harmful given that a similar drop-off in prescribing was observed in known harmful drugs. Hydromorphone Hydrochloride is an opioid and therefore was likely not prescribed during pregnancy given the harm that opioids have on developing fetuses³¹. The other medication – benzocaine mucosal spray – has been linked to development of methemoglobinemia in infants and because safer category B medications are available many physicians consider it contra-indicated during pregnancy^32,33. Our machine learning approach did not know this information a priori, but it was able to learn this from clinician usage patterns (i.e., dramatic drop-off of prescribing during pregnancy). Several other medications were rarely used early on in the pregnancy (first and second trimester), including several opioids, and those also increased the risk of fetal loss in our fetal loss model. This was likely the reason for their contra-indication earlier on during pregnancy.

Drugs Predicted Harmful in Fetal Loss Model

Drugs That May Inadvertently Induce Fetal Loss: DHCR7 Mechanism

First trimester haloperidol exposure increased the risk of fetal loss. Haloperidol injection increased risk of fetal loss from 22.2% in the 3–6 months prior to conception to 78.6% following first trimester exposure. Nine pregnancies were exposed in the 3–6 month pre-conception period while 28 pregnancies were exposed during the first trimester – 22 resulted in fetal loss. Haloperidol increases the expression of 7-dehydrocholesterol reductase (DHCR7) - an enzyme important in the conversion of 7-dehydrocholesterol to cholesterol³⁴. While exposure to pharmacological DHCR7 inhibitors increases the risk of fetal anomalies, the effects of drugs that merely increase the gene’s expression are less-well known³⁴. Drugs increasing DHCR7 expression are not known to increase fetal loss; however increasing DHCR7 volume would lower the amount of available 7-dehydrocholesterol used to produce vitamin D³⁴. Therefore, drugs increasing DHCR7 expression could inadvertently lower maternal vitamin D levels. Patients on haloperidol have been shown to have elevated levels of 7-dehydrocholesterol³⁵, which is curious as increasing DHCR7 expression would be expected to lower 7-dehydrocholesterol and elevate cholesterol (by increasing the conversion rate). Therefore, the harmful effects we observed for haloperidol could be due to the elevated 7-dehydrocholesterol levels and not a reduction in vitamin D. Further mechanism-based studies are required. In this study, we compared haloperidol to two other nervous system medications (ATC category: N), one an SSRI citalopram and the other an epilepsy medication levetiracetam. Haloperidol 5 MG tablet greatly increased the risk of fetal loss when compared to these two other nervous system FDA category C medications, which were both predicted as ‘safe’ by our model (Fig. 5C and 5D). This is important, as it might be possible for pregnant women to switch their anti-depressant medication following pregnancy.

Drugs Treating Symptoms of Fetal Loss

Some drugs predicted as harmful in the fetal loss cohort could have been prescribed to treat conditions leading up to a spontaneous abortion. For example, excessive bleeding often occurs during a spontaneous abortion, but a miscarriage can take several days. A drug used to treat severe bleeding following childbirth, or miscarriage, is Methylergonovine Maleate. All forms of Methylergonovine Maleate (3 different types listed in Table 2) had high rates of miscarriage following first trimester exposure – ranging from 97.9–100% of those exposed during that trimester. Typically, Methylergonovine Maleate would not be prescribed during the first trimester, unless something was wrong (e.g., excessive bleeding, which is indicative of a miscarriage). Therefore, this is likely a treatment-of-the-fetal-loss type of result. Other drugs related to fluids, including potassium chloride and calcium gluconate (ATC: A category drugs in Table 2) are likely used during fetal loss as women experience nausea while experiencing a miscarriage and would require fluids.

Genetic Targets of Drugs More Predictive Than Classification

Prenatal vitamin supplementation is important in reducing the overall disease risk of adverse fetal effects with supplementation linked to lower rates of leukemia, pediatric brain tumors and neuroblastoma³⁶. We restricted our analyses to identification of congenital anomalies diagnosed within the first 90 days of life. Therefore, we did not investigate complex outcomes such as childhood cancers or autism. However, vitamin-exposure during the prenatal period is widely considered to be important in predicting fetal outcome. All models showed that knowing whether or not a drug affected a vitamin-related protein was more important then just knowing that a drug was a prenatal supplement (Figure S8). This is important because it shows that a drug’s mechanism of action and how it interfaces with vitamin-related mechanisms is extremely important in determining fetal outcome. This was known for specific drugs³⁴, but not across a larger cohort of fetal drug exposures. This knowledge can inform future fetal toxicity studies.

Rationale for Using Logistic Regression and Random Forest

In this paper, we employed two statistical approaches: logistic regression and random forest. The logistic regression model was used only on drugs with known fetal effect (either harmful: D or X or safe: A or B). This model allowed Odds Ratios to be computed for various features included in the model among known drugs (Figure S1). This information was already known. For example, in the fetal loss model, drugs that were respiratory system drugs (ATC: R) were likely to be safe drugs whereas drugs in the systemic hormonal preparations class (ATC: H) were likely to be category D or X. We were really interested in understanding the drugs with unknown fetal effect (i.e., the category C drugs). For the purpose of classifying these unknown drugs, we developed a random forest classifier on the known drugs and then applied it to the unknown drugs to assign a probability that a drug was harmful or safe based on the information learned from the other drugs. This random forest classifier also allowed us to easily rank the importance of the features included in the model (Figure S8).

Limitations

Our method identifies drugs predicted to be harmful given their prescribing patterns (e.g., low exposure during pregnancy), anomaly rates (e.g., proportion of exposed with an anomaly) and other chemoinformatics factors important in determining fetal outcome (e.g., affecting proteins involved in vitamin-related processes). Further study is needed to confirm drug predictions, especially for drugs that are predicted as safe to ensure that they are not harmful to the developing fetus. Some drugs may be predicted as harmful because they are prescribed during high-risk pregnancies, which are at increased risk of complications during delivery. High-risk pregnancies are known to be at a higher-risk of congenital anomalies³⁷. An example of this type of finding may be Dinoprostone (or Cervidil) predicted as harmful in our congenital anomaly model. Dinoprostone is a cervical implant used to induce labor. These are often used during high-risk pregnancies³⁸.

Another limitation is our exclusive use of medications recorded in EHRs. Others have investigated non-hormonal category X drugs and their prescribing patterns among pregnant women in a decision support context³⁹. They found that the medication information was not of sufficient quality to construct an EHR-based alert for pregnant women³⁹. We were unable to conduct a detailed chart-review for all 36,000 pregnancies to determine the accuracy of medications across the various FDA categories and drug types. Our validation of several findings on predicted harmful drugs with the literature on their effects helps to confirm our findings. However, we recognize this as a limitation of our work.

Conclusion

In conclusion, we developed a machine learning approach that predicts drugs to be either harmful or safe in two outcome models – fetal loss and congenital anomalies. We achieved an OOB estimated accuracy of 90.6% for fetal loss and 87.1% for congenital anomalies. Some drugs were predicted as harmful because physicians stopped prescribing them upon pregnancy diagnosis – this dramatic drop-off in exposure rates triggered the algorithm to detect the drug as harmful (since a similar pattern is observed among drugs that are known to be harmful). Other drugs were predicted as harmful because of the increase in anomalies observed following exposure. Many medications predicted to be harmful by our algorithm have documented harmful effects, including naproxen, ibuprofen and rubella live vaccine. Additionally, we found that first trimester exposure to haloperidol – a drug that interferes with the DHCR7 – cholesterol – vitamin D pathway increased the risk of fetal loss. We also compare haloperidol to other nervous system medications that do not increase the risk of fetal loss to the same extent. Our approach provides much needed information for pharmacologists and prescribers interested in understanding drugs’ fetal effects and prescribing patterns in EHRs.

Materials and Methods

Clinical Cohorts

Maternal Prescription Exposure and Fetal Outcome: Live Birth

We obtained records on all infants born at the Columbia University Medical Center (CUMC) - New York Presbyterian Hospital (NYPH) healthcare system who had mothers listed in the Electronic Health Record (EHR) system. These links were created in the EHR system upon delivery to facilitate maternal-fetal care post-delivery. The EHR system contains billing information collected during routine clinical care. This information includes prescription information, diagnoses, laboratory tests and results, procedures, radiological reports and clinical free text notes. In this study, we have used only the diagnosis codes and prescription information contained with the system along with the mother-infant links. We retained all mother-infant pairs with at least one medication prescribed before birth and up to 15 months prior. Pregnant women with no medication information (e.g., not even a vitamin supplement) in the EHR system are most likely missing their medication records. Therefore, these women were not included in our analysis and only women with at least one prescribed medication, which includes vitamins, were included. We excluded all multiple infant pregnancies (e.g., twins, triplets) as these pregnancies are considered high-risk. We also excluded all pregnancies with any chromosomal abnormality diagnosed within the first three months of life (0–90 days of life). Presence of chromosomal abnormality was determined using the International Classification of Diseases, 9^th edition (ICD-9) range 758–758.9.

We identified infants with congenital anomalies as those having a congenital anomaly ICD-9 diagnosis, i.e., 740–759 (with 758–758.9 excluded) occurring within the first 90 days of life. Only one anomaly diagnosis was necessary for identification although some infants had multiple anomalies. We identified minor anomalies using criteria established by the New York State Department of Health, only ICD-9 codes within the 740–759 range were used⁴⁰. For comparison purposes, the reported background rate of major congenital anomalies is 3% while the rate of minor congenital anomalies is 15% of live-born infants⁴¹.

Maternal Prescription Exposure and Fetal Outcome: Fetal Loss

All pregnancies ending in fetal loss were identified at CUMC-NYPH. Fetal loss in this study includes spontaneous abortion (i.e. ‘miscarriages’), legal/elective termination and any other forms of fetal loss/death recorded within the ICD-9 range 630–639. Because we are interested in fetal outcomes following pharmacological exposure, we only included females with at least one medication prescribed up to 15 months before fetal loss. A female may have more than one fetal loss code occurring on two separate dates (often during the course of a single hospital visit); therefore we collapsed dates to the month level. For our control population, we used women with a successful fetal outcome (e.g., single live birth) recorded at CUMC-NYPH with at least one medication prescribed up to 15 months prior to birth and who had no diagnosis of fetal loss recorded at CUMC-NYPH and whose infant was without chromosomal abnormality. According to the CDC, 17.0% of conceptions resulted in miscarriage and 18.4% ended in legal termination in 2008⁴². Because we define fetal loss to include both spontaneous abortion and legal termination, we expect a background rate of 35.4%.

Pharmacological Drug Information

The FDA pregnancy categories for all drugs included in our study were extracted from uptodate.com⁴³ and drugs.com⁴⁴. While the FDA has recently updated this labeling system and moved away from the A-X categorization schema⁴⁵, we chose to use it in our study because it allows researchers and physicians to easily identify drugs with unknown fetal effects (the category C drugs). If a particular drug-combo was not listed with its own FDA pregnancy category designation then we used the most severe pregnancy category from each drug in the combo. We also mapped each drug to its first-level class within the Anatomical Therapeutic Chemical (ATC) classification system, which categorizes drugs based on their organ system effects. We also extracted the Mendelian genes either inhibited or affected (regardless of mechanism) for each drug using the Online Mendelian Inheritance in Man (OMIM) (URL: https://www.omim.org/). Because drugs targeting genes involved in vitamin processes may affect fetal risk (either protective or injurious), we also identified drugs that target at least one vitamin-related gene as noted on DisGeNET – a disease-gene association network (URL: http://www.disgenet.org/).

We are interested in finding drugs that increase or decrease the risk of fetal loss following prenatal exposure. However, some medications are used to induce legal termination or to treat subsequent conditions (e.g., hemorrhage, excessive bleeding, pain). These drugs could bias our analyses; therefore, we identified drugs given to women where the first prescription of the drug was the same day as the legal termination. We calculated the proportion of legal terminations where a given prescription drug was first prescribed out of those terminations where prescription information was available. All drugs with at least 2% frequency were labeled as ‘drugs typically prescribed with legal termination’.

Statistical Analysis

Identifying Trimester of Drug Exposure

For pregnancies that resulted in a single live birth, we used the average gestation period (i.e., 38 weeks) as reported by the Centers for Disease Prevention and Control (CDC)⁴⁶. We then divided the 38-week pregnancy into three equal-sized periods (12.67 weeks each) as ‘trimesters’. For pregnancies that resulted in fetal loss, we used the average time to fetal loss. CDC reported that 91.6% of legal terminations occur within 13 weeks gestation with many other forms of fetal loss occurring prior to 13 weeks as well⁴⁷. Therefore, an exposure could have only occurred during the first trimester (i.e., one 12.67 week period). We also investigated two pre-conception periods (each 3 months in size) where exposures could occur both for the fetal loss and congenital anomaly cohorts. This was to investigate the presence or absence of a drug pre-conception effect.

Classifying Category C Drugs Into Harmful and Non-Harmful Pregnancy Categories

We only investigated drugs with at least 50 pregnancies across all five-exposure periods (e.g., first trimester, second trimester) to minimize statistical anomalies due to low data. We excluded all drugs classified as FDA pregnancy category N (i.e., Not Classified) or drugs that were ‘Not Listed’. For visualization purposes, we performed Multi-Dimensional Scaling (MDS) component analysis to assess the relationship between the proportion of fetal loss (or proportion with congenital anomaly depending on the model) per trimester of exposure to illustrate the relationship between adverse fetal outcomes and FDA pregnancy category. We also visualized only drugs known to be prescribed with legal termination to determine where in each of the visualizations those drugs appeared.

Logistic Regression

We first performed a logistic regression model to predict a binary pregnancy category either ‘Detrimental to Fetus – D or X’ or ‘Not Harmful to Fetus – A or B’. We built three models – one model for fetal loss, a second for congenital anomalies, and a third for minor congenital anomalies only. This allowed us to determine Odds Ratios (OR) and significance in a full model. The full model includes 29 features: one for each of 14 ATC classifications, 5 features indicating the number exposed during each trimester category (3 trimesters plus two 3-month periods for the pre-conception period), 5 features indicated the proportion of exposed with an anomaly per trimester category, 1 binary indicator variable for whether Mendelian genes were inhibited (from OMIM), 1 binary indicator variable whether Mendelian genes were affected (from OMIM), one binary indicator variable for whether vitamin genes are affected (from DisGeNET), one binary indicator variable for whether or not the drug could be used as a prenatal supplement (e.g., vitamin, mineral, glucose), and one binary indicator variable for whether or not the drug was a treatment for nicotine abuse (since exposure to smoking during the prenatal period is a known risk factor for fetal loss and anomalies). For the fetal loss model, we only had 25 features because the majority of fetal losses occurred during the first trimester and therefore we did not have variables for second and third trimester (either proportion of anomalies or exposed).

Random Forest Classifier

We constructed a random forest model to classify both fetal loss and congenital anomalies (separately) with 2000 trees using all possible features. Out-Of-Bag (OOB) error rates were estimated to assess the quality of each model. Features were ranked using the Mean Decrease in Accuracy (MDA) with more informative features having higher MDAs. This allowed us to assign probabilities for each drug as being harmful (similar to a category D or X drug) or safe (similar to a category A or B drug). We compared a drug’s probability of being harmful from each model for drugs with known FDA status and those with no recommendation (i.e., FDA category C drugs). Code was implemented using R version 3.3.0.

References

Dally, A. Thalidomide: was the tragedy preventable? The Lancet 351, 1197 (1998).
Article CAS Google Scholar
Kim, J. H. & Scialli, A. R. Thalidomide: the tragedy of birth defects and the effective treatment of disease. Toxicological Sciences 122, 1–6 (2011).
Article CAS PubMed Google Scholar
Smithells, R. Thalidomide and malformations in Liverpool. The Lancet 279, 1270–1273 (1962).
Article Google Scholar
Hill, R. M. Drugs ingested by pregnant women. Clinical Pharmacology & Therapeutics 14, 654–659 (1973).
Article CAS Google Scholar
Olesen, C. et al. Drug use in first pregnancy and lactation: a population-based survey among Danish women. European journal of clinical pharmacology 55, 139–144 (1999).
Article CAS PubMed Google Scholar
Nielsen, G. L., Sorensen, H. T., Larsen, H. & Pedersen, L. Risk of adverse birth outcome and miscarriage in pregnant users of non-steroidal anti-inflammatory drugs: population based observational study and case-control study. Bmj 322, 266–270 (2001).
Article CAS PubMed PubMed Central Google Scholar
Andrade, S. E. et al. Prescription drug use in pregnancy. American journal of obstetrics and gynecology 191, 398–407 (2004).
Article PubMed Google Scholar
Khare, R. et al. A longitudinal analysis of data quality in a large pediatric data research network. J Am Med Inform Assoc, doi:https://doi.org/10.1093/jamia/ocx033 (2017).
Cranor, C. f. D. Y. Want to Bet Your Children’s Health on Post-Market Harm Principles-An Argument for a Trespass or Permission Model for Regulating Toxicants. Vill. Envtl. LJ 19, 251 (2008).
Google Scholar
Muanda, F. T., Sheehy, O. & Bérard, A. Use of antibiotics during pregnancy and the risk of major congenital malformations: A population based cohort study. British Journal of Clinical Pharmacology (2017).
Faherty, L. J. et al. Movement patterns in women at risk for perinatal depression: use of a mood-monitoring mobile application in pregnancy. Journal of the American Medical Informatics Association (2017).
Gordon, M., Henderson, R., Holmes, J. H., Wolters, M. K. & Bennett, I. M. Participatory design of ehealth solutions for women from vulnerable populations with perinatal depression. Journal of the American Medical Informatics Association 23, 105–109 (2016).
Article PubMed Google Scholar
Boland, M. R., Shahn, Z., Madigan, D., Hripcsak, G. & Tatonetti, N. P. Birth month affects lifetime disease risk: a phenome-wide method. Journal of the American Medical Informatics Association 22, 1042–1053 (2015).
Article PubMed PubMed Central Google Scholar
Woolery, L. K. & Grzymala-Busse, J. Machine learning for an expert system to predict preterm birth risk. Journal of the American Medical Informatics Association 1, 439–446 (1994).
Article CAS PubMed PubMed Central Google Scholar
Mani, S. et al. Medical decision support using machine learning for early detection of late-onset neonatal sepsis. Journal of the American Medical Informatics Association 21, 326–336, https://doi.org/10.1136/amiajnl-2013-001854 (2014).
Article PubMed Google Scholar
Meyerhoefer, C. D. et al. A mixed methods study of clinical information availability in obstetric triage and prenatal offices. Journal of the American Medical Informatics Association 24, e87–e94 (2017).
PubMed Google Scholar
Schaff, E. A. et al. Low-dose mifepristone 200 MG and vaginal misoprostol for abortion. Contraception 59, 1–6 (1999).
Article CAS PubMed Google Scholar
Ericson, A. & Källén, B. A. J. Nonsteroidal anti-inflammatory drugs in early pregnancy. Reproductive Toxicology 15, 371–375, https://doi.org/10.1016/S0890-6238(01)00137-X (2001).
Article CAS PubMed Google Scholar
Ofori, B., Oraichi, D., Blais, L., Rey, E. & Bérard, A. Risk of congenital anomalies in pregnant users of non‐steroidal anti‐inflammatory drugs: A nested case‐control study. Birth Defects Research Part B: Developmental and Reproductive Toxicology 77, 268–279 (2006).
Article CAS PubMed Google Scholar
Hernandez, R. K., Werler, M. M., Romitti, P., Sun, L. & Anderka, M. Nonsteroidal antiinflammatory drug use among women and the risk of birth defects. American Journal of Obstetrics and Gynecology 206, 228.e221–228.e228, https://doi.org/10.1016/j.ajog.2011.11.019 (2012).
Google Scholar
Bloor, M. & Paech, M. Nonsteroidal anti-inflammatory drugs during pregnancy and the initiation of lactation. Anesthesia & Analgesia 116, 1063–1075 (2013).
Article CAS Google Scholar
Torres, M. & Nieves, J. A. Progress in congenital cardiac care for newborns and infants: the emerging role of “off-label” medications. Newborn and Infant Nursing Reviews 9, 18–30 (2009).
Article Google Scholar
Buck, M. L. & Rudis, M. Clinical experience with ketorolac in children. Annals of Pharmacotherapy 28, 1009–1013 (1994).
Article CAS PubMed Google Scholar
Moffett, B. S., Wann, T. I., Carberry, K. E. & Mott, A. R. Safety of ketorolac in neonates and infants after cardiac surgery. Pediatric Anesthesia 16, 424–428 (2006).
Article PubMed Google Scholar
Cooper, L. Z. & Krugman, S. Clinical manifestations of postnatal and congenital rubella. Archives of Ophthalmology 77, 434–439 (1967).
Article CAS PubMed Google Scholar
Webster, W. S. Teratogen update: congenital rubella. Teratology 58, 13–23 (1998).
Article CAS PubMed Google Scholar
Swan, C., Tostevin, A., Moore, B., Mayo, H. & Black, G. B. Congenital Defects in Infants following Infectious Diseases during Pregnancy. With special reference to the Relationship between German Measles and Cataract, Deaf-Mutism, Heart Disease and Microcephaly, and to the Period of Pregnancy in which the Occurrence of Rubella is followed by Congenital Abnormalities. Medical journal of Australia 2, 201–210 (1943).
Google Scholar
Rudolph, A. J. et al. Transplacental rubella infection in newly born infants. JAMA 191, 843–845 (1965).
Article CAS PubMed Google Scholar
Naeye, R. L. & Blanc, W. Pathogenesis of congenital rubella. JAMA 194, 1277–1283, https://doi.org/10.1001/jama.1965.03090250011002 (1965).
Article CAS PubMed Google Scholar
Ergenoglu, A. M. et al. Rubella vaccination during the preconception period or in pregnancy and perinatal and fetal outcomes. The Turkish journal of pediatrics 54, 230 (2012).
PubMed Google Scholar
Brennan, M. C. & Rayburn, W. F. Counseling about risks of congenital anomalies from prescription opioids. Birth Defects Research Part A: Clinical and Molecular Teratology 94, 620–625 (2012).
Article CAS PubMed Google Scholar
Lee, K. C., Korgavkar, K., Dufresne, R. G. & Higgins, H. W. Safety of cosmetic dermatologic procedures during pregnancy. Dermatologic Surgery 39, 1573–1586 (2013).
Article CAS PubMed Google Scholar
Peterson, H. d. C. Acquired methemoglobinemia in an infant due to benzocaine suppository. New England Journal of Medicine 263, 454–455 (1960).
Article CAS PubMed Google Scholar
Boland, M. & Tatonetti, N. Investigation of 7-dehydrocholesterol reductase pathway to elucidate off-target prenatal effects of pharmaceuticals: a systematic review. The pharmacogenomics journal 16, 411–429 (2016).
Article CAS PubMed PubMed Central Google Scholar
Korade, Ž. et al. Effect of psychotropic drug treatment on sterol metabolism. Schizophrenia Research, doi:https://doi.org/10.1016/j.schres.2017.02.001 (2017).
Goh, Y., Bollano, E., Einarson, T. & Koren, G. Prenatal multivitamin supplementation and rates of pediatric cancers: a meta‐analysis. Clinical Pharmacology & Therapeutics 81, 685–691 (2007).
Article CAS Google Scholar
Sunitha, T. et al. Risk factors for congenital anomalies in high risk pregnant women: A large study from South India. Egyptian Journal of Medical Human Genetics 18, 79–85, https://doi.org/10.1016/j.ejmhg.2016.04.001 (2017).
Article Google Scholar
Rozenberg, P. et al. A randomized trial that compared intravaginal misoprostol and dinoprostone vaginal insert in pregnancies at high risk of fetal distress. American Journal of Obstetrics and Gynecology 191, 247–253, https://doi.org/10.1016/j.ajog.2003.12.038 (2004).
Article CAS PubMed Google Scholar
Strom, B. L. et al. Detecting pregnancy use of non-hormonal category X medications in electronic medical records. Journal of the American Medical Informatics Association 18, i81–i86, https://doi.org/10.1136/amiajnl-2010-000057 (2011).
Article PubMed PubMed Central Google Scholar
NYSDOH. Congenital Malformations Registry - Summary Report. Appendix 1: Classification of Codes. https://www.health.ny.gov/diseases/congenital_malformations/2002_2004/appendices.htm Accessed on 11/30/2016 (2007).
Stevenson, R. E., Solomon, B. D. & Everman, D. B. Human malformations and related anomalies. (Oxford University Press, 2015).
CDC, Ventura, S. J., Curtin, S. C., Abma, J. C. & Henshaw, S. K. Estimated Pregnancy Rates and Rates of Pregnancy Outcomes for the United States, 1990–2008. National Vital Statistics Reports https://www.cdc.gov/nchs/data/nvsr/nvsr60/nvsr60_07.pdf (2012).
UpToDate. Accessed in December, 2016 and January 2017 uptodate.com (2017).
Drugs.com. Accessed in December, 2016 and January 2017 drugs.com (2017).
Boothby, L. A. & Doering, P. L. FDA labeling system for drugs in pregnancy. Annals of Pharmacotherapy 35, 1485–1489 (2001).
Article CAS PubMed Google Scholar
CDC. Measuring Gestational Age in Vital Statistics Data: Transitioning to the Obstetric Estimate. National Vital Statistics Reports http://www.cdc.gov/nchs/data/nvsr/nvsr64/nvsr64_05.pdf (2015).
CDC. CDCs Abortion Surveillance System FAQs: Abortion Surveillance—Findings and Reports. Reproductive Health https://www.cdc.gov/reproductivehealth/data_stats/abortion.htm (2017).

Download references

Acknowledgements

MRB is supported by generous funding by the Perelman School of Medicine, University of Pennsylvania. MRB was supported by the National Center for Advancing Translational Sciences, National Institute of Health through TL1TR001875 from Jul. 2016 – Jun. 2017 and by R01 GM107145. FP is supported by AHRQ R01H5021816. NPT is supported through the following awards: R01 GM107145, OT3 TR002027, and an award from the Herbert and Florence Irving Foundation. The content is solely the responsibility of the authors and does not represent the official views of the NIH.

Author information

Authors and Affiliations

Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, USA
Mary Regina Boland
Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA
Mary Regina Boland
Center of Excellence in Environmental Toxicology, University of Pennsylvania, Philadelphia, USA
Mary Regina Boland
Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, USA
Mary Regina Boland
Department of Biomedical Informatics, Columbia University, New York, USA
Mary Regina Boland, Fernanda Polubriaginof & Nicholas P. Tatonetti
Department of Medicine, Columbia University, New York, USA
Mary Regina Boland, Fernanda Polubriaginof & Nicholas P. Tatonetti
Department of Systems Biology, Columbia University, New York, USA
Mary Regina Boland, Fernanda Polubriaginof & Nicholas P. Tatonetti
Observational Health Data Sciences and Informatics, Columbia University, New York, USA
Mary Regina Boland & Nicholas P. Tatonetti

Authors

Mary Regina Boland
View author publications
You can also search for this author in PubMed Google Scholar
Fernanda Polubriaginof
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas P. Tatonetti
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceived Study Design: M.R.B., N.P.T. Provided information, insights to study design: M.R.B., F.P., N.P.T. Wrote Paper: M.R.B. Reviewed, Edited, and Approved Final Manuscript: M.R.B., F.P., N.P.T.

Corresponding authors

Correspondence to Mary Regina Boland or Nicholas P. Tatonetti.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplemental Information

Supplemental Dataset 1

Supplemental Dataset 2

Supplemental Dataset 3

Supplemental Dataset 4

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Boland, M.R., Polubriaginof, F. & Tatonetti, N.P. Development of A Machine Learning Algorithm to Classify Drugs Of Unknown Fetal Effect. Sci Rep 7, 12839 (2017). https://doi.org/10.1038/s41598-017-12943-x

Download citation

Received: 05 July 2017
Accepted: 08 September 2017
Published: 09 October 2017
DOI: https://doi.org/10.1038/s41598-017-12943-x

This article is cited by

Use of Electronic Health Record Data for Drug Safety Signal Identification: A Scoping Review
- Sharon E. Davis
- Luke Zabotka
- Joshua C. Smith
Drug Safety (2023)
Machine Learning Within Studies of Early-Life Environmental Exposures and Child Health: Review of the Current Literature and Discussion of Next Steps
- Sabine Oskar
- Jeanette A. Stingone
Current Environmental Health Reports (2020)
Enabling pregnant women and their physicians to make informed medication decisions using artificial intelligence
- Lena Davidson
- Mary Regina Boland
Journal of Pharmacokinetics and Pharmacodynamics (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Clinical Cohorts

Pharmacological Drug Dataset

Classifying FDA Category C Drugs As ‘Harmful or ‘Safe’

Logistic Regression

Random Forest Classification of Category C Drugs

Discussion

Drugs Predicted Harmful in Congenital Anomaly Model

Non-Steroidal Anti-Inflammatory Drugs (NSAIDs)

Live Rubella Vaccine

Prescribing Pattern Drop-offs During Pregnancy – Predicted Harmful in Congenital Anomaly Model

Drugs Predicted Harmful in Fetal Loss Model

Drugs That May Inadvertently Induce Fetal Loss: DHCR7 Mechanism

Drugs Treating Symptoms of Fetal Loss

Genetic Targets of Drugs More Predictive Than Classification

Rationale for Using Logistic Regression and Random Forest

Limitations

Conclusion

Materials and Methods

Clinical Cohorts

Maternal Prescription Exposure and Fetal Outcome: Live Birth

Maternal Prescription Exposure and Fetal Outcome: Fetal Loss

Pharmacological Drug Information

Statistical Analysis

Identifying Trimester of Drug Exposure

Classifying Category C Drugs Into Harmful and Non-Harmful Pregnancy Categories

Logistic Regression

Random Forest Classifier

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links