## Introduction

Altered physiology and variable protein abundance of drug metabolizing enzyme and transporter (DMET) proteins in special populations (paediatrics, pregnant women and diseased patients) predispose them to safety or efficacy risks. This necessitates the need for clinical dose optimization of narrow therapeutic index drugs, especially in these categories. It is difficult to perform clinical trials in many populations because of several ethical and logistical reasons1. To fill this gap, in vitro-in vivo extrapolation (IVIVE)-linked physiologically based pharmacokinetic (PBPK) modelling has become a promising tool to predict drug pharmacokinetics (PK) in special populations, based on healthy adult data2. The regulatory agencies, such as U.S. Food and Drug Administration (US FDA), European Medicines Agency (EMA), and Pharmaceuticals and Medical Devices Agency (PMDA), now encourage pharmaceutical companies to use PBPK modelling for safer and efficient clinical drug development3,4.

However, PBPK models are data hungry and require physiological data, such as the abundance of DMET proteins. Particularly, IVIVE of drug disposition using PBPK modelling systems, e.g., GastroPlus, Simcyp and PK-Sim, can be refined further by use of the reported physiological DMET protein levels. Quantitative DMET protein information can be predicted using activity and mRNA data as surrogates. However, poor correlation between mRNA versus protein levels and non-selectivity of activity assays limit their use in IVIVE-PBPK modelling. In addition, cross-reactivity of antibodies across DME isoforms is a limitation of Western blotting technique. Therefore, the use of liquid chromatography-tandem mass spectrometry (LC-MS/MS) based quantitative proteomics, which is considered more precise, reliable and efficient, is rapidly emerging to establish inter-individual abundance of DMET proteins5,6,7. Additionally, availability of multiple organ banks or tissues from commercial sources requires a high-throughput technique such as LC-MS/MS proteomics for simultaneous DMET quantification. There are multiple benefits of such studies, which include: i) establishment of association of DMET abundance with age, sex, genotype, ethnicity and disease, and ii) prediction of in vivo PK profiles from in vitro data using PBPK modelling and simulation (M&S)8,9,10,11.

Unavailability of DMET proteomics data from large sample cohorts is a limitation in this field, which generally requires compilation and interpretation of data collected from different laboratories. However, extensive heterogeneity is observed when results of different laboratories are evaluated. These inter-laboratory variations are attributed to biological as well as methodological differences. True inter-individual variability in the samples due to demographic differences is covered under biological heterogeneity, whereas methodological heterogeneity mainly arises due to technique differences, e.g., quality of samples, analytical method variability, etc. The term, statistical heterogeneity, accounts for combined biological and methodological heterogeneity12. Such inter-laboratory heterogeneity is studied by meta-analysis of DMET abundance data through calculation of weighted mean and percent coefficient of variation (%CV) values13,14,15,16,17. With the availability of tissue samples and LC-MS/MS proteomics, more DMET data are available from last few years, which necessitates continuous updation of meta-analysis results.

We envisaged to compile all the available literature data on DMET protein abundances, and offer the repository to the users in the form of an Excel spreadsheet. The repository, so developed, was meant to be uploaded into an existing user-friendly open-access online QPrOmics database (http://qpromics.uw.edu/qpromics/data/) of the University of Washington, Seattle, USA. It was also envisioned to update meta-analysis of the individual DMET protein abundance data in the repository, especially of non-cytochrome P450 (non-CYP) enzymes, such as uridine 5′-diphospho-glucuronosyltransferases (UGTs), carboxylesterases (CESs) and flavin-containing monooxygenases (FMOs). The %CV values, ought to be calculated using three different methods, were planned to be integrated into a PBPK model for describing variability in PK of lamotrigine, a model drug, in the adult population. After validating the healthy adult lamotrigine PBPK model with multiple clinical PK data in literature, the model was intended to be extrapolated to paediatric and hepatic impaired (HI) populations, by integrating changes in protein abundance due to age and HI, respectively. All these expected activities were successfully accomplished. The details are discussed herein.

## Results

### Compilation of DMET protein abundance data

We compiled literature reported protein abundance data of 55 DMEs (30 Phase I and 25 Phase II) and 104 membrane transporters (67 uptake and 37 efflux) (Fig. 1 and Supplementary Table S1). Wherever available, detailed demographic information, such as age, sex, ethnicity, genotype and disease conditions was also collated. The information in Supplementary Table S1 includes mean or median values of abundance, standard deviation (SD), range (minimum to maximum), %CV, abundance units, analytical method, relative or absolute quantification method, and references to the data source. The repository has been uploaded online as an open access user-friendly QPrOmics database at http://qpromics.uw.edu/qpromics and it is being updated regularly. The search from database is possible through either protein name, gene name or UniProt ID, which can be further refined by selecting the tissue of interest from various organs, viz., liver, intestine, kidney, brain and lungs. The search output information from website can be downloaded as an Excel spreadsheet (representative example shown in Supplementary Fig. S1).

### Assessment of heterogeneity in the abundance of non-CYP enzymes through meta-analyses

In the case of three non-CYP enzymes, viz., UGT1A4, UGT2B7 and CES1, the heterogeneity constant (H2) value was more than 1.5 in the case of fixed effect (FE) model and the same was below 1.2 in the random effect (RE) model. The H2 values more than 1.5 mean heterogeneity concern. Also, the FE model showed high to medium heterogeneity based on heterogeneity index (I2) data (Table 1). Considering this, we concluded that the inter-laboratory variability superseded true biological variability for these enzymes (Table 1). In UGT2B10, medium heterogeneity was observed for both FE and RE models. However, the result was shown to be statistically insignificant (P-value > 0.05) based on chi squared (χ2) distribution of the coefficient of heterogeneity (Q). The heterogeneity was low for all other enzymes (Table 1).

### Determination of weighted mean and percent coefficient of variation values

The weighted mean values of abundance (pmol/mg microsomal protein) of non-CYP enzymes were determined to be in the following order: 1252.93 (CES1) > 75.21 (UGT2B7) > 49.43 (UGT2B4) > 46.77 (UGT1A4) > 41.50 (UGT2B15) > 35.97 (UGT1A1) > 29.71 (UGT1A6) > 29.33 (FMO3) > 27.38 (UGT1A9) > 25.01 (UGT1A3) > 24.63 (FMO5) > 14.72 (UGT2B10) > 5.84 (UGT2B17). The data for individual studies are shown in Supplementary Table S2.

In general, the value of %CV or 95% confidence interval (CI), estimated using method II, was higher in comparison to methods I and III. Other observed advantage of method II was that it remained uninfluenced even when the studies were few in number. Also, it gave 95% CI range, which captured the observed variability. The meta-analyses results, including the weighted mean and %CV values for all non-CYP enzymes considered in the studies, are depicted in Table 2. The forest plots in Figs 2 and 3 provide a visual representation of the inter-laboratory variability across the mean for individual enzymes.

### Prediction of the pharmacokinetics of lamotrigine in healthy adults and special populations

Figure 4 shows the predicted intravenous (IV) and peroral (PO) PK profiles of lamotrigine in healthy adults. The model was validated across various dosage forms (tablet, solution and capsule) and for an ascending dose of a capsule dosage form, and the resultant overlapping profiles are included in the same figure. The simulated lamotrigine plasma exposure parameters for both IV and PO studies were within the acceptance criteria (Table 3). Further, predicted results of lower and higher 95% CI values around the geometric weighted mean abundance, calculated using method II in adults, reasonably captured the variability in the observed data.

The validated model was further extrapolated to predict PK in paediatric and HI populations, considering all the physiological changes including adjusted maximum velocity of the kinetic reaction (Vmax) and renal plasma clearance (CLR) values. The predicted results were within the range of observed clinical data (Table 3 and Fig. 4).

## Discussion

PBPK M&S is a main component of the model‐informed drug development (MIDD) and model‐informed precision dosing (MIPD)18. It finds application starting from first in human (FIH) dose selection, clinical study design onto dosing recommendations regarding drug interaction and pharmacogenetic effect in product labeling4. In that respect, M&S is targeted to reduction and/or replacement of human/animal studies2,19. Other benefit is that it facilitates benefit/risk assessment, whereby it enhances the likelihood of regulatory success. It is for these reasons that PBPK modelling is being currently widely encouraged even by the regulatory agencies, like US FDA20, EMA21 and PMDA3. For example, PBPK approaches have been included in regulatory guidance on drug-drug interactions (DDIs)22,23,24, paediatrics25, HI26, renal impairment (RI)27 and pharmacogenetics28,29 as a means to guide clinical study design and labeling decisions. Between 2008 and 2017, the FDA’s Office of Clinical Pharmacology (OCP) received 130 investigational new drug (IND) applications and 94 new drug applications (NDAs) containing PBPK analyses30. The utility of PBPK analyses in these regulatory submissions was primarily to assess enzyme-based DDIs (60%), followed by applications in paediatrics (15%), DDI with transporter (7%), HI (6%), RI (4%), absorption including food effect (4%), and pharmacogenetics (2%)30.

The successful application of PBPK modelling, which meets regulatory expectations, requires integration of drug-specific properties (molecular weight, pKa, logP, pH dependent solubility, apparent permeability, fraction of drug unbound in plasma (fup), blood to plasma drug concentration ratio (B:P), etc.) with various physiology parameters (cardiac output, specific organ volume, tissue compositions, DMET abundance, transit times for luminal contents, etc.)31. With their help, the drug’s PK profile can be predicted and the same can be extrapolated across special populations, like paediatrics, pregnant women, maternal-fetal, HI and RI, etc.

Fortunately, drug specific properties such as solubility, permeability, enzyme and transport kinetics are extensively evaluated during early drug development. Further, for existing drugs, databases of drug specific properties are available in plenty, and some of them are freely accessible, like regulatory labels, DrugBank, etc. Good amount of information can also be accessed through material safety data sheets (MSDS).

The compilations and databases outlining body anatomy and physiology (e.g., tissue weights, blood flows to organs, tissue composition, etc.) in healthy and few special populations have been curated in the past several years. While individual physiology databases have been developed for Japanese32,33, Chinese34, and Indian populations35, a better geographically spread compilation is from 5-year effort conducted under the aegis of the International Atomic Energy Agency (IAEA), which accounted for the characteristics of populations in Bangladesh, China, India, Japan, Republic of Korea, Pakistan, Philippines, and Vietnam36. Similarly, National Center for Health Statistics (NCHS) designed a program, named National Health and Nutrition Examination Survey (NHANES), in order to assess the health and nutritional status of adults and children in the United States37,38. In the same way, Valentin compiled all the information on age- and gender-related differences in the anatomical and physiological characteristics of Western Europeans and North Americans, published earlier by the International Commission on Radiological Protection (ICRP) in 197539. Another such attempt was made by Thompson et al. who compiled data from reported studies, including age-specific and clearance-related parameters in healthy and disease states40.

The clearance of drugs is primarily governed by DMET proteins, and hence the abundance of the latter has direct bearing on prediction of drug’s PK profiles and extrapolation to special populations. This is because abundance of DMET protein varies with demographic, biological and genetic factors, such as age, sex, ethnicity, disease condition and genotype1. This necessitates the availability of a repository containing quantitative information of DMET proteins. Therefore, the primary goal of the present study was to develop an online public repository that compiled the literature reported data on DMET proteins in various human tissues. Another target was to collate the information on the effect of associated covariates.

During the process of compiling the DMET abundance data, we observed vast inter-laboratory variability, which was higher than the anticipated biological variability41. This highlighted the need to derive more robust conclusions by performing meta-analyses, which provides good assessment of heterogeneity, and the calculated values of weighted mean and %CV, which can be integrated into PBPK modelling to predict the variability in PK13,14,15,16,42.

To assess heterogeneity as a part of meta-analyses, both FE and RE models were applied in the present study. Our results for UGT enzymes based on FE model (Table 1) were consistent with the previously published meta-analysis studies on same set of enzymes14. However, we observed that for two enzymes, where heterogeneity was evident through high I2 in the FE model, the RE model displayed none or low statistical heterogeneity. This indicated that FE model was perhaps a simpler model in describing statistical heterogeneity in this set of reported DME abundance data. A critical analysis of individual DME abundance studies showed major role of methodological heterogeneity, in the terms of sample source; its procurement and storage (frozen versus fresh tissue); sectioning procedure; sample preparation, and the technique of analysis. Among the latter, conventional immunoquantification-based methods like Western blotting, enzyme-linked immunosorbent assay (ELISA) and microarray can be considered less selective and low throughput than LC-MS/MS based quantification. However, inter-laboratory variability is also observed with LC-MS/MS proteomics, which can be attributed to the use of different peptides, differential protein extraction recovery, digestion efficiency and other methodological factors5,41,43. Therefore, harmonization of protein abundance determination protocol across laboratories is warranted.

The meta-analysis of UGT1A4 was in congruence with the large inter-individual variability in observed clinical PK of lamotrigine, which was successfully captured in the model by making use of %CV values of UGT1A4 protein abundance, obtained through method II (Fig. 4 and Table 3). The high inter-individual variability of this particular DME has been held responsible for adverse effects of lamotrigine, such as benign rashes, gastrointestinal disturbances and multi-organ failure associated with Steven Johnson syndrome44,45. In other reports, the reason for the observed variability has been attributed to underlying factors, such as age, gender, weight, co-medication, and state of renal and hepatic function46,47,48,49,50, highlighting the possibility of population effects.

Also, in the case of lamotrigine, therapeutic drug monitoring (TDM) is resorted for dose adjustment. Its serum concentration of 2.5–15 µg/mL is considered to be efficient and safe51,52,53. The drug is metabolized mainly by glucuronidation pathway, whose contribution is 86% and around 4% unidentified metabolites are formed54. The remaining 10% of drug dose is excreted unchanged in the urine54. The DMEs reported to be involved in lamotrigine metabolism are UGT1A346, UGT1A446 and UGT2B755. However, more recent reports observed that UGT2B7 had no role in lamotrigine glucuronide formation46. Amongst UGT1A3 and UGT1A4, the latter is involved in ten-fold higher intrinsic clearance of the drug as compared to the former46. The neonatal level of UGT1A4 is ~50-fold lower than adult level56. Further, UGT1A4 abundance in alcoholic and hepatitis C virus (HCV) cirrhotic liver samples has been reported to be 12- and 4-fold lower than healthy liver samples, respectively9.

To describe UGT1A4 mediated variability in lamotrigine PK, we developed and validated a whole-body PBPK model of the drug using GastroPlus software. The predicted results were successfully able to capture, in particular, the elimination phase (Fig. 4), which is directly affected by the variability in UGT1A4. However, the absorption phase was not well captured. A high variability was also observed in lamotrigine absorption in clinic. The primary cause of the same remains to be ascertained.

For prediction of PK profile of lamotrigine in special population, age- and disease-dependent UGT1A4 abundance was integrated into the PBPK model, which well predicted the drug’s PK, even in the selected population. The differential protein abundance of UGT1A3 was not considered here, because of its small contribution to the metabolic clearance of lamotrigine and the lack of data. The extrapolated model well captured the PK parameters in children, including early and middle childhood and the obtained results were comparable to clinically reported PK values. In the case of HI population, while the PK parameters (AUC) were reasonably predicted, the simulated profile of lamotrigine was visually different than the observed clinical data48. This was perhaps due to the unknown etiology of the liver disease and its influence on the observed PK data reported in the clinical study. Moreover, the protein abundance of UGT1A4 is known to vary between alcoholic and HCV cirrhotic liver tissues9.

To summarize, a comprehensive repository of DMET protein abundance data was developed. Meta-analysis was successfully carried out on the compiled information to estimate overall variability (%CV) of protein abundance, and the influence of the latter on variability in PK profiles was established, taking lamotrigine as a model drug. The developed model was extrapolated to predict PK of lamotrigine in paediatric and HI populations.

## Methods

### In silico tools

Numerical values of abundance were extracted from the reported figures using GetData Graph Digitizer version 2.25 (http://getdata-graph-digitizer.com/). MySQL open source relational database management system (Cupertino, CA, USA) was used as a platform for the QPrOmics database. All the simulations for PBPK model development of lamotrigine were carried out in GastroPlus version 9.6 (Simulations Plus, Inc., Lancaster, California, USA). Figures for visual representation of statistical and simulation data were created using Excel 2016 (Microsoft, Redmond, WA); the same software was used for meta-analyses, and its in-build statistical function ‘CHIDIST’ was used to calculate P-values.

### Compilation of published DMET protein abundance data

Human DMET protein abundance data in different tissues with demographic details were from published literature that was searched through online search engines like PubMed, Google Scholar, Microsoft Academic, ScienceDirect, etc. For relevant search, the keyword combinations used were: drug metabolizing enzyme/transporter + abundance/expression + words like quantification/quantitation, LC-MS, LC-MS/MS, liquid chromatography-high resolution mass spectrometry (LC-HRMS), proteomics, or Western blotting/immunoblotting. Also the terms, such as quantity, concentration, content, quantification or measurement, were used as substitutes for the term “abundance/expression” to widen the search scope. The cross-references of individual articles were critically reviewed for any additional reported data. All available information till January 2019 for tissue distribution, donor demographics (including age, gender, ethnicity, genotype, disease, smoking, alcohol consumption and medication) and analytical methods employed were collated.

### Meta-analyses of protein abundance data of non-CYP enzymes

To demonstrate the utility of the database, a systematic meta-analysis was performed as per the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline57 to establish the overall abundance of non-CYP enzymes. The inclusion of reported data in meta-analyses was based on the following pre-defined criteria: i) considering only individual microsomal samples, excluding data from pooled donor samples; ii) taking absolute protein abundance values that were quantified by LC-MS/MS or Western blotting (immunoblotting), and excluding LC-MS global proteomics, mRNA expression levels and enzyme activity data, iii) including studies reporting data in picomole (pmol) per mg protein unit, but excluding studies with abundance data in arbitrary, relative or non-standard units; and iv) adding only those proteins where there were more than one reports from different laboratories.

Accordingly, for the purpose of this publication, only 18 out of 220 studies were included in the meta-analyses (Supplementary Fig. S2). The selected studies covered protein abundance data of the following non-CYP enzymes: UGT1A1, UGT1A3, UGTA4, UGT1A6, UGT1A9, UGT2B4, UGT2B7, UGT2B10, UGT2B15, UGT2B17, CES1, FMO3 and FMO5. The data were subjected to heterogeneity tests (weighting by inverse variance) to investigate intra- and inter-laboratory variability, which included determination of H2, I2, P-value and heterogeneity class13,14,17,58. Thereafter, the data were subjected to calculation of weighted mean (weighting by sample size)13,14,17, determination of %CV using three methods I–III. Eventually, 95% geometric CI, calculated using %CV (method II), was incorporated into the adult PBPK model of lamotrigine to explain the variability in its PK.

### Assessment of heterogeneity across studies

Summary estimates of studies and coefficient of heterogeneity were determined by FE and RE meta-analysis, to assess heterogeneity in data of different studies. The basic assumption for FE model is that the studies conducted are virtually identical (e.g., same study design, experimental conditions, etc.). On the other hand, the RE model assumes that the observed results may vary from study to study and follow certain distribution pattern59.

For FE model, summary estimate (μF) of studies was calculated using Equation 113,17,58.

$${\mu }_{{\rm{F}}}{\mathtt{=}}\sum _{{\rm{j}}{\mathtt{=}}{\mathtt{1}}}^{{\rm{k}}}{{\rm{w}}}_{{\rm{j}}}\cdot {{\rm{X}}}_{{\rm{j}}}{\mathtt{/}}\sum _{{\rm{j}}{\mathtt{=}}{\mathtt{1}}}^{{\rm{k}}}{{\rm{w}}}_{{\rm{j}}}$$
(1)

where, wj is the FE weight of the study j, calculated as inverse of variance [wj = 1/(SDj)2]; and Xj represents the mean abundance of a particular non-CYP enzyme for individual study j.

Further, Equation 2 was used to determine heterogeneity by FE model58.

$${{\rm{Q}}}_{{\rm{F}}}=\sum _{{\rm{j}}=1}^{{\rm{k}}}{{\rm{w}}}_{{\rm{j}}}\cdot {({{\rm{X}}}_{{\rm{j}}}-{{\rm{\mu }}}_{{\rm{F}}})}^{2}$$
(2)

where, QF [Cochran χ2-based Q test60] represents the coefficient of heterogeneity for the FE model. In RE model, summary estimate (μR) was calculated using Equation 3.

$${{\rm{\mu }}}_{{\rm{R}}}=\sum _{{\rm{j}}=1}^{{\rm{k}}}{{\rm{w}}}_{{\rm{j}}}^{\ast }\cdot {{\rm{X}}}_{{\rm{j}}}/\sum _{{\rm{j}}=1}^{{\rm{k}}}{{\rm{w}}}_{{\rm{j}}}^{\ast }$$
(3)

where, $${{\rm{w}}}_{{\rm{j}}}^{\ast }$$ is the RE weight of the study j, calculated by formula $${{\rm{w}}}_{{\rm{j}}}^{\ast }=1/({{\rm{w}}}_{{\rm{j}}}^{-1}+{\hat{{\rm{\tau }}}}^{2})$$. Herein $${\hat{{\rm{\tau }}}}^{2}$$ is the between-study heterogeneity estimator, which was obtained using QF and degrees of freedom (df; calculated as k-1, where k is the number of studies) by Equation 461.

$${\hat{{\rm{\tau }}}}^{2}=\frac{{{\rm{Q}}}_{{\rm{F}}}-{\rm{df}}}{{\sum }_{{\rm{j}}=1}^{{\rm{k}}}{{\rm{w}}}_{{\rm{j}}}-(\frac{{\sum }_{{\rm{j}}=1}^{{\rm{k}}}{{{\rm{w}}}_{{\rm{j}}}}^{2}}{{\sum }_{{\rm{j}}=1}^{{\rm{k}}}{{\rm{w}}}_{{\rm{j}}}})}$$
(4)

The coefficient of heterogeneity of RE meta-analysis (QR)58 was estimated when QF > df (Equation 5). Otherwise $${\hat{{\rm{\tau }}}}^{2}$$ value was considered as zero, thus implying that RE meta-analysis would lead to same results as those obtained in FE meta-analysis.

$${{\rm{Q}}}_{{\rm{R}}}=\sum _{{\rm{j}}=1}^{{\rm{k}}}{{\rm{w}}}_{{\rm{j}}}^{\ast }\cdot {({{\rm{X}}}_{{\rm{j}}}-{{\rm{\mu }}}_{{\rm{R}}})}^{2}$$
(5)

Further, heterogeneity was determined through H2 and I2 indices for both FE and RE models using Equations 6 and 7, respectively.

$${{\rm{H}}}^{2}=\frac{{\rm{Q}}}{{\rm{df}}}$$
(6)
$${{\rm{I}}}^{2}( \% )=\frac{{{\rm{H}}}^{2}-1}{{{\rm{H}}}^{2}}\,\cdot 100$$
(7)

where, Q is coefficient of heterogeneity, defined as QF and QR for FE and RE models, respectively. As mentioned in discussion section, H2 values more than 1.5 generally cause considerable heterogeneity concern, while values below 1.2 are of little concern for heterogeneity49. The I2 index provides a percentage of overall variability between individual studies, with values 0%, ~25%, ~50% and ~75% classified as none, low, medium and high heterogeneity, respectively13,14. When I2 is negative, it is set to zero.

Further, P-values for determining the statistical significance of analysis were calculated using chi-squared (χ2) distribution of the Q and df values13,14.

### Calculation of weighted means and coefficient of variation

The meta-analysis was carried out considering weighting by sample size13,17,62. The weighted mean (WM) was calculated using Equation 8:

$${\rm{WM}}=\sum _{{\rm{j}}=1}^{{\rm{k}}}{{\rm{n}}}_{{\rm{j}}}\cdot {{\rm{X}}}_{{\rm{j}}}/{\rm{N}}$$
(8)

where, N is the number of samples in all studies $$({\rm{N}}=\sum _{{\rm{j}}=1}^{{\rm{k}}}{{\rm{n}}}_{{\rm{j}}})$$ and nj is the number of samples in study j.

Amongst the three methods I-III used for calculation of %CV, variance (ν) and WM in method I were correlated as per Equation 9.

$$\% \mathrm{CV}=\frac{{\rm{Overall}}\,{\rm{SD}}}{{\rm{WM}}}\cdot 100=\frac{\sqrt{{\rm{\nu }}}}{{\rm{WM}}}\cdot 100$$
(9)

where, ν was calculated62 in accordance with Equation 10:

$$\nu =\sum _{{\rm{j}}=1}^{{\rm{k}}}{{\rm{n}}}_{{\rm{j}}}\cdot {({{\rm{X}}}_{{\rm{j}}}-{\rm{WM}})}^{2}/{\rm{N}}$$
(10)

In method II, %CV was determined through overall SD and WM using Equation 1163.

$$\% \mathrm{CV}=\frac{{\rm{Overall}}\,{\rm{SD}}}{{\rm{WM}}}\cdot 100=\frac{\sqrt{{\rm{Overall}}\,\mathrm{sum}\,\,{\rm{of}}\,{\rm{sqaures}}/{\rm{N}}}}{{\rm{WM}}}\cdot 100$$
(11)

where, the overall sum of squares was calculated considering standard deviation (SDj), Xj and nj of individual study j, and WM employing Equation 12.

$${\rm{Overall}}\,{\rm{sum}}\,{\rm{of}}\,{\rm{squares}}=\sum _{{\rm{j}}=1}^{{\rm{k}}}\,[({({{\rm{SD}}}_{{\rm{j}}})}^{2}+{({{\rm{X}}}_{{\rm{j}}})}^{2})\cdot {{\rm{n}}}_{{\rm{j}}}]-{\rm{N}}\cdot {({\rm{WM}})}^{2}$$
(12)

In method III, %CV was calculated as weighting by sample size using Equation 1313,14,15:

$$\% \mathrm{CV}=\frac{{\sum }_{{\rm{j}}=1}^{{\rm{k}}}{{\rm{n}}}_{{\rm{j}}}\cdot {{\rm{CV}}}_{{\rm{j}}}}{{\rm{N}}}\cdot 100$$
(13)

Lower and higher 95% CIs around the geometric weighted mean (WM) were calculated taking z value as 1.96 by using Equations 14 and 1564:

$${\rm{CI}}=\exp \,[\mathrm{ln}({\rm{WM}})\pm {\rm{z}}\cdot \frac{{\rm{\sigma }}}{\sqrt{{\rm{k}}}}]$$
(14)
$${\rm{\sigma }}=\sqrt{\mathrm{ln}\,[{(\frac{ \% \mathrm{CV}}{100})}^{2}+1]}$$
(15)

where, σ is the standard deviation of the data on the natural log scale.

### Development and validation of lamotrigine PBPK model and extension to special populations

#### Model development for healthy adult population

In the first step of PBPK model development for lamotrigine in GastroPlus, drug-specific physicochemical properties and system-specific input parameters were compiled, which are listed in Table 4.

The next step involved development of a whole-body PBPK model for a healthy adult of 30 years age and 70 kg weight. The adult physiology was created using Population Estimates for Age Related (PEAR) physiology module within the simulator. In particular, intravenous plasma clearance (CLIV, L/h) and steady-state volume of distribution (Vss, L) values were taken from reported PBPK model of lamotrigine, which was primarily focused on optimal profiling of lamotrigine formulations, drug disposition and drug-drug interactions65. All tissues were assumed to be perfusion-limited compartments. Only liver tissue was considered for metabolic clearance of the drug.

#### In vitro-in vivo extrapolation (IVIVE)

The third step was development of a mechanistic model for which CLIV and fraction of unchanged drug cleared through renal route (fCL,renal) data were used to estimate hepatic plasma clearance (CLH, 1.8 L/h) and CLR (CLR = fCL,renal × CLIV, 0.2 L/h). The net unbound intrinsic hepatic clearance (CLuint,H, L/h) was back-calculated from CLH by taking into account fup (0.45), and B:P (1) using “well-stirred” model66, as mentioned in Equation 16.

$${{\rm{C}}{\rm{L}}{\rm{u}}}_{{\rm{i}}{\rm{n}}{\rm{t}},{\rm{H}}}=\frac{{{\rm{Q}}}_{{\rm{H}},{\rm{B}}}\cdot {{\rm{C}}{\rm{L}}}_{{\rm{H}}}}{{{\rm{f}}{\rm{u}}}_{{\rm{p}}}\cdot ({{\rm{Q}}}_{{\rm{H}},{\rm{B}}}-{{\rm{C}}{\rm{L}}}_{{\rm{H}}}/{\rm{B}}:{\rm{P}})}$$
(16)

In vivo unbound intrinsic hepatic clearance for individual DME isoforms ($${{\rm{C}}{\rm{L}}{\rm{u}}}_{{{\rm{i}}{\rm{n}}{\rm{t}},{\rm{D}}{\rm{M}}{\rm{E}}}_{{\mathtt{j}}}},{\mathtt{L}}{\mathtt{/}}{\mathtt{h}}$$) was calculated using fraction metabolized by individual DME isoform ($${{\rm{f}}}_{{{\mathtt{m}},{\rm{D}}{\rm{M}}{\rm{E}}}_{{\mathtt{j}}}}$$), CLuint,H values and the fraction of drug cleared through hepatic metabolism (fCL,metabolism,H = 1 − fCL,renal) in accordance with Equation 17.

$${{\rm{C}}{\rm{L}}{\rm{u}}}_{{\rm{i}}{\rm{n}}{\rm{t}},{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}=\,\frac{{{\rm{f}}}_{{\rm{m}},{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}\cdot {{\rm{C}}{\rm{L}}{\rm{u}}}_{{\rm{i}}{\rm{n}}{\rm{t}},{\rm{H}}}}{{{\rm{f}}}_{{\rm{C}}{\rm{L}},{\rm{m}}{\rm{e}}{\rm{t}}{\rm{a}}{\rm{b}}{\rm{o}}{\rm{l}}{\rm{i}}{\rm{s}}{\rm{m}},{\rm{H}}}}$$
(17)

In vitro intrinsic hepatic clearance of individual DME isoforms (in vitro $${{\rm{C}}{\rm{L}}}_{{{\rm{i}}{\rm{n}}{\rm{t}},{\rm{D}}{\rm{M}}{\rm{E}}}_{{\mathtt{j}}}}$$, µL/min/mg protein) was calculated using Equation 18.

$${\rm{I}}{\rm{n}}\,{\rm{v}}{\rm{i}}{\rm{t}}{\rm{r}}{\rm{o}}\,{{\rm{C}}{\rm{L}}}_{{\rm{i}}{\rm{n}}{\rm{t}},{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}\,=\frac{{{\rm{C}}{\rm{L}}{\rm{u}}}_{{\rm{i}}{\rm{n}}{\rm{t}},{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}}{{\rm{M}}{\rm{P}}{\rm{P}}{\rm{G}}{\rm{L}}\cdot {\rm{L}}{\rm{i}}{\rm{v}}{\rm{e}}{\rm{r}}\,{\rm{w}}{\rm{e}}{\rm{i}}{\rm{g}}{\rm{h}}{\rm{t}}\cdot 60\cdot {10}^{-6}}$$
(18)

where, MPPGL (mg of microsomal protein per g of liver; default GastroPlus value 38), liver weight (default GastroPlus value 1637.7 g) and a factor of 60 × 10−6 was used for unit conversion.

Thereafter, $${{\rm{V}}}_{{max,{\rm{D}}{\rm{M}}{\rm{E}}}_{{\mathtt{j}}}}$$, which is maximum velocity of the kinetic reaction for individual DME isoforms (pmol/min/pmol isoform) was calculated using Equation 19.

$${{\rm{V}}}_{max,{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}=\frac{{\rm{I}}{\rm{n}}\,{\rm{v}}{\rm{i}}{\rm{t}}{\rm{r}}{\rm{o}}\,{{\rm{C}}{\rm{L}}}_{{\rm{i}}{\rm{n}}{\rm{t}},{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}\cdot {{\rm{K}}}_{{\rm{m}},{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}\cdot {{\rm{f}}{\rm{u}}}_{{\rm{m}}{\rm{i}}{\rm{c}}}}{{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}\,{\rm{a}}{\rm{b}}{\rm{u}}{\rm{n}}{\rm{d}}{\rm{a}}{\rm{n}}{\rm{c}}{\rm{e}}\cdot {{\rm{I}}{\rm{S}}{\rm{E}}{\rm{F}}}_{{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}}$$
(19)

where, $${{\mathtt{K}}}_{{{\mathtt{m}},{\rm{D}}{\rm{M}}{\rm{E}}}_{{\mathtt{j}}}}$$ is the in vitro Michaelis-Menten constant (µM), fumic is the unbound fraction in microsomes (default GastroPlus value 1.0), DMEabundance is the abundance of individual DME isoforms in liver (default GastroPlus value 7.6 and 7.9 pmol/mg protein for UGT1A3 and UGT1A4, respectively), $${{\rm{ISEF}}}_{{{\rm{DME}}}_{{\rm{j}}}}$$ is the inter-system extrapolation factor for individual DME isoforms (default GastroPlus value 1.0), as mentioned in Table 4.

To address inter-laboratory variability in specific protein abundance, we adjusted the $${{\mathtt{V}}}_{{max,{\mathtt{\text{DME}}}}_{{\mathtt{j}}}}$$ values for individual enzymes using Equation 20 and then the same were input in the Enzymes and Transporter module of GastroPlus.

$${\rm{A}}{\rm{d}}{\rm{j}}{\rm{u}}{\rm{s}}{\rm{t}}{\rm{e}}{\rm{d}}\,{{\rm{V}}}_{max,{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}{\mathtt{=}}{{\mathtt{V}}}_{max,{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}\cdot {{\rm{S}}{\rm{F}}}_{{{\rm{C}}{\rm{I}}}_{{\rm{j}}}}$$
(20)

where, SF is the scale factor for inter-laboratory variability of specific DME, which was calculated using Equation 21.

$${{\rm{SF}}}_{{{\rm{CI}}}_{{\rm{j}}}}=\frac{{\rm{Weighted}}\,{\rm{lower}}\,{\rm{or}}\,{\rm{higher}}\,95 \% \,{\rm{CI}}\,{{\rm{DME}}}_{{\rm{j}}}\,{\rm{abundance}}\,{\rm{in}}\,{\rm{adults}}}{{\rm{Weighted}}\,{\rm{mean}}\,{{\rm{DME}}}_{{\rm{j}}}\,{\rm{abundance}}\,{\rm{in}}\,{\rm{adults}}}$$
(21)

The oral absorption model was established considering absorption parameters involving intestinal permeability, solubility, diffusion coefficient and particle size in “Human-Fasted” gut physiology model (default GastroPlus value except for permeability and solubility) with the disposition parameters as optimized above. Intestinal metabolism was assumed negligible as the oral bioavailability of lamotrigine is 98%.

#### Model evaluation

In a further step, the predictive performance of the developed models was evaluated by comparing the simulated exposure parameters with literature-based observed clinical exposure parameters (Cmax and AUC), in accordance with the acceptance criteria suggested in the literature64. The lower and higher 99.998% CIs were calculated taking z value as 4.26 using Equations 14 and 15, and considering individual clinical PK data given in Supplementary Table S3.

#### Extrapolation of model to paediatric and HI population

The last step involved extrapolation of the validated adult PBPK model to predict PK in different paediatric populations, which was done using the PEAR Physiology module of GastroPlus. The models were developed for three age groups (representing average age and weight in each group), viz., early childhood (4 years, 17.34 kg), children (7 years, 26.54 kg) and middle childhood (9 years, 34.45 kg).

Although the protein abundance in adult, paediatric and HI population was considered similar in the software, to address protein abundance alteration, we adjusted the $${{\mathtt{V}}}_{{max,{\mathtt{\text{DME}}}}_{{\mathtt{j}}}}$$ values for individual enzymes using Equation 22 and then the same were input in the Enzymes and Transporter module of GastroPlus.

$${\rm{Adjusted}}\,{{\rm{V}}}_{{\rm{\max }},{{\rm{DME}}}_{{\rm{j}}}}={{\rm{V}}}_{{\rm{\max }},{{\rm{DME}}}_{{\rm{j}}}}\cdot {{\rm{SF}}}_{{{\rm{DME}}}_{{\rm{j}}}}\cdot {{\rm{SF}}}_{{\rm{MPPGL}}}$$
(22)

where, SF is the scale factor, which was calculated as altered abundance of enzyme and MPPGL due to effect of age and disease (Supplementary Table S4), by using Equations 23 and 24, respectively:

$${{\rm{S}}{\rm{F}}}_{{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}=\frac{\,{\rm{M}}{\rm{e}}{\rm{a}}{\rm{n}}\,{\rm{o}}{\rm{r}}\,95{\rm{ \% }}\,{\rm{C}}{\rm{I}}\,{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}\,{\rm{a}}{\rm{b}}{\rm{u}}{\rm{n}}{\rm{d}}{\rm{a}}{\rm{n}}{\rm{c}}{\rm{e}}\,{\rm{i}}{\rm{n}}\,{\rm{p}}{\rm{a}}{\rm{e}}{\rm{d}}{\rm{i}}{\rm{a}}{\rm{t}}{\rm{r}}{\rm{i}}{\rm{c}}/{\rm{H}}{\rm{I}}\,{\rm{p}}{\rm{o}}{\rm{p}}{\rm{u}}{\rm{l}}{\rm{a}}{\rm{t}}{\rm{i}}{\rm{o}}{\rm{n}}}{{\rm{M}}{\rm{e}}{\rm{a}}{\rm{n}}\,{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}\,{\rm{a}}{\rm{b}}{\rm{u}}{\rm{n}}{\rm{d}}{\rm{a}}{\rm{n}}{\rm{c}}{\rm{e}}\,{\rm{i}}{\rm{n}}\,{\rm{h}}{\rm{e}}{\rm{a}}{\rm{l}}{\rm{t}}{\rm{h}}{\rm{y}}\,{\rm{a}}{\rm{d}}{\rm{u}}{\rm{l}}{\rm{t}}{\rm{s}}}$$
(23)
$${{\rm{SF}}}_{{\rm{MPPGL}}}=\frac{{\rm{Mean}}\,{\rm{MPPGL}}\,{\rm{in}}\,{\rm{paediatric}}/{\rm{HI}}\,{\rm{population}}}{{\rm{Mean}}\,{\rm{MPPGL}}\,{\rm{in}}\,{\rm{healthy}}\,{\rm{adults}}}$$
(24)

The determinations were hence made for different age groups and HI population, especially for the abundance of UGT1A49,56, and the results were incorporated into the model in order to capture differential PK of lamotrigine in these populations. Our published protein abundance data of UGT1A4 enzyme in alcoholic and HCV cirrhotic livers9 were used to develop two separate SFs for each of these populations. Further, scaled $${{\rm{C}}{\rm{L}}{\rm{u}}}_{{{\rm{i}}{\rm{n}}{\rm{t}},{\mathtt{\text{DME}}}}_{{\mathtt{j}}}}$$ value was obtained through IVIVE using Equation 25:

$${\rm{S}}{\rm{c}}{\rm{a}}{\rm{l}}{\rm{e}}{\rm{d}}\,{{\rm{C}}{\rm{L}}{\rm{u}}}_{{\rm{i}}{\rm{n}}{\rm{t}},{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}=\frac{{\rm{A}}{\rm{d}}{\rm{j}}{\rm{u}}{\rm{s}}{\rm{t}}{\rm{e}}{\rm{d}}\,{{\rm{V}}}_{max,{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}\cdot {{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}\,{\rm{a}}{\rm{b}}{\rm{u}}{\rm{n}}{\rm{d}}{\rm{a}}{\rm{n}}{\rm{c}}{\rm{e}}\cdot {{\rm{I}}{\rm{S}}{\rm{E}}{\rm{F}}}_{{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}}{{{\rm{K}}}_{{\rm{m}},{{\rm{D}}{\rm{M}}{\rm{E}}}_{{\rm{j}}}}\cdot {{\rm{f}}{\rm{u}}}_{{\rm{m}}{\rm{i}}{\rm{c}}}}\cdot {\rm{M}}{\rm{P}}{\rm{P}}{\rm{G}}{\rm{L}}\cdot {\rm{L}}{\rm{i}}{\rm{v}}{\rm{e}}{\rm{r}}\,{\rm{w}}{\rm{e}}{\rm{i}}{\rm{g}}{\rm{h}}{\rm{t}}\cdot 60\cdot {10}^{-6}$$
(25)

Default GastroPlus liver weight values for each age group were input in the above mentioned Equation 25, which accounts for age-dependent change, viz., early childhood (592.12 g at 4 years), children (726.23 g at 7 years), middle childhood (906.24 g at 9 years). For Child-Pugh C class of HI population, liver weight taken was 867.97 g for 30 years age.

The renal plasma clearance in paediatric and HI group (CLR,paediatric/HI) were calculated67 using SFfu×GFR (Supplementary Table S4), as described by Equations 26 and 27:

$${{\rm{CL}}}_{{\rm{R}},{\rm{paediatric}}/{\rm{HI}}}={{\rm{SF}}}_{{\rm{fu}}\times {\rm{GFR}}}\cdot {{\rm{CL}}}_{{\rm{R}}}$$
(26)
$${{\rm{SF}}}_{{\rm{fu}}\times {\rm{GFR}}}=\frac{{{\rm{fu}}}_{{\rm{paediatric}}/{\rm{HI}}}\cdot {{\rm{GFR}}}_{{\rm{paediatric}}/{\rm{HI}}}}{{{\rm{fu}}}_{{\rm{adult}}}\cdot {{\rm{GFR}}}_{{\rm{adult}}}}$$
(27)

where, fupaediatric/HI and GFRpaediatric/HI values were obtained from GastroPlus.

The model was then simulated for oral formulations containing a dose of 2 mg/kg for early and middle childhood and children using in-built paediatric models. For all these paediatric populations, gastric emptying time (GET) value was set as 0.75 h68. The refined model was then used to predict PK in early and middle childhood and children populations, and the predictions were compared with the reported data using acceptance criteria as mentioned for the healthy adult population.

Similarly, the adult PBPK model was extrapolated to the diseased state model of HI population using PEAR Physiology module “Human, diseased, cirrhosis, Child-Pugh C”. The refined model was then used to predict PK in HI population and the predictions were compared with the reported observed data for the healthy adult population.