ZNF542P is a pseudogene associated with LDL response to simvastatin treatment

Statins are the most commonly prescribed cardiovascular disease drug, but their inter-individual efficacy varies considerably. Genetic factors uncovered to date have only explained a small proportion of variation in low-density lipoprotein cholesterol (LDLC) lowering. To identify novel markers and determinants of statin response, we used whole transcriptome sequence data collected from simvastatin and control incubated lymphoblastoid cell lines (LCLs) established from participants of the Cholesterol and Pharmacogenetics (CAP) simvastatin clinical trial. We looked for genes whose statin-induced expression changes were most different between LCLs derived from individuals with high versus low plasma LDLC statin response during the CAP trial. We created a classification model of 82 “signature” gene expression changes that distinguished high versus low LDLC statin response. One of the most differentially changing genes was zinc finger protein 542 pseudogene (ZNF542P), the signature gene with changes most correlated with statin-induced change in cellular cholesterol ester, an in vitro marker of statin response. ZNF542P knock-down in a human hepatoma cell line increased intracellular cholesterol ester levels upon simvastatin treatment. Together, these findings imply a role for ZNF542P in LDLC response to simvastatin and, importantly, highlight the potential significance of noncoding RNAs as a contributing factor to variation in drug response.

the change in gene expression after simvastatin treatment using RNA-seq data from the tails of the European American LDL response distribution, and testing for cross-ancestry replication in the tails of the African American LCL response distribution. Here we report identification of 82 "signature" genes whose statin-induced expression changes differed between the tails of the LDLC response distribution. From this analysis, zinc finger protein 542 pseudogene (ZNF542P) emerged as a novel candidate gene implicated in LDL simvastatin response.

Identification of signature genes in cell lines from European American and African American donors.
Our goals were to: (i) to identify genes whose simvastatin-induced change in expression levels differed between LCLs derived from high and low LDLC simvastatin responders ( Supplementary Fig. S1), and (ii) validate the utility of signature genes in differentiating high and low responders in classification models. For this analysis, we first identified signature genes from LCLs established from the tails of the European American LDLC response distribution, and then we tested for replication in the extremes of the African American LDLC response distribution. The clinical characteristics of these populations are shown in Table 1 and Supplementary Table S1.
Gene expression measurements can be strongly influenced by experimental and other confounders that may impede the detection of the desired transcriptomic signal (namely genetically regulated expression differences underlying inter-individual variation in LDLC statin response). To minimize the effects of these confounders, we used a principal components (PCs) based correction method (details in Materials and Methods), where each PC served as a proxy for unknown sources of variation in the data 12,13 . RNA-seq expression changes from both European American and African American LCLs were adjusted by progressively increasing from the 1 st to the 25 th PC to generate 25 datasets. Next, each dataset was split by ancestry (European American or African American), and 25 each of high and low responders from the European American population ( Supplementary Fig. S1a) were used to identify differentially changing genes (signature genes) by empirical Bayes moderated t-statistics. Using the expression changes of the identified European American signature genes that were identified as predictors in radial-basis SVM classification models, we then trained and predicted their ability to distinguish 12 high and 14 low African American responders ( Supplementary Fig. S1b) whose LDLC statin response was as extreme as the European American subset. A comparison of the prediction performance of the top 100 signature genes demonstrated that, of the different PC corrections, the dataset that corrected for 15 PCs performed best. Notably, a receiver operating characteristic (ROC) analysis using this dataset yielded an area under the curve (AUC) of 0.82 (Fig. 1a).
To evaluate the effects of PC correction, we obtained the first 20 PCs from the 5, 10 and 15 PC corrected datasets, and we tested for correlations with several potential non-genetic covariates ( Supplementary Fig. S2). With non-corrected data, most of the covariates showed significant associations with several major PCs. However, these associations became less significant as the number of adjusted PCs increased and were negligible after correction with 15 PCs.
Next, we sought to further refine the model by testing if varying the number of signature genes affected the prediction performance. Using the 15 PC corrected data, we again tested the prediction performance of the European American signature genes in the African American subset, testing varying numbers of the most differentially changing genes. From this analysis, we found that a dataset including the top 80 most differentially changing genes outperformed (AUC = 0.86) other gene sets with smaller or larger numbers of genes (Fig. 1b). To further refine the number of signature genes, we tested additional signature gene sets from 70 to 90 signature genes with a step size of 1 and observed the maximum AUC of 0.88 with 82 signature genes (Supplementary Fig. S3 from individuals with low LDLC simvastatin response. Consistent with this hypothesis, we found that in vitro simvastatin exposure reduced cellular cholesterol ester levels in LCLs from the high responders, whereas there was either no change or slightly increased cellular cholesterol ester in LCLs from the low responders (Fig. 2a). There was no difference in simvastatin-induced change in free cellular cholesterol levels between high and low responders (Fig. 2b), a finding which was not unexpected given the fact that free cholesterol levels are tightly regulated to maintain homeostasis.
Identification of ZNF542P as a candidate gene. Next, we tested if simvastatin-induced expression differences of any the 82 signature genes were associated with variation in simvastatin-induced change in cellular cholesterol ester. From this analysis, we found that ZNF542P was the only signature gene whose change was significantly correlated with cholesterol ester change after correction for multiple testing (Supplementary  Table S3  As described above, ZNF542P was among the signature genes whose expression in response to simvastatin showed the greatest difference between the European American high and low responders. This was due to reduced expression levels in the low responders, with no effect in the high responders (Fig. 3b).

ZNF542P knock-down increases intracellular cholesterol ester upon statin treatment. Since
ZNF542P has no known function, we tested if knock-down of ZNF542P altered intracellular cholesterol levels.
Using an siRNA targeting ZNF542P, we achieved > 80% knock-down in the Huh7 human hepatoma cell line, Supplementary Figure S4. Under endogenous conditions, we observed a non-statistically significant trend of increased cellular cholesterol ester upon ZNF542P knock-down. Importantly, this increase became statistically significant when the cells were exposed to increasing concentrations of simvastatin, with a dose dependent effect observed (Fig. 4). In cells incubated with 5 μM simvastatin, ZNF542P knock-down increased cholesterol ester (2.7 ± 0.48 fold mean ± s.e.m., p = 0.007), in contrast with the reduced cellular cholesterol ester observed in the NTC siRNA treated cells (0.60 ± 0.22 fold mean ± s.e.m., p = 0.03). ZNF542P knock-down did not alter intracellular total or free cholesterol under simvastatin or control treated conditions. In addition, ZNF542P knock-down did not alter transcript levels of genes we tested involved in cholesterol synthesis (HMGCR, MVK and HMGCS1) or uptake (LDLR) in either simvastatin or control treated conditions, Supplementary Figure S4.

Discussion
In 2013 the American Heart Association and American College of Cardiology released expanded guidelines for statin prescription, dramatically increasing the number of potential statin users 14,15 . Since statins are not uniformly effective in preventing cardiovascular disease 14 , and the incidence of adverse effects such as myopathy and new-onset diabetes may be more prevalent than originally thought [16][17][18] , identification of predictors of statin efficacy has become increasingly more important from a public health perspective. To obtain a more  comprehensive understanding of genetic markers contributing to variation in simvastatin LDLC response, we applied a radial-basis SVM classification model to whole transcriptome sequence data and identified a set of 82 signature genes whose simvastatin-induced change in expression levels could be used to differentiate between individuals with "high" vs. "low" LDLC response to statin treatment. ZNF542P emerged as an interesting novel candidate gene based on two factors: (1) it was one of the signature genes whose expression levels were most differentially changed between the high and low LDLC simvastatin responders, and (2) its simvastatin-induced expression difference was highly correlated to change in cellular cholesterol ester content.
Here, we report that changes in expression of the putative pseudogene ZNF542P in simvastatin-exposed LCLs were significantly associated with both changes in cellular cholesterol ester content and with in vivo statin-induced changes in plasma LDLC and total cholesterol in the individuals from whom the LCLs were derived. Notably, these relationships were consistent with each other, as greater simvastatin-induced reductions in ZNF542P were correlated with smaller reductions in both cellular cholesterol ester content and plasma LDL cholesterol. Furthermore, these results are consistent with our findings that ZNF542P knock-down increased cholesterol ester levels upon simvastatin treatment. Cholesterol ester is a storage form of cholesterol that is found primarily within lipid droplets. The lack of effect of ZNF542P knock-down on cellular unesterified cholesterol content is not surprising given that this form of cholesterol is located primarily in cell membranes and is subject to tight homeostatic regulation.  To date, there have been no reports regarding the function of ZNF542P. Pseudogenes were once thought to be inactive gene sequences, evolutionary remnants from past gene duplication events. However, more recently, several reports have found that long non-coding RNAs (lncRNAs) transcribed from pseudogenes have functional effects on a variety of cellular processes, and they often regulate their protein-coding counterparts 19 . For example, PTENpg1, a PTEN pseudogene, regulates PTEN transcript levels by preventing miRNAs from targeting the PTEN transcript 20 . Flanked by ZSCAN5A (zinc finger and SCAN domain containing 5A) and ZNF582 (zinc finger protein 582), ZNF542P lies within a cluster of several ZNF genes that have been implicated in gene transcription. However, further study is necessary to determine the mechanism by which ZNF542P impacts intracellular cholesterol ester in the context of simvastatin treatment.
Although our findings implicate ZNF542P as a novel modulator of LDLC response to simvastatin treatment, other genes identified within the set of 82 signature genes also likely contribute to response. For example, the signature gene set contains solute carrier organic anion transporter family member 2B1 (SLCO2B1), a known statin transporter 21 . A SNP in SLCO2B1 has been implicated in simvastatin pharmacokinetics 22 . Thus, the relationship between the signature gene set expression changes and statin response may be driven by both pharmacodynamic and pharmacokinetic mechanisms.
In summary, we here identify a set of 82 signature genes whose simvastatin-induced change in expression levels distinguish high versus low LDLC statin responders in both European American and African American populations. This gene set contains 13 noncoding RNAs, with ZNF542P specifically emerging as a novel candidate gene implicated in cholesterol metabolism and simvastatin response. A recent comprehensive survey of the human transcriptome identified over 91,000 expressed genes, of which 68% are classified as lncRNAs 23 . Our findings highlight the need to further our understanding of the roles of these genes in mediating cellular processes well as determinants of drug response.

Methods
Participants and clinical measures. The Cholesterol and Pharmacogenetics (CAP) clinical trial was comprised of 944 participants (609 self-identified whites and 335 self-identified blacks) treated with 40 mg/day simvastatin for 6 weeks (Clinical Trials.gov identifier NCT00451828) 1 . Plasma lipid measures were quantified twice before statin treatment, and after 4 and 6 weeks on-treatment, with delta LDL calculated as the difference of the average pre-treatment vs. the average post-treatment values. All experimental protocols were approved by the institutional review boards at Children's Hospital Oakland Research Institute and at UCLA and UCSF, where the clinical trial was performed. Lymphoblastoid cell lines (LCLs) were generated from each study participant as previously described 24 . Informed consent for participation in the statin clinical trial and the use of cell lines was obtained from each study subject. All methods were performed in accordance with the relevant guidelines and regulations.
RNA-seq data generation and analysis. LCL lines from CAP participants were exposed to 2 µM activated simvastatin (provided by Merck Inc., Whitehouse Station, NJ) or control buffer for 24 hours and total RNA was extracted as previously described 10 . Indexed, strand-specific, paired-end Illumina sequencing libraries were prepared by LabCorp (formerly Covance, Seattle, WA) as previously described 10 . Sequences were aligned using TopHat2 25 and adjusted for library size and variance stabilized using DESeq2 26 as previously described 10 . Quality control checks were performed as previously described 10 , except that 7 samples in experimental batch 1 were included here but excluded previously.
RNA-seq expression normalization. Differential RNA-seq expression was determined by subtracting the control-treated variance stabilized data from the statin-treated variance stabilized data and the resulting expression changes for each gene were quantile normalized and adjusted for experiment batch 1 using regression. To adjust expression differences for unmeasured confounders, principal component analysis (PCA) was adopted as previously described 12,13 . Among the PCs of a covariance matrix between samples sorted by the proportion of explained variation in the original matrix, up to 25 PCs were selected such that adding another PC would explain less than 0.5% of the variation. PCs 1 through 25 were progressively regressed out, and the residuals from each regression were quantile normalized and used as the change in expression level of each gene.
Identifying signature genes. To identify signature genes from 25 each of high and low European American responders, we used empirical Bayes moderated t-statistics as in Kim et al. 7 . Starting from a relative difference d(i) which is defined as Cross-ancestry prediction using SVM based classification. Radial-basis SVMs were used for training and predicting 12 high and 14 low African American responders in the SVM classification models. The performance of the models was evaluated by randomly splitting the data into 10 sets, with 9 assigned as the training set and the tenth as the testing set. The model was trained using the training set and applied to the testing set for prediction. This process was repeated 5000 times and the prediction power of the model was estimated based on the 5000 testing sets. The SVM function in the R package (kernlab) was used to implement the models with Scientific REPORTS | (2018) 8:12443 | DOI:10.1038/s41598-018-30859-y default parameter settings 27 . For the SVM, the radial basis kernel was chosen due to its superior performance in the cross-validation results. The prediction performance was evaluated by ROC curve analysis and quantitated by AUC using the ROCR package in R 28 . in vivo and in vitro association analysis. To measure statin-induced changes of in vivo clinical phenotypes for correlation analyses, delta log measures were calculated as the log (average value of each phenotype on treatment) minus the log (average of value of each phenotype two pre-treatment). The distribution of the plasma LDLC change adjusted for age, race and smoking status is shown in Supplementary Figure S1. Plasma HDLC change was adjusted for race.
For in vitro cholesterol measurements, lipids were extracted from the CAP LCLs with hexane and isopropyl alcohol (3:2, v/v), and dried under nitrogen. Intracellular total cholesterol and free cholesterol levels were quantified using the Amplex Red Cholesterol Assay Kit (Life Technologies) following manufacturer's instructions, and normalized to total cellular protein content. For measurement of cholesterol esters, extracted lipids were incubated with esterase up to 2hrs, and cholesterol ester was calculated as difference of the total minus free cholesterol. The change in cholesterol ester was calculated as the delta log of the cholesterol ester in the statin minus control treated cells.
All in vivo and in vitro phenotypes were tested for association with the ZNF542P expression changes using Spearman rank correlation in R.
Functional studies of ZNF542P. Huh7 cells grown in MEM with 10% FBS were reverse transfected with a Silence Select siRNA (Life Technologies) targeting ZNF542P (n258919) or a non-targeting control (NTC, assay number AM6411, Life Technologies) using the siPORT transfection reagent as previously described 29 . After 24 hrs, cell culture media was replaced with media supplemented with either 2 μM activated simvastatin or control buffer. Simvastatin was kindly provided by Merck. RNA was extracted using Qiashredders (Qiagen) and the PureLink RNA Mini Kit (Life Technologies), and cDNA was synthesized using the cDNA Archive Kit (Life Technologies). ZNF542P values were quantified by a TaqMan assay (n258919_asy, Life Technologies) and normalized to CLTPM as a loading control. HMGCR, HMGCS, MVK and LDLR transcript levels were quantified as previously described 30 . All qPCR reactions were performed in triplicate. Intracellular total cholesterol, free cholesterol, and cholesterol ester were quantified as above. All cultures were verified to be mycoplasma free using the MycoSensor qPCR Assay Kit (Agilent).
One-way ANOVA was used to identify statistically significant effects of ZNF542P knock-down on levels of cellular cholesterol and transcripts. For ANOVA p < 0.05, statistically significantly differences between treatment conditions were identified using Tukey's multiple comparisons test with adjusted p-values reported.
Data Availability. RNA-seq and clinical phenotype data used in this analysis are available from dbGaP (phs000481.v2.p1). All other datasets analyzed during the current study are available from the corresponding author on reasonable request.