A Circulating MicroRNA Signature Capable of Assessing the Risk of Hepatocellular Carcinoma in Cirrhotic Patients

With the availability of potent antiviral therapies, complete suppression of hepatitis B virus (HBV) replication and total eradication of hepatitis C virus (HCV) can now be achieved. Despite these advances, hepatocellular carcinoma (HCC) still develops in a substantial proportion of cirrhotic patients, suggesting that host factors remain critical. Dysregulation of miRNAs is noted in many cancers, and circulating miRNAs can be readily assayed. In this study, we aimed to develop a circulating miRNA signature to assess the risk of HCC in cirrhotic patients. We first discovered that HBV- and HCV-related cirrhotic patients had distinguishable circulating miRNA profiles. A cohort of 330 cirrhotic patients was then compared against a cohort of 42 early HCC patients with complete remission. A score comprising 5 miRNAs and a binary etiology variable was established that was capable of differentiating between these two groups (AUC = 72.5%, P < 0.001). The 330 cirrhotic patients were further stratified into high- and low-risk groups, and all patients were longitudinally followed for 752 (11–891) days. Of them, 19 patients developed HCC. The high-risk group had significantly higher cumulative HCC incidence (P = 0.038). In summary, a circulating miRNA-based score was developed that is capable of assessing HCC risks in cirrhotic patients.

Liver cirrhosis is a major sequel of chronic hepatitis in patients suffering from a prolonged period of persistent necroinflammation. Once progressing to liver cirrhosis, patients are at high risk of liver function decompensation and hepatocellular carcinoma (HCC). Because hepatitis B virus (HBV) and hepatitis C virus (HCV) are the most common etiologies for chronic hepatitis, these two viruses are the most important causes of liver cirrhosis, functional decompensation and HCC. Several predictive models have been built to estimate the risk of HCC in patients with chronic HBV or HCV infection. When applying these scoring systems, two important aspects should be carefully evaluated. First, one must identify the clinical stage of hepatitis in which these models have been built. Second, one must distinguish whether the models were established before or after the era of effective antiviral therapy.
Several virological predictors are associated with HCC risk in chronic hepatitis B 1, 2 . Accordingly, risk scores were proposed by research groups from different Asian areas [3][4][5][6] . However, because the patients' data were collected before antiviral treatment was available, the effect of antiviral treatment was not considered. Additionally, cirrhotic patients were either excluded in the baseline or included as a small proportion of the subjects. A subsequent study attempting to understand the accuracy of these scoring systems in patients receiving antiviral treatment discovered that the independent predictors for HCC include only older age, liver cirrhosis and virological
The classification of cirrhosis and HCC patients based on their miR-15a levels was moderately successful, with an area under the receiver operating characteristic curve (AUC) of 64.1% (P = 0.003, Fig. 2A). However, when we analyzed the time-to-HCC development in cirrhotic patients after longitudinal follow-up (N = 330), the patient strata of high and low miR-15a levels (each strata N = 165) did not exhibit a significantly different cumulative incidence of HCC (P = 0.257, Fig. 2B).
We then incorporated all of the 12 miRNAs that exhibited significant differences in the univariate analysis into a logistic regression model for classifying cirrhotic and HCC patients. An AUC of 68.8% was achieved (P < 0.001, Fig. 2C). However, when the cirrhosis patients were stratified into the high-risk and low-risk groups by their estimated HCC risks in the logistic regression model (each group N = 165), no significant difference in the cumulative incidence of HCC was identified between the two groups (P = 0.261, Fig. 2D).
Scientific RepoRts | 7: 523 | DOI:10.1038/s41598-017-00631-9 To formulate a miRNA profile to distinguish between HBV and HCV with/without HBV infection, the subjects were then randomly divided into training (n = 220; including 61 HCV, 146 HBV and 13 co-infected) and validation (n = 110; including 32 HCV, 74 HBV and 4 co-infected) subsets. No significant difference was observed between the miRNA levels in the two subsets (Table S1). In the training subset, a total of 7 miRNAs, including miR-21, miR-30c, let-7g, miR-15a, miR-122, miR-221 and miR-30b, reached P < 0.1 for the classification of etiology. Among them, 3 miRNAs had P < 0.05 (Table 4). The levels of seven miRNAs with P < 0.1 were then analyzed using the generalized iterative modeling (GIM) algorithm (see Methods) to formulate a model. Six out of the 7 miRNAs were chosen by the algorithm, and an etiology score was defined as follows: Etiology score miR 21 0 4724 miR 30c 2 7896 let 7g 0 6248 miR 15a miR 122 0 3175 miR 30b 3 1162 1 6562 In the training subset, the etiology model could classify patients with distinct etiology, achieving an AUC of 61.1% and a significance level of 0.007 (Supplementary Figure S1). The constant term of the model equation was calibrated so that the optimum cut-off, which was determined by Youden's J statistic, occurred at a score of  0. Thus, a positive value of the etiology score indicated HCV-related cirrhosis (including coinfection), whereas a negative value indicated HBV-related cirrhosis. For prediction of HCV-related cirrhosis (including coinfection), the sensitivity was 67.57%, the specificity was 54.79%, the positive predictive value was 43.10%, and the negative predictive value was 76.92%. When the etiology model was tested in the validation subset, the score distributions of the HBV and HCV + co-infection groups remained significantly different (Mann-Whitney P = 0.017) (Fig. 3). The sensitivity was 61.11%, the specificity was 60.81%, the positive predictive value was 43.14%, and the negative predictive value was 76.27%.
A miRNA signature for the prediction of subsequent HCC occurrence in cirrhotic patients who had no HCC at baseline. We then compared the miRNA levels in patients with (n = 42) and without (n = 330) HCC at baseline. A total of 16 miRNAs manifested significantly different levels in the univariate analysis, where the classification performance was assessed by the AUC and the Mann-Whitney U-statistics (Table 5). Based on the miRNA profiles differing significantly in chronic hepatitis B and C, a binary etiology variable "HCV positive" together with the 16 miRNAs was incorporated into the multivariate analysis by the GIM algorithm. If a patient was anti-HCV antibody positive, he/she was either HCV monoinfected or HCV-HBV co-infected, and the value of the variable was 1. In contrast, if a patient was anti-HCV antibody negative, the value of the variable was 0. An HCC risk score was then generated for the optimal distinction between HCC and non-HCC cirrhotic patients (AUC = 72.5%, P < 0.001, Fig. 2E).   By introducing the "HCV positive" variable, this model could be applied to both HBV-and HCV-related cirrhotic patients. Using the optimum cut-off determined by Youden's J statistics to predict HCC occurrence, the sensitivity was 66.67%, the specificity was 77.88%, the positive predictive value was 27.72%, and the negative predictive value was 94.83%. The constant term of the model equation was calibrated to enable a zero median value in the cirrhosis patient group (N = 330).

HCC Risk
Finally, because all 330 cirrhotic patients had been longitudinally followed, the initial HCC risk score could be calculated at the baseline when patients were enrolled. During the follow-up of 752 (11-891) days, 19 patients developed HCC. Comparing the distributions of HCC risk scores of those who did or did not develop HCC, a borderline significance level was obtained (P = 0.070, unpaired t-test with unequal variance). However, this analysis was a cross-sectional case/control analysis where the time information of HCC occurrence was not used. Therefore, we further analyzed the time-to-HCC development with respect to the patient strata by the baseline HCC risk scores. The patients were divided into the high-risk and low-risk groups (each N = 165). In total, 14 HCC development events were noted in the high-risk group, and 5 events were noted in the low-risk group. Our results indicated that the high-risk group exhibited significantly reduced time-to-HCC-development compared with the low-risk group (P = 0.038, Fig. 2F). The average times to HCC are 839 ± 14 days and 860 ± 8 days in the high-risk and low-risk groups, respectively.  Table 3. Different circulating miRNA levels were observed in liver cirrhotic patients with different viral etiology. Particularly, many miRNA have significant difference between "HBV monoinfection" and "HCV + coinfection" patients.   Table 4. Univariate analysis of associations between miRNA levels and viral etiology. A total of 7 miRNAs reached a significance level of 0.1 (underscored). Among them, 3 miRNA has P < 0.05 (shown in bold face).
Scientific RepoRts | 7: 523 | DOI:10.1038/s41598-017-00631-9 As a benchmark, the same set of 16 miRNAs and the HCVpositive variable were jointly analyzed by the support vector machine (SVM) algorithm for the classification of the HCC and non-HCC groups. The resulting AUC was 68.3% (P < 0.001, Supplementary Figure S2A). In addition, the high-risk and low-risk patient strata by the SVM score (each strata N = 165) did not exhibit a significant difference in cumulative HCC incidence (P = 0.227, Supplementary Figure S2B). This benchmark showed that GIM HCC Risk Score outperformed the SVM score both in cross-sectional classification (AUC = 72.5% > 68.3%) and longitudinal analysis (P = 0.038 < 0.227). Improvement in the clinical-factor-based prediction model by incorporation of the miRNA score. Finally, we evaluated the well-established R.E.V.E.A.L. HCC model that was effective in predicting HCC risk in non-cirrhotic, treatment-naïve, chronic hepatitis B patients 3,5 . We employed the same risk score assignment of three host variables: age (score increased 1 for every 5-year increment of age, starting from the minimum age of the cohort: 29), gender (male: score = 2, female: score = 0), and ALT level (≥45: score = 2; between 15 and 45: score = 1; <15: score = 0). The virological variables were not evaluated because our cirrhotic patients  Table 5. Univariate analysis of miRNA levels in association with HCC using the receiver operating characteristic curves. A total of 16 miRNAs has P < 0.05 (shown in bold face).
Scientific RepoRts | 7: 523 | DOI:10.1038/s41598-017-00631-9 included both HBV-and HCV-infected patients. Additionally, all HBV patients received antiviral treatment, if needed, to suppress HBV-DNA to an undetectable level. This HCC risk score was a discrete score with integer values ranging from 0 to 14 (Supplementary Figure S3A). The median value was 8, which was used for patient stratification in the same manner as described in previous analyses. Two different cutoffs of patient stratification were evaluated. Comparing patients with score > 8 and ≤ 8, no significant difference in the cumulative incidence of HCC was found (log-rank P = 0.116, Supplementary Figure S3B). Alternatively, when patients were stratified using scores ≥ 8 and < 8, the high-risk and low-risk groups demonstrated different cumulative incidences of HCC (log-rank P = 0.018, Supplementary Figure S3C). Of the 191 patients identified as high risk, 16 (8.38%) developed HCC in 2 years. To evaluate whether the simplified R.E.V.E.A.L. score and the miRNA HCC score were confounding variables with respect to HCC occurrence, we performed a multivariate logistic regression analysis on the two scores. Statistical significance was found in both the simplified R.E.V.E.A.L score (adjusted Wald-test P = 0.005) and the miRNA HCC score (adjusted P = 0.002), suggesting that they were independently associated with HCC occurrence (Supplementary Table S2).
The regression formula can also be used for the calculation of a combined score: . ⋅ + . ⋅ − . 1 201 (miRNA HCC score) 0 217 (simplified REVEAL score) 3 892 (3) The AUC of the combined score is 73.8%, a significant improvement from the simplified R.E.V.E.A.L. score (AUC = 66.4%, P = 0.034, Supplementary Figure S4A). In contrast, no significant difference was observed between the combined score and the miRNA score (AUC = 72.5%, P = 0.657, Supplementary Figure S4B). When the combined score was used for patient stratification, a significant difference in the cumulative incidence of HCC was observed between the high-and low-risk groups (P = 0.001, Supplementary Figure S4C).
We also explored the stratifications of patients using both the miRNA model and the simplified R.E.V.E.A.L. model (scores ≥ 8 versus < 8). Four distinct curves of the cumulative incidence of HCC were observed (log-rank P = 0.011, Supplementary Figure S5A). Furthermore, patients identified as the highest risk (in the high-risk group of both models; N = 98, 29.7% of all patients) manifested a distinct incidence curve from the other three groups (N = 232) (log-rank P = 0.001, Supplementary Figure S5B). Of these 98 patients, 12 (12.24%) developed HCC in 2 years. The combined model outperformed the simplified R.E.V.E.A.L. model as well as the miRNA-only model, which identified 165 patients as high risk, and 14 (8.48%) patients subsequently developed HCC.

Discussion
Few predictive models for HCC occurrence are available for HBV-or HCV-related cirrhotic patients 3, 14, 28 . These models, however, included patients who had not received antiviral treatment (for HBV) or patients who had received interferon-based treatment (for HCV). Under these circumstances, virological factors are the key predictors. However, when the HBV models were validated in entecavir-treated patients, the only independent virological predictor was virological relapse 7 . As tenofovir was available globally, virological relapse could now be prevented in almost all HBV patients. However, with more effective direct antiviral agents for HCV, almost all HCV patients could be virologically cured. Thus, virological factors might not be included in the future prediction models in antiviral-treated patients. Despite effective antiviral treatment, HCC still developed in a substantial proportion of cirrhotic patients. It is therefore critical to develop HCC prediction models for this group of patients with or without antiviral treatments.
Instead of including virological factors, miRNAs were selected as candidate predictors in the study. Numerous miRNAs were aberrantly expressed in HCC 29 , and circulating miRNAs are readily detectable in serum or plasma 30 at quantifiable levels by qPCR 31 . Given that HBV and HCV were the two most important etiologies for cirrhosis in our population, we first studied whether the circulating miRNA profiles were different between these two etiologies. The results showed that HBV-and HCV-related cirrhosis could be distinguished by specific miRNA profiling, suggesting that during the long courses of chronic hepatitis, HBV and HCV evoked different sets of miRNAs in the liver, both resulting in liver cirrhosis. Accordingly, a logistic variable was included in the prediction model for HCC risk assessment; therefore, this model could be used for both HBV-and HCV-related cirrhotic patients.
In this study, the model was built on 330 cirrhotic patients with no HCC developed at baseline. During the subsequent follow-up, some of these patients developed HCCs. To build a more accurate model, these HCC patients should be incorporated into the HCC group (n = 42). However, in the present study, we intended to perform a validation test using this cohort of patients for the established equation. Thus, these would-be HCC patients were included in the 330-patient cohort. Although the validation was successful, this was not a truly prospective study. An authentic prospective validation study should be conducted for a final conclusion.
To our knowledge, this is the first circulating miRNA-based model for HCC risk prediction in cirrhosis patients. Our studies provided supporting evidence for two interesting concepts. First, HBV and HCV evoked differential miRNA dysregulation during the long courses of chronic hepatitis toward liver cirrhosis. Second, dysregulation of miRNAs may have occurred prior to the development of HCC, and the baseline miRNA levels might be used for identifying high HCC-risk patients among liver cirrhotic patients. Combined with the conventional clinical predictors, including age, gender and baseline ALT levels, a subgroup of cirrhotic patients (~30%) was identified with a particularly high risk of HCC compared with other cirrhotic patients (P = 0.001).

Materials and Methods
Patients. This study was approved by the Institutional Review Board of Chang Gung Memorial Hospital, Taiwan (IRB No. 103-5039 C). Written informed consent was obtained from all patients, and the study was conducted in accordance with the Guidelines for Good Clinical Practice and the applicable laws and regulations. A total of 372 HBV-and/or HCV-related cirrhotic patients from three branches (Keelung, Linkou, and Kaohsiung Scientific RepoRts | 7: 523 | DOI:10.1038/s41598-017-00631-9 Branches; located at the northern, central-northern, and southern parts of Taiwan, respectively) of Chang Gung Memorial Hospital were enrolled. All of the patients provided informed consent. Among them, 330 patients had liver cirrhosis but did not develop HCC at the time when patients were recruited (the liver cirrhosis group), whereas 42 patients were diagnosed as early HCC at the Barcelona Clinic of the Liver Cancer Stage A (the HCC group). These HCC patients were treated by either surgical removal or radiofrequency ablation and were under complete remission when recruited. Plasma samples were collected from these subjects for analysis of 28 circulating miRNAs, which were obtained from a literature search. The liver cirrhosis group was further divided into the training and validation subsets by a 2:1 randomization for evaluation of miRNA profiles capable of distinguishing viral etiologies (Fig. 1). All 330 cirrhotic patients were prospectively followed until development of HCC or the final date of follow-up on 2015/07/23, whichever came first. Patients who did not develop HCC by the end of the follow-up were considered right-censored data in the time-to-HCC analysis. The median follow-up period was 752 (11-891) days.
HCC was diagnosed by cytology or liver biopsy. Liver cirrhosis was diagnosed by either liver biopsy or ultrasound characteristics (coarse parenchyma and uneven surface) plus at least one of the following: (i) endoscopy visualization of esophageal varices, (ii) fibroscan value > 12kPa, or (iii) aspartate transaminase (AST) to platelet ratio index > 1.
No HCV-related cirrhotic patient received antiviral treatment at the time of, or after, enrollment. All HBV-related cirrhotic patients had a serum HBV-DNA level < 500 IU/mL. In 34 patients who had HBV-DNA level > 2000 IU/mL before enrollment, life-long antiviral treatment was provided, so that when included, the HBV-DNA levels were < 500 IU/mL.

RNA Extraction.
To avoid miRNA degradation, 250 μL of plasma sample was mixed with 750 μL of TRIzol LS reagent (Thermo Fisher Scientific, Wilmington, DE, USA) immediately after centrifuge separation from blood cells. The RNA-containing mixture was transferred to a prepared PLG Heavy tube (BIOTOOLS, New Taipei City, Taiwan) for RNA extraction following the procedure provided by the manufacturer.

MicroRNA Detection.
A stem-loop RT-qPCR method was performed as described in our previous report 27 .
Briefly, 10 μl RT reaction mixture containing miRNA-specific stem-loop RT primers (final concentration, 2 nM each), 500 μM dNTP, 0.5 μl MMLV HP RT EPICENTRE Biotechnologies, Madison, WI), 0.5 μl RNaseOut (Invitrogen), and 80 ng total RNA was used for the RT reaction performed at 16 °C for 30 min, followed by 50 cycles of reaction at 20 °C for 30 s, 42 °C for 30 s, and 50 °C for 1 s. The RT products were diluted 8-fold before qPCR. Next, 0.5 μl of diluted RT product was used as a template in a 6-μl PCR reaction mixture that contained 1× SYBR Master Mix (Applied Biosystem, Foster City, CA), 200 nM miRNA-specific forward primer, and 200 nM universal reverse primers. The conditions for qPCR were 95 °C for 10 min, followed by 40 cycles of reaction at 95 °C for 15 s and 63 °C for 32 s. ABI 7900HT Fast Real-Time PCR system (Foster City, CA) was used for qPCR reactions. ABI 7900HT SDS 2.3 software was used to calculate the threshold cycle (Ct) and relative quantification. The ΔCt method was used to calculate expression levels normalized against U6. The miRNA expression level was calculated as POWER(2, ΔCt) × 10 6 .
Statistical Analysis. Cross-sectional clinical variables, such as etiologies or logarithmic transformed miRNA levels, were evaluated using the area under the receiver operating characteristic curves (AUC), which were estimated by non-parametric empirical calculations using the SPSS statistical software version 21 (IBM, New York City, NY). The significance levels were evaluated using Mann-Whitney U statistics 32 . Longitudinal analysis of time to HCC was performed by the Kaplan-Meier method. Statistical significance was evaluated using the non-parametric log-rank test. Classification by the support vector machine was performed with the radial basis function kernel, using the svm() function of the R statistical scripting language. Differences of AUCs of two correlated, empirical ROC curves were evaluated by a bootstrap test with 2000 times of re-sampling, using the pROC package 32 of the R runtime environment.

Generalized Iterative Modeling.
A multivariate modeling method, the generalized iterative modeling (GIM) method, was used to produce an algebraic biosignature model (M) for the optimum clinical classification in terms of the maximum AUC: M where the AUC of M equates to the non-parametric Mann-Whitney U-statistics, normalized by the numbers of patients in two distinct clinical classes, n1 and n2 33 : Mann Whitney U statistics n n AUC ( ) 1 2 (5) GIM is a generalization of previously published algorithms, GABA and HABA, which were specifically designed for analyzing discrete genomic information and therefore were restricted to Boolean algebra 34,35 . The generalized algorithm can currently incorporate both continuous clinical variables and discrete genomic variables altogether. Briefly, candidate biosignature models were produced by joining randomly-selected clinical parameters with three basic algebraic operations, addition (+), subtraction (−) and multiplication (·). These models were then sculptured progressively to generate new models by the following computational operations: coefficient adjustment, adding or removing clinical variables, changing the algebraic operators between variables, and a crossover of two candidate models. Each model was assessed by their classification performance gauged by the empirical AUC. Models with better performance were more likely to be retained in the subsequent computation.
The entire process was iterated until quasi-optimal models were identified when the AUC did not increase any further after a predefined number of iterations. An optional input variable of the algorithm, the cost of model complexity c (c ≥ 0), was also introduced to penalize candidate models with many variables. In the computation of the HCC_Risk_Score, the value of c is 0.0005. The pseudocode of the GIM algorithm is as follows:

Input
T: a patient-by-variable matrix L: a vector of patients' class labels c: the cost of model complexity, c ≥ 0.