A New Strategy for Analyzing Time-Series Data Using Dynamic Networks: Identifying Prospective Biomarkers of Hepatocellular Carcinoma

Time-series metabolomics studies can provide insight into the dynamics of disease development and facilitate the discovery of prospective biomarkers. To improve the performance of early risk identification, a new strategy for analyzing time-series data based on dynamic networks (ATSD-DN) in a systematic time dimension is proposed. In ATSD-DN, the non-overlapping ratio was applied to measure the changes in feature ratios during the process of disease development and to construct dynamic networks. Dynamic concentration analysis and network topological structure analysis were performed to extract early warning information. This strategy was applied to the study of time-series lipidomics data from a stepwise hepatocarcinogenesis rat model. A ratio of lyso-phosphatidylcholine (LPC) 18:1/free fatty acid (FFA) 20:5 was identified as the potential biomarker for hepatocellular carcinoma (HCC). It can be used to classify HCC and non-HCC rats, and the area under the curve values in the discovery and external validation sets were 0.980 and 0.972, respectively. This strategy was also compared with a weighted relative difference accumulation algorithm (wRDA), multivariate empirical Bayes statistics (MEBA) and support vector machine-recursive feature elimination (SVM-RFE). The better performance of ATSD-DN suggests its potential for a more complete presentation of time-series changes and effective extraction of early warning information.

from time-series data in metabolomics studies. Smilde et al. 13 combined analysis of variance (ANOVA) and simultaneous component analysis to study the variation caused by different factors such as time, doses or combinations, and then proposed ANOVA-simultaneous component analysis (ASCA) method to deal with time course problems. Nueda gave a time-series feature selection technique by calculating the leverage and the squared prediction error based on the ASCA model 14 . Tai et al. proposed multivariate empirical Bayes statistical time-series analysis (MEBA) method to rank the features by calculating the Hotelling's T 2 15 . Berk et al. 8 used smoothing splines mixed effects (SME) and an associated statistic functional test to detect the features with differences between groups. Subsequently, some data analysis platforms also have been established 16,17 to facilitate the study of time-series data. In our previous work 18 , we also proposed a weighted relative difference accumulation algorithm (wRDA) in which an adapted weight was assigned to every time point for extracting early information regarding complicated diseases. These dynamic methods worked successfully in metabolomics, however, all of them only considered individual metabolites without taking feature association into consideration.
Biological processes are intricate and the relationships among features (such as genes, metabolites and proteins) [19][20][21][22] are complicated and evolve with dynamic physiological processes. Thus, analyzing data from the perspective of networks could provide more information to understand the associations among features and discover important markers. Fang et al. 23 calculated the information gain (IG) of a ratio between two genes to construct a network. The genes with the largest degrees were regarded as the important factors related to lung cancer. Netzer et al. 24 also constructed a ratio network to select the nodes as biomarkers. If the ratio indicated a statistically significant difference between the classes (e.g., control and obesity groups), then there was an edge between the two corresponding features. Zuo et al. 25 used a low order partial correlation that could reduce spurious edges to infer the network. It is worth noting that most network methods were applied to find key information in static-omics data that discriminated between the different groups, rather than the tracking of features with dynamic differential changes.
In this study, a novel strategy for analyzing time-series data based on dynamic networks (ATSD-DN) in a systematic time dimension was developed. The non-overlapping ratio (NOR) was introduced to quantify the changes in feature ratios with the process of disease development, and provide a novel basis for network construction. Given that the ratio of two metabolites can be assumed to be the result of pathway reactions in which one metabolite is converted into another via single or multiple reaction pathways 26 , ATSD-DN constructed the networks based on the NOR changes of feature ratios along time points, which would facilitate the reflection of physiological or pathological changes. Dynamic concentration analysis and topological structure analysis were performed to analyze the networks and extract early warning information for the disease. Hepatocellular carcinoma (HCC) is one of the most lethal malignancies 27 , and liver cirrhosis is the major precancerous lesion in the majority of HCC cases 28 . However, until now, early detection of HCC has been a great challenge, especially for the discrimination of precancerous cirrhosis and small malignant HCCs 29,30 . Developing new effective methods for the discovery of new biomarkers for early warning of HCC is urgently needed. Due to similarities with histological and genetic features of patients, a diethylnitrosamine (DEN)-induced HCC model can be used to imitate the process of stepwise hepatocarcinogenesis [31][32][33] . Considering the important role of the liver in ensuring the homeostasis of lipids 11,34 , delineating the changes in lipid metabolism would be useful to provide unique insight into early hepatocarcinogenesis and identify novel diagnostic targets. Therefore, ATSD-DN was applied to the time-series lipid data from a rat HCC model induced by DEN administration to define the potential lipid biomarkers for early diagnosis of HCC and validate the performance of ATSD-DN.

Results
The workflow of the ATSD-DN strategy is given in Fig. 1. After filtering the non-informative features by static analysis, ATSD-DN constructed the networks. ATSD-DN provides two techniques: dynamic concentration analysis and topological structure analysis, each of the two network analysis techniques was performed independently to define the informative feature ratios. The PCA score plots based on the feature ratios defined by each network analysis technique alone were used to show the performance of each technique. Finally, the common feature ratios defined by both two techniques were selected and the corresponding performance analysis was also given.
The construction of dynamic networks. Time-series lipidomics data were analyzed to depict changes in lipid metabolism regarding the process of stepwise hepatocarcinogenesis. A histological examination confirmed that the DEN-induced hepatocarcinogenesis model was successfully produced in this study. The serial progression of hepatocarcinogenesis was divided into three stages: week 8 (hepatitis (H) stage, T 1 ), weeks 10-14 (cirrhosis (CIR) stage, T 2 -T 4 ) and weeks 16-20 (HCC stage, T 5 -T 7 ). The last week of each stage (i.e., T 1 , T 4 and T 7 ) was the typical time point of the corresponding liver disease stage, while the first weeks of the latter two stages (i.e., T 2 and T 5 ) were the interfacial points.
In three sub-problems of classification (H vs. CIR, H vs. HCC and CIR vs. HCC), 38 individual features were selected from the first process of noise filtering (i.e., static analysis) at typical time points by SVM-RFE 35 (Table  S1). The multivariate unsupervised PCA analyses were performed to show the discrimination between HCC (T 5 -T 7 ) and non-HCC (T 1 -T 4 ) samples (i.e., hepatitis and cirrhosis samples). The first two principal components captured 65.1% and 71.1% of the total variation from the PCA models based on original all features and these 38 individual features, respectively ( Figure S2A,B).
Subsequently, a total of 703 feature ratios were developed based on these 38 individual lipids. For each feature ratio, if the NOR value at two adjacent time points was greater than or equal to 0.85, the corresponding two individual lipids were linked with a red edge. If the NOR was less than or equal to −0.85, the edge was green. As only two time points were considered in each network construction and each time point had exactly the same samples, the sample probability p t was 0.5. Figure 2 shows the six networks along the 7 time points. In particular, each network can illustrate the changes in feature ratios at two continuous time points, instead of quantification at a single time point. Dynamic concentration and topological structure analyses. These NOR-based dynamic networks were firstly analyzed from the perspective of dynamic concentration. In Fig. 2, the color of the edges in each network DN-i indicates the change trend in the effective range for each feature ratio with increased (red) or decreased (green) results at two adjacent time points. To trace the continuous changes of the most important interfacial stage between pre-cancer CIR and early HCC, networks DN-4 (T 4 -T 5 ) and DN-5 (T 5 -T 6 ) representing the cases in which liver disease developed from pre-cancer cirrhosis to HCC and continued to deteriorate were first emphasized. Therefore, 44 edges with the same color in networks DN-4 and DN-5 were picked, and the corresponding ratios were retained to construct feature subset 1. The edges with the same colors in DN-4 and DN-5 represent continuous changes in the dynamics of the circulating metabolites from T 4 to T 6 . The PCA analysis was then performed based on the 44 feature ratios to show the discrimination between HCC (T 5 -T 7 ) and non-HCC (T 1 -T 4 ) samples. The score plot shows that the non-HCC and HCC samples could be separated well. A better performance of the PCA model was obtained that 95.6% of the total variation could be explained (Fig. 3A).
In Fig. 2, the dynamics of circulating metabolites could also be analyzed from the perspective of topological structure of networks. In ATSD-DN, the edges between two features represent the dynamics of circulating metabolites over time 26 . Therefore, the network with the most edges among the 6 networks may represent the largest difference in the dynamics of circulating metabolites, which implies physiological or pathological abnormalities. The network with the most edges could be a key stage along the time course and the key point for a particular biological process. The top nodes with the largest degrees in the network would be the key factors signaling the onset of the key stage. For this topological structure analysis, it can be observed that the edge number of network DN-4 (T 4 -T 5 ) (Fig. 2G) was the largest among the 6 networks that agreed with the development of HCC validated by the histological examination, indicating activated metabolic disturbance in the interfacial stage between CIR and HCC. Then, the top node with the largest degree (i.e., the number of edges) was chosen. Two nodes (free fatty acid (FFA) 20:5 and triacylglycerol (TAG) 56:9) were observed with the same largest degree in network DN-4. It is worth noting that FFA 20:5 was also the top one with the most accumulated degree in 6 networks (Table S2), indicating the continuous metabolic disturbance over time. As a result, 33 ratios associated with FFA 20:5 in network DN-4 was retained for subsequent analysis. The separation between non-HCC and HCC stages can also be obviously represented in the PCA score plot based on these 33 feature ratios with 96.9% of the total variation explained (Fig. 3B).

Definition and external validation of prospective biomarkers.
In the discovery set, the common 15 ratios were selected by both dynamic concentration and topological structure analyses (Table S3). In the PCA score plot based on these 15 ratios, the HCC samples could be clearly discriminated from non-HCC subjects with the highest percentage of the total variation explained (i.e., 99.1%; Fig. 3C).
For univariate evaluation, 4 of the 15 ratios showed significant difference between the model and age-matched control groups at the HCC stage (t-test, p < 0.05) and between T 4 and any time point at the HCC stage (paired t-test, p < 0.05) simultaneously. Detailed information of these 4 ratio candidates (lyso-phosphatidylcholine (LPC) 16:0/FFA 20:5, LPC 18:1/FFA 20:5, phosphatidylcholine (PC) 34:2/FFA 20:5 and LPC 20:3-isomer2/FFA 20:5) is given in Table 1, and the metabolic trajectories of them are presented in Fig. 3D-G. In the model group, their levels changed slightly at the pre-HCC stage and appeared to increase significantly in the early stage of HCC (T 5 ). A significant difference between the model and age-matched control groups was also observed at the HCC stage (T 5 -T 7 ). To further illustrate the ability of the 4 feature ratios to discriminate HCC and non-HCC samples, the receiver operating characteristic (ROC) curve was analyzed based on the results for the area under the curve (AUC) and the sensitivity and specificity at the best cut-off points ( Table 2). The AUC values of these 4 feature ratios were 0.940-0.980 in the discovery set.
To validate the performances of the 4 biomarker candidates, 36 sera from another 6 model rats with 6 monitoring time points (i.e., T 1 -T 6 ) were analyzed. These 6 rats were sacrificed for histological examination with the validation of HCC at week 18 (T 6 ). In this external validation set, the AUC values of these 4 candidates were 0.934-0.983 for the discrimination of T 1 -T 4 (pre-HCC stage) and T 5 -T 6 (HCC stage), confirming the potential of these 4 ratio biomarkers for HCC diagnosis. Considering the similar metabolic characteristics of these 4 candidates and clinical practicability, the feature ratio of LPC 18:1/FFA 20:5 was found to be the potential biomarker with the best AUC value for discrimination. The chromatograms and MS/MS data for LPC 18:1 and FFA 20:5 are provided in Figure S4.
Comparison with previous methods. To further evaluate the performance of ATSD-DN, this novel approach was compared with two time-series methods wRDA and MEBA, and a popular two-way technique SVM-RFE. The features with the top AUC values in the discrimination of HCC and non-HCC were retained from each method. Phosphatidylinositol (PI) 36:3 was selected by both wRDA and MEBA and TAG 56:8 was selected by SVM-RFE.
In the discovery set, 95.2% of HCC and 96.4% of non-HCC samples could be correctly diagnosed at the best cutoff value based on the results of ATSD-DN (i.e., LPC 18:1/FFA 20:5; Table 2). The AUC value of LPC 18:1/FFA 20:5 was 0.980, which was better than 0.898 of PI 36:3 defined by both wRDA and MEBA and 0.852 of TAG 56:8 defined by SVM-RFE ( Fig. 4A-C). Similar comparison results in the validation set are also presented in Fig. 4D-F (the corresponding AUC values were 0.972, 0.833 and 0.833, respectively). The better performance of ATSD-DN may suggest its potential for a more complete presentation of time-series changes.

Discussion
HCC is one of the most prevalent malignancies with a high mortality rate 27 . Early diagnosis could greatly improve the survival rate 36 . However, unapparent early symptoms and individual differences bring difficulties to early discrimination and seasonable treatment of HCC. Although ultrasonography and some typical tumor markers (e.g., α-fetoprotein) have been applied for clinical diagnosis and achieved some successes, they are far from ideal, with high false negative rates 29,30 . Developing new efficient methods such as discovering new biomarkers for the early screening of high risk populations is challenging and urgent. Dynamic metabolomics studies based on time-series data can trace the interfacial stage between pre-cancer cirrhosis and HCC and then facilitate the screening of biomarkers for early diagnosis. To identify the early warning signals of disease deterioration, a new strategy for analyzing time-series data based on dynamic networks in a systematic time dimension was proposed and applied in a prospective cohort study using a diethylnitrosamine (DEN)-induced rat hepatocarcinogenesis model. In this study, noise and irrelevant features were first removed based on the pre-screen. Then, the feature ratio of each of two individual metabolites was developed. The change in the effective range for each feature ratio at two adjacent time points was depicted by the NOR value, which provided the novel basis for network construction. Then, these dynamic networks were used to trace and define the feature ratios with continuous differential changes from two different methods.
In this time-series dataset, to trace the continuous changes of the interfacial stage between CIR and HCC, the networks DN-4 and DN-5 inferred by T 4 , T 5 and T 6 representing the cases in which liver disease developed from pre-cancer cirrhosis to HCC and continued to deteriorate were first emphasized. In Fig. 2, these NOR-based dynamic networks were firstly analyzed from the perspective of dynamic concentration. The edges with the same colors in DN-4 and DN-5 represent continuous changes in the dynamics of the circulating metabolites from T 4  to T 6 , which were picked to facilitate the discrimination between the pre-HCC and HCC stages. Moreover, it is known that there usually exists a key point in disease development that warns the deterioration of the disease. The discovery of this key point and related key information are of great importance to study the disease. In Monoglycerophospholipid LPC 18:1 can be formed via the hydrolysis of phosphatidylcholine (PC), which has an important role in cell signaling. FFA 20:5 (i.e., eicosapentaenoic acid) has been previously reported to improve steatohepatitis and inhibit the development of HCC 34,37 . The decrease in FFA 20:5 may indicate the risk of HCC. In this study, the combination of these two lipids using the biomarker pattern of the LPC 18:1/FFA 20:5 ratio was employed to improve the diagnostic performance. This ratio biomarker pattern would facilitate the magnification of metabolic differences for discrimination. Moreover, compared with traditional individual features or the combination of metabolites from a single pathway, this combination pattern reflects the imbalance of the lipid network from different perspectives of physiology, which would be more informative and robust for HCC risk assessment 38 . Further validation is still needed with a larger cohort of specimens.
To evaluate the efficacy of this new strategy, ATSD-DN was further compared with previous methods (wRDA, MEBA and SVM-RFE). As shown in Fig. 4, the ratio biomarker from ATSD-DN fulfills the best discrimination of HCC and non-HCC samples with the best AUC values in both discovery and validation sets. Based on the comparison results, the better performance of ATSD-DN suggests its great potential for the extraction of early warning information. The advantages of ATSD-DN are as follows: i) this novel strategy is better for the more complete presentation of time-series changes. Rather than screening differentially expressed variables at isolated time points, as in two-way analysis methods, ATSD-DN can be used to trace and define feature ratios with continuous differential changes in a systematic time dimension. ii) The introduction of NOR based on the repeated time series measure facilitates the quantification of changes at two continuous time points and provides a novel basis for network construction. Thus, each network in ATSD-DN presents changes in feature ratios at two continuous time points, which could better reflect the physiological and pathological changes. iii) ATSD-DN analyzes data from the perspective of networks which could possibly provide the insight into the complicated interplay of multiple molecules and be better to explore the development of diseases. Two ways of dynamic concentration and topological structure analyses can be flexibly selected to define the early warning information. iv) ATSD-DN is a data-driven learning method in which few parameters need to be set by the researchers.
It should be noticed that ATSD-DN traces the effective range of a feature ratio along the time points to examine the changes in the feature relationships, and time series repeated measures has been considered in the construction of network. Different from other time-series methods such as ASCA which explores the contributions of different factors or multi-factors, ATSD-DN aims to analyze the networks and extract early warning information for the disease by dynamic concentration analysis and topological structure analysis. In the analysis of metabolomics data, ATSD-DN focuses on the relationship of features to extract the early warning information, and it may ignore some metabolites which associate with the disease but have little relationship with others. Besides, it should be noticed that the present study based on the lipidomics analysis may drop some metabolites which their associate metabolites cannot be detected by the MS. The novel strategy which can combine the feature associations and independent features together should be further developed. In summary, ATSD-DN analyzes the time-series data from the perspective of networks to define the early warning biomarkers of complicated diseases. The application of ATSD-DN to the rat HCC metabolomics data demonstrated that it is an effective method for identifying potential metabolic biomarkers for early diagnosis. To improve the performance of early risk identification, more construction methods for dynamical networks can be employed in further studies.

Methods
To study the development of a disease and identify the early warning signals, both control and model samples were collected. Let C denote the control group, M denote the model group and T i denote a time point, 1 ≤ i ≤ N, where N is the number of time points. Usually, as time goes on, the model samples may suggest different stages of the disease. Let N s denote the number of the different disease stages along N time points.
ATSD-DN defines the prospective information of the disease deterioration based on the dynamic analysis of the networks along the time course. However, not all the features in the metabolic spectrum are involved in the network analysis. Non-informative features are filtered out by static analysis before network construction. ATSD-DN provides two independent techniques to identify the features of interest from the networks. Figure 1 shows the procedure for ATSD-DN.

Static analysis.
It is known that noise and irrelevant features are two factors affecting the efficient analysis of metabolomics data. Given that the model samples experience N s different biological stages, the features containing little discriminative information from each two-stage segment are noise or unrelated to the problem and should be removed. Thus, ATSD-DN separates the problem into N s (N s − 1)/2 binary sub-problems and selects the features with discriminative information for each sub-problem to construct the networks for further analysis.
A change in r ijt at the adjacent time points could reflect a change in the biological procedure. Thus, ATSD-DN traces the effective range of a feature ratio along the time points to examine the changes in the feature relationships. The effective range of r ijt is defined as follows 39 : where − er ijt and + er ijt are the floor and the ceiling of the effective range of r ijt . p t is the sample probability at time point T t in the corresponding network construction. For the effective range containing least two-thirds of the samples,  1, 2, …,  n). For a change in the effective range of a feature ratio between two time points, there exist three cases ( Figure S3). In the third case, the effective range of the feature ratio at one time point is included in the effective range at another time point ( Figure S3C). This is far from ideal to illustrate the changes in the assumed pathway reactions related to the disease development. Therefore, only the first two cases ( Figure S3A,B) are examined in ATSD-DN. Additionally, the changes in the effective range of the feature ratio at the adjacent time points T t and T t+1 (1 ≤ t < N) are depicted by the non-overlapping ratio (NOR), which is defined as follows: 1) . If |NOR(r ijt )| is large, it indicates that the feature ratio r ijt from time T t to time T t+1 changes greatly, suggesting the continuous metabolic disturbance for the assumed reaction between individual feature f i and f j . Thus, a network DN-t could be built based on T t and T t+1 . The network is presented using the rational visualization method of hive plots which is accessed at http://www.hiveplot.net/. Let the features be the vertices of DN-t. For every pair of features f i and f j , if |NOR(r ijt )| ≥ τ, then there is an edge between f i and f j in DN-t. NOR could also tell the direction of the feature ratio change. NOR(r ijt ) > 0 represents the feature ratio r ijt increasing along two adjacent time points, and NOR(r ijt ) < 0 represents r ijt decreasing. For simplicity, if NOR(r ijt ) ≥ τ, the edge between f i and f j in DN-t is colored red, and if NOR(r ijt ) ≤ −τ, the edge is colored green. If the edge between the two individual features stays red (or green) in consecutive networks, it implies that the feature ratio of these two individual features increases (or decreases) continually along the time points.
Network analysis. To define the prospective information for a complex disease, ATSD-DN analyzes the networks from two perspectives: dynamic concentration analysis and topological structure analysis.
Dynamic concentration analysis. Dynamic concentration analysis investigates the changes in the feature ratios during the course of disease development. As a biological process is always in motion, some signals must exist before a specific time point in a complex disease, such as a malignant tumor. To identify the signals, ATSD-DN focuses on certain time points (without loss of generality, it is assumed to be N e (0 < N e < N) time points) before the typical time point T s (1 < s ≤ N) of the disease. If the effective range of the ratio between the features along N e time points continues to change in the same direction (such as continuous increasing or decreasing), it indicates a continuous metabolic disturbance. Therefore, to identify the early warning signal for the specific time point of disease, the networks DN-i (s − N e ≤ i < s − 1) are examined, and the edges that remain the same color in DN-i are selected. The corresponding ratios are selected as the signals of the specific time point of the disease and constitute feature subset 1.
Topological structure analysis. The topological structures of the N-1 networks along N time points can also indicate the biological changes over time. If the edge number of DN-t (1 ≤ t < N) is large, it implies that many pathway reactions experience large changes in the reaction rate and the organism experiences a relatively drastic biological change. Thus, DN-t (1 ≤ t < N) with the most edges could be a key stage along the time course and may be the key point for a particular biological process. The nodes with the largest degrees in the network would be the key factors signaling the onset of the key stage. Thus, in topological structure analysis, ATSD-DN analyzes the edge numbers of N-1 networks along N time points and focuses on the one (DN-t, 1 ≤ t < N) that has the most edges. It ranks the nodes in DN-t according to their degrees in a descending order, and the top k ≥ 1 nodes are selected and the feature ratios corresponding to the edges associated with the k nodes are selected to constitute feature subset 2.
Each of the two network analysis techniques has its own merits for extracting early warning information. Therefore, they can be used flexibly to analyze the time-series data and to define the potential biomarkers independently. It is also possible to use them simultaneously to get the feature subset by union or intersection of feature subset 1 and feature subset 2.
The application of ATSD-DN to metabolomics data from a rat HCC model. ATSD-DN was applied to the time-series data to define the potential biomarkers for early diagnosis of HCC. The data include a discovery set and a validation set. ATSD-DN was performed on the discovery set to identify prospective information. The validation set was used to test the results of ATSD-DN on the discovery set.
Time-series data source. In this study, time-series data were obtained from the animal model with DEN-induced stepwise hepatocarcinogenesis. This animal experiment was conducted at the experimental animal center of Scientific RepoRts | 6:32448 | DOI: 10.1038/srep32448 Dalian Medical University (Dalian, China), in compliance with national guidelines for the care and use of laboratory animals. The study protocol was reviewed and approved by the institutional reviewer board of Dalian Medical University, Dalian, China. And the experiment was carried out in accordance with the approved guidelines.
This rat model has been described detailedly in our previous report 11,40 . Briefly, a total of 55 male Sprague-Dawley (S.D.) rats were enrolled in the present study at the age of 42 days (i.e., week 0). Then, after two weeks of adaptation, all rats were randomly divided into control (n = 10) and model (n = 45) groups, administrated with saline and DEN at 70 mg/kg body weight respectively via intraperitoneal injection. The injection was performed once a week between week 2 and week 11, and 14 rats from the model group died during the administration.
Histological examination was performed to monitor the progress of stepwise hepatocarcinogenesis based on the sacrifice of model rats, until all of the surviving animals (n = 10 for control and n = 7 for model groups) were finally sacrificed in week 20. Collected liver tissues were fixed in 10% buffered formalin and embedded in paraffin for histological examination, which confirmed that the DEN-induced hepatocarcinogenesis model was successfully produced in the present study.
The collection of time-series sera set was conducted from week 8 to week 20 once every 2 weeks (i.e., 7 monitoring time points). The discovery data included 10 rats from the control group and 7 rats from the model group. A total of 119 time-series sera were then collected from all 7 monitoring time points once every two weeks from week 8 to week 20. Thus, the number of the time points for the discovery set was 7; i.e., N = 7. In the model group, the first time point T 1 was week 8 (M8) and the 7th time point T 7 was week 20 (M20). Similarly, C8 and C20 were week 8 and week 20 in the control group.
Furthermore, 36 sera from another 6 model rats were used for validation. These 6 rats were sacrificed for histological examination with the affirmance of HCC at week 18. Therefore, their sera were collected from 6 monitoring time points (i.e., T 1 -T 6 ).

Profiling of lipids by LC-MS analysis.
Time-series serum samples were analyzed to perform a non-targeted lipidomics study using an ACQUITY ultra-performance liquid chromatography (UPLC) system (Waters, USA) coupled with a tripleTOF ™ 5600 plus mass spectrometer (AB Sciex, USA). Details regarding lipidomics analysis including serum preparation and instrument methods are provided in the Supplemental Information.
Data analysis. Based on the accurate m/z, retention behavior and MS/MS fragmentation pattern, lipid species were first identified with LipidView and PeakView software (AB Sciex, USA). Then, the quantitative information for detected lipids was extracted using MultiQuan software (AB Sciex, USA) with a mass width of ± 0.01 Da and retention time width of ± 0.15 min. Before statistical analysis, the relative abundance of all lipids was calculated by normalizing to the area of corresponding internal standards. Finally, a time-series dataset was exported to the ATSD-DN strategy.
Seven time points include three different stages of liver disease (N s = 3): hepatitis, cirrhosis and hepatocellular carcinoma. The features containing little discriminative information for every two-stage segment were removed. SVM-RFE was first applied on three binary sub-problems (H vs. CIR, H vs. HCC, CIR vs. HCC). Five-fold cross-validation was run fifty times for each sub-problem. In SVM-RFE, the kernel function and penalty factor were set as the liner kernel function and 1, respectively. The implementation of SVM was performed with LIBSVM (available at http://www.csie.ntu.edu.tw/~cjlin/libsvm). MEBA was from http://www.metaboanalyst.ca/ faces/Secure/upload/TimeUploadView.xhtml. All the algorithms were written in C++.
The selected feature subsets of the three sub-problems were united and used to infer the networks with τ = 0.85. T 7 is the typical HCC stage and T 4 is the typical CIR stage. It is known that HCC usually develops from CIR. Thus, N e = 3 time points before typical HCC (T s = 7) were studied to define the early warning information of for HCC by means of dynamic concentration analysis. Thus, DN-4 and DN-5 were inferred by these three time points. The feature ratios corresponding to the edges whose colors stay the same in DN-4 and DN-5 were selected to constitute feature subset 1.
The edge numbers of the 6 networks along the 7 time points were analyzed. The network that had the greatest number of edges was selected. Its nodes were ranked according to their degrees in descending order, and the top ranked node was selected. The ratios corresponding to the edges linked with the top ranked node were selected to constitute feature subset 2.
The compared methods. wRDA. The mean value and standard deviation were used to measure the differences for a feature between the control and model groups 18 . An adapted weight was assigned to each time point for extracting early information on complicated diseases. Subsequently, a false discovery rate (FDR) 41 was used to evaluate the selected feature subset. The lower the FDR, the better the selected features. In this study, the weights of non-HCC and HCC stages were 0.1 and 0.2, respectively. The top 30 features with the largest scores with FDR = 0% were constructed as the final feature subset.

MEBA.
A time-course analysis method based on multivariate empirical Bayes statistical which could evaluate the importance of the features by the Hotelling's T 2 15 . The top 30 features with the largest Hotelling's T 2 were constructed as the final feature subset.
SVM-RFE. This method has been widely applied to select discriminative features from the high-dimensional metabolomics data 35,[42][43][44][45][46] . It removes the least important features iteratively. In each iteration, the weight of each feature in the current feature subset is re-measured based on the contribution to the hyper-plane, and r% features with the smallest weights are removed. This process is repeated until the current feature subset is empty. The feature subset with the largest accuracy rate in the iteration is kept as the selected features subset.