A novel miRNA analysis framework to analyze differential biological networks

For understanding complex biological systems, a systems biology approach, involving both the top-down and bottom-up analyses, is often required. Numerous system components and their connections are best characterised as networks, which are primarily represented as graphs, with several nodes connected at multiple edges. Inefficient network visualisation is a common problem related to transcriptomic and genomic datasets. In this article, we demonstrate an miRNA analysis framework with the help of Jatropha curcas healthy and disease transcriptome datasets, functioning as a pipeline derived from the graph theory universe, and discuss how the network theory, along with gene ontology (GO) analysis, can be used to infer biological properties and other important features of a network. Network profiling, combined with GO, correlation, and co-expression analyses, can aid in efficiently understanding the biological significance of pathways, networks, as well as a studied system. The proposed framework may help experimental and computational biologists to analyse their own data and infer meaningful biological information.

Continued Even after determining targets for a specific miRNA, the identification of its functions is also a crucial task because each miRNA has the potential to target multiple different genes and consequently affect numerous BPs. To tackle this issue, scientists usually look for functions or pathways on which these miRNAs converge. A universal approach is to use enrichment analysis techniques for examining whether a given biological function unit is more frequently observed compared with the anticipation by random chance. Nonetheless, these enrichment techniques have been employed under the supposition that genes are selected consistently at random from a finite population referred as the gene universe, which may not be true in the case of target genes selected on the basis of query miRNAs.
Compared with protein-coding genes, most miRNA knockouts have very modest and subtle phenotypic effects 27 . One logical explanation is that multiple miRNAs may regulate their target genes cooperatively through a combinatorial or synergistic association. Therefore, examining the combined functions of target genes from a list of miRNAs showing synchronised changes appears to be more biologically significant. To get acquainted with such complex 'many-to-many' relationships between miRNAs and target genes, the best method is to use network visualisation methods. This approach associated with reliable enrichment analysis support can provide beneficial information that can help in gaining central insights into the miRNA regulatory machinery. The uncomplicated nature of this approach can identify major 'dramatis personae' from the network perspective by identifying those genes that are targeted by multiple miRNAs or that together regulate multiple genes of interest. However, such analysis and visualisation supports for NGS datasets are not available in current miRNA tools.
In this study, we have developed a novel miRNA analysis framework by using the transcriptomic datasets of Jatropha curcas L. as a sample input. This miRNA framework includes a six-step process where transcriptome data areannotated, followed by the miRNA identification and prediction of mRNA targets. Selected miRNA-mRNA interactions were considered for the construction of gene ontology (GO) inferred network, as a result of which differential expression analysis was performed using Pearson's correlation coefficient (PCC). The analysed nodes with a significant fragments per kilobase million (FPKM) value were used in further co-expression analysis.
To validate the proposed framework for various datasets, we contemplated two differential conditions of Jatropha. Various viral, fungal, and bacterial infections reduce the global average yield of J. curcas by 16% yearly 28 , the majority of which results from the occurrence of J. curcas mosaic virus, which causes leaf curling and reduction in fruit size 17 . Therefore, we considered the virus-infected (JV) tissue as one of the differential conditions. To identify molecular and cellular processes leading to phenotypic variations, we also considered the healthy (JH) tissue. Archit et al. reported the transcriptomic and molecular mechanistic understanding of JH and JV conditions 26 . The present study displays a meticulous comparative network analysis of differential miRNA expressions and disease-specific unique miRNAs in the virus-infected tissue to understand pathway complexity.

Results and Discussion
The miRNA analysis framework would broaden the horizon of the traditional miRNA target identification process and help in understanding the mechanism of action by using various network profiles. miRNA regulatory networks have numerous advantages over random networks because miRNAs are positioned upstream of gene signal transduction; therefore, alterations in miRNA expression are more sensitive and occur before changes in proteins 29 . In the present study, miRNA network analysis was performed using JH and JV transcriptome datasets from virus-host interactions to obtain regulatory nodes. The persistence of a large proportion of shared target proteins between JH and JV indicated that miRNA regulatory sub networks and viral infections are significantly interwoven in host cell networks. The overlapping of targeted genes involved in crucial cellular processes suggests that miRNAs play key roles in the process of viral infection.
miRNAs exert a substantial effect on targeted mRNA and gene expression. Variations in gene expression exert an effect on molecular pathways, which in turn affect cellular processes 30 . A total of 11 and 13 miRNAs are identified in JH and JV, respectively, of which 8 are common in both, namely miR-156, miR-157, miR-159, miR-319, miR-4995, miR-5021, miR-5658, and miR-f11908 (Table 1). To gain a deep insight into gene expressions, cellular processes, and associated pathways, controlling elements that are unique to particular conditions should be identified. Hence, we identified unique miRNAs in JH as miR-172, miR-414, and miR-529. In addition, the miR-2910, miR-2914, miR-477, miR-f11953, and miR-f12158 are identified uniquely in JV (Table 2). JH-specific miRNAs can be used as biomarkers to evaluate the resistance mechanism in healthy tissues, and JV-specific miR-NAs can point out targets that are compromised during a virus attack. miR-f11908, miR-f11953, and miR-f12158  Table 2. miRNA targets unique to healthy (JH) and diseased (JV) conditions. are novel miRNAs identified by the proposed miRNA analysis framework in J. curcas, and these miRNAs are also not experimentally validated in other plant species.
In accordance with the aforementioned results, mRNA targets respective to the miRNAs are also identified from the transcriptome. A total of 39 and 61 targets are predicted followed by KAAS annotation in JH and JV, respectively.
To quantify the target transcripts of respective miRNAs in JH and JV (Fig. 1A,B), selected transcripts were presented as nodes, and their interactions were represented through edges. Only 50 and 74 interacting nodes from JH and JV, respectively, were used for further construction of the bipartite network according to miRNA-mRNA target distribution. Here, the bipartite network has been constructed to represent the association between two groups without having any connexion within the same group. On the construction of these networks, some nodes showed a dominant effect compared with other nodes (Fig. 2A,B).
Virus has an asymptotic effect on the phenotypic appearance of plants through various cellular processes controlled by various genes involved in molecular pathways 31 . Analysis of the miRNA-mRNA target network shows the major contribution of miR-5021 and miR-5658 to the regulation of the expression of various transcripts in JH and JV. Some nodes, such as choline monooxygenase, histone H3, and ferrochelatase, showed an association with more than one miRNA. To understand the BPs, molecular functions (MFs), and cellular components (CCs) of selected transcripts in JH and JV, GO analysis was performed.
GO explains a set of clearly defined, ordered vocabularies with the aim of describing the BPs, MFs, and CCs of selected transcripts. Transcripts were clustered into different subgroups and assigned a node score value, which was further used to calculate the association within clusters. Apart from BLAST2GO (which used INTERPRO and PANTHER for analysis), we separately tested our miRNA target-associated GO terms by using GORILLA and WebGestalt GSAT. The results obtained were consistent with those of BLAST2GO, and no such biased results were found.
The transcripts involved in BPs in JH, such as the phosphatidylinositol metabolic process, transcription from RNA polymerase II promoter, isoprenoid biosynthesis process, oxidation reduction process, and positive regulation of the transcription elongation factor RNA polymerase II promoter, were filtered on the basis of a high GO node score (Fig. 3A). In JV, we observed more number of transcripts involved in the terpenoid biosynthesis process, aerobic respiration, tricarboxylic acid cycle, oxidative reduction process, citrate metabolic processes, organic substance biosynthetic process, and macromolecule biosynthetic processes (Fig. 3B). To cross-check and verify the role of the aforementioned BPs, we performed literature mining. According to the literature, processes involved in JH were more generalised compared with those involved in JV [32][33][34][35][36][37][38] . Additionally, we found that the processes mentioned in JV were involved in stress-mediated conditions to produce more energy so that plants can sustain in unfavourable conditions [39][40][41] .
To identify the elementary activities of genes at the molecular level, a node score was assigned to selected transcripts. Catalytic activity, transferase activity, transferase acyl activity, organic cyclic compound binding, heterocyclic binding, and nucleic acid binding were observed in JH (Fig. 4A). Response to oxidative stress, primary metabolic process, organic substance metabolic process, protein ubiquitination, transcription biosynthetic process, carbon fixation, oxidation-reduction process, and tricarboxylic acid cycle were observed in JV (Fig. 4B). A literature search of the stated MFs indicated that JH-associated functions were specific to normal conditions. However, functions relevant to JV appeared to be associated with stress and host-pathogen interactions 17,31,42 . A study of CCs in both the conditions revealed no such differential role of transcripts; however, intrinsic and integral components of the membrane were found to be uniquely present in JV (Fig. 5A,B), which indicated alteration in the cell membrane due to viral infection 43 .
PCC and Spearman's correlation coefficient were used to evaluate the association between differentially expressed miRNA targets. However, only PCC showed a strong association within transcripts according to biological relevance. Unique miRNAs and their targets were excluded from this analysis, and only common miRNA targets were selected to observe the overall behaviour of transcripts in JH and JV conditions. As shown in Fig. 1C, miRNA target transcripts showed a significant higher expression in the JV condition than in the JH condition, and the same highly expressed miRNA target transcripts were considered for the co-expression network analysis.  Table 3). To further analyse the role of co-expressed genes, they were mapped with KEGG to identify pathways in which all these genes were involved 44 . CMO-associated co-expressed genes were found to be involved in the biosynthesis of secondary metabolites and carbon metabolism; RPS5-related co-expressed genes in disease resistance response; RPL9-related co-expressed genes in ribosomal machinery, EIF5-related co-expressed genes in protein processing in the endoplasmic reticulum and ubiquitin-mediated proteolysis; MVA1-related co-expressed genes in the biosynthesis of secondary metabolites, fatty acid biosynthesis, carbon metabolism, and fatty acid metabolism; PPC-related co-expressed genes in the biosynthesis of secondary metabolites, carbon metabolism, glycolysis/gluconeogenesis, and galactose metabolism; PSAX-related co-expressed genes in photosynthesis and photosynthesis-antenna proteins; SUMO-related co-expressed genes in ribosome machinery and spliceosome; CYP707A1-related co-expressed genes in plant hormone signal transduction; and DXS-related co-expressed genes in the biosynthesis of secondary metabolites, photosynthesis-antenna proteins, prohrine and chlorophyll metabolism, proteosome, and the mRNA surveillance pathway. From the results of the co-expression network construction, it can be deduced that co-expressed genes are involved in various subcellular processes, which may be controlled or altered by miRNA regulation.
The current study represents a novel framework to unravel big data analytics in terms of transcriptomics, genomics, proteomics, metabolomics, and phenomics. Additionally, this framework can be employed to decipher the regulatory mechanism by controlling elements such as miRNA, transcription factor, and cis-regulatory elements. This framework can also be used for conducting differential and comparative analysis for multi-datasets and drug-target identification. To demonstrate one of the applications of this framework, the regulatory role of miRNA targets within healthy and diseased conditions in J.curcas is used as a sample to understand big data analytics in terms of RNA-seq transcriptomics. This was achieved by constructing a bipartite network, followed by correlation analysis, GO inferred network, and finally co-expression network, to show that comparative analysis can help in identifying regulatory genes in JH and JV. The proposed framework may help experimental and computational biologists to analyse their own data and infer meaningful biological information.

Conclusions
Through various plant miRNA identification and network construction techniques, a remarkable amount of information has been obtained, facilitating the construction of several biological networks. However, identifying intra-network nodes that cause variations in phenotypes remain a major challenge. With the application of transcriptome-wide strategies to elucidate biological networks in multiple sublevels, the effects on phenotypic variation can be understood. Here, we also propose future experimental validation of selected targets to confirm the regulatory roles of miRNA for predicted targets. Given the large-scale availability of transcriptome data, this framework can aid in comparative analysis to decipher the key driver nodes considerably affecting a phenotype. miRNA Identification. The annotation of high-quality reads was performed by comparing them against the non redundant database downloaded from the National Centre for Biotechnology Information, followed by the quantification of high-quality reads from JH and JV transcriptomes. miRNA identification was performed using in-house Perl scripts by using Zhang et al. 's algorithm 45 . The local database of mature miRNAs based on data obtained from the miRbase 46 and plant microRNA database 47 was constructed, and the in-house Perl script was used to identify precursor miRNAs from transcriptomes by using the parameters of sequence similarity of 100% and an e-value cutoff of 1e −5 . After removing redundant entries, miRNAs were classified into their respective families. To predict the secondary structure, we adopted the approach of the mfold software for sequences containing not more than 4 mismatches 48 . The parameters considered for miRNA identification were as follows: 1) selection of an RNA sequence as a candidate miRNA precursor, 2) RNA sequences should fold into an appropriate stem-loop hairpin secondary structure, 3) a mature miRNA sequence site is in one arm of the hairpin structure, 4) miRNAs should have less than seven mismatches with the opposite miRNA sequence in the other arm, 5) no loop or break in miRNA sequences,and 6) predicted secondary structures should have high negative minimum fold energies (less than or equal to −20 kcal/mol) 48 .

Methods
miRNA Target Prediction. The plant small RNA (psRNA) target (http://plantgrn.noble.org/psRNATarget/) analysis server was used to predict mRNA targets corresponding to identified miRNAs based on customised parameters 49 . Prediction analysis was performed using the option of user-submitted small RNA transcripts. The parameters considered for this analysis were as follows: 1)maximum expectation value = 3, 2) length for complementary scoring (upsize) = 20, 3) number of top target genes for each small RNA = 200, 4) target accessibility-allowed maximum energy to unpair the target site = 25, 5) flanking length around the target site for target accessibility analysis was 17 base pairs in upstream and 13 base pairs in downstream, and 6) the range of central mismatch leading to translational inhibition was 9-11 nucleotides. miRNA targets were classified into their respective miRNA families.
To cross-check psRNA-based predicted results, we performed target prediction analysis by using the TargetFinder Perl script downloaded from https://github.com/carringtonlab/TargetFinder. This script also provided the same results as those predicted using the psRNA target. Moreover, we manually compared miRNA and mRNA results by using shell scripting to evaluate the presence of false positive hits. However, all psRNA-based predictions were consistent with the results of other two methods. miRNA-mRNA Interaction Network Analysis. Advances in network biology indicate on the fact that cellular networks are ruled by universal laws and deal with a new conceptual framework that can potentially transform our view of biology and disease pathologies 50 . The framework for network analysis is shown in Fig. 6.
Interaction Matrix Construction. The adjacent matrix of the miRNA target network can be represented as follows: where i represents miRNA and j represents the association between miRNA and its targets.

Bipartite Network
Construction. An undirected graph where G = (V, E) in which V can be partitioned into two sets, V 1 and V 2 , such that (u, v) є E implies either u є V 1 and v є V 2 or v є V 1 and u є V 2 can be referred as a bipartite graph. In simple words, a network is called bipartite if its nodes can be divided into two groups in such a manner that nodes in one group are connected to nodes in the other group with no or sparse connexion existing within the same group. The directed bipartite network can be represented as shown in Fig. 1D. Three bipartite networks were constructed: healthy specific miRNA and its target, disease-specific miRNA and its target, and same miRNA in healthy and diseased conditions but different targets across healthy and diseased conditions. In addition, we examined the GO of selected nodes to understand the association between JH and JV.
GO enrichment inferred network. GO analysis deals with three components, namely BPs, MFs, and CCs.
BLAST2GO 51 was used to link selected transcripts to map with the GO database in terms of BPs, MFs, and CCs. Scoring function can be defined as follows: where • desc(g) represents all the descendant terms for a given GO term g • dist(g, ga) represents the number of edges between the GO term g and the GO term ga • g represents the element of GO, where GO is the whole set of all GO terms • gp(g) represents the number of gene outcomes given to a given GO term g The transcripts that belonged to the same category were clustered. A node score function was defined for all transcripts targeted by miRNAs in JH and JV tissues. Transcripts that had the same score were clustered in the same rectangle. Interconnection from one cluster to another cluster was performed on the basis of their respective association based on the node score.
Degree and Correlation Analysis. The degree of a node in an undirected graph is the number of connexions or edges a node has with other nodes, and it is defined as deg(i) = k(i) = |N(i)| where N(i) is the number of the neighbours of node i. The degree distribution p(k) reveals the fraction of vertices with degree k. To find the correlation between constructed bipartite networks, Pearson's correlation analysis was performed. PCC measures the linear correlation (r) between two variables. where ji and ki are the degrees of targets at both the ends of the ith connexion, and M represents total connexions in the network. A Perl script was used to calculate Pearson's correlation value for each pair of the identical transcript in healthy and diseased conditions on the basis of FPKM values by using R package DESeq 26 . Only those transcripts that were targeted by miRNAs in both the conditions were considered for analysis and further used for co-expression network construction.