For understanding complex biological systems, a systems biology approach, involving both the top-down and bottom-up analyses, is often required. Numerous system components and their connections are best characterised as networks, which are primarily represented as graphs, with several nodes connected at multiple edges. Inefficient network visualisation is a common problem related to transcriptomic and genomic datasets. In this article, we demonstrate an miRNA analysis framework with the help of Jatropha curcas healthy and disease transcriptome datasets, functioning as a pipeline derived from the graph theory universe, and discuss how the network theory, along with gene ontology (GO) analysis, can be used to infer biological properties and other important features of a network. Network profiling, combined with GO, correlation, and co-expression analyses, can aid in efficiently understanding the biological significance of pathways, networks, as well as a studied system. The proposed framework may help experimental and computational biologists to analyse their own data and infer meaningful biological information.
Complex networks theory plays a vital role in various disciplines, ranging from computer science, sociology, engineering, and physics to molecular and population biology. In the fields of biology and medicine, a network analysis may be applied for identifying a drug target, determining protein or gene function, designing effective strategies for treating various diseases, or diagnosing disorders early. Many different interaction networks, including gene regulatory interactions, transcriptional regulatory networks, and metabolic networks, emerge from the sum of transcriptomic interactions1,2,3. Network biology aims at inspecting molecular components to deduce meaningful information from large transcriptomic datasets. Generally, all metabolic networks depend on each other and form a ‘network of networks’, which is responsible for the behaviour of the biological system4,5. A major challenge in modern biology is to embark on an integrated theoretical and experimental programme to retrieve, comprehend, and model the parametric terms for the topological and dynamic properties of various networks that control the behaviour of the biological system, ultimately resulting in phenotypic changes.
miRNA are small regulatory noncoding molecules (approximately 22 nucleotides) that control the expression of genes at the transcript level. Gene expression alteration has a substantial effect on their respective mRNA targets and subsequently on the associated metabolic pathways6. miRNAs can exert an effect on most of the normal biological processes (BPs), such as immunity, metabolism, and development. Subsequently, alterations in miRNA targets cause disturbances in the molecular interaction and expression of genes. Recent studies have reported that miRNAs play crucial roles in host–gut microbiota interactions7, host–parasite interactions8, and transgenerational epigenetic inheritance9. Although various studies have identified correlations of several miRNAs with differential conditions on the basis of homology searches10, the molecular mechanism connecting miRNA and associated targets still remains unclear.
The identification of miRNA targets forms the main step towards better understanding miRNA functions11,12. Computational predictions have been proved to be very useful; however, they produce high false-positive rates13. During the past decade, high-throughput RNA sequencing has been considered a useful tool, which provided new insights into gene identification in many plant species14,15. For instance, RNA-seq and microarray experiments have been increasingly employed to determine the silencing effects of miRNAs at the gene expression level. The RNA-seq–based transcriptome scrutiny fastens the process of understanding plant systems in terms of biotic or abiotic stress, including virus infection, at the molecular level16. The mechanism of miRNA interaction between a virus and host plant was reported for various plant species including Jatropha17. The next-generation sequencing approaches substantially increase the quantity and quality of data on miRNA targets and their functional annotations.
Many online databases have been developed to handle emergent datasets, such as miRTarBase18, TarBase19, miRecords20, and starBase21. In the intervening time, an array of miRNA studies have generated a high quantity of data that associate miRNAs for epigenetic modifications, diseases, drug effects, and additional molecules with databases such asEpimiR22, Pharmaco-miR23, miR2Disease24, SM2miR25, and PhenomiR26. Mutually, these databases provide rich information to comprehend targets and potential functions for many specified miRNAs of interest. However, the target analysis and miRNA target network co-expression studies need to be deeply explored to understand associated partners and respective outcomes of gene regulatory interactions.
Even after determining targets for a specific miRNA, the identification of its functions is also a crucial task because each miRNA has the potential to target multiple different genes and consequently affect numerous BPs. To tackle this issue, scientists usually look for functions or pathways on which these miRNAs converge. A universal approach is to use enrichment analysis techniques for examining whether a given biological function unit is more frequently observed compared with the anticipation by random chance. Nonetheless, these enrichment techniques have been employed under the supposition that genes are selected consistently at random from a finite population referred as the gene universe, which may not be true in the case of target genes selected on the basis of query miRNAs.
Compared with protein-coding genes, most miRNA knockouts have very modest and subtle phenotypic effects27. One logical explanation is that multiple miRNAs may regulate their target genes cooperatively through a combinatorial or synergistic association. Therefore, examining the combined functions of target genes from a list of miRNAs showing synchronised changes appears to be more biologically significant. To get acquainted with such complex ‘many-to-many’ relationships between miRNAs and target genes, the best method is to use network visualisation methods. This approach associated with reliable enrichment analysis support can provide beneficial information that can help in gaining central insights into the miRNA regulatory machinery. The uncomplicated nature of this approach can identify major ‘dramatis personae’ from the network perspective by identifying those genes that are targeted by multiple miRNAs or that together regulate multiple genes of interest. However, such analysis and visualisation supports for NGS datasets are not available in current miRNA tools.
In this study, we have developed a novel miRNA analysis framework by using the transcriptomic datasets of Jatropha curcas L. as a sample input. This miRNA framework includes a six-step process where transcriptome data areannotated, followed by the miRNA identification and prediction of mRNA targets. Selected miRNA–mRNA interactions were considered for the construction of gene ontology (GO) inferred network, as a result of which differential expression analysis was performed using Pearson’s correlation coefficient (PCC). The analysed nodes with a significant fragments per kilobase million (FPKM) value were used in further co-expression analysis.
To validate the proposed framework for various datasets, we contemplated two differential conditions of Jatropha. Various viral, fungal, and bacterial infections reduce the global average yield of J. curcas by 16% yearly28, the majority of which results from the occurrence of J. curcas mosaic virus, which causes leaf curling and reduction in fruit size17. Therefore, we considered the virus-infected (JV) tissue as one of the differential conditions. To identify molecular and cellular processes leading to phenotypic variations, we also considered the healthy (JH) tissue. Archit et al. reported the transcriptomic and molecular mechanistic understanding of JH and JV conditions26. The present study displays a meticulous comparative network analysis of differential miRNA expressions and disease-specific unique miRNAs in the virus-infected tissue to understand pathway complexity.
Results and Discussion
The miRNA analysis framework would broaden the horizon of the traditional miRNA target identification process and help in understanding the mechanism of action by using various network profiles. miRNA regulatory networks have numerous advantages over random networks because miRNAs are positioned upstream of gene signal transduction; therefore, alterations in miRNA expression are more sensitive and occur before changes in proteins29. In the present study, miRNA network analysis was performed using JH and JV transcriptome datasets from virus–host interactions to obtain regulatory nodes. The persistence of a large proportion of shared target proteins between JH and JV indicated that miRNA regulatory sub networks and viral infections are significantly interwoven in host cell networks. The overlapping of targeted genes involved in crucial cellular processes suggests that miRNAs play key roles in the process of viral infection.
miRNAs exert a substantial effect on targeted mRNA and gene expression. Variations in gene expression exert an effect on molecular pathways, which in turn affect cellular processes30. A total of 11 and 13 miRNAs are identified in JH and JV, respectively, of which 8 are common in both, namely miR-156, miR-157, miR-159, miR-319, miR-4995, miR-5021, miR-5658, and miR-f11908 (Table 1). To gain a deep insight into gene expressions, cellular processes, and associated pathways, controlling elements that are unique to particular conditions should be identified. Hence, we identified unique miRNAs in JH as miR-172, miR-414, and miR-529. In addition, the miR-2910, miR-2914, miR-477, miR-f11953, and miR-f12158 are identified uniquely in JV (Table 2). JH-specific miRNAs can be used as biomarkers to evaluate the resistance mechanism in healthy tissues, and JV-specific miRNAs can point out targets that are compromised during a virus attack. miR-f11908, miR-f11953, and miR-f12158 are novel miRNAs identified by the proposed miRNA analysis framework in J. curcas, and these miRNAs are also not experimentally validated in other plant species.
In accordance with the aforementioned results, mRNA targets respective to the miRNAs are also identified from the transcriptome. A total of 39 and 61 targets are predicted followed by KAAS annotation in JH and JV, respectively.
To quantify the target transcripts of respective miRNAs in JH and JV (Fig. 1A,B), selected transcripts were presented as nodes, and their interactions were represented through edges. Only 50 and 74 interacting nodes from JH and JV, respectively, were used for further construction of the bipartite network according to miRNA–mRNA target distribution. Here, the bipartite network has been constructed to represent the association between two groups without having any connexion within the same group. On the construction of these networks, some nodes showed a dominant effect compared with other nodes (Fig. 2A,B).
Virus has an asymptotic effect on the phenotypic appearance of plants through various cellular processes controlled by various genes involved in molecular pathways31. Analysis of the miRNA–mRNA target network shows the major contribution of miR-5021 and miR-5658 to the regulation of the expression of various transcripts in JH and JV. Some nodes, such as choline monooxygenase, histone H3, and ferrochelatase, showed an association with more than one miRNA. To understand the BPs, molecular functions (MFs), and cellular components (CCs) of selected transcripts in JH and JV, GO analysis was performed.
GO explains a set of clearly defined, ordered vocabularies with the aim of describing the BPs, MFs, and CCs of selected transcripts. Transcripts were clustered into different subgroups and assigned a node score value, which was further used to calculate the association within clusters. Apart from BLAST2GO (which used INTERPRO and PANTHER for analysis), we separately tested our miRNA target-associated GO terms by using GORILLA and WebGestalt GSAT. The results obtained were consistent with those of BLAST2GO, and no such biased results were found.
The transcripts involved in BPs in JH, such as the phosphatidylinositol metabolic process, transcription from RNA polymerase II promoter, isoprenoid biosynthesis process, oxidation reduction process, and positive regulation of the transcription elongation factor RNA polymerase II promoter, were filtered on the basis of a high GO node score (Fig. 3A). In JV, we observed more number of transcripts involved in the terpenoid biosynthesis process, aerobic respiration, tricarboxylic acid cycle, oxidative reduction process, citrate metabolic processes, organic substance biosynthetic process, and macromolecule biosynthetic processes (Fig. 3B). To cross-check and verify the role of the aforementioned BPs, we performed literature mining. According to the literature, processes involved in JH were more generalised compared with those involved in JV32,33,34,35,36,37,38. Additionally, we found that the processes mentioned in JV were involved in stress-mediated conditions to produce more energy so that plants can sustain in unfavourable conditions39,40,41.
To identify the elementary activities of genes at the molecular level, a node score was assigned to selected transcripts. Catalytic activity, transferase activity, transferase acyl activity, organic cyclic compound binding, heterocyclic binding, and nucleic acid binding were observed in JH (Fig. 4A). Response to oxidative stress, primary metabolic process, organic substance metabolic process, protein ubiquitination, transcription biosynthetic process, carbon fixation, oxidation–reduction process, and tricarboxylic acid cycle were observed in JV (Fig. 4B). A literature search of the stated MFs indicated that JH-associated functions were specific to normal conditions. However, functions relevant to JV appeared to be associated with stress and host–pathogen interactions17,31,42. A study of CCs in both the conditions revealed no such differential role of transcripts; however, intrinsic and integral components of the membrane were found to be uniquely present in JV (Fig. 5A,B), which indicated alteration in the cell membrane due to viral infection43.
PCC and Spearman’s correlation coefficient were used to evaluate the association between differentially expressed miRNA targets. However, only PCC showed a strong association within transcripts according to biological relevance. Unique miRNAs and their targets were excluded from this analysis, and only common miRNA targets were selected to observe the overall behaviour of transcripts in JH and JV conditions. As shown in Fig. 1C, miRNA target transcripts showed a significant higher expression in the JV condition than in the JH condition, and the same highly expressed miRNA target transcripts were considered for the co-expression network analysis.
The analysis of PCC results revealed10genes, namely CMO (miR-156 and miR-157), RPS5 (miR-4995), RPL9 (miR-5658), EIF5 (miR-5658 andmiR-5021), MVA1, PPC, PSAX, SUMO, CYP707A1, and DXS (miR-5021), which were further considered inco-expression network construction (Supplementary Figs 11–12, Table 3). To further analyse the role of co-expressed genes, they were mapped with KEGG to identify pathways in which all these genes were involved44. CMO-associated co-expressed genes were found to be involved in the biosynthesis of secondary metabolites and carbon metabolism; RPS5-related co-expressed genes in disease resistance response; RPL9-related co-expressed genes in ribosomal machinery, EIF5-related co-expressed genes in protein processing in the endoplasmic reticulum and ubiquitin-mediated proteolysis; MVA1-related co-expressed genes in the biosynthesis of secondary metabolites, fatty acid biosynthesis, carbon metabolism, and fatty acid metabolism; PPC-related co-expressed genes in the biosynthesis of secondary metabolites, carbon metabolism, glycolysis/gluconeogenesis, and galactose metabolism; PSAX-related co-expressed genes in photosynthesis and photosynthesis-antenna proteins; SUMO-related co-expressed genes in ribosome machinery and spliceosome; CYP707A1-related co-expressed genes in plant hormone signal transduction; and DXS-related co-expressed genes in the biosynthesis of secondary metabolites, photosynthesis-antenna proteins, prohrine and chlorophyll metabolism, proteosome, and the mRNA surveillance pathway. From the results of the co-expression network construction, it can be deduced that co-expressed genes are involved in various subcellular processes, which may be controlled or altered by miRNA regulation.
The current study represents a novel framework to unravel big data analytics in terms of transcriptomics, genomics, proteomics, metabolomics, and phenomics. Additionally, this framework can be employed to decipher the regulatory mechanism by controlling elements such as miRNA, transcription factor, and cis-regulatory elements. This framework can also be used for conducting differential and comparative analysis for multi-datasets and drug-target identification. To demonstrate one of the applications of this framework, the regulatory role of miRNA targets within healthy and diseased conditions in J.curcas is used as a sample to understand big data analytics in terms of RNA-seq transcriptomics. This was achieved by constructing a bipartite network, followed by correlation analysis, GO inferred network, and finally co-expression network, to show that comparative analysis can help in identifying regulatory genes in JH and JV. The proposed framework may help experimental and computational biologists to analyse their own data and infer meaningful biological information.
Through various plant miRNA identification and network construction techniques, a remarkable amount of information has been obtained, facilitating the construction of several biological networks. However, identifying intra-network nodes that cause variations in phenotypes remain a major challenge. With the application of transcriptome-wide strategies to elucidate biological networks in multiple sublevels, the effects on phenotypic variation can be understood. Here, we also propose future experimental validation of selected targets to confirm the regulatory roles of miRNA for predicted targets. Given the large-scale availability of transcriptome data, this framework can aid in comparative analysis to decipher the key driver nodes considerably affecting a phenotype.
Two to three healthy and symptomatic virus-infected younger apical leaves were collected from different mature plants of the J. curcas genotype IC561235 from the experimental farm of Himalayan Forest Research Institute at Jwalaji, Himachal Pradesh, India. The raw data were generated using NextSeq. The raw reads were filtered using Trimmomatic (v 0.30) with a quality value of >20, and other contaminants, such as adapters, were also trimmed. The reference genome of J. curcas was downloaded from the Jatropha Genome Database (http://www.kazusa.or.jp/jatropha/). The Illumina NextSeq transcriptome data for both samples were separately mapped to the Jatropha reference genome by using BWA, version 0.7.5a (http://bio-bwa. sourceforge.net/), with default settings. The software package SAMtools (http://samtools.sourceforge.net/) was used to convert the sequence alignment/mapfile to the sorted binary alignment/map (BAM) file. The mapped reads ratio for the reference in each dataset was calculated by applying the flagstat command of SAMtools software to the BAM file. Differential raw reads from JH and JV leaf transcriptomes were retrieved from the publicly accessible repository Next Generation Sequencing and Analysis Resources http://188.8.131.52/download.php31.
The annotation of high-quality reads was performed by comparing them against the non redundant database downloaded from the National Centre for Biotechnology Information, followed by the quantification of high-quality reads from JH and JV transcriptomes. miRNA identification was performed using in-house Perl scripts by using Zhang et al.’s algorithm45. The local database of mature miRNAs based on data obtained from the miRbase46 and plant microRNA database47 was constructed, and the in-house Perl script was used to identify precursor miRNAs from transcriptomes by using the parameters of sequence similarity of 100% and an e-value cutoff of 1e−5. After removing redundant entries, miRNAs were classified into their respective families. To predict the secondary structure, we adopted the approach of the mfold software for sequences containing not more than 4 mismatches48. The parameters considered for miRNA identification were as follows: 1) selection of an RNA sequence as a candidate miRNA precursor, 2) RNA sequences should fold into an appropriate stem-loop hairpin secondary structure, 3) a mature miRNA sequence site is in one arm of the hairpin structure, 4) miRNAs should have less than seven mismatches with the opposite miRNA sequence in the other arm, 5) no loop or break in miRNA sequences,and 6) predicted secondary structures should have high negative minimum fold energies (less than or equal to −20 kcal/mol)48.
miRNA Target Prediction
The plant small RNA (psRNA) target (http://plantgrn.noble.org/psRNATarget/) analysis server was used to predict mRNA targets corresponding to identified miRNAs based on customised parameters49. Prediction analysis was performed using the option of user-submitted small RNA transcripts. The parameters considered for this analysis were as follows: 1)maximum expectation value = 3, 2) length for complementary scoring (upsize) = 20, 3) number of top target genes for each small RNA = 200, 4) target accessibility-allowed maximum energy to unpair the target site = 25, 5) flanking length around the target site for target accessibility analysis was 17 base pairs in upstream and 13 base pairs in downstream, and 6) the range of central mismatch leading to translational inhibition was 9–11 nucleotides. miRNA targets were classified into their respective miRNA families.
To cross-check psRNA-based predicted results, we performed target prediction analysis by using the TargetFinder Perl script downloaded from https://github.com/carringtonlab/TargetFinder. This script also provided the same results as those predicted using the psRNA target. Moreover, we manually compared miRNA and mRNA results by using shell scripting to evaluate the presence of false positive hits. However, all psRNA-based predictions were consistent with the results of other two methods.
miRNA–mRNA Interaction Network Analysis
Advances in network biology indicate on the fact that cellular networks are ruled by universal laws and deal with a new conceptual framework that can potentially transform our view of biology and disease pathologies50. The framework for network analysis is shown in Fig. 6.
Interaction Matrix Construction
The adjacent matrix of the miRNA target network can be represented as follows:
where i represents miRNA and j represents the association between miRNA and its targets.
Bipartite Network Construction
An undirected graph where G = (V, E) in which V can be partitioned into two sets, V1 and V2, such that (u, v) є E implies either u є V1 and v є V2 or v є V1 and u є V2 can be referred as a bipartite graph. In simple words, a network is called bipartite if its nodes can be divided into two groups in such a manner that nodes in one group are connected to nodes in the other group with no or sparse connexion existing within the same group. The directed bipartite network can be represented as shown in Fig. 1D. Three bipartite networks were constructed: healthy specific miRNA and its target, disease-specific miRNA and its target, and same miRNA in healthy and diseased conditions but different targets across healthy and diseased conditions. In addition, we examined the GO of selected nodes to understand the association between JH and JV.
GO enrichment inferred network
GO analysis deals with three components, namely BPs, MFs, and CCs. BLAST2GO51 was used to link selected transcripts to map with the GO database in terms of BPs, MFs, and CCs. Scoring function can be defined as follows:
desc(g) represents all the descendant terms for a given GO term g
dist(g, ga) represents the number of edges between the GO term g and the GO term ga
g represents the element of GO, where GO is the whole set of all GO terms
gp(g) represents the number of gene outcomes given to a given GO term g
The transcripts that belonged to the same category were clustered. A node score function was defined for all transcripts targeted by miRNAs in JH and JV tissues. Transcripts that had the same score were clustered in the same rectangle. Interconnection from one cluster to another cluster was performed on the basis of their respective association based on the node score.
Degree and Correlation Analysis
The degree of a node in an undirected graph is the number of connexions or edges a node has with other nodes, and it is defined as deg(i) = k(i) = |N(i)| where N(i) is the number of the neighbours of node i. The degree distribution p(k) reveals the fraction of vertices with degree k. To find the correlation between constructed bipartite networks, Pearson’s correlation analysis was performed. PCC measures the linear correlation (r) between two variables.
where ji and ki are the degrees of targets at both the ends of the ith connexion, and M represents total connexions in the network.
A Perl script was used to calculate Pearson’s correlation value for each pair of the identical transcript in healthy and diseased conditions on the basis of FPKM values by using R package DESeq26. Only those transcripts that were targeted by miRNAs in both the conditions were considered for analysis and further used for co-expression network construction.
Co-expression Network Reconstruction
A co-expression network is an undirected graph, with every node representing a gene or transcript and every edge representing the connection between these nodes. In this study, we used an in-house Perl script to calculate gene co-expression; we calculated various scores, assigned weights to each score, and finally generated a combined score. Out of total score i.e. 1, weights were assigned to every parameter; transcript abundance (score 0.4), GO (score = 0.3), PCC (score = 0.3). Based on combined score, we considered top 20 genes associated with query gene and plotted network in Cytoscape. Systematic workflow for co-expression analysis is shown in Fig. 7.
Change History: A correction to this article has been published and is linked from the HTML version of this paper. The error has been fixed in the paper.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Authors thank Dr. Sarika Jalan (Indian Institute of Technology, Indore) for helping in bipartite network reconstruction; Dr. Raghu M. Yennamalli (Jaypee University of Information Technology, Solan) for helping out in co-expression network reconstruction; Prabhat Thakur for technical help and serving in revision of the manuscript; Dr. Archit Sood (CSIR-Institute of Himalayan Bioresource Technology, Palampur) for Jatropha associated pathway analysis; Pawan Verma (Maharishi Markandeshwar University, Ambala) and Ira Vashisht (Jaypee University of Information Technology, Solan) for miRNA identification and target correlation analysis. Authors also acknowledge financial support from the Department of Biotechnology (DBT), Govt. of India, to RSC in the form of a R&D project on Jatropha curcas.