Inference of Large-scale Time-delayed Gene Regulatory Network with Parallel MapReduce Cloud Platform

Yang, Bin; Bao, Wenzheng; Huang, De-Shuang; Chen, Yuehui

doi:10.1038/s41598-018-36180-y

Download PDF

Article
Open access
Published: 12 December 2018

Inference of Large-scale Time-delayed Gene Regulatory Network with Parallel MapReduce Cloud Platform

Bin Yang¹,
Wenzheng Bao ORCID: orcid.org/0000-0002-1471-5432²,
De-Shuang Huang³ &
…
Yuehui Chen⁴

Scientific Reports volume 8, Article number: 17787 (2018) Cite this article

1337 Accesses
11 Citations
Metrics details

Subjects

Abstract

Inference of gene regulatory network (GRN) is crucial to understand intracellular physiological activity and function of biology. The identification of large-scale GRN has been a difficult and hot topic of system biology in recent years. In order to reduce the computation load for large-scale GRN identification, a parallel algorithm based on restricted gene expression programming (RGEP), namely MPRGEP, is proposed to infer instantaneous and time-delayed regulatory relationships between transcription factors and target genes. In MPRGEP, the structure and parameters of time-delayed S-system (TDSS) model are encoded into one chromosome. An original hybrid optimization approach based on genetic algorithm (GA) and gene expression programming (GEP) is proposed to optimize TDSS model with MapReduce framework. Time-delayed GRNs (TDGRN) with hundreds of genes are utilized to test the performance of MPRGEP. The experiment results reveal that MPRGEP could infer more accurately gene regulatory network than other state-of-art methods, and obtain the convincing speedup.

NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks

Article Open access 23 November 2022

Genome-wide promoter responses to CRISPR perturbations of regulators reveal regulatory networks in Escherichia coli

Article Open access 16 September 2023

Learning complex dependency structure of gene regulatory networks from high dimensional microarray data with Gaussian Bayesian networks

Article Open access 04 November 2022

Introduction

Inferring gene regulatory network (GRN) is the primary and important biochemical network, which contains the regulatory relationships among genes, proteins and small molecules^1,2. To infer and analyze gene regulatory network could understand the intracellular physiological activity and function of biology, interaction in the pathway and how to make the organism change^3,4,5. Time delay is a very important characteristic in biological regulation mechanism, especially for regulation process^6,7. The proteins translated by transcription factor (TF) regulate the target gene. This regulation process requires a time lag, which involves the regulation of protein translation, folding, nuclear transport, turnover, and the extension of the target mRNA^8,9. Thus time-delayed factor is critical to gene regulation process. Inferring time-delayed GRN (TDGRN) is one of the major hotspots in system biology¹⁰.

To design gene regulatory network modeling methods need to consider time-delayed factor. The time-delayed versions of GRN modeling methods have been proposed. Li et al. proposed a unified approach based on time-delayed correlation algorithm for design of time-delayed gene expression matrix and inference of TDGRN¹¹. Ngom et al. proposed a new extending version of Bayesian network, namely Max-Min high-order dynamic Bayesian network, to model the time lags between TFs and target genes¹². Chueh and Lu presented a new method based on time-delay Boolean networks to infer biological pathways¹³. Kordmahalleh et al. proposed a hierarchical recurrent neural network (HRNN) to identify TDGRN with time series gene expression data¹⁴.

To understand deeply the specific mathematical relationships between TFs and target genes, differential equation model was proposed to infer GRN^15,16,17,18. Some research added time-delayed factor into differential equation model for TDGRN inference. Chowdhury et al. presented time-delayed S-System (TDSS) model to identify simultaneously both instantaneous and time-delayed interactions of TDGRN¹⁹. But in Chowdhury’s method, differential evolution (DE) algorithm was utilized to optimize all parameters in a TDSS model, and the computing load is very large for the large-scale GRN. In order to reduce computing load, we proposed restricted gene expression programming (RGEP) and particle swarm optimization (PSO) to evolve the TDSS model²⁰. This method could select TFs automatically and the number of optimized parameters is reduced greatly. However the execution time is still unacceptable for GRN inference with hundreds of genes^21,22,23. Parallel technology is urgently needed to decrease the computing cost of the algorithm.

MapReduce framework as a parallel programming model is utilized for parallel computation over the past few years^24,25,26. Recently many methods based on the MapReduce model have been widely applied in various fields, especially in bioinformatics^{27,28,29,30,31,32}. Hu et al. presented a modified variable-length associative sequential pattern discovery (VLASPD) method based on MapReduce model for large-scale protein-protein interactions (PPI) forecasting³³. Abduallah et al. proposed a new MapReduce algorithm based on information-theoretic approach to infer GRN in a cloud environment³⁴. You et al. e presented a parallel support vector machine (SVM) model based on MapReduce framework to predict the large-scale PPI with the information of protein sequences³⁵.

In order to decrease the computing cost of large-scale TDGRN identification, this paper proposes a novel MapReduce-based parallel restricted gene expression programming (MPRGEP) algorithm for TDSS model identification. In order to evolve the structure and parameters of TDSS model simultaneously, the structure and parameters are encoded as a chromosome in MPRGEP algorithm. According to partition number, split chromosome population over a cloud computing system’s nodes. At each cloud computing node, sub population is optimized iteratively by a novel hybrid evolutionary method based on gene expression programming and genetic algorithm. Then merge them to save as offsprings.

Method

Mapreduce overview

Storage, pretreatment and analysis of biological high-throughput sequencing data have gradually become the main bottleneck of system biology research^36,37,38. Hadoop has provided a new solution for big data processing. Hadoop is open-source distributed computing system based on Hadoop Distributed File System (HDFS) and MapReduce framework, and applied to the storage, management and analysis of massive data^39,40,41. HDFS is distributed file system, which is utilized to store massive data. MapReduce model is a software framework for big data processing in parallel. MapReduce framework is completed by Map and Reduce operation units, which is described in Fig. 1. In Map phase, input data could be divided into m data blocks. Computing nodes calculate Map function in parallel. The pair output $ < $key, value> of Map function is stored in each computing node. In Reduce phase, all the intermediate results are combined according to key values and generate the final output, which are stored in HDFS.

MapReduce-based restricted gene expression programming algorithm

Time-Delayed S-system

Due that time-delayed S-system has high accuracy and flexibility, and contains time-delayed factors, which is suitable for modeling time-delayed systems. t-th nonlinear time-delayed differential equation in TDSS model is described as followed⁴².

$$\frac{d{X}_{i}}{dt}={\alpha }_{i}\prod _{j=1}^{N}{X}_{j,t-{\tau }_{{g}_{ij}}}^{{g}_{ij}}-{\beta }_{i}\prod _{j=1}^{N}{X}_{j,t-{\tau }_{{h}_{ij}}}^{{h}_{ij}},\,i=1,2,\ldots N.$$

(1)

Where ${X}_{j,t-{\tau }_{{g}_{ij}}}^{{g}_{ij}}$ is the expression level of gene X_j at $t-{\tau }_{{g}_{ij}}$ time point, ${\tau }_{{g}_{ij}}$ and ${\tau }_{{h}_{ij}}$ are the time-delayed factors, N is the total number of genes in TDGRN, α_i and β_i are rate constants of production function and consumption function, g_ij and h_ij are kinetic orders.

Chromosome of restricted gene expression programming

In order to better represent and evolve TDSS model, the restricted version of GEP (RGEP) was presented⁴³. In RGEP each chromosome of RGEP contains only two genes. An example of RGEP chromosome is described in Fig. 2. The subtraction operator (−) is utilized to connect two genes. Each gene contains head part and tail part, which are created randomly using function set (F) and variable set (T).

$${I}_{1}=F\cup T=\{{}^{\ast }1,\,{}^{\ast }2,\,{}^{\ast }3,\,\ldots ,\,{}^{\ast }n\}\cup \{{x}_{1},\,{x}_{2},\,\ldots ,\,{x}_{m},\,R\}.$$

(2)

Where *n represents the multiplication of n operands, ${x}_{i}(i=1,\,2,\,\ldots m)$ represents the input variable and R denotes constant.

In each gene, the symbols of head part can be selected from set I₁ randomly. The tail part is created randomly with variable set T only. In advance, the head length (h) is specified for the problems solved. The tail length (t) is calculated according to $h$.

$$t=(n-1)\times h+1$$

(3)

Where n represents the largest number of the operands of functions in set F.

TDSS model has three kinds of parameters: rate constants (α_i and β_i), kinetic orders (g_ij and h_ij) and time-delayed factor (${\tau }_{{g}_{ij}}$ and ${\tau }_{{h}_{ij}}$), so we add these parameters into the chromosome in RGEP. As shown in Fig. 2, gene1 and gene2 are given α_i and β_i, respectively. In each gene, kinetic order (g_ij or h_ij) and time-delayed factor (${\tau }_{{g}_{ij}}$ or ${\tau }_{{h}_{ij}}$) need to be allocated to each terminal node.

Figure 3 describes the arithmetic expression trees (ETs) of Fig. 2. Its decoding differential equation expression is shown as follows.

$$\frac{d{x}_{i}}{dt}={\alpha }_{i}{x}_{3,t-{\tau }_{{g}_{i1}}}^{{g}_{i1}}{x}_{1,t-{\tau }_{{g}_{i2}}}^{{g}_{i2}}{x}_{2,t-{\tau }_{{g}_{i3}}}^{{g}_{i3}}-{\beta }_{i}{x}_{2,t-{\tau }_{{h}_{i1}}}^{{h}_{i1}}{x}_{4,t-{\tau }_{{h}_{i2}}}^{{h}_{i2}}{x}_{1,t-{\tau }_{{h}_{i3}}}^{{h}_{i3}}{x}_{3,t-{\tau }_{{h}_{i4}}}^{{h}_{i4}}.$$

(4)

Hybrid evolutionary method

In order to search the optimal TDSS model, an original hybrid optimization approach based genetic algorithm^44,45,46 and gene expression programming^47,48,49 is proposed in REGP. The structure and parameters in a TDSS model need to be optimized, which are shown in Fig. 3. In our hybrid evolutionary method, two genes of RGEP and parameters are encoded into one chromosome, which is depicted in Fig. 4.

One chromosome contains three kinds of encoding forms. In Fig. 4, gene1 and gene2 are structure-based encoding, rate constants (α_i and β_i) and kinetic orders (g_ij and h_ij) are real-based encoding, and time-delayed factors (${\tau }_{{g}_{ij}}$ and ${\tau }_{{h}_{ij}}$) are binary-based encoding. Single evolution strategy could not reach the optimization purpose, so a hybrid evolutionary method is utilized to reproduce the chromosomes.

(1) Mutation. Mutation probability p_m is defined in advance. According to the encoding case, three mutation strategies are utilized, which are introduced as followed.

(1)
Structure-based mutation

Single-point mutation. The symbols in the head part could be changed to any symbol, which is selected from set I₁ randomly. The symbols in the tail part can only be changed into a symbol from variable set T. Therefore, single-based mutation could create the legal offspring.
Single-gene mutation. One gene in a chromosome is selected by random, which is replaced by the new gene.
Change all the variables. All terminal symbols in the structure-coding region are replaced with another terminal symbols.

(2)
Real-based mutation

For each real value $X$ in the real-coding region, create a real value r in the interval [0, 1] randomly. If r < p_m, real value X could be mutated with the following Equation.
$$X^{\prime} =X+\delta .$$
(5)
Where δ is Gaussian random value.
(3)
Binary-based mutation

For each binary value in the binary-coding region, generate a real value r in the interval [0, 1] randomly. If r < p_m, the corresponding binary value is inverted.

(2) Crossover. According to the encoding case, three crossover strategies are utilized. First two parents (X and Y) are chosen with the crossover probability p_c, which is defined in advance.

(1)
Structure-based crossover

Single-point recombination. A random point is selected from the structure coding region. Exchange the symbol operators of two parents, which are after this point.
Single-gene recombination. Two random genes chosen from two parents are swapped.

(2)
Real-based crossover

Two parents (X and Y) implement crossover operator with following Equation.
$$X^{\prime} =X+\gamma (X-Y).$$
(6)
$$Y^{\prime} =Y-\gamma (X-Y).$$
(7)
Where $\gamma =0.99\,{\gamma }^{t}.$ γ is a variable related to iteration number t. This strategy can change the individuals with a wide range in the early stage of optimization, and protect the better individuals in the later stage.
(3)
Binary-based crossover

Single-point crossover

A binary point in the binary-coding region is selected randomly. The binary symbols before the point selected are exchanged in order to create the new offsprings.
Two-point crossover

Select two points in the binary-coding region randomly. The binary string between two points is exchanged between parents.

(3) Selection method. Roulette sampling algorithm is proposed to select the chromosomes to be copied into the next generation according to the fitness values.

Flowchart of time-delayed gene regulatory network inference

Inference flowchart of TDGRN with $n$ genes $({G}_{1},\,{G}_{2},\,\ldots ,\,{G}_{n})$ is depicted in Fig. 5. Decomposition strategy is utilized. From G₁ to G_n, regulatory relationships of each gene are identified by optimizing the TDSS models.

(1)
Initialize population $({p}_{1},\,{p}_{2},\,\ldots ,\,{p}_{m})$ containing structure and parameters. The chromosome structure is described in Fig. 5.
(2)
The fitness values of all the chromosomes are calculated. If the optimal model is found, stop; otherwise go to (3).
(3)
The hybrid evolutionary method is utilized to create the offsprings, which is introduced in Section 2.2.3. According to encoding type, select different crossover and mutation strategies. Go to (2).

Through the optimized TDSS model, gain the regulatory relationships of each gene. Finally the regulatory relationships of all genes constitute gene regulatory network.

MapReduce-based hybrid evolutionary method

To infer large-scale gene regulatory network and reduce high computation load, our hybrid evolutionary method based on Hadoop MapReduce framework is proposed. This framework distributes evolutionary tasks to Map and Reduce modules. Figure 6 shows the hybrid evolutionary framework with the Hadoop MapReduce model.

(1)
Input data. The input data are stored on the HDFS, which contain two types of data. The first type of data is chromosome information including the structure and parameters. The second type of data is the fitness value of the corresponding chromosome.
(2)
Map phase. Each computation node can operate in Map phase independently, without waiting for other nodes. The task of computing node is to calculate the fitness value f_i of the i-th chromosome. The fitness values of all chromosomes are accumulated to obtain sum_f for selection operation. According to the input file, the framework divides the chromosome population into computation nodes (Mappers) in order to achieve parallel computing. In order to realize the crossover operation between chromosomes, we randomly divide the population into different partitions. The chromosomes with the same partition id could implement crossover operator. The number of partitions k is defined in advance. The partition id of chromosome $partition\_id$ is generated randomly, which is set as the key output of Map phase. The chromosome, fitness f_i and total fitness value sum_f are set as the value output of Map stage.
(3)
Reduce phase. The input data in Reduce phase are from Map phase. After the complete execution of the corresponding Map nodes, the Reduce phase can be executed. In the Reduce phase, the chromosomes with the same $partition\_id$ are collected into a group, obtaining a sub population. The optimization tasks of sub population are distributed to the same computational node (Reducer). With f_i and sum_f, roulette sampling algorithm is utilized to create the offsprings. The individuals in sub population could implement crossover and mutation operator. The gained sub offsprings and fitness values are written to output file of the Reduce phase in order to update the input data on the HDFS. If the number of iterations reaches the termination condition, the algorithm is terminated; otherwise, go to the Map phase.

Experiments

Our proposed parallel algorithm MPRGEP is implemented on MapReduce framework. The hadoop version is 2.6.2 and hadoop cluster consists of one master and 30 slaves. The infrastructure hardware of all nodes is comprised of 3.5 GHz Intel Xeon E5–1620 CPU, 4GB DDR2, and Linux CentOS 6.4 (64-bits). The nodes are connected by local area network with transmission speed of 1,000 Mbps. Three criterions are utilized to evaluate the performance of MPRGEP.

$${S}_{n}=\frac{TP}{TP+FN}.$$

(8)

$${S}_{p}=\frac{TN}{FP+TN}.$$

(9)

$$Speedup=\frac{runtime(Single\,node)}{runtime(cluster)}.$$

(10)

Where TP, FN, FP and TN are presented in Fig. 7.

Artificial dataset

In this part, the parameters of experiments are shown in Table 1, which are selected empirically. The used function set is {*2, *3, *4, *5}. The first artificial dataset is from a 30-gene time-delayed GRN, which is shown in Fig. 8 ^19,20. Kimura’s method (S-system model based on decomposition strategy and a cooperative coevolutionary algorithm)²¹, DBN (dynamic Bayesian network learned by the likelihood maximization)²² and TDSS (time-delayed S-system model based on PSO)²³ are also applied for 30-gene artificial TDGRN identification. The averaged performance results of four inferred algorithms are represented in Table 2. From Table 2, it could be seen that MPRGEP has a higher sensitivity (S_n) than other three methods, which reveals that our method can infer more true-positive regulatory relationship. MPRGEP could identify less false-positive regulatory relationships.

Table 1 Parameters in this experiment.

Full size table

Table 2 Experiment results for 30-gene artificial TDGRN.

Full size table

The open-source software GeneNetWeaver 3.1 is utilized to generate three yeast S.cerevisiae sub gene regulatory networks with 50 genes, 100 genes and 150 genes, respectively. Time-delayed regulatory relationships are created randomly and time-delayed values are selected from [0, 3]. Three time-delayed gene regulatory networks are described in Table 3. Initial conditions are randomly generated. For each network, 10 time-series datasets are generated and each dataset contains 21 time points from 0 to 20.

Table 3 Description of three time-delayed gene regulatory networks.

Full size table

Our method is executed in the single machine and computing clusters with 20 computing nodes, respectively. Through several runs, the averaged performances are listed in Table 4. From the inference results, we know that MPRGEP not only can solve large-scale time-delayed gene regulatory network, but also perform well in terms of S_n and S_p. Table 4 also reveals that MapReduce framework could reduce running time of GRN inference, which makes it possible to identify large-scale GRN with more genes.

Table 4 Performance of three TDGRNs by running MPRGEP.

Full size table

In order to validate the parallel computing performance, MPRGEP algorithm is utilized to identify three above time-delayed GRNs in three computing clusters with 10, 20 and 30 nodes, respectively. The runtime and speedup performance are depicted in Figs 9 and 10. From Fig. 9, it could be seen clearly that as the number of genes rises, the running time also rises. With the increment of computing nodes, the running time decreases. Figure 10 shows that as the number of computing nodes increases, our proposed parallel algorithm accelerates significantly. The best speedup performance of MPRGEP is the case that MPRGEP is run on 30 computing nodes to infer GRN with 150 genes. The speedup curve is not linear because of serial bottlenecks and infrastructure barriers in MapReduce framework.

In MPRGEP, the computational tasks of hybrid evolutionary algorithm are mainly concentrated in the Reduce phase. The sub population with the same partition id will be assigned to the same Reduce for optimization. If the number of Reducers is fixed in advance, the number of partitions can affect the speed of parallel computation. We make the experiments with three partition numbers, 1, 200 and 1000. Node number in the computing cluster is set as 20. The running time is depicted in Fig. 11. From the result, we can see that the hybrid evolutionary algorithm performs best when partition number is set as 200. When the partition number is 1, the sub population contains all the population and is optimized in one Reducer. Parallel strategy doesn’t work. When the number of partition number is given to 1000, the number of sub populations is too large. In this case, more Reducers are needed. The allocation and merging of resources could waste more time.

Real biology dataset

In this part, the dataset is from the Gene Expression Omnibus (GEO) at http://www.ncbi.nlm.nih.gov/geo/ (GEO accession: GSE30052)^34,50. This dataset contains 5,744 probe sets, 10,928 genes and 49 time points. In order to validate the parallel performance of MPRGEP, one subset from this dataset is extracted, containing 500 genes. The experiment is executed in the computing clusters with 20 nodes. The parameters are also from Table 1. The running results are described in Fig. 12. From Fig. 12, it can be seen that our method could be accelerated evidently.

Discussion and Conclusion

With the rapid development of biotechnology, gene regulatory networks inferred contain more genes, so there is necessity for developing advanced computational algorithm to infer gene regulatory network with gene expression data. This paper proposes time-delayed S-system model to model instantaneous and time-delayed regulation interactions in time-delayed gene regulatory network. A novel MapReduce-based parallel restricted gene expression programming (MPRGEP) algorithm is utilized for TDSS model identification. The experiment results reveal that our parallel algorithm is promising in terms of accuracy and speedup when used to infer large-scale TDGRN.

References

Kaern, M., Blake, W. J. & Collins, J. J. The engineering of gene regulatory networks. Annu Rev Biomed Eng. 5, 179–206 (2003).
Article CAS PubMed Google Scholar
Park, J., Ogunnaike, B., Schwaber, J. & Vadigepalli, R. Identifying functional gene regulatory network phenotypes underlying single cell transcriptional variability. Prog Biophys Mol Bio. 117, 87–98 (2015).
Article CAS Google Scholar
Schlitt, T. & Brazma, A. Current approaches to gene regulatory network modeling. BMC Bioinformatics. 8, S9 (2007).
Article PubMed CAS PubMed Central Google Scholar
Madhamshettiwar, P. B., Maetschke, S. R., Davis, M. J., Reverter, A. & Ragan, M. A. Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets. Genome Med. 4, 41 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yang, B. et al. HSCVFNT: Inference of Time-Delayed Gene Regulatory Network Based on Complex-Valued Flexible Neural Tree Model. Int. J. Mol. Sci. 19, 3178 (2018).
Article PubMed Central Google Scholar
Parmar, K., Blyuss, K. B., Kyrychko, Y. N. & Hogan, S. J. Time-Delayed Models of Gene Regulatory Networks. Comput Math Methods Med. 2015, 1–16 (2015).
Article MathSciNet MATH Google Scholar
Wang, G., Yin, L., Zhao, Y. & Mao, K. Efficiently mining time-delayed gene expression patterns. IEEE Trans Syst Man Cybern B Cybern. 40, 400–11 (2010).
Article PubMed Google Scholar
Orosz, G., Moehlis, J. & Murray, R. M. Controlling biological networks by time-delayed signals. Philos Trans A Math Phys Eng Sci. 368, 439–54 (2010).
Article ADS MathSciNet PubMed MATH Google Scholar
Chaturvedi, I. & Rajapakse, J. C. Detecting robust time-delayed regulation in Mycobacterium tuberculosis. BMC Genomics. 10, S28 (2009).
Article PubMed CAS PubMed Central Google Scholar
Huang, T. et al. Using GeneReg to construct time delay gene regulatory networks. BMC Res Notes. 3, 142 (2010).
Article PubMed CAS PubMed Central Google Scholar
Li, X. et al. Discovery of Time-Delayed Gene Regulatory Networks based on temporal gene expression profiling. BMC Bioinformatics. 7, 26 (2006).
Article PubMed CAS PubMed Central Google Scholar
Li, Y., Chen, H., Zheng, J. & Ngom, A. The Max-Min High-Order Dynamic Bayesian Network for Learning Gene Regulatory Networks with Time-Delayed Regulations. IEEE/ACM Trans Comput Biol Bioinform. 13, 792–803 (2016).
Article PubMed Google Scholar
Chueh, T. H. & Lu, H. H. S. Inference of Biological Pathway from Gene Expression Profiles by Time Delay Boolean Networks. PLoS One. 7, e42095 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Kordmahalleh, M. M., Sefidmazgi, M. G., Harrison, S. H. & Homaifar, A. Identifying time-delayed gene regulatory networks via an evolvable hierarchical recurrent neural network. BioData Min. 10, 29 (2017).
Article PubMed PubMed Central Google Scholar
Cao, J., Qi, X. & Zhao, H. Modeling gene regulation networks using ordinary differential equations. Methods Mol Biol. 802, 185–97 (2012).
Article CAS PubMed Google Scholar
Gebert, J., Radde, N. & Weber, G. W. Modeling gene regulatory networks with piecewise linear differential equations. European Journal of Operational Research. 181, 1148–1165 (2007).
Article MathSciNet MATH Google Scholar
Sakamoto, E. & Iba, H. Identifying gene regulatory network as differential equation by genetic programming. Genome Informatics. 11, 281–283 (2000).
Google Scholar
Wu, H., Lu, T., Xue, H. & Liang, H. Sparse Additive Ordinary Differential Equations for Dynamic Gene Regulatory Network Modeling. J Am Stat Assoc. 109, 700–716 (2014).
Article MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Chowdhury, A. R., Chetty, M. & Vinh, N. X. Incorporating time-delays in S-System model for reverse engineering genetic networks. BMC Bioinformatics. 14, 196 (2013).
Article PubMed PubMed Central Google Scholar
Yang, B., Zhang, W., Wang, H. F., Song, C. D. & Chen, Y. H. TDSDMI: Inference of time-delayed gene regulatory network using S-system model with delayed mutual information. Computers in Biology and Medicine. 72, 218–225 (2016).
Article PubMed Google Scholar
Kimura, S., Ide, K. & Kashihara, A. Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics. 21, 1154–1163 (2005).
Article CAS PubMed Google Scholar
Perrin, B. E. et al. Gene networks inference using dynamic Bayesian networks. Bioinformatics. 19, 138–148 (2003).
Article MathSciNet Google Scholar
Yang, B., Zhang, W., Yan, X. F. & Liu, C. X. Reverse engineering of time-delayed gene regulatory network using restricted gene expression programming. Advances in Intelligent Systems and Computing. 420, 155–165 (2016).
Article Google Scholar
Babu, S. Towards automatic optimization of MapReduce programs. Acm Symposium on Cloud Computing. 137–142 (2010).
Dean, J. & Ghemawat, S. MapReduce: A Flexible Data Processing Tool. Communications of the Acm. 53, 72–77 (2010).
Article Google Scholar
Liu, Y. et al. MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning. Comput Intell Neurosci. 2015, 297672 (2015).
PubMed PubMed Central Google Scholar
Vasciaveo, A. et al. A cloud-based approach for Gene Regulatory Networks dynamics simulations. 4th Mediterranean Conference on Embedded Computing. 72–76 (2015).
Langmead, B., Hansen, K. D. & Leek, J. T. Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biology. 11, R83 (2010).
Article PubMed PubMed Central Google Scholar
Li, Z. et al. Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework. PLoS One. 10, e0116781 (2015).
Article PubMed CAS PubMed Central Google Scholar
Liao, R., Zhang, Y., Guan, J. & Zhou, S. CloudNMF: a MapReduce implementation of nonnegative matrix factorization for large-scale biological datasets. Genomics Proteomics Bioinformatics. 12, 48–51 (2014).
Article PubMed Google Scholar
Kumar, M., Rath, N. K. & Rath, S. K. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier. J Biomed Inform. 60, 395–409 (2016).
Article PubMed Google Scholar
Mohammed, E. A., Far, B. H. & Naugler, C. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. BioData Min. 7, 22 (2014).
Article PubMed PubMed Central Google Scholar
Hu, L., Yuan, X., Hu, P. & Chan, K. C. C. Efficiently predicting large-scale protein-protein interactions using MapReduce. Comput Biol Chem. 69, 202–206 (2017).
Article CAS PubMed Google Scholar
Abduallah, Y. et al. MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach. Biomed Res Int. 2017, 1–8 (2017).
Article CAS Google Scholar
You, Z. H., Yu, J. Z., Zhu, L., Li, S. & Wen, Z. K. A MapReduce based parallel SVM for large-scale predicting protein-protein interactions. Neurocomputing. 145, 37–43 (2014).
Article Google Scholar
Wade, J. T. Mapping Transcription Regulatory Networks with ChIP-seq and RNA-seq. Adv Exp Med Biol. 883, 119–34 (2015).
Article CAS PubMed Google Scholar
Finotello, F. & Di Camillo, B. Measuring differential gene expression with RNA-seq: challenges and strategies for data analysis. Brief Funct Genomics. 14, 130–42 (2015).
Article CAS PubMed Google Scholar
Liu, Y., Zhou, J. & White, K. P. RNA-seq differential expression studies: more sequence or more replication? Bioinformatics. 30, 301–4 (2014).
Article CAS PubMed Google Scholar
White, T. Hadoop: the definitive guide 15–362 (O’Reilly Media, Inc., 2009).
Shvachko, K., Kuang, H., Radia, S. & Chansler, R. The Hadoop Distributed File System. IEEE 26th Symposium on Mass Storage Systems and Technologies. 1–10 (2010).
Taylor, R. C. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. Bmc Bioinformatics. 11, S1 (2010).
Article MathSciNet PubMed PubMed Central Google Scholar
Chowdhury, A. R., Chetty, M. & Vinh, N. X. Reverse Engineering Genetic Networks with Time-Delayed S-System Model and Pearson Correlation Coefficient. Lecture Notes in Computer Science. 8227, 624–631 (2013).
Article Google Scholar
Yang, B., Liu, S. & Zhang, W. Reverse engineering of gene regulatory network using restricted gene expression programming. J Bioinform Comput Biol. 14, 1650021 (2016).
Article CAS PubMed Google Scholar
Herrera, F., Lozano, M. & Verdegay, J. L. Tackling Real-Coded Genetic Algorithms: Operators and Tools for Behavioural Analysis. Artificial Intelligence Review. 12, 265–319 (1998).
Article MATH Google Scholar
Goldberg, D. E. Genetic Algorithm in Search Optimization and Machine Learning 30–254 (Addison-Wesley Longman Publishing Co., Inc, 1989).
Gai, K., Qiu, M. & Zhao, H. Cost-Aware Multimedia Data Allocation for Heterogeneous Memory Using Genetic Algorithm in Cloud Computing. IEEE Transactions on Cloud Computing. 99, 1–1 (2016).
Article Google Scholar
Ferreira, C. Gene Expression Programming: a New Adaptive Algorithm for Solving Problems. Computer Science. 21, 87–129 (2001).
MathSciNet MATH Google Scholar
Zhang, Y. et al. Using gene expression programming to infer gene regulatory networks from time-series data. Comput Biol Chem. 47, 198–206 (2013).
Article ADS MathSciNet CAS PubMed Google Scholar
Tang, L., Yang, C. & Li, W. Adopting gene expression programming to generate extension strategies for incompatible problem. Neural Computing & Applications. 28, 1–16 (2016).
Article Google Scholar
Chin, S. L., Marcus, I. M., Klevecz, R. R. & Li, C. M. Dynamics of oscillatory phenotypes in Saccharomyces cerevisiae reveal a network of genome-wide transcriptional oscillators. FEBS Journal. 279, 1119–1130 (2012).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of China (No. 61702445), Shandong Provincial Natural Science Foundation, China (No. ZR2015PF007), the PhD research startup foundation of Zaozhuang University (No. 2014BS13), and Zaozhuang University Foundation (No. 2015YY02).

Author information

Authors and Affiliations

School of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
Bin Yang
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
Wenzheng Bao
Institute of Machine Learning and Systems Biology, Tongji University, Shanghai, China
De-Shuang Huang
School of Information Science and Engineering, University of Jinan, Jinan, China
Yuehui Chen

Authors

Bin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenzheng Bao
View author publications
You can also search for this author in PubMed Google Scholar
De-Shuang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yuehui Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.Y. conceived the method. Y.C. and D.H. designed the method. W.B. conducted the experiments and wrote the main manuscript text. All authors reviewed the manuscript.

Corresponding author

Correspondence to Wenzheng Bao.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, B., Bao, W., Huang, DS. et al. Inference of Large-scale Time-delayed Gene Regulatory Network with Parallel MapReduce Cloud Platform. Sci Rep 8, 17787 (2018). https://doi.org/10.1038/s41598-018-36180-y

Download citation

Received: 09 August 2018
Accepted: 16 November 2018
Published: 12 December 2018
DOI: https://doi.org/10.1038/s41598-018-36180-y

Keywords

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.