Protocol


Nature Protocols 2, 685 - 694 (2007)
Published online: 29 March 2007 | doi:10.1038/nprot.2007.96

Subject Categories: Genomics and proteomics | Isolation, purification and separation | Plant biology | Computational and theoretical biology

MetaNetwork: a computational protocol for the genetic study of metabolic networks

Jingyuan Fu1, Morris A Swertz1, Joost JB Keurentjes2,3 & Ritsert C Jansen1

We here describe the MetaNetwork protocol to reconstruct metabolic networks using metabolite abundance data from segregating populations. MetaNetwork maps metabolite quantitative trait loci (mQTLs) underlying variation in metabolite abundance in individuals of a segregating population using a two-part model to account for the often observed spike in the distribution of metabolite abundance data. MetaNetwork predicts and visualizes potential associations between metabolites using correlations of mQTL profiles, rather than of abundance profiles. Simulation and permutation procedures are used to assess statistical significance. Analysis of about 20 metabolite mass peaks from a mass spectrometer takes a few minutes on a desktop computer. Analysis of 2,000 mass peaks will take up to 4 days. In addition, MetaNetwork is able to integrate high-throughput data from subsequent metabolomics, transcriptomics and proteomics experiments in conjunction with traditional phenotypic data. This way MetaNetwork will contribute to a better integration of such data into systems biology.

Top

Introduction

The genetic diversity of primary and secondary metabolites is incredibly high, notably in plants1; however, our understanding of such metabolism and its regulation is still limited2. In a recent paper3, we have made the first attempt to unravel the genetic architecture of metabolism in a model plant using "genetical metabolomics." This is a derivative of the strategy of genetical genomics4 that has been applied in recent years to the genetic study of gene expression data in a wide range of organisms5, 6, 7, 8, 9, 10, 11, 12, 13, 14. For transcriptome data, this strategy works as follows: determine gene expression (preferably genome-wide) in genetically different individuals, treat the transcript abundances of each gene over all individuals as a quantitative trait, use molecular markers to fingerprint the individuals, use quantitative trait locus (QTL) mapping to identify regulators (expression quantitative trait loci (eQTL)) and (re)-construct regulatory networks. For such network reconstruction, correlations of either transcript abundances11, 15, 16 or eQTL profiles11, 17 are applied. Keurentjes et al.3 developed and applied a similar strategy to metabolite abundance data.

Specifics of MetaNetwork

Similar to the approach used in gene expression studies, the genetic determinants of variation for metabolite abundance (mQTL) can be mapped. However, algorithms used for the analysis of transcript abundance have to be accommodated to the specifics of metabolite abundance. In the work of Keurentjes et al.3, one-third of the mass peaks segregating were not present in the parental lines, presumably caused by new allelic combinations. Likewise, many segregating mass peaks were not present in an appreciable proportion of the segregants, causing clear spikes at zero in the corresponding metabolite abundance distributions. Standard parametric approaches for QTL mapping (e.g., t-test12, ANOVA6, 7, 10, maximum likelihood13) make use of the assumption that the residual variation follows a normal distribution and departure from this assumption due to a spike can inflate errors of type I and II18. Standard non-parametric approaches for QTL mapping (Wilcoxon–Mann–Whitney test5, 14) can solve this problem, but they are less useful in consideration of multiple QTL models18. A more suitable approach is to perform QTL analysis on the binary trait defined by whether an individual has a non-zero abundance, and on the quantitative trait for those individuals who have non-zero abundance. To combine these two analyses, MetaNetwork implements a two-part parametric model18 for mQTL mapping and outputs QTL profiles (- 10log P significance values plotted at marker positions along the genome).

Network reconstruction approaches based on the correlation of transcript abundance15, 16 may also be suitable for metabolite abundance. However, whereas transcripts are translated into molecules of another type (proteins), metabolites are transformed by enzymes into molecules of the same type (other metabolites). Therefore, if one metabolite is the precursor of another metabolite, an mQTL involved in the transformation will exert reversed effects for the precursor and its successor. Counterbalancing of positive and negative effects of multiple mQTLs may make it difficult to infer associations between metabolites from abundance correlations. Metabolites in the same pathway will show similar peaks in their QTL profiles, so that a correlation analysis based on QTL profiles may overcome this problem. MetaNetwork subsequently uses such correlations to determine associations between metabolites and to re-construct metabolic networks.

Challenges in MetaNetwork

Within the context of the genetical genomics experimental space, MetaNetwork encounters numerous challenges due to the size and the scope of the data set and the complexity of metabolic networks. Testing multiplicity is obviously a general challenge in QTL mapping19. The genome-wide mapping of each of many (correlated) mass peaks can result in a large number of false positives and/or false negatives. MetaNetwork uses Storey's method20 to control false discovery rate (FDR). Candidate gene multiplicity is another challenge: an mQTL may still harbor hundreds of candidate genes21. Incorrect connections between metabolites affected by different enzymes may be predicted if the genes for those enzymes appear to colocalize on the genome. To predict or to prioritize candidates among many potential genes in a mQTL region requires additional strategies such as fine mapping and/or follow-up laboratory experiments. Appropriate information can also be derived from the use of assumedly independent (in silico) information in databases with metabolic pathway information, such as KEGG22, MetaCyc23 or AraCyc24, or data on eQTL studies, enzyme activity assays, or phenotypic data on the same segregants. Mass peak multiplicity, that is, metabolites represented by multiple mass peaks, is another challenge25. For example, a metabolite with mass m can have one or more charges and peaks can appear at masses m, m/2, m/3 and so on. Or different isotopes of this metabolite have different numbers of neutrons and peaks appearing at m + 1, m + 2, m + 3 and so on. Unfortunately, error-free assignment of different mass peaks to a single metabolite is still difficult with today's mass spectrometry methods26. However, MetaNetwork can provide important independent information to improve on this: it can predict possibly related peaks based on highly correlated mQTL profiles (r > 0.95).

Applications of MetaNetwork

To date, our MetaNetwork applications have been based on untargeted metabolite abundance data collected from recombinant inbred lines (RILs) of Arabidopsis thaliana plants using liquid chromatography–mass spectrometry technology3. It measures a large range of different metabolites mainly involved in secondary metabolism, including phenylpropanoids, flavonoids and glucosinolates27. Many of these metabolites show a spike in their abundance distribution and MetaNetwork was specifically developed to handle such data. However, the MetaNetwork protocol can equally well handle abundance data without spikes. Moreover, it can handle data obtained from other mass spectrometry techniques, such as gas chromatography–mass spectrometry28 that can detect polar primary metabolites.

In addition to mass spectrometry technologies for targeted or untargeted measuring amounts of metabolites3, 29, other high-throughput technologies for measuring amounts of other molecular entities, such as microRNAs, proteins and their post-translational modifications, are rapidly being developed30. The methodology described here is directly applicable to these and other quantitative types of data and helps biologists to understand how biological systems function.

Implementation of MetaNetwork

MetaNetwork is implemented in R, an open source software environment for statistical computing and graphics31. MetaNetwork is executed via a command line. However, users with little experience of command-line-driven applications and/or computer programming can easily run MetaNetwork using default parameter settings. An advanced user of R can change parameter settings or modify the underlying protocol, for example, by replacing the module for calculation of correlations by one for calculation of mutual information32, or the module for QTL analysis on RILs by one for QTL analysis on other types of segregating or natural populations. Future MetaNetwork releases will offer more options, for example, multiple QTL analysis33, 34 in the two-part model, combined analysis of metabolite abundance data with other types of biomolecular data11 and direct access of the R-tools to a metabolite abundance database. A seamless software infrastructure that supports MetaNetwork data management and analysis workflows is under development using code generation techniques35. For more implementation details, please consult the Supplementary Manual online.

Algorithm of MetaNetwork

The flowchart of the MetaNetwork protocol is shown in Figure 1. Given the scope of this manuscript, we will limit ourselves to the definition of the two main steps in the procedure: QTL mapping of metabolite abundances; and reconstruction of metabolic networks from correlations of QTL profiles. It should be noted that MetaNetwork does not offer data pre-processing, for example, alignment of mass peaks has to be performed by external applications such as METALIGN27.

Figure 1: MetaNetwork flowchart.
Figure 1 : MetaNetwork flowchart.

The shaded squares represent computational steps where names of R-functions are indicated between parentheses and the superscript numbers refer to steps in Box 1. The ellipses represent significance thresholds and cylinders represent biological results where the result names as R objects are indicated between accolades. The solid line represents the step that is by default "on" in MetaNetwork and the dashed line represents the step that is by default "off" in MetaNetwork.

Full size image (87 KB)

MetaNetwork detects the genetic determinants underlying variation in metabolite abundance with the help of a two-part QTL analysis. Part one tests whether the presence/absence of metabolites has a genetic basis: whether different genotype classes at a given marker differ in their numbers of non-zero observations. Part two tests whether quantitative variation in non-zero abundances has a genetic basis: whether the non-zero observations for each of these genotype classes at a given marker differ in mean abundance. The "P-value" of the QTL is computed as the product of the two "P-values" in the two parts. With binary data only (no quantitative data) or quantitative data only (no spike), the "P-value" of the missing part is set to one. These "P-values" are not yet corrected for multiple testing at many markers and also not for testing multiple metabolites. MetaNetwork can run simulation and FDR procedures20 to set an empirical threshold for the "P-values" at desired multiple-testing significance levels. MetaNetwork will output all relevant information such as the estimated effect of each mQTL, its support interval on the genome and the proportion of variance explained by it (see Box 1).

MetaNetwork explores the associations between metabolites by comparing their QTL profiles based on correlations. A permutation procedure sets an empirical threshold for the correlation at a desired significance level. MetaNetwork generates files with network connections that can be visualized using Cytoscape, an open source software suite for visualization of biomolecular interactions36 (see Box 1).


Top

Materials

Equipment

  • Computer operating systems: Windows XP, GNU Linux or Mac OS X
  • R (http://www.r-project.org): software environment for statistical computing and graphics. The R application (current version 2.4.1) and installation manual can be found at http://www.r-project.org. In this paper, we assume an application under Windows XP
  • Required R-packages: "qvalue" for FDR control. R packages can be easily installed via Packages | install package(s). The user can choose a mirror site close to his location and then select the package "qvalue" for installation. Please go to http://www.r-project.org for help if necessary
  • MetaNetwork package, user manual and example data files can be downloaded from http://gbic.biol.rug.nl/supplementary/2007/MetaNetwork and saved locally. Install MetaNetwork package via Packages | install package(s) from local zip files: browse the zip file of MetNetwork package
  • Cytoscape: open source software for visualizing biomolecular interaction networks. Cytoscape (current version 2.3.2) can be downloaded from http://www.cytoscape.org. Cytoscape requires Java version 1.4.2, which can be downloaded from http://java.sun.com/j2se/1.4.2/index.jsp

ADVERTISEMENT

Top

Procedure

  1. Preparing and startingPrepare input files. Three kinds of information are required in QTL analysis: the genetic linkage map of molecular markers (markers, see Table 1); the genotypes of each individual at each marker position (genotypes, see Table 2); and the trait values (metabolite abundances) of each individual (traits, see Table 3). Optionally, the user can provide mass weight information for the mass peaks, to allow for a combined analysis of mass data and QTL profiles (peaks, see Table 4). The files should be formatted as comma separated values (CSV), for example, as "markers.csv," "genotypes.csv," "traits.csv" and "peaks.csv," respectively. Files can be formatted by using Microsoft's Excel via File | Save as, and choosing the file type "CSV (comma delimited) (*.csv)" from the pull-down menu of "Save as type."



  2. Load the MetaNetwork package by starting the R application and typing the command

    > library(MetaNetwork)

    This loads the functions of MetaNetwork and the required qvalue package.

  3. Change the working directory (optional). The default directory of R is most likely to be "C:/Program Files/R/R-2.4.1," where R is installed. Users can change it to the directory where the files from Step 1 are saved, for example, change to "C:/MetaAnalysis" using the command

    > setwd("C:/MetaAnalysis")

  4. Loading data (the order of Steps 4–7 does not matter)Load the marker data. Load marker data (see Table 1 for format) from a file into an R object using the function "loadData," for example, load file "markers.csv" into R object "markerData" using the command

    > markerData <- loadData("markers.csv")

    If the user did not set the working directory in Step 3, he should give the full path of the file. The same holds for Steps 5–7.

    > markerData <- loadData("C:/MetaAnalysis/markers.csv")

  5. Load the genotype data (see Table 2 for format) using the command

    > genotypeData <- loadData("genotypes.csv")

  6. Load the trait data (see Table 3 for format) using the command

    > traitData <- loadData("traits.csv")

  7. Optionally, load the peak data (see Table 4 for format). Load peak data to allow for a combined analysis of peak masses and QTL profiles using the command

    > peakData <- loadData("peaks.csv")

  8. Running the analysisRun MetaNetwork. Run the "MetaNetwork" function on data from previous steps and with default settings using the command

    > MetaNetwork(markers=markerData, genotypes=genotypeData, traits=traitData, spike=4)

    The arguments "markers," "genotypes" and "traits" take values from the R objects "markerData," "genotypeData" and "traitData" loaded in Steps 4–6. Absence of a mass peak in a considerable number of individuals leads to signal intensities equal to or less than the detection limit and therefore causes a spike in the trait distribution at zero. The argument "spike" has to be specified to separate presence/absence (binary) from available trait abundance (quantitative) in the trait data, for example, here using a threshold of four times the local noise3. The order of arguments does not matter (see Table 5). The above command will run analysis steps A–E and G by default (see Box 1). These steps can be individually excluded from, or optional steps F and H can be included in, the analysis using the commands outlined in Box 1. During MetaNetwork analysis (see Box 1), a summary of the process (e.g., the progress of the procedure, generated R objects and output files and the computing time) will be displayed in the R Console (see Fig. 2) and saved in the file "output.txt" for future reference.

    Figure 2: The view of the R console for the MetaNetwork application.
    Figure 2 : The view of the R console for the MetaNetwork application.

    The procedures, R object names and file names for saving results and processing times are shown.

    Full size image (91 KB)



    Critical step R objects exist only during the working period of the R Console. To serve later MetaNetwork analyses, R objects can be saved during closure of the R console.
  9. VisualizationQTL profiles visualization. The QTL likelihood along the genome (- 10log P calculated at each marker position) can be visualized in R with function "qtlPlot" using the command

    >qtlPlot(markers=markerData, qtlProfiles=qtlProfiles, qtlThres=qtlThres)

    where argument "markers" takes values from object "markerData" generated in Step 4; argument "qtlProfiles" is the QTL test statistic and takes the values in the object "qtlProfiles" generated in Step 8A (see Box 1) of MetaNetwork; argument "qtlThres" is the threshold for significant QTLs and takes the value from object "qtlThres" generated in Step 8B of MetaNetwork.

  10. Network visualization using Cytoscape. Launch Cytoscape and choose "File | Import | Network (multiple file types)" to load network file ("network.sif") and "File | Import | Edge Attributes" to load edge attributes file ("network.eda") generated in Step 8G (see Box 1). Different layout and visualization styles can be applied to view the network, for example, applying the threshold "corrThres" from Step 8F (see Box 1) as a filter to only show significant edges. For details, please see the Cytoscape manual (http://www.cytoscape.org).Troubleshooting
Top

Timing

Figure 2 shows the timing of the analysis of 24 metabolites from 162 RILs in Arabidopsis at 117 markers3, using a Windows XP PC with an AMD Athlon 64 CPU (2.20 GHz) and 1 GB of RAM. The computation time increases with the number of traits and markers: linearly for QTL mapping (Steps 8A and C), and quadratically for correlation (Steps 8D and E) and peak multiplicity finding (Step 8H). The computation time of QTL threshold simulation (Step 8B) and correlation threshold permutation (Step 8F) increases linearly with the number of simulations/permutations. The timing for optional steps 8F and H are not shown: 10,000 permutations take 5,270 min (use of a computer cluster is suggested); peak multiplicity finding takes a few seconds. The total computation time for a default MetaNetwork analysis of 2,000 mass peaks is up to 4 days.

Top

Troubleshooting

The most important sources of error and possible solutions are given in Table 6.


Top

Anticipated results

MetaNetwork was used for the genetic study of approx2,000 mass peaks in 162 RILs of Arabidopsis generated from a cross between the distant accessions Landsberg erecta (Ler) and Cape Verde Islands (Cvi)3. These individuals have been genotyped at 117 markers which are nearly evenly distributed along the genome. The network correlations as predicted by the MetaNetwork protocol were verified against previous knowledge29, 37, 38, 39 for 18 aliphatic glucosinolates and six glycosylated flavonols, all products of secondary metabolism. We use this small data set as an example of the type of results that can be anticipated. All data are shipped with the package and can be loaded in R using

> data(markers)

> data(genotypes)

> data(traits)

Alternatively, users can load data and test MetaNetwork simply by command line

> example(MetaNetwork).

Mapping genetic determinants

The QTL likelihood along the genome as stored in "qtlProfiles" is visualized with the function "qtlPlot," loaded by > data(qtlProfiles) and visualized by > qtlPlot(markers,qtlProfiles,4.11). At the empirical - 10log P threshold 4.11 (alpha=0.05, FDR=0.0003), the glucosinolate mQTLs map to two major loci, which were confirmed by a previous targeted study39: gene AOP at 9.0 cM of chromosome 4 is responsible for glucosinolate side-chain modification37, and gene MAM at 35 cM of chromosome 5 is responsible for chain elongation39. The observation that all glucosinolates have a QTL at MAM but only some of them have a QTL at AOP suggests that AOP acts downstream of MAM (Fig. 3a). The mQTL at MAM exerts the same sign of effect for all glucosinolates that are in the same branch of the network, whereas the mQTL at AOP exerts reversed effects on precursors and their successors. Six flavonols showed strong mQTLs at 88.6 cM of chromosome 1, where a not previously known glycosyl transferase or regulator was suggested3 (Fig. 3b).

Figure 3: The visualization of metabolic QTL profiles and networks.
Figure 3 : The visualization of metabolic QTL profiles and networks.

(a) The mQTL profiles for ten aliphatic glucosinolates before AOP catalysis (upper part) and eight after AOP catalysis (lower part). The mQTL at 303.3 cM on chromosome 4 is at the AOP locus. The mQTL at 409.4 cM on chromosome 5 is at the MAM locus. A positive (negative) sign indicates that individuals carrying the Cvi allele have higher (lower) abundance than individuals carrying the Ler allele. The different colors represent different carbon chain lengths (black 3C; red 4C; green 5C; blue 6C; light blue 7C). (b) The mQTL profiles for six glycosylated flavonols. The mQTL at 88.6 cM on chromosome 1 is a putative glycosyl transferase, catalyzing the production of flavonoldihexosides. The different colors represent different aglycone classifications (black: quercetin; red: kaempferol; green: isorhamnetin), different line types represent different glycosylation patterns (solid line: dihexoside; dashed line: hexoside). (c) The detected mQTLs explain a percentage of the total variation observed between the RILs: the percentage of variance explained for the binary presence/absence of metabolite is on the x axis; the percentage of variance for the non-zero quantitative metabolite abundance is on the y axis. The green dots represent MAM mQTLs for glucosinolates; the red dots represent AOP mQTLs for glucosinolates; the blue triangles represent mQTLs for flavonols. (d) Visualization of the metabolic network using Cytoscape. The nodes represent different metabolites and the edges represent significant correlations. Glucosinolates are presented in a different color based on their carbon chain length—Hgray (3C), red (4C), green (5C) and blue (6C)—and flavonols are presented in pink.

Full size image (93 KB)

The mQTLs can underlie binary variation of presence/absence of the metabolite, quantitative variation of metabolite abundance or both types of variation in the segregants (Fig. 3c). For the detected 52 mQTLs, 22 mQTLs only underlie quantitative variation; seven mQTLs predominantly underlie binary variation and the rest underlies both types of variation. For example, two flavonols showed mQTLs 88.6 cM of chromosome 1 that underlie only quantitative variation, whereas the four other flavonols showed mQTLs at that position that underlie both binary and quantitative variation. Further interpretation of these mQTLs can be obtained from the QTL summary "qtlSumm," loaded by > data(qtlSumm).

A combined analysis of mass data and QTL profiles predicted that a single glucosinolate can have up to six mass peaks (1.2 on average, 6 glucosinolates had 3–6 mass peaks); a single flavonol can have up to four mass peaks.

Metabolic network (re)-construction

MetaNetwork computes the zero-order correlation "corrZeroOrder" and second-order partial correlation "corrSecondOrder" between pairs of metabolites, loaded by > data(corrSecondCorr) and > data(corrZeroOrder), respectively. Thirty-one second-order correlations were significant at a Bonferroni-corrected alpha=0.05 level ("corrThres"=0.14 from 20,000 simulations). These significant correlations are plotted using Cytoscape (Fig. 3d). We can observe that glucosinolates and flavonols are separated into two networks because they have different mQTLs.

The similarities between the reconstructed and known glucosinolate pathway validate the approach, and the dissimilarities may suggest (but do not prove) possible previously unknown steps in the formation of glucosinolates. In the constructed network for glucosinolates (left in Fig. 3d), edges for the known transformation between the methylthio group and the methylsulfinyl group were always observed. But novel edges between metabolites were also observed, for example, the edge linking 2-propenyl to 4-methylthiobutyl (but the biochemical linkage may be indirect, that is, due to coregulation by the same mQTL). The reverse additive effect of the AOP locus for 4-hydroxybutyl, 2-propenyl and 4-benzoyloxybutyl formation shows that regulation can be completely different for different growth stages3.

Except one flavonol, all pairwise partial correlations among the other five flavonols remain significant (right in Fig. 3d). Colocation of mQTLs of these six flavonols suggests that the biochemical linkages are indirect, that is, variation in their abundance is attributable to a single locus affecting glycosylation of the basic flavonoid backbone3.

These results show how the combined genetic and metabolomic approach allows the (re)construction of metabolic pathways. It can provide an independent line of evidence to create new knowledge or to validate or modify current knowledge. Even an untargeted approach can therefore facilitate the annotation of metabolites and show that they play a role in existing or new pathways3. Although MetaNetwork can identify meaningful associations between metabolites, it can obviously not prove causality (i.e., that there are true biochemical linkages between highly correlated metabolites). Any output should therefore be treated as an independent source of information solely for the use of hypothesis formation and be used as guidelines for future experimental confirmation.

Although MetaNetwork is developed for and has been applied to metabolite data, its theoretical basis readily extends to other high-throughput quantitative measurements such as gene and protein expression. We expect that MetaNetwork will prove increasingly useful in elucidating systems genetics.

Note: Supplementary information is available via the HTML version of this article.



Top

Acknowledgments

We thank Dr. Jan-Peter Nap for constructive comments on an earlier version of this paper, Bruno Tesson, Gonzalo Vera and Richard Scheltema for helping to develop the R-package, and Martijn Dijkstra and Rainer Breitling for helping to predict multiple peaks belonging to the same metabolite. This work was supported by grants from the Netherlands Organization for Scientific Research Program Genomics (050-10-029).

Competing interests statement: 

The authors declare no competing financial interests.

Top

References

  1. Wink, M. Plant breeding: importance of plant secondary metabolites for protection against pathogens and herbivores. Theor. Appl. Genet. 75, 225–233 (1988). | Article | ChemPort |
  2. Baxter, J.D. & Webb, P. Metabolism: bile acids heat things up. Nature 439, 402–403 (2006). | Article | PubMed | ChemPort |
  3. Keurentjes, J.J.B. et al. The genetics of plant metabolism. Nat. Genet. 38, 842–849 (2006). | Article | PubMed | ChemPort |
  4. Jansen, R.C. & Nap, J.P. Genetical genomics: the added value from segregation. Trends Genet. 17, 388–391 (2001). | Article | PubMed | ISI | ChemPort |
  5. Brem, R.B., Yvert, G., Clinton, R. & Kruglyak, L. Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755 (2002). | Article | PubMed | ISI | ChemPort |
  6. Bystrykh, L. et al. Uncovering regulatory pathways that affect hematopoietic stem cell function using "genetical genomics". Nat. Genet. 37, 225–232 (2005). | Article | PubMed | ISI | ChemPort |
  7. Chesler, E.J. et al. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat. Genet. 37, 233–242 (2005). | Article | PubMed | ISI | ChemPort |
  8. Cheung, V.G. et al. Mapping determinants of human gene expression by regional and genome-wide association. Nature 437, 1365–1369 (2005). | Article | PubMed | ISI | ChemPort |
  9. DeCook, R., Lall, S., Nettleton, D. & Howell, S.H. Genetic regulation of gene expression during shoot development in Arabidopsis. Genetics 172, 1155–1164 (2006). | PubMed | ChemPort |
  10. Hubner, N. et al. Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat. Genet. 37, 243–253 (2005). | Article | PubMed | ISI | ChemPort |
  11. Keurentjes, J.J.B. et al. Regulatory network construction in Arabidopsis using genome-wide expression QTLs. Proc. Natl. Acad. Sci. USA 104, 1708–1713 (2007). | Article | PubMed | ChemPort |
  12. Morley, M. et al. Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743–747 (2004). | Article | PubMed | ISI | ChemPort |
  13. Schadt, E.E. et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003). | Article | PubMed | ISI | ChemPort |
  14. Yvert, G. et al. Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat. Genet. 35, 57–64 (2003). | Article | PubMed | ISI | ChemPort |
  15. Bing, N. & Hoeschele, I. Genetical genomics analysis of a yeast segregant population for transcription network inference. Genetics 170, 533–542 (2005). | Article | PubMed | ChemPort |
  16. Lan, H. et al. Combined expression trait correlations and expression quantitative trait locus mapping. PLoS. Genet. 2, e6 (2006). | Article | PubMed | ChemPort |
  17. Zhu, J. et al. An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet. Genome Res. 105, 363–374 (2004). | Article | PubMed | ISI | ChemPort |
  18. Broman, K.W. Mapping quantitative trait loci in the case of a spike in the phenotype distribution. Genetics 163, 1169–1175 (2003). | PubMed |
  19. Sabatti, C., Service, S. & Freimer, N. False discovery rate in linkage and association genome screens for complex disorders. Genetics 164, 829–833 (2003). | PubMed | ISI |
  20. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003). | Article | PubMed | ChemPort |
  21. Broman, K.W. Mapping expression in randomized rodent genomes. Nat. Genet. 37, 209–210 (2005). | Article | PubMed | ChemPort |
  22. Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000). | Article | PubMed | ISI | ChemPort |
  23. Zhang, P. et al. MetaCyc and AraCyc. Metabolic pathway databases for plant research. Plant Physiol. 138, 27–37 (2005). | Article | PubMed | ChemPort |
  24. Mueller, L.A., Zhang, P. & Rhee, S.Y. AraCyc: a biochemical pathway database for Arabidopsis. Plant Physiol. 132, 453–460 (2003). | Article | PubMed | ChemPort |
  25. Dijkstra, M., Vonk, R.J. & Jansen, R.C. SELDI-TOF mass spectra: a view on sources of variation. J. Chromatogr. B 847, 12–23 (2007). | Article | ChemPort |
  26. Tikunov, Y. et al. A novel approach for nontargeted data analysis for metabolomics. Large-scale profiling of tomato fruit volatiles. Plant Physiol. 139, 1125–1137 (2005). | Article | PubMed | ChemPort |
  27. de Vos, C.H. et al. Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry. Nat. Protoc. DOI doi: 10.1038/nprot.2007.95 (2007). | Article |
  28. Lisec, J., Schauer, N., Kopka, J., Willmitzer, L. & Fernie, A.R. Gas chromatography mass spectrometry-based metabolite profiling in plants. Nat. Protoc. 1, 387–396 (2006). | Article | ChemPort |
  29. Kliebenstein, D.J., Gershenzon, J. & Mitchell-Olds, T. Comparative quantitative trait loci mapping of aliphatic, indolic and benzylic glucosinolate production in Arabidopsis thaliana leaves and seeds. Genetics 159, 359–370 (2001). | PubMed | ChemPort |
  30. Hoheisel, J.D. Microarray technology: beyond transcript profiling and genotype analysis. Nat. Rev. Genet. 7, 200–210 (2006). | Article | PubMed | ChemPort |
  31. Ihaka, R. & Gentleman, R. R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996). | Article |
  32. Butte, A.J. & Kohane, I.S. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput. 5, 418–429 (2000).
  33. Jansen, R.C. Interval mapping of multiple quantitative trait loci. Genetics 135, 205–211 (1993). | PubMed | ISI | ChemPort |
  34. Jansen, R.C. Quantitative trait loci in inbred lines. in Handbook of Statistical Genetics (eds. Balding, D.J., Bishop, M. & Cannings, C.) 445–476 (John Wiley & Sons, New York, 2003).
  35. Swertz, M.A. & Jansen, R.C. Beyond standardization: dynamics software infrastructures for systems biology. Nat. Rev. Genet. 8, 235–243 (2007). | Article | PubMed | ChemPort |
  36. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). | Article | PubMed | ISI | ChemPort |
  37. Kliebenstein, D.J., Lambrix, V.M., Reichelt, M., Gershenzon, J. & Mitchell-Olds, T. Gene duplication in the diversification of secondary metabolism: tandem 2-oxoglutarate-dependent dioxygenases control glucosinolate biosynthesis in Arabidopsis. Plant Cell 13, 681–693 (2001). | Article | PubMed | ISI | ChemPort |
  38. Kliebenstein, D.J. et al. Genetic control of natural variation in Arabidopsis glucosinolate accumulation. Plant Physiol. 126, 811–825 (2001). | Article | PubMed | ISI | ChemPort |
  39. Kroymann, J. et al. A gene controlling variation in Arabidopsis glucosinolate composition is part of the methionine chain elongation pathway. Plant Physiol. 127, 1077–1088 (2001). | Article | PubMed | ChemPort |
  40. de la Fuente, A., Bing, N., Hoeschele, I. & Mendes, P. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20, 3565–3574 (2004). | Article | PubMed | ChemPort |
  1. Groningen Bioinformatics Centre, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Kerklaan 30, NL-9751 NN Haren, The Netherlands.
  2. Laboratory of Genetics, Wageningen University, Arboretumlaan 4, NL-6703 BD Wageningen, The Netherlands.
  3. Laboratory of Plant Physiology, Wageningen University, Arboretumlaan 4, NL-6703 BD Wageningen, The Netherlands.

Correspondence to: Ritsert C Jansen1 e-mail: r.c.jansen@rug.nl

Extra navigation

Feedback

Browse by category

Open Innovation Challenges

naturejobs

  • Postdoctoral Position

    • McGill University
    • Goodman Cancer Centre, McGill University, Cancer Pavilion, 1160 Pine Avenue West, Room 414, Montreal, Quebec , Canada, H3A 1A3
  • Associate Research Scientist

    • Lovelace Respiratory Research Institute
    • Albuquerque, NM

natureproducts


ADVERTISEMENT