INTRODUCTION

Since the first draft of the human genome was published in 2001 (Venter et al, 2001), genome-wide studies of human variability have gone from dream to reality, covering the entire spectrum of human disease, including neurological and psychiatric diseases. Following the initial draft, scientific debates arose regarding the number of genes encoded by the human genome and the percentage of noncoding or ‘junk’ DNA. At first, the coding elements were deemed to be a low percent of the entire genome. The first general translation of human genome in 2012, delivered by the ‘Encyclopedia of DNA Elements’ or ENCODE project (Encode Project Consortium, 2012), showed that 80% or more of the human genome is biologically active and that the noncoding DNA regions in fact have the potential to modulate human health and disease. These new data will provide a basis to increase the impact of genomics in human medicine and are highly synergistic with data from trancriptomic, proteomic, glycomic, and metabolomic initiatives. For instance, only one-third of the ENCODE-predicted protein-coding genes have been identified at the protein level. To meet the challenge of identifying all human proteins and assigning their functions, ENCODE has joined forces with the Chromosome-centric Human Proteome Project (C-HPP) (Paik and Hancock, 2012). Those combined efforts are expected to define the mechanisms by which polygenes, transcription factor networks, and single-nucleotide polymorphisms (SNPs) control the expression of groups of protein isoforms. The C-HPP will identify the expression of yet undetected proteins, including those encoded by newly identified coding regions. A third partner in the collaboration is the Genome-Tissue Expression Project (http://www.genome.gov/gtex/). The GTEx project employs optimized protocols for obtaining and storing a large range of organs and tissues, and for analyzing genetic variation and expression. These tissues and organs are collected and stored through the National Cancer Institute’s cancer Human Biobank initiative on behalf of GTEx. Microarray results for frontal and temporal cortex, cerebellum, and pons are already available. The main goal of the collaboration is to integrate genomic and proteomic data into functional modules that will describe systems-wide interactions in biological processes, in health, and in disease, including neuropsychiatric disorders (Califano et al, 2012).

Understanding the response of neurological cell populations within tissues (normal vs diseased, treated vs untreated, etc.) and evidence that these cell populations are ‘drivers’ of disease underscores the importance of understanding the detailed biology at the molecular level of cells and circuits. Utilizing an integrated biological approach to probe the response of cells and tissues will guide the rational design of therapeutic treatment. Because of the complexity of the approach, this research is most often accomplished through multidisciplinary and interinstitutional collaborations, as no single laboratory has expertise in all the technologies required. The C-HPP is an extreme example, because it brings together data sets acquired in hundreds of international laboratories.

Systems biology research is built on knowledge derived from global data sets measured in patterns of response of the transcriptome, proteome, glycome, lipidome, and metabolome. To obtain knowledge of the signaling pathways involved in the maintenance of the normal vs diseased state or treated vs untreated states, quantitative phosphoproteomic, glycomic, transcriptomic, lipidomic, and metabolomic data sets are collected on each sample. The data are compared with known, published results and pathways and then processed with innovative mathematical–computational algorithms tailored to define correlations between the data sets and graphical modeling to reconstruct pathways that are not yet defined in the scientific literature. The focus is to identify and characterize pathway reactions that will bridge the final gap and, for the first time, enable a mechanism to understand cellular responses, identify new targets, and design appropriate therapeutic interventions specifically targeting the disease.

We have applied this approach (He et al, 2010a, 2010b; Kroes et al, 2010; Nilsson et al, 2010) to the study of GBM and glioma-derived cancer stem-like cells (GSCs). The research is based on several innovative biological systems and technologies, and has resulted in novel insights into GBM and GSC biology. Our approach permits a hypothesis-driven, carefully planned experimental design to generate the data that enable identification of novel therapeutic targets and biomarkers. We measured global changes in intact polar lipids (gangliosides, phospholipids, sulfatides, etc.) through an analytical methodology that employed innovative chromatographic separations, efficient ionization techniques, and high-resolution mass spectrometry (MS) analysis. Observed changes in lipids (lipidomics) and other metabolites in normal vs treated GBMs and/or GSCs were correlated to changes in glycogene transcripts. Those data were further correlated with changes in the phosphoproteome of treated GSCs. Computational–mathematical analysis (Goerke et al, 2010; Meyer-Baese et al, 2012a, 2012b) further defined correlations in the large data sets (Figure 1). Although the subject of the study was within neuro-oncology, the same techniques are applicable to other diseases. However, experimental planning is more complicated in studies of neuropsychiatric illness, because those studies may include neurodevelopmental hypotheses or changes over a very long timespan. The appropriate samples, whether from human or from animal models, must be obtained in a standardized manner (Mattsson et al, 2012) and must be properly treated and stored, especially for biomolecular studies. Heat stabilization may be required at the time of sampling to preserve neuropeptide, metabolite, and protein, especially phosphoprotein, integrity (Ahmed and Gardiner, 2011)

Figure 1
figure 1

Computational analysis of comprehensive proteomic, transcriptomic, and lipidomic data sets can yield new insights derived from correlations hidden in the data.

PowerPoint slide

This review focuses upon how the new knowledge base provided by advances in genomics can be leveraged in other genome-wide studies in neuroscience, especially when those data sets are combined. The four major tissues (human-derived or from animal models) used in molecular studies of human CNS disease are cerebrospinal fluid, cell cultures, post-mortem brain, and spinal cord. Blood and urine samples are always part of the general medical analysis, but saliva and nasal secretions may prove to be valuable sample sources, especially for metabolite analysis. Biological tissues are a rich source of (poly)nucleotides, amino acid biopolymers (neuropeptides and proteins), lipids, and metabolites. Traditionally, bioanalytical methods in neuroscience have relied heavily on antibody-based detection of single or a few proteins by western blot or immunohistochemistry analysis. Although highly sensitive, antibody-based assays only recognize a linear or three-dimensional epitope of three to seven amino acids (Schechter, 1970) and often cross-react with molecules other than the one targeted. If the epitope is modified (such as by phosphorylation), the protein may go undetected. Furthermore, a single antibody likely will not deliver structural information outside the epitope and thus cannot resolve molecular isoforms that may have distinct biological functions. For these reasons, there has been a surge in interest in analytical techniques that can provide both high-confidence identification and structural assignments.

This review is organized into sections that describe the enabling technologies that are employed in integrated biological studies, including genome-wide determination of transcript, protein, lipid and metabolite expression, and computational strategies to find novel correlations between large biological data sets. Examples of synergistic results obtained in neuroscientific and neuro-oncological investigations are provided. We present methods relevant to neuroimaging connectomics and the cell biology of nueropsychiatric disorders. Although we do not include a separate discussion of the biomarker field, it is our experience that putative biomarker identification is a natural outcome of integrated biological studies. Future perspectives on the potential impact of integrated biological approaches in the development of neurotherapeutics are described in each section.

ESSENTIAL CONSIDERATIONS IN EXPERIMENTAL DESIGN

The transcription and translation of genetic information into proteins, lipids, and metabolites comprises highly nonlinear processes. Intra- and extracellular environments are dynamic systems that constantly adjust to signals in their surroundings. Thus, understanding the micro- and macroheterogeneity of functional molecules is essential to study disease processes. Recent advances in analytical and computational technologies show promise to characterize the full biochemical complement of neurological and other biological systems in health and disease. It is of importance to note that studies of biomolecules on a global scale often result in very large data sets (terabytes) and require a high degree of skilled experimental planning, execution, and data analysis in order to obtain biologically relevant results.

The pitfalls in large-scale studies can be eliminated by thoughtful study design (Oberg and Vitek, 2009). Systematic data biases result from different treatments between sample states (different sampling methods or storage conditions, for instance) and those that cannot be corrected by statistics. Data from adequate numbers of biological and technical replicates must be acquired. Every large-scale experiment does have sources of technical variation, which can be ameliorated by statistical analysis. Although large-scale investigations of lipids and metabolites are relatively rare, still, much discussion of statistical methods to correct for experimental biases in transcriptomic (Dupuy and Simon, 2007) and proteomic (Cappadona et al, 2012; Thompson et al, 2012) assays has been published. It is highly advisable to have a statistician participate in the design of large-scale studies, beginning at the planning stage.

The most important component of large-scale experimental design is a robust hypothesis, followed by the identification of sample populations, appropriate comparisons, and numbers of biological samples. There are special challenges associated with molecular research in the neuropsychiatric field. The privileged nature of human brain sampling, which most often occurs post mortem, the need to protect the samples from degradation to the greatest extent possible, and the challenges in making clinical diagnoses are especially prominent in this field. A strict control of factors that affect intersample variability is necessary. Once the biological samples are accumulated, experimental variations should be minimized by rigorous protocols to ensure that all samples are treated identically. Most samples are extracted and analyzed in a queue, injected sequentially into a mass spectrometer. When the samples are grouped and introduced sequentially, systematic quantitative differences may be observed that arise due to instrument sensitivity rather than real differences between samples. Block randomization of sample analysis is one way to overcome this issue.

Quantitative workflows in proteomics, lipidomics, and metabolomics record patterns of analyte abundances in samples derived from distinct conditions or states. Typically, very large data sets are generated, and are first extracted and processed by the use of commercial or in-house-developed software tools. A list of quantified analytes may then be further examined with knowledge-based tools, such as Ingenuity Pathways Analysis, to find biological circuits that may be relatively up- or downregulated between sample states.

TRANSCRIPTOMICS

One important principle underpinning the systems biological approach to understand complex neurological disorders is not merely the integration of each of the physical components and interactions of a system, but also the mapping of how the information propagates through this system in response to perturbations. If we hope to understand the dynamic biology of complex neurological disorders, instead of looking at one transcript or protein at a time, which is historically how this has been approached, we need to understand the integration of thousands of proteins in a dynamically changing environment (Geschwind and Konopka, 2009; Villoslada et al, 2009; Kotaleski and Blackwell, 2010). How we understand core transcriptomic changes can occur at several different levels, depending upon the overarching nature and purpose of the question at hand: (a) the hypothesis-generating phase. When transcriptomics is employed as a single technique, it is an unbiased approach that requires no understanding of the system and occurs primarily at the level of gene expression patterns, and (b) the applications or target generation phase. The latter requires a systems approach and applies it in a way that can be used to identify better therapeutic targets or combinations of targets in the proper context for the treatment of disease. As many genes contribute to normal functions, research efforts are moving from the search for a ‘disease-specific gene’ towards the understanding of the biochemical and molecular functioning of gene families whose disrupted interaction in complicated networks can lead to the disease state.

Transcriptomics can address all (global approaches) or a segment (focused approaches) of the transcriptome, from normal or diseased single cells or tissues. Global approaches represent an important complement to the traditional reductionist single-gene studies. Whereas traditional single gene studies are hypothesis-driven, both global and focused approaches can be hypothesis-generating, discovery-based, or hypothesis-driven, and therefore have the potential to yield novel insight into brain function and dysfunction. As detailed below, depending on the techniques employed, it can be used as a snapshot to annotate a cell’s expression profile or fingerprint, including coding as well as noncoding transcripts; query the structure of the genes that gave rise to them, including intricacies of exon/intron boundaries, transcription start sites, splicing patterns, and even gene fusion events; and aid in mapping interactive networks. Networks are systems of interconnected entities. When networks are studied comprehensively, novel properties emerge that can’t be derived from the individual analysis of any of their components. And perhaps most familiarly, transcriptomics are used to determine how transcript expression patterns are altered under differing conditions (such as disease or drug treatment), and to identify and validate potential biomarkers or therapeutic targets.

MICROARRAY-BASED TRANSCRIPTOMICS

Focused Microarray Approaches

Focused microarrays are fabricated using a restricted number of carefully selected transcripts, and may include those targeting a particular biological pathway, a specific tissue, or even a scaled-down version of the transcriptome. Irrespective of their overall design (detailed below), this approach optimizes sensitivity and accuracy of the data, while controlling the cost of experiments, thus facilitating the repeated measurement and replicate experiments required in analyses of complex tissues. The reduced number of probes comprising these arrays can be stringently designed, thus controlling their biophysical characteristics critical to hybridization efficiency (melting temperature, secondary structure, and stability). As such, these arrays are able to accurately identify differentially expressed transcripts with high sensitivity to low levels of differential expression. This approach reliably identifies transcripts having as low as 10% difference between RNA samples, reinforcing the applicability of this approach to analyze highly heterogenous tissue sources. As with all transcriptomic platforms, the identified transcripts must be validated by an independent methodology (eg, higher throughput quantitative real-time PCR assays) (Canales et al, 2006).

We have utilized this focused microarray approach (FMA) platform to identify aberrant cell-surface glycosylation patterns in malignant human gliomas (Kroes et al, 2007a, 2007b; Moskal et al, 2009; Kroes et al, 2010). We developed a custom-fabricated, focused oligonucleotide microarray representing all of the known human glycogenes: transcripts encoding proteins responsible for glycan synthesis and glycan recognition, including glycosyltransferases, glycosylhydrolases, and other glycan processing enzymes, proteoglycans, and glycan-binding proteins. This platform was designed to provide comprehensive coverage of the glyco-associated gene family, the flexibility to add new sequences to the arrays as they become available, and to assure the most up-to-date coverage of these gene families. This platform has been rigorously evaluated in terms of dynamic range, discrimination power, accuracy, reproducibility, and specificity, which provides the ability to reliably measure even low levels of statistically significant differential gene expression. In particular, when combined with robust ontological analyses, qRT-PCR corroboration, and in vivo functional studies, these high-quality, comprehensively annotated, application-specific low-density microarray platforms provide an efficient strategy that can pinpoint gene expression changes and provide a robust database for pursuing the creation of novel therapeutics.

Global Microarray Approaches

The evolution of microarray platforms over the past two decades has resulted in multiple commercially available platforms for measuring genome-wide gene expression levels, including short oligonucleotide (25–30 base), long oligonucleotide (40–80 base), and cDNA (highly variable in length) arrays, in both single- and dual-channel formats. The continual increases in both the efficiency and economy of the oligonucleotide arrays seem to have made them the platform of choice for many studies. Performance characteristics between platforms, as well as inter- and intraplatform reproducibility of gene expression measurements, have been extensively evaluated over the years (Petersen et al, 2005; Microarray Consortium, 2006; Patterson et al, 2006). Overall, such detailed platform comparisons reveal mixed results. Several studies demonstrate concordance between platforms (Kane et al, 2000; Hughes et al, 2001; Taniguchi et al, 2001; Guckenberger et al, 2002; Yuen et al, 2002; Barczak et al, 2003; Dabrowski et al, 2003; Lee et al, 2003; Wang et al, 2003; Bloom et al, 2004; Järvinen et al, 2004; Lee et al, 2004; Parmigiani et al, 2004; Thompson et al, 2004; Ulrich et al, 2004), whereas others suggest disconcordance (Kothapalli et al, 2002; Kuo et al, 2002; Li et al, 2002; Carter et al, 2003; Lenburg et al, 2003; Rogojina et al, 2003; Tan et al, 2003; Mah et al, 2004). Although the explanation for these discrepancies is not altogether consistent, platform-specific differences in the statistical algorithms used to discriminate signal from noise and calculate transcript abundance were most relevant. In fact, the majority of discrepancy appears to be because of the (in)ability to detect lower abundance transcripts. This, however, is an arena where FMA platforms appear to have significant benefit.

Although FMA, by definition, provides less than the total transcriptome coverage than a global array, this limitation can be partially overcome by the careful selection of the transcripts that are represented. The breadth of the transcriptome coverage can be achieved, eg, using approaches that select representative genes within the defined ontology of the organism under study. Thus, this approach is clearly limited to those with adequate genome annotation.

As part of a larger program aimed at utilizing such an approach in appropriate animal models to give some insight into the molecular mechanisms underlying the emergence of neurological disorders, we have created and utilized such annotated arrays (Kroes et al, 2007a, 2007b, 2010). The genes comprising this oligonucleotide microarray were compiled from current databases and were strategically chosen to provide representation of greater than 90% of the major gene ontological categories. Downstream of the stringent data analysis, the incorporation of ontological algorithms, such as GOMiner (Zeeberg et al, 2003), Gene Set Enrichment Analysis (Subramanian et al, 2005, 2007;), DAVID (Huang et al, 2008), or Expression Analysis Systematic Explorer (Hosack et al, 2003) provided statistical rigor to analyses of coregulation of multiple genes (gene sets) that are functionally related or related by involvement in given biological pathways. For a full description of the dynamic range, discrimination power, accuracy, reproducibility, and specificity of this platform, please see these complete references (Kroes et al, 2006, 2007a, 2007b). We have identified expression patterns and functional gene families involved in models of epileptogenesis (Corcoran et al, 2011), depression (Kroes et al, 2006, 2007a, 2007b), and autism (Moskal et al, 2011); all are underexamined areas for development of novel therapeutics. Such preclinical studies uncovering the role of NMDA receptor modulation in positive emotional learning and resilience to depression has reinforced the clinical development of GLYX-13, a functional partial agonist of the NMDA receptor currently in phase II clinical development (clinicaltrials.gov identifier NCT01684163) as an adjunctive therapy for major depressive disorder (Burgdorf et al, 2011, 2013).

Direct Sequencing-Based Approaches

Although microarray technology has clearly dominated transcriptomics research, this situation appears to be changing rapidly with the recent technological advances in high-throughput RNA sequencing (RNAseq) (Janitz, 2008; Wang et al, 2009; Hawkins et al, 2010), albeit a technology still under active refinement; especially bioinformatic challenges related to storing, retrieving, and processing large amounts of data (Pepke et al, 2009; Martin and Wang, 2011). In general, a population of RNA is converted to a library of cDNA fragments with defined adaptors attached to one or both ends. Each molecule, with or without amplification, is then sequenced in a high-throughput manner to obtain short sequences; typically 30–400 bp, depending on the DNA-sequencing technology used. Following sequencing, the resulting reads are either aligned to a reference genome or reference transcripts, or assembled de novo without the genomic sequence to produce a genome-scale transcription map that consists of both the transcriptional structure and/or level of expression for each gene. It records the numerical frequency of a sequence in the library population, and thus provides an absolute measurement of transcript abundance rather than a relative measure.

RNAseq has some distinct advantages over microarrays (Courtney et al, 2010; Malone and Oliver, 2011). There is no prerequisite for genome sequence information in order to detect and ultimately evaluate transcripts; thus, RNAseq is most useful in organisms and situations where limited genome information exists. Unbiased sequence methods can detect novel transcripts, alternatively spliced isoforms, SNPs, and fusion transcript, again in the absence of genome annotation. It provides deep coverage and base-scale resolution (Cloonan and Grimmond, 2008). However, this technology also comes with some distinct trade-offs as well. The amount of time to retrieve results for a given sample is far greater. It is 10–100 times more expensive when compared at the same resolution of accuracy (coverage vs cost). It still may suffer from unknown biases, such as those imposed by the required ligation steps. The ability to detect and quantify rare transcripts is obscured. By design, high-abundance transcripts (such as from housekeeping genes) are responsible for the majority of the sequencing data (5% of the genes give rise to 75% of the reads sequenced). As a consequence, it is difficult to measure the abundances of the remaining genes reliably, and the majority of transcript measurements that are made at this level are very noisy. In fact, some recent reports suggest that in a mammalian genome, 700 million reads would be required to obtain an accurate quantification of >95% of the expressed transcripts (Blencowe et al, 2009) but, as yet, there has not been a systematic analysis on how sequencing coverage affects differential expression calls (Oshlack et al, 2010). Although deep sequencing effectively enhances our view on the diversity of the transcriptome, the identification of true differential expression at a low-count range might not be so easy to achieve. More reads imply the detection of more genes, but also result in noisier data, which makes the assessment of differential expression increasingly difficult.

In sum, the diversity of transcriptomic platforms clearly provides investigators with many options. When sensitivity is not limiting, DNA microarrays seem to offer significant advantages. On the other hand, when sensitivity is everything, look toward the short-read sequencing technologies. These technologies are totally complementary: each strategy has its appropriate time and place for use. In the end, the use of multiple platforms can only lead to the generation of higher quality data.

GLOBAL ANALYSIS OF PROTEINS AND OTHER BIOPOLYMERS

Unlike transcriptomic experiments, which are readily performed in a chip format, sequence/structural analysis of proteins, peptides, lipids, and glycoconjugates relies heavily on MS. The application of large-scale MS studies has greatly increased in the past two decades because of the high sensitivity and structural specificity that MS and tandem MS (MS/MS) can provide. A mass spectrometer always contains two parts, a source that generates gas-phase ions and a mass analyzer. The increase in biological applications of MS is attributed to the discovery of new, ‘soft’ methods to ionize large biopolymers, the development of low-flow (nl/min) introduction of solubilized analytes into mass spectrometers, and improved figures of merit for speed of data acquisition, sensitivity, and resolution of the instruments themselves. For small molecules, such as metabolites, gas chromatography (GC) with electron ionization is one reliable ionization technique. For large analytes, such as peptides, proteins, and glycolipids, matrix-assisted laser desorption/ionization (MALDI) (Karas and Hillenkamp, 1988) and electrospray ionization (ESI) (Fenn et al, 1989) provide intact molecular ions for measurements of intact mass by MS or fragmentation by MS/MS. MALDI-MS is a technique that is particularly tolerant of molar excesses of salts and buffers typically found in biological samples. MALDI-MS has been applied to analyze not only extracted analytes but has been used successfully in molecular imaging studies of in situ peptides, proteins, and lipids(Seeley et al, 2011) (see Andrén et al, this issue).

When an ion source is coupled to an ultra-high-resolution mass analyzer, such as a Fourier transform ion cyclotron resonance mass spectrometer (FT-ICR MS) (Marshall et al, 1998; Marshall and Hendrickson, 2008), high mass resolution (400 000 mass spectral peak width at half-maximal peak height) and high mass accuracy (ppb mass deviations) can be routinely attained. For molecules <400 Da, exact molecular fomulae and isotopic fine structures can be determined. Thus, FT-ICR MS is highly suitable for metabolomic studies. This type of mass spectrometer is often a shared resource in academic institutions because of its cost. For proteomic studies, lower-end instrumentation is often sufficient.

Proteomics

The human proteome is defined as the protein complement of the human genome. Proteomic studies benefit greatly from accurate and complete genome characterization. In comparative proteomic studies, global characterization without the use of antibodies of proteins expressed by cells or tissues in healthy or otherwise normal states, compared with proteins expressed in diseased states, can provide clues to underlying mechanisms of the underlying disease process. It is possible to identify putative biomarkers as a result of those studies, although extensive validation studies are required as a consequence. The proteomic samples are complex mixtures that can be simplified at the level of sampling by laser capture microdissection (Pérez-Manso et al, 2006) to enrich specific cell types from tissues. Subcellular fraction or enrichment for proteins bearing a specific posttranslational modification (PTM) may be useful approaches, depending on the experiemental hypothesis. An excellent overview with helpful facts on the application of proteomics to clinical materials was published recently by Tabb (2013).

Channelomics

The quest for new therapeutics that could ultimately lead to personalized treatments for psychiatric disorders remains a significant challenge. Large-scale genomic studies are likely to provide the initial data to identify potential disease mechanisms and disease-predictive biomarkers (Wong et al, 2011). The subsequent step is to capitalize on that new knowledge and evaluate systematically the role of relevant gene products in specific disease states through integrated biological technologies and informatics tools. This proposed pharmacogenetic pipeline has been proven successful in other disease contexts, such as cancer (Pitteri and Hanash, 2010) and cardiovascular diseases (Dubois et al, 2011) but it is at its nascent stage in psychiatric disorders. Over the past few years, a phenomenal number of genes have emerged as relevant in psychiatric disorders but very little is known about their exact role in psychiatric diseases. Some exemplary results will be discussed in the following paragraphs.

In neurons, voltage-gated sodium (Nav) and voltage-gated potassium (Kv) channels expressed at the axonal initial segment (AIS) drive the initiation and propagation of the action potential, the electrical signal underlying brain function and behaviors (Lai and Jan, 2006). Scaffolding and regulatory proteins at the AIS cluster with Nav and Kv channels ensure the integrity of electrical signaling (Ogawa and Rasband, 2008; Grubb and Burrone, 2010a, 2010b) and interference with this protein network can lead to deficits in excitability and eventually neuronal death (Buffington and Rasband, 2012). The emerging association of mutations, copy variants, or SNPs in genes coding for these ion-channel molecular complexes with psychiatric disorders is, in our opinion, one of the most exciting recent discoveries.

A large-scale genomic study in autistic patients identified two independent nonsense variants that interfere with the coding of Nav1.2 (Sanders et al, 2012), one of the Nav channel isoforms expressed at the AIS. Those results further support earlier studies that identified SNPs in the genes coding for Nav1.1 and Nav1.2 from another cohort of autistic patients (Weiss et al, 2003). Given the comorbidity of autism and epilepsy (Robinson, 2012), it is not surprising that the mutation in the Nav1.1-coding gene had been previously identified in a patient with juvenile myoclonic epilepsy (Weiss et al, 2003). Similar to Nav1.1, mutations in KCNQ2 (coding for Kv7.2, another channel expressed at the AIS) are recognized causes of human epilepsy and other excitability-driven inherited disorders (Singh et al, 1998; Mulley et al, 2003; Cooper, 2011; Mulley and Mefford, 2011). A new splice variant of the KCNQ2 gene has been identified in bipolar disorder and its aberrant phenotype described in cellular studies (Borsotto et al, 2007). Whether all these genetic modifications of Nav and Kv channels converge on common cellular phenotypes with heterogeneous clinical manifestations remains to be determined, but it is a plausible hypothesis.

Other results from the collection of studies implicate ankyrin-G in bipolar disorder (Leussis et al, 2012). Ankyrin-G is the main scaffolding protein of the AIS required for clustering both Nav and Kv channels (Zhang and Bennett, 1998; Bennett and Lambert, 1999; Jenkins and Bennett, 2001; Brachet et al, 2010; Gasser et al, 2012) and necessary for establishing and maintaining neuronal polarity (Grubb and Burrone, 2010a, 2010b; Galiano et al, 2012). Studies from different genetic cohorts have consistently ascribed the gene coding for ankyrin-G as a potential risk gene for bipolar disorder (Smith et al, 2009; Gella et al, 2011; Lett et al, 2011; Roussos et al, 2011; Tesli et al, 2011; Lee et al, 2012). Likewise, the gene coding for β-IVspectrin, a molecule associated with the ankyrin-G complex (Berghs et al, 2000) and intimately linked to both the Nav and Kv channel complexes (Komada and Soriano, 2002; Kosaka et al, 2008; Hund et al, 2010), was found to contain microdeletions in an autistic patient (Griswold et al, 2011). Furthermore, a missense mutation in the FGF14 gene, coding for a Nav channel regulatory protein expressed at the AIS (Lou et al, 2005; Laezza et al, 2007, 2009; Goetz et al, 2009; Shavkunov et al, 2012), is associated with cognitive impairment, neurodegeneration, and motor dysfunction (Van Swieten et al, 2003; Brusse et al, 2006), and SNPs in the same gene have been identified in schizophrenic patients (Need et al, 2009).

The biology of the mutations described above and their pathophysiological roles in psychiatric disorders is poorly understood. However, one could speculate that these ion channel macromolecular complexes are part of a nonredundant molecular network vital for neuronal communication, which, if aberrant, could lead to a wide range of neurological or psychiatric syndromes. A systematic quantitative proteomic analysis of expression levels and regulatory PTMs (eg, phosphorylation, glycosylation, and ubiquitination) of these ion channel complexes in health and disease could provide guidance to formulate new hypotheses on disease mechanisms and identify reliable biomarkers for clinical applications (Filiou et al, 2012; Filiou and Turck, 2012). Those results will require further validations and proof of scalability, especially for membrane-associated proteins such as ion channels. A first step toward this goal would be to utilize animal models to evaluate the role of specific gene products into disease contexts and to optimize proteomic techniques towards clinical applications (Holsboer, 2008). The biological experimental data sets would then be properly integrated into predictive informatics networks and contribute to elucidate the complex mechanisms of psychiatric disorders. This process will require many further steps performed in translational collaborations between basic sciences and the clinical settings in order to create personalized treatments.

Protein PTMs

More than 100 PTMs of proteins have been described, including disulfide bonds, phosphorylation, glycosylation, and myristoylation. A polypeptide transcript may be processed differently in different tissues and depend on developmental stage and health of the organism. Epigenomics, the study of histone PTMs on a global scale, is an expanding area of research in neuroscience, recently reviewed in this journal (Volume 38, Number 1, 2013).Conversely, the quantitative study of a specific PTM on a global scale can elucidate the role of that PTM in a disease state. For instance, large-scale studies of protein phosphorylation are frequently employed to measure changes in intracellular signaling pathways (Nilsson, 2012).

Protein Phosphorylation

Reversible protein phosphorylation is a well-studied mechanism in the regulation of cell signaling. This PTM can increase or decrease protein activity, change protein subcellular localization or protein–protein binding characteristics. Important regulatory proteins are typically detected at substoichiometric concentratons in cells, which presents special challenges to their analysis.

Protein phosphorylation is by far the most-studied PTM in the area of the control of cell signaling. It is a reversible modification of serine, threonine, and tyrosine, enzymatically catalyzed by kinases (phosphorylation, ‘writer’) and phosphatases (dephosphorylation ‘eraser’). SH2-domain binding proteins recognize and bind to sites of tyrosine phosphorylation, providing a specific ‘reader’ protein that can modulate tyrosine phosphorylation-mediated signaling pathways (Table 1). Recently, quantitative phosphoproteomic analysis of hippocampus samples from Alzheimer’s disease (AD) patients compared with control samples found changes in several proteins involved in metabolism and signal transduction (Di Domenico et al, 2011). An integrated transcriptomic–phosphoproteomic study of neuronal retrograde injury responses in a rat model identified several hundred redundant signaling networks; those may provide a relatively robust response to neuronal damage (Michaelevski et al, 2010).

Table 1 Examples of Writers, Readers and Erasers as Key Control Proteins in Biological Systems

Protein Glycosylation

Unlike protein phosphorylation, protein glycosylation is not a single chemical entity. Glycan structures are among the most complex structures in the biochemical space. The coding density of sugars is significantly higher than that of amino acids or nucleotides (Laine, 1997). For a given saccharide of six sugar units, the number of possible structural isomers is 1.44 × 1015; for a six-amino-acid-long peptide, it is 6.4 × 107; and for a nucleotide of six, it is 4096. The branching nature of oligosaccharides, the substitution of sugars with phosphate or sulfate, and differing anomeric linkages all contribute to the structural diversity of carbohydrates, yielding rich complexity dubbed ‘the sugar code’ (Gabius et al, 2002, 2004) that is a key determinant of cell adhesion, receptor activation, and intracellular transduction. The complexity of carbohydrate epitopes on proteins (and lipids) modulates glycan–glycan interactions at the cell surface and intra- or extracellular glycan–protein interactions.

Proteins that recognize an extended glycoepitope but lack enzymatic or transporter activities are called lectins. The name lectin is derived from tha past participle of legere (from Latin, to read) and are indeed readers of the sugar code (Table 1). Dysfunctional lectins and lectin polymorphisms have been implicated in neurological and psychiatric disorders. Galectins are defined by their affinity of β-galactosides and modulate cell cycle, apoptosis, angiogenesis, and cell motility. Galectin-3 has been associated in inflammatory diseases. Neuroinflammation has an important role in age-associated cognitive disorders. A recent study of genetic variation in the gene that encodes galectin-3 (LGALS3) showed an association of certain LGALS3 SNPs to higher baseline C-reactive protein concentrations and subtle attention defects (Trompet et al, 2012). Galectin-1 is another lectin whose glycan interactions have been shown to be essential in tempering microglial activation, brain inflammation, and neurodegeneration, with therapeutic implications for treatment of multiple sclerosis (Starossom et al, 2012). This lectin was recently implicated in the disease mechanism of the hereditary disorder giant axonal neuropathy (Mussche et al, 2012). In addition to binding glycoproteins and glycolipids, and acting as a noncovalent crosslinker, lectins can activate intracellular signaling cascades. For instance, the attachment of galectin-3 to branched N-glycans modulates focal adhesion remodeling through activation of focal adhesion kinase and phosphoinositide 3-kinase (Ono et al, 2001).

There is biochemical cross-talk between aberrant cell surface glycosylation patterns and intracellular phosphorylation cascades; recent investigations have demonstrated that modulation of glycogene function by gene therapy can suppress tumor growth in vivo and induce measurable changes in kinase-mediated signaling (Moskal et al, 2009; Kroes et al, 2010). Reversible serine and threonine O-glycosylation by N-acetyl glucosamine has been demonstrated to act as an off-switch for protein phosphorylation (Hart et al, 2011).

Proteoglycans and glycoproteins are enriched in the extracellular matrix (ECM). In addition to disrupted extracellular lectin–matrix interactions, dysregulation of ECM components are associated with neuropsychiatric disorders. In a genome-wide association study of 682 patients and 1300 control subjects, genetic variation in the neurocan gene was demonstrated to be linked to bipolar disorder (Cichon et al, 2011). A follow-up study to derive main clinical factors associated with the neurocan risk allele was most strongly linked to mania, especially overactivity (Miro et al, 2012). Furthermore, neurocan knockout mice showed hyperactivity and other mania-related symptoms that were decreased by lithium administration (Miro et al, 2012). The aberrant organization of the ECM in AD was studied by Morawski et al (2012). ECM components, such as brevican and aggrecan, form perisynaptic axonal coats and perineural nets, respectively. Loss of brevican was correlated to synaptic loss, but other ECM components resisted degradation.

GLYCOMICS

Glycans decorate cell surface glycoproteins and glycolipids; as previously mentioned, these oligosaccharide structures are some of the most complex and diverse structures in vertebrate cells. The advancement of glycomics has been challenged by the need to develop analytical tools to characterize glycans on proteins and lipids in large-scale studies. Previously, definitive structural assignment, including anomeric sugar linkages, required nuclear magnetic resonance (NMR) methods, which are highly specific in their output but, in general, lack sensitivity for global studies of cellular glycans. Advances in MS have recently enabled high-sensitivity analysis of both protein and lipid glycans. Low-flow liquid chromatography (LC) separations (nano-LC) coupled to low-flow ESI (micro and nano-electrospray) (Emmett and Caprioli, 1994; Wilm and Mann, 1994) with high-resolution MS, FT-ICR MS, and MS/MS analysis greatly facilitates identification of glycans from extremely complex biological mixtures. Lower-resolution MS and other ionization methods can also be used in glycoconjugate analysis, albeit with lower confidence in identification. MALDI-MS analysis is fast, tolerant of salts, and uses little sample, but often the MALDI process causes the loss of labile saccharides (especially sialylation), resulting in the loss of structural information. FT-ICR MS analysis provides the highest mass accuracy and resolution, which greatly aid in the identification of sugar residues, but the analysis time (scan speed) is slow compared with other MS analyses. The slower scan speed does not take full advantage of the chromatographic separation, but the high-resolution FT-ICR analysis permits identification of most overlapping peaks. Some direct glycomic applications with high-resolution FT-ICR MS will be discussed below.

To determine the importance of glycan changes in disease, it is necessary to coordinate many data sets. As proteins mediate the biosynthesis and degradation of complex glycoconjugates, it is necessary to merge transcriptomic, proteomic, glycomic, lipidomic, and even metabolomic data sets to chart pathways involved in altered glycoconjugate phenotypes in disease. The integrated biological approach has been successfully developed through the correlation of phenomic, transcriptomic, phosphoproteomic, and lipidomic data sets (He et al, 2007, 2010a, 2010b; Kroes et al, 2010; Nilsson et al, 2010; Nilsson, 2012). Computational mathematical algorithms described in the next section were developed (Goerke et al, 2011; Meyer-Baese et al, 2012a, 2012b) to extract useful correlations from these massive data sets to generate biochemical pathways to drive intelligent experimental design and to elucidate therapeutic targets. The goal of these studies is to further understand glycan changes to generate new therapeutic targets and identify novel biomarkers for disease treatment and early disease diagnosis.

Analysis of glycoproteins involves not only the structural identity of the attached glycan, but it is also important to know the site of the glycan attachment to the protein. Collisionally induced fragmentation in a mass spectrometer will often result in the removal of the glycan, resulting in the loss of the attachment site. Coupling a softer fragmentation technique first (such as electron capture dissociation (ECD) (Zubarev et al, 1998), with a heating fragmentation such as infrared multiphoton dissociation (IRMPD) (Little et al 1994) as demonstrated by Hakansson et al (2001), provides complementary data that identifies the site of glycosylation attachment with ECD and complementary carbohydrate structural information with IRMPD. Altered protein glycosylation has been detected in AD (Sihlbom et al, 2004, 2005). CSF from individual patients with AD was separated by narrow range two-dimensional (2D) gel electrophoresis, in-gel-digested with trypsin, and analyzed by electrospray FT-ICR MS. Glycopeptides were fragmented with IRMPD, which produces abundant fragment ions through dissociation of glycosidic linkages. MS/MS with high mass accuracy facilitates the structural determination of site-specific N-linked glycosylation and possible identification of AD biomarkers. These analyses can be performed on enzymatically released glycans from the glycoproteins, or intact glycopeptides. The released glycans are collected from the peptides/proteins and further chromatographically purified on-line before high-resolution MS and MS/MS analysis (Wang et al, 2010). These analyses are information-rich, typically producing hundreds of glycan structures from a single sample.

LIPIDOMICS

About 50% of the brain and spinal cord is composed of lipid material. Bioidentical lipids have even been examined as therapeutics; for instance, intracerebroventricular administration of GM1 ganglioside to early onset Alzheimer patients showed positive effects, such as improved motor performance and neuropsychological assessments (Augustinsson et al, 1997; Svennerholm et al, 2002). Lipid profiling or lipidomics is often bundled into metabolomics, but lipids represent a distinct chemical and functional group of biomolecules. Lipids are enriched in the plasma membrane and lipid rafts. Lipids also are important in the compartmentalization of the cell by formation of organellar membranes. Polar lipids, such as gangliosides are cell-surface glycoconjugates that modulate cell adhesion, migration, and signal transduction through interactions, with growth factor tyrosine kinases and other kinase pathway members. Lipids are also responsible for altering membrane properties (eg, membrane permeability) and directly interact with effector proteins (eg, G-proteins) (Chun, 2005; Piomelli, 2005). Lipids are implemented as key factors in many neuroactive processes, such as second messenger signaling, potassium channel flux, cannabinoid receptor activity, and NMDA receptor function (Piomelli et al, 2007). Lipidomic data from neurological and psychiatric model systems can elucidate activation of enzymatic, receptor, and signaling pathways.

Polar lipid analysis has been documented in GBM and GSC cultures (He et al, 2007, 2010a, 2010b) and correlated with phenotypic, transcriptomic, and functional proteomic changes (He et al, 2010a, 2010b). Lipid profiling identifies not only the abundant phospholipids, but also the glycolipids in mammalian cells. Phospholipids are major components of the cell membrane, and their compositions and modifications affect the membrane fluidity and phospholipid-mediated signaling pathways. The methodology described by He et al (2007) was applied to identify phosphatidylinositol (PI), phosphatidylethanolamine, phosphatidylcholine (PC), sphingomyelin (SM), and many other polar lipids with a wide range of acyl chain lengths and varying numbers of hydroxylations and unsaturations. Changes in phospholipid composition not only modulate the membrane fluidity, but also affect phospholipid-mediated signal transduction. For example, phosphorylated PI, including inositol 3,4-bisphosphate [PI(3,4)P2] and inositol 3,4,5-triphosphate [PI(3,4,5)P3], has been shown to affect cell proliferation, survival, and movement (Di Paolo and De Camilli, 2006). The presence of a double bond results in a ‘kink’ in the phospholipid fatty acyl chain. Thus, cells with a high concentration of unsaturated phospholipids may possess a more fluid cell membrane relative to cells with more saturated ones, because unsaturated phospholipids pack much more loosely. Moreover, differentiated tumor cells show an increase in PC and SM, which are predominantly localized on the outer plasma membrane and carry positive charges (Yamaji-Hasegawa and Tsujimoto, 2006). Together, these changes will alter membrane stability, modulate cell surface charge, and alter signal transduction via intracellular pathways.

MS analysis is the analytical detector of choice in lipid analysis. The primary methodology employed for lipid analysis involves a simple solvent/solvent extraction of the lipids and partitioning of the polar and nonpolar lipids into an aqueous and organic phase, respectively. Because of the complexity of the lipid mixture, a chromatographic separation is necessary before MS analysis. Reverse-phase chromatography is often used for the separation of the lipids, but is not the optimal choice because of the risk of irreversible binding of lipids to that resin. He et al (2007) optimized separation of the lipids over a chromatographic resin that focusses on π–π stacking (such as phenyl–hexyl), which greatly improves the recovery of polar lipids from the chromatographic column. Complex polar lipid mixtures are reasonably resolved chromatographically with phenyl–hexyl resins based on variations of oligosaccharide and aglycone moieties. Nano-LC was used to further increase sensitivity and was introduced by high-sensitivity microelectrospray (Emmett and Caprioli, 1994; Emmett et al, 1998) to a high-resolution FT-ICR MS to further aid in the detection of the complex lipid mixture (Schaub et al, 2007, 2008). Negative-ion micro-electrospray provided better sensitivity compared with positive-ion mode for analyzed polar lipids (Levery, 2005). The lipid analysis methodology described above has been applied to analyses of cell culture, tissue, blood, and CSF samples. MALDI-MS imaging is showing great potential to analyze lipids in their histological context, which provides complementary data to information gained through analysis of total extracts (Sparvero et al, 2010; Hanada et al, 2012).

Accurate mass (typically better than 1 ppm), LC retention time trends, along with available tandem mass spectra generated from collisional-induced dissociation of precursor enable the determination of chemical composition and proposed structural assignment of 600–800 glycolipids and phospholipids per sample. A Kendrick mass analysis algorithm aids in assignment of structural families of polar lipids (Hughey et al, 2001). High mass accuracy and high resolving power of high-field FT-ICR MS (Senko et al, 1997; Marshall et al, 1998; Levery, 2005; Schaub et al, 2008) greatly improves the resolution and assignment of glycolipids and phospholipids in the complex mixture of polar lipid extracts from biological samples.

METABOLOMICS IN NEUROPHARMACOLOGY

To elucidate the etiology of neuropsychiatric disorders, metabolomic data can be quite complementary to proteomics. For example, Zhang et al, 2011 studied high and low anxiety-related behavior in mice by this combined approach. One conclusion of the integrated study suggested that the inositol pathway was implicated in mice with a genetic predispositon to anxiety. Taken together, their findings described proteins, metabolites, and pathways that could be related to the anxiety phenotype. Metabolites vary in size, polarity, charge, structure, and elemental composition, all of which reside as a highly complex mixture in a biological matrix; thus, comprehensive metabolite profiling is a daunting task that cannot be performed by a single analytical technique. Metabolomic profiling in neuroscience is discussed at length by Wood (2013) elsewhere in this issue. This section is meant to encourage metabolomic analysis, to inform the reader of the methodologies available and challenges of each.

NMR spectroscopy analysis provides the most detailed structural information for metabolites, but its application has limits of sensitivity and complexity (Steuer, 2006). One-dimensional (1D) NMR spectroscopy is often used for metabolite profiling, because the analysis is fast (min), but the fast scan time results in low resolution and low signal-to-noise ratios on metabolites. 2D NMR greatly increases resolution of metabolites, but the tradeoff is long acquisition times (several hours) per sample (Motta et al, 2010). NMR sensitivity is improving with the development of microcoil probes and the ability of coupling NMR detection with LC concentration/separation of biological metabolites, but for analysis of metabolites at biological concentrations, MS is currently the method of choice (Rubakhin et al, 2011).

For MS analysis of metabolites, GC-MS is the gold standard (Griffin and Kauppinen, 2007; Wibom et al, 2010; Garcia and Barbas, 2011). Even though GC-MS is characterized by high sensitivity, the separation of metabolites is greatly improved by 2D GC (GC × GC) analysis, because often more than one compound occupies a peak as it elutes from the GC. The second dimension separation in GC × GC often reveals lower-level components that were masked in a 1D separation. Recently, GC × GC separations have been applied increasingly to the analysis of metabolites because of the complexity of the biological samples (Mohler et al, 2006; Ralston-Hooper et al, 2011). The added separation afforded by GC × GC decreases the complexity of the MS analysis of small metabolites. Both GC-MS and GC × GC-MS techniques are limited to the analysis of volatile compounds or compounds derivatized chemically to make them volatile. On the other hand, LC separations before MS analysis allows analysis of compounds without derivatization (Urban et al, 2010; Vuckovic and Pawliszyn, 2011) and, similar to GC, have the ability to separate chiral molecules under optimal conditions (Oh et al, 2011). The nonpolar component of metabolomics has been largely overlooked. Many nonpolar analytes once derivatized can be analyzed by GC or GC × GC MS, but not all nonpolar metabolites can be detected. Supercritical fluid chromatography (SFC) coupled to MS analysis adds another dimension to the analysis of these complex nonpolar metabolite mixtures (Farrell et al, 2009). In a very simplistic description, SFC separations combine attributes of both GC and LC separations, because the separations are performed in a gas that is in a supercritical or enhanced fluidity state (Taylor, 2008). Molecules that are not volatile and thus intractable by GC can often be dissolved in an enhanced fluidity gas and be separated by SFC. The same can be said of nonpolar compounds that are insoluble in aqueous solvents and can’t be separated by conventional reverse-phase LC. In addition, SFC is readily interfaced with MS through atmospheric ionization sources and, with short separation times, is considered a high-throughput analysis (Farrell et al, 2009).

The application of SFC-MS for metabolomics has been demonstrated, but currently SFC-MS is underutilized in metabolomic analysis. Most of the methodologies described above involve different separation techniques coupled with MS, but all of the technologies are complementary and, when used together, will provide the most complete metabolomic data sets to date. Another factor involved in the MS analysis of metabolites is the efficient ionization of the compounds before MS detection. Multiple ionization modes are needed to detect comprehensively the diverse chemical compounds in the complex biological metabolome, including positive and negative ionization electrospray, chemical ionization in both positive and negative modes, and photoionization. Once separated and ionized, elemental composition assignment of metabolites is the first step of analysis. NMR analysis is superior to MS, because it provides exact structural information for compounds; however, the sensitivity of the technique can be insufficient for the anlysis of complex biological mixtures. Low-resolution MS analysis necessitates MS/MS in most cases to provide elemental composition. Ultrahigh-resolution high mass accuracy MS (FT-ICR MS) often can assign elemental composition based on accurate mass, but sub-ppm mass accuracy alone maybe insufficient for elemental assignment because of the large number of possible atomic configurations for a given nominal mass. Recently, accurate mass coupled with ultrahigh-resolution isotopic fine structure information (as shown in Figure 2) has demonstrated that compounds can be correctly identified (Marshall et al, 1998; Shi et al, 1998; He et al, 2004; Marshall and Hendrickson, 2008; Miura et al, 2010). By using the hetero-atom content of the metabolite ion, nominal spacing of isotopic intensities, and ultrahigh mass accuracy, metabolites’ elemental composition can be determined unambiguously (Daltonics B, 2012).

Figure 2
figure 2

Ultrahigh-resolution isotopic fine structure of a metabolite, S-methyl-l-cysteine. Spectra were collected with a 12 T FT-ICR MS with a mass resolving power of 340 K. The exact mass of S-methyl-l-cysteine [M+H]+is 136.04267. The zoomed inset shows the first 13C isotopic fine structure. The natural isotope abundances are: 15N, 0.36%; 33S, 0.76%; 13C, 1.1%; and 2H, 0.015%. The exact mass, isotopic fine structure, and isotope abundances are used to assign unambiguously elemental compositions.

PowerPoint slide

In summary, the acquisition of comprehensive metabolite data sets requires an enhanced metabolomics toolbox of complementary analytical technologies and methods. There is no single assay that permits the detection and identification of all metabolites in a biological sample; thus, several analytical techniques must be employed to perform global metabolomic analysis. Because of the complexity of these analyses, a targeted metabolite approach is often adopted for specific experiments.

COMPUTATIONAL METHODS FOR INTEGRATIVE NEUROBIOLOGY

Computational approaches have facilitated quantitative analysis and data interpretation of large data sets stemming from EEG, MEG, PET, CT, MRI, and fMRI, showing the response of the brain to placebo and drug effects for substance use and cognitive disorders. The computational techniques range from (a) neural dynamical models describing synchronization to (b) multivariate pattern recognition techniques for emerging spatial and temporal maps as a response to a certain task or analyzing impulse control in connection to drug abuse, and finally to (c) graph theoretic approaches to elucidate the brain network topology and contrast structural and functional connectivity deficiencies associated with disorders (Stam et al, 2010; Fornito et al, 2012; Giessing and Thiel, 2012).

The latter, also known as ‘imaging connectomics’ (Fornito et al, 2012), represents a novel trend in neuroimaging techniques. It is based on analysis of graph networks, whose topologies aid in identifying areas of malfunctioning brain network architectures as targets for cognitive enhancing drugs. Existing studies, which are few, have shown that graph networks represent a powerful tool in the development of data-driven biomarkers of cognitive enhancing drug effects, on both local and global scales. Also, they provide a framework to understand the dynamics of neurodegenerative diseases. A new approach in drug development, network pharmacology, aims at addressing the whole brain instead of targeting specific areas, and represents a new route in therapeutics. Although neuroimaging scientists have embraced recently graph theoretic approaches, integrated biological studies in neuropharmacology based on large graphs are sparse (limited to only two studies of visualization in AD and Parkinsons diseases) (Paananen and Wong, 2009; You et al, 2010). Large-scale biological data analysis and their integration in neuropharmacology based on graph networks must overcome two major challenges: the high dimensionality of the data compared with neuroimaging and the disparate data sources to be integrated within the same network. Advances in graph theoretical analysis in the context of statistics and the development of visualization and navigation tools in large graphs have been combined to overcome these major challenges.

The first steps in this direction can be found in several approaches to mathematical modeling that have been developed to infer relationships between large data sets derived from complex biological systems. Biological systems are highly nonlinear and, thus, mathematical approaches are either qualitative, such as the graph network analysis based on correlations, or quantitative, such as the nonlinear multi-mode autonomous deterministic or stochastic system. However, one major issue needs to be addressed in order to facilitate the application of these techniques: the feature size of the measured data is far larger than the sample size, a phenomena known as the ‘curse of dimensionality’ in data mining. Effective feature reduction in case of qualitative approaches or model order reduction for quantitative techniques becomes imperative in order to reduce the complexity in computation and enhance the generalization capabilities of the proposed model. The field of transcriptomics has benefited greatly from the development of computational models to determine gene regulatory networks (GRNs) (Hecker et al, 2009), but the techniques are applicable to proteomic and glycomic data sets as well. In the following section, we will present some modern techniques that pertain to nonlinear quantitative and qualitative model reduction.

Approaches to Model GRNs

GRNs represent gene circuits that interact and regulate their own and the activity of other genes by protein expression. Many GRNs are described by complex models, which are difficult to analyze and control (Hecker et al, 2009). The large-scale nature of these systems and the highly complex underlying models require reduced-order models to facilitate their analysis. Balanced truncation is known as a popular method for model reduction, as it is relatively simple and it yields a qualitatively accurate system. The interpretation of most balancing techniques (Scherpen, 1993) is based on the concept of past input energy (controllability) and future input energy (observability). For a GRN system to be controllable means to be able to reach a given state under the influence of an input signal, whereas to be observable means that there is enough information to observe all possible states of the system. The idea behind transforming a system into balanced form is to easily detect and remove a state component of the initial system to obtain a reduced-order model. Although for linear systems finding a balancing coordinate transformation via solutions of the controllability and observability, Lyapunov equations (Hamilton, 1994) are quite easy, but for nonlinear systems these equations are almost impossible to solve and, thus, balancing becomes in general a challenging task. Carleman bilinearization (Mohler, 1973; Brockett, 1976; Meyer-Baese et al, 2012a, 2012b) facilitates the representation of a nonlinear system by a bilinear form. The idea behind the Carleman bilearization procedure is to approximate the nonlinear function in the GRN by a Taylor expansion. This procedure preserves a certain contribution of the nonlinearity of the system in the resulting simplified form. The achieved reduction represents a simple estimate for the additional parameters employed and is, at the same time, computationally nonintensive for deterministic GRN under assumption of a bounded input. The determined model reduction has potential application for reverse engineering and robust biosynthetic GRN design.

Approaches to Model Phosphoprotein-Mediated Networks

Advances in MS techniques have facilitated the identification and quantification of very large numbers of phosphoproteins and corresponding phosphorylation sites. These identified and quantified phosphorylation sites provide a wealth of information about the biochemical status of cells and tissue, and have an increasing role to elucidate the mechanism and importance of novel therapeutics in cancer research and other diseases. However, the extensive data sets do not automatically provide deep insight into deciphering the phosphoproteome and, most importantly, how to use the acquired knowledge in reverse engineering and drug design. Quantitative data analysis methods, such as statistical data analysis and predictive analytical techniques, are essential to derive knowledge from the data.

For large data sets at the protein level, network-based perspectives offer a solution to this problem and many node-edge graphs are stored in public databases. Most protein networks are assembled from literature or inferred merely from data and cannot predict the response of cells to aberrant signals. Few attempts to specify a pathway as a logical model emulating the signal input/output flow through gates in a parallel optimization against experimental data have been described. However, for site-specific phosphorylation data sets, both the theoretical and computational integration tools are very limited. The analysis of phosphoproteomic data highly pertinent to systems biology and disease informatics is either data-driven or topology-driven. Data-driven analysis requires a priori information of the signaling topology and includes standard techniques from chemometrics, such as principal component analysis and partial least square regression. Topology-driven techniques are based on ordinary differential equations, fuzzy logic, and Boolean logic, and employ canonical pathways as a scaffold. The only theoretical study for phosphoproteomic data constructs large signaling pathways using an adaptive perturbation approach (Melas et al, 2012). Pathway optimization is accomplished via integer linear programming. Two computational tools are available in the form of advanced bioinformatics platforms: NetworKIN (Linding et al, 2008) and PhosphoSiteAnalyzer (Bennetzen et al, 2012). NetworKIN predicts kinase families based on detailed linear motif analyses by employing an artificial neural network and scoring matrices. The prediction of kinase families responsible for the phosphorylation of a given substrate are determined with high accuracy and sensitivity. PhosphoSiteAnalyzer is currently the most advanced tool and enhances the capabilities of NetworKIN by using it to obtain kinase predictions from several phosphorylation sites. A large-scale phosphoproteomic data analysis is accomplished based on integrated bioinformatics modules. These modules perform a statistical data analysis and extract relevant biological information.

Future research initiatives in theoretical phosphoproteomic modeling must focus on mathematical formulations that do allow a prediction of therapeutic targets or a desired response to a certain stimulus. Computational bioinformatic tools should provide improved solutions for facilitating complex kinase-substrate network analysis in a user-friendly integrative manner.

Identification of Novel Interactions Based on Compressed Graph Networks

The analysis of transcriptomic, proteomic, and metabolomic data sets with existing knowledge-based tools, such as IPA, can only reveal interconnected pathways that are already represented in the scientific literature. New models are needed to ‘connect the dots,’ making occult connections between data sets obvious. As physicist Richard Feynman expressed it, ‘There is a rhythm and a pattern between the phenomena of nature which is not apparent to the eye, but only to the eye of [mathematical] analysis.’ The Gaussian Graph Model (GGM) technique is one approach to create interrelated ‘omics’ graph network and determine novel correlations (Mueller et al, 2011). Reproducible quantitative changes in (phospho)proteins, transcripts, and metabolites in response to disease or other perturbations can contain correlations that cannot be derived based on intuitive reasoning. Visualization by generation of Gaussian Graph Maps is a powerful technique to find novel correlations and can be applied to derive new knowledge about the function of activators or inhibitors in brain-derived tissues or cells.

To identify the effects on lipid levels in glioma cells as a result of drug treatment, lipidomic data sets from control and treated cells were acquired (He et al, 2007). In contrast to simple Pearson’s correlation networks the partial correlations of GGMs only identify direct correlations, whereas indirect associations are eliminated as shown in Figure 3. By implementing disease-driven GGM, we revealed GBM-specific lipid correlations to advance biomedical research on novel gene therapies. A summary of our results is found in Figure 4.

Figure 3
figure 3

The raw lipidomic data is a matrix of samples over variables. (a) The samples are the individual perturbations, which are grouped into control samples and the sample(s) of interest to the study, here simply called ‘disease’. Partial correlations of all variables are obtained and later evaluated with respect to statistical significance. (b) To investigate whether a significant partial correlation is specific for the disease sample, partial correlations (as in a) were calculated for the entire data set, as well as for data sets where each one sample was left out. Unless a correlation is significant in all Gaussian Graph Models (GGMs), it is considered disease-specific. This figure is reproduced with the permission of the publisher (Mueller et al, 2011).

PowerPoint slide

Figure 4
figure 4

Lipids specifically regulated when a glioma cell was effectively perturbed. (a) Relative numbers of disease-specific and -unspecific lipid–lipid partial correlations in the Gaussian Graph Models (GGMs). Analysis of the entire data set is named ‘conventional’ GGM with respect to disease specificity. (b) Disease-relevant GGM, which is associated with the combined perturbation by WT-p53 adenoviral transfection before SN38 chemotherapy in U87 glioma cell lines. (c) Modularity matrix was calculated by using the lipid species as class label for the GGM in b. PE, phosphatidylethanolamine; PG, phosphatidylglycerol; PI, phosphoinositol; PS, phosphoserine. This figure is reproduced with the permission of the publisher (Mueller et al, 2011).

PowerPoint slide

We have previously applied graph-clustering techniques to determine correlations in the glycomic and glycotranscriptomic responses of glioma stem cells to pharmacological treatment with a STAT3 phosphorylation inhibitor or fetal bovine serum (Goerke et al, 2011), yielding characteristic fingerprints of glioma stem cell responses that could not have been determined by searches of the scientific literature. We developed a novel technique to replace standard correlation networks to better reveal the topological properties reflecting the response to therapeutical interventions (Steuer et al, 2003; Steuer, 2006). Traditional correlation networks are based on computing the Pearson’s correlation coefficients between the nodes of the graph corresponding to the distinct measurements and establishing an edge if the correlation value exceeds a specified threshold.

Our paradigm is based on graph-clustering techniques as an equivalent to standard correlation networks for the analysis of therapeutic outcomes for GBM cells. Global multivariate approaches become imperative to comprehend the potential dysregulation of normal cellular responses in disease and their response to various therapeutic interventions, and to identify causal influence relationships involving multiple biological analytes. Figure 5 shows an example of such an emerging graph network structure.

Figure 5
figure 5

The clustered glycolipids in treatment with WP1193 (threshold of 0:9). The edges are colored on a linear HSV-scale (from blue=cold to red=hot) by the absolute value of the edge weight, and the size of the nodes is proportional to sum of their incident absolute edge weights. In this difference graph, a node is large if it differs strongly in its correlations to other nodes in the WP1193 treatment. Small nodes roughly preserve their correlation behavior in this treatment. This figure is reproduced with the permission of the publisher (Goerke et al, 2010).

PowerPoint slide

Algorithms that are robust and efficient for large-scale experiments and drug design are an active field of development in mathematics and will become standard tools in the future for biomedical scientists to identify novel correlations between dysregulated pathways in neurological and psychiatric disease.

FUTURE DIRECTIONS AND CLINICAL IMPLICATIONS

The recent advancements in human genomic research are essential to other global biological studies, but not completely sufficient to describe complex underlying phenomena associated with neurological and psychiatric disorders. The application of integrative biological techniques in the realm of neuropsychiatry and neuropharmacology is increasing in number and frequency in the scientific literature. The success of these endeavors will require new collaborations between basic science and clinical teams. Through the acquisition and integration of quantitative data sets from two or more biomolecular subsets, synergies arise and promise to provide the field with new therapeutic targets and biomarkers of disease and drug response.

FUNDING AND DISCLOSURE

Dr Priebe is the founder and a shareholder of Moleculin, IntertechBio, and Houston Pharmaceuticals. He is also the cofounding scientist, shareholder, and Scientific Advisory Board member of the Reata Pharmaceuticals, and has received financial compensation. Dr Moskal is the founder of Naurex. He has founders’ shares of stock in the company. He receives financial compensation as a consultant. Dr Kroes has been a consultant for Naurex for the past 3 years, and has received financial compensation and stock. Dr Nilsson is a former employee of Pfizer Global Research and Development (2007–11). Drs Conrad, Emmett, and Meyer-Baese have no conflicts of interest to disclose.