Trends in the characteristics of human functional genomic data on the gene expression omnibus, 2001–2017

Liu, Daniel D.; Zhang, Lanjing

doi:10.1038/s41374-018-0125-5

Article
Published: 11 September 2018

Trends in the characteristics of human functional genomic data on the gene expression omnibus, 2001–2017

Laboratory Investigation volume 99, pages 118–127 (2019)Cite this article

476 Accesses
3 Citations
2 Altmetric
Metrics details

Subjects

Abstract

The gene expression omnibus (GEO) is the world’s largest public repository of functional genomic data. Despite its broad use in secondary genomic analyses, the temporal trends in the characteristics of genomic data on GEO, including experimental procedures, geographic origin, funder(s), and related disease, have not been examined. We identified 75,376 Series deposited to the GEO during 2001–2017 and built a database of all human genomic data (39,076 Series, 51.8% of all Series). Using the associated publications, we obtained funding information and identified the related disease area. Of the Series with classified disease areas, the two most common were cancer (n = 12,688, 32.5%) and immunologic diseases (n = 2,393, 6.1%), while the percentages of all other disease areas were below 5%, including neurological diseases (n = 1733, 4.4%), infectious diseases (n = 1225, 3.1%), diabetes (n = 828, 2.1%), and cardiovascular diseases (n = 299, 0.8%). In recent years, there has been a significant increase in the use of high-throughput sequencing (HTS), protein array and multiple-platform technologies, as well as in the proportion of North American deposits. Compared to those from other regions, North American deposits appeared to lead the shift from array-based to HTS technologies (odds ratio [OR], 95% confidence intervals [CI] = 3.39, 3.23–3.55, P = 9.40E−323), and were less likely to focus on a major disease area (OR = 0.64, 95% CI: 0.61–0.67, P = 5.02E−107), suggesting a greater emphasis on basic science in North America. Furthermore, the Series utilizing HTS were less likely to be disease-classified compared to other technologies (OR = 0.39, 95% CI: 0.37–0.41, P = 1.00E−322), suggesting a preferential use or adoption of HTS in basic science settings. Finally, funding from the NHGRI, NCI, NIEHS, and NCCR resulted in a higher number of GEO Series per grant than other NIH institutes, demonstrating different preferences on genomic studies among awardees of NIH institutes. Our findings demonstrate geographic, technological, and funding disparities in the trends of GEO deposit characteristics.

You have full access to this article via your institution.

Download PDF

GREIN: An Interactive Web Platform for Re-analyzing GEO RNA-seq Data

Article Open access 20 May 2019

Naim Al Mahi, Mehdi Fazel Najafabadi, … Mario Medvedovic

GenomicSuperSignature facilitates interpretation of RNA-seq experiments through robust, efficient comparison to public databases

Article Open access 27 June 2022

Sehyun Oh, Ludwig Geistlinger, … Sean Davis

Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction

Article Open access 21 October 2020

Li Tong, Po-Yen Wu, … May D. Wang

Introduction

The gene expression omnibus (GEO) is the world’s largest public repository of functional genomic data, founded and run by the US-based National Center for Biotechnology Information (NCBI) within the National Library of Medicine at the National Institutes of Health (NIH) [1]. Along with its European counterpart ArrayExpress [2], such repositories are central towards fostering reproducibility and open access in genomic research [3].

GEO data are classified into four entity types: Platform (GPL), Sample (GSM), Series (GSE), and DataSet (GDS) [1]. Platform (GPL) records detail the specific technology or technologies used to obtain data of a given sample. Sample (GSM) records describe the experimental output of one individual sample. Series (GSE) records consist of a group of related Samples within an experiment. Finally, DataSet (GDS) records are the Series that have been curated by GEO staff, normalized to be biologically and statistically comparable.

Buried within the metadata of GEO deposits, however, lie broader trends in the research ecosystem. Open-access genomic databases on human samples are critical for future advances in oncology and medicine, and have been expanded significantly in the past decade [4,5,6,7,8,9,10]. However, to date, there have been no in-depth analyses of the trends in functional genomic data on GEO or ArrayExpress, despite their growing importance and volume. Such information could prove especially useful for the research on genomic medicine [11], public health [12, 13], and science funding and policy [14].

Here, we developed a database of human GSE alongside their associated metadata, and identified the temporal trends in genomic data growth on GEO. Only some of this metadata was readily available from the GEO browser; the disease-of-interest was extracted from experiment summaries, and funding data were extracted from the associated publications. Probing this database yielded several new insights on the technology, geographic origin, and research focus of the functional genomic studies. Most prominently, we observed a rapid adoption of high-throughput sequencing (HTS) in North America, alongside a shift toward basic research in human.

Materials and methods

Metadata extraction

We identified and included human GEO series using the organism keyword of Homo sapiens without any other search criteria in July 2017, and again in January 2018 for updates. Metadata on all human GEO Series (GSE) were downloaded from the GEO repository browser, including accession codes, title, Series type, release date, and associated curated GDS. Geographic origin (i.e., the corresponding author’s affiliation on the record) and experimental summaries were extracted from each Series’ accession display page using a custom web scraper. For Series with one or more associated publications, further metadata were extracted from MEDLINE, a bibliographic database indexed by the National Library of Medicine. We extracted the grant numbers under the GR field, and the medical subject headings from the MH field. Only the Series uploaded on or before 31 December 2017 were included in the analyses.

Data curation

From the raw metadata, certain fields were extracted to facilitate analyses. The Series type indicates both the general application (e.g., expression profiling or SNP genotyping) and the technology used (e.g., array or HTS). Due to the large number of such combinations, we separated the application and technology for individual analysis.

We classified each Series into one of the six broad disease areas using a keywords-based classification strategy: cancer, cardiovascular diseases, diabetes, immunologic, infectious diseases, and neurologic diseases. Briefly, we scanned each Series’ summary for keywords relating to each disease classification (Supplementary Table 1), and categorized it into the one with the greatest number of keyword hits. Those with no keyword hits were categorized as “unclassified.”

From the grant numbers, we parsed out the specific National Institutes of Health (NIH) institute(s) funding each grant, or listed down “other” for non-NIH institutes. During the data analysis, if a Series was funded by more than one NIH institute or center, each was counted once. If a Series was funded by two grants from the same institute, that institute was counted twice.

Statistical analysis

Statistical analyses including Fischer exact test were performed using MATLAB (Version R2017a March 2017, MathWorks). The Joinpoint Regression Program (Version 4.5.0.1. June 2017, Statistical Research and Applications Branch, National Cancer Institute, Bethesda, MD, USA) was used to analyze the trends in the number of deposited Series per annum and subgroup trend-analyses, from which annual percent change (APC) values were computed [15]. The model selections were based on permutation tests in which log transformation was conducted, an overall P value < 0.05 was considered as significant, and the number of randomly permuted data sets was 4499. Up to two joinpoints were allowed. All P values were two-sided.

Results

Of the 75,376 Series deposited on GEO between 2001 and 2017, a total of 39,076 (51.8%) were human samples. Raw data for the human Series are summarized in Table 1. Fig. 1 shows the descriptive statistics of the Series by geographic origin, disease classification, genomic application, and technology. A slight majority of Series (54%) originated from North America, followed by Europe (28%) and Asia (15%) (Fig. 1a). Around 48% of Series could be classified to one of six major disease-categories: in descending order, cancer (30%), immunologic diseases (9%), neurologic diseases (4%), infectious diseases (3%), diabetes (2%), and cardiovascular diseases (1%) (Fig. 1b). The remaining “unclassified” Series consisted of mostly basic science studies, and some less prevalent diseases. Genomic application was dominated by expression profiling (62%), (Fig. 1c). The majority of the Series were collected using array technologies (58%) or HTS (26%) (Fig. 1d).

Table 1 Trends in the characteristics of the functional genomic data deposited in the gene expression omnibus (GEO) from 2001–2017

Full size table

We next sought to discover trends in these data over time. In regards to the number of Series deposited per year, we identified two segments of growth (one joinpoint), namely 2001–2009 (APC = 43.6, P < 0.001) and 2009–2017 (APC = 20.3, P < 0.001). Sharp fluctuations were found in the number of DataSets (GDS) curated from each year (Fig. 2). GDS curation grew rapidly from 2001 to 2006, when it peaked at 193, but following this period, a very low number of Series were curated from 2008 to 2010. In 2011 there was a sudden jump up to 200 GDS, but the number has since dropped to zero.

There were also trends in the geographic origin of Series (Fig. 3). When GEO was launched, a vast majority of the submitted Series originated from North America. With each passing year, however, Europe and Asia represented an increasingly large proportion of submitted Series. This trend took a dramatic turn in 2015, after which the proportion of North American Series sharply increased (Fig. 3a). Analysis of the raw number of Series per year shows that European deposits have plateaued around 2012, with other regions still steadily growing (Fig. 3b).

Given the rapidly evolving nature of genomics, it is perhaps unsurprising that there were changing trends in the genomic technologies used for producing the deposited human genomic data. While array-based technologies initially predominated, HTS rapidly overtook it in 2016 (Fig. 4a). The number of HTS Series deposited per year has been exponentially increasing (APC = 79 for 2009–2017, P < 0.001), while arrays have nearly plateaued in recent years (APC = 3.4 for 2011–2017, P = 0.07) (Fig. 4b, Supplementary Table S2). There has also been a sustained increase in the number of Series using “other” technologies (APC = 59 for 2001–2017, P < 0.001), possibly reflecting the growing number of emerging functional genomic techniques. Interestingly, Series originated from North America were 3 times more likely to use HTS technology compared to those from other regions (OR = 3.39), a gap that dramatically widened after 2015 (OR₂₀₁₇ = 5.52) (Fig. 4c, Table 2).

Table 2 Association between human GEO deposits’ technology used (high-throughput sequencing technology vs. other methods) and their corresponding geographic origin (North America vs. other regions) from 2008 to 2017

Full size table

We next investigated trends in the Series’ disease-of-interest over time. The proportion of Series that could be classified to one of the six major disease-categories increased steadily from 2003 to around 2008, after which it remained steady at around 60% (Fig. 5a, b). However, starting in 2015, the proportion of Series related to major disease area dropped sharply, down to 36% in 2017. This reflects an increase in “unclassified” Series focusing on basic science and less prevalent diseases. Nevertheless, all six disease classifications still saw a steady growth in the number of Series per year (Supplementary Table S3). The decreasing proportion of disease-classified Series was due almost entirely to those of North America, which dropped from 59% disease-classified in 2015 to just 25% in 2017, while there was no change for the rest of the world (Fig. 5c, Table 3). Importantly, Series utilizing HTS were significantly less likely to be disease-classified compared to other technologies (OR = 0.39), suggesting a preferential use or adoption of HTS in basic science settings (Fig. 5d, Table 4).

Table 3 Association between human GEO deposits’ geographic origin (North America vs. other regions) and their corresponding disease area (related to a major disease area or unclassified) from 2001 to 2017

Full size table

Table 4 Association between the human GEO deposits’ technology used (high-throughput sequencing vs. others) and their corresponding disease area (related to a major disease area or unclassified) from 2008 to 2017

Full size table

Finally, we assessed trends in the funding sources of Series with associated publication(s) indexed in the MEDLINE. Funding information could only be extracted and analyzed for Series with associated publications, accounting for ~68% of all Series. Of the grants with associated publications indexed in MEDLINE, the large majority (86%) were funded by the U.S. NIH. The NIH institutes funding the greatest proportion of Series were, in descending order, the National Cancer Institute (NCI, 33%), National Institute on Aging (NIA, 11%), National Institute of General Medical Sciences (NIGMS, 7.7%), National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK, 6.7%), and National Heart, Lung, and Blood Institute (NHLBI, 6.6%). There were no significant trends in funding sources over time (Fig. 5e). However, simply assessing the proportion of Series funded by a particular agency can be misleading, as larger agencies can naturally fund more studies. To address this, we normalized the number of Series funded by each NIH institute to the total number of grants funded by that institute, giving the proportion of grants that result in a GEO Series. The overall NIH proportion was 0.063, or nearly one Series produced per 16 grants. Five institutes were above this level: unsurprisingly, the National Human Genome Research Institute (NHGRI, 0.49), followed by the NIA (0.19), NCI (0.18), National Center for Research Resources (NCRR, 0.12), and National Institute of Environmental Health Sciences (NIEHS, 0.085) (Fig. 5f). The NIH was not more likely to fund disease-classified studies compared with non-US agencies (OR = 1.02, P = 0.677) (Supplementary Table S4).

Discussion

Since its inception in 2001, the GEO has become a mainstay of molecular biology research [1]. Its exponential growth reflects an evolving research environment where HTS technologies are increasingly used in human genomic studies. GEO metadata thus present a valuable resource in analyzing trends in the research ecosystem. This study, to our best knowledge, represents the first in-depth study of human GEO Series, encompassing geography, disease of interest, funding sources, genomic application, and technology. The summary database curated here is powerful because it not only allows for analysis of descriptive statistics and trends, but also correlations that offer clues as to the origin of specific trends.

Curated DataSets (GDS) are very valuable tools for researchers. They are normalized to be biologically comparable, and are compatible with a suite of data display and analysis tools offered by GEO. Thus, the sharp decline in GDS records in recent years may be troublesome for high-quality, secondary genomic analyses. However, due to the increasing use and availability of free bioinformatics packages [16,17,18,19,20], normalization of functional genomic data is no longer a difficult task. It was likely deemed that the curation process is no longer of sufficient priority to the research community.

The predominant geographic origin of the GEO data has taken some interesting turns. Although the repository was becoming increasingly international, North American deposits once again began dominating after 2015. This was due to a sharp increase in North American deposits as well as a plateau in European ones. The reason behind these trends is not clear, but it is not likely the case that Europeans are now preferentially depositing on ArrayExpress, which continues to see only linear growth in their number of deposits [2].

Of note, it seems that North America is spearheading the sharp rise in HTS technologies in recent years, although its use is increasing in other regions as well. This finding is consistent with the fact that the U.S. has invested more in genomic research than any other country in the world [14]. HTS encompasses a variety of techniques, including ChIP-seq for genome binding profiling, and RNA-seq for transcriptome profiling. RNA-seq has some advantages over array-based technologies, being superior for detecting low-abundance transcripts, biologically distinct isoforms, and genetic variants [21, 22]. As sequencing becomes increasingly cheaper per base, and analysis software more widespread, RNA-seq may continue to overtake array-based technologies.

Interestingly, HTS was less likely to be used to study one of the six major disease areas. This suggests that HTS, as a relatively new technology, is still largely used for basic science and is still in the process of being adopted for more disease-specific applications (likely clinical studies). Nevertheless, this shifting technology carries important implications for the use of genomic data in clinical decisions and precision medicine [9, 23, 24]. Indeed, Array-based transcriptomics are already being used for cancer diagnosis [25, 26], staging and prognosis [27,28,29,30]. Moreover, the unique ability of RNA-seq to detect gene fusions and disease-associated isoforms appears to be an advantage for a clinical tool development [31], although comparatively few RNA-seq-based clinical tests currently exist [32, 33]. As HTS becomes increasingly prevalent in the research world, clinicians will need to adapt so as to be able to effectively collect, analyze, and interpret data of such formats.

Of the investigated disease areas, there was a dominance of unclassified (likely basic research), cancer, and immunological diseases in the GEO deposits. The low percentages of GEO deposits in other disease areas, such as cardiovascular disease, may be concerning because the lack of sufficient human genomic data and understanding in the field may limit the development of genomics-based diagnostics and treatments [10, 31, 34, 35]. Related to this finding, the higher number of GEO Series per grant in select NIH institutes likely reflects a greater preference for and awareness of genomic data among the NIH-sponsored researchers. Perhaps a more interesting question is whether the research areas with fewer per-grant GEO deposits would need more genomic studies. This question may have profound clinical applications and present unique research opportunities. We found that cancer and basic science dominated GEO deposits, consistent with the largest funding sources (such as the NCI). On the other hand, endocrinological (diabetes, for example), neurological, and cardiovascular diseases lagged far behind. Due to the faster accumulation of human genomic data and deeper understanding of cancer, cancer biologists, pathologists, and oncologists will more likely take advantage of genome-based diagnostics and targeted therapies than their colleagues in the fields with fewer genomic data deposits [4, 5, 36]. These advances in cancer will lead to more rapid and profound benefits for cancer patients.

In conclusion, we report increasing trends in GEO deposits (1) using HTS methods, (2) originating from North America, and (3) focusing on basic science applications. Cancer, immunological disease, and neurological diseases were the three disease areas with most deposits on the GEO. We also show that the NHGRI, NCI, NIEHS, and NCCR had a higher number of per-grant GEO Series than other NIH institutes and centers. More studies are needed to elucidate our observations. Our findings nonetheless may shed light on shaping future functional genomics-based research and clinical priorities.

References

Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2013;41:D991–995.
Article CAS PubMed Google Scholar
Kolesnikov N, Hastings E, Keays M, et al. ArrayExpress update—simplifying data submissions. Nucleic Acids Res. 2015;43:D1113–1116.
Article CAS PubMed Google Scholar
Brazma A, Hingamp P, Quackenbush J, et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat Genet. 2001;29:365–71.
Article CAS PubMed Google Scholar
Varmus H. The transformation of oncology. Science. 2016;352:123.
Article CAS PubMed Google Scholar
Grossman RL, Heath AP, Ferretti V, et al. Toward a shared vision for cancer genomic data. N Engl J Med. 2016;375:1109–12.
Article PubMed PubMed Central Google Scholar
Kahn SD. On the future of genomic data. Science. 2011;331:728–9.
Article CAS PubMed Google Scholar
Chin L, Hahn WC, Getz G, et al. Making sense of cancer genomic data. Genes Dev. 2011;25:534–55.
Article CAS PubMed PubMed Central Google Scholar
Varmus H. Genomic empowerment: the importance of public databases. Nat Genet. 2003;35(Suppl 1):3.
Article PubMed Google Scholar
Zhang L. Biomarker discovery and validation in HCC diagnosis, prognosis, and therapy. In: Liu C, editor. Precision Molecular Pathology of Liver Cancer. Cham: Springer International Publishing, 2018. p. 95–113.
Lu M, Zhang J, Zhang L. Emerging concepts and methodologies in cancer biomarker discovery. Crit Rev Oncog. 2017;22(5-6):371-388. https://doi.org/10.1615/CritRevOncog.2017020626.
Kumar D. From evidence-based medicine to genomic medicine. Genom Med. 2007;1:95–104.
Article Google Scholar
El-Sayed AM, Koenen KC, Galea S. Rethinking our public health genetics research paradigm. Am J Public Health. 2013;103(Suppl 1):S14–18.
Article PubMed PubMed Central Google Scholar
Strauss KA, Puffenberger EG, Morton DH. One community’s effort to control genetic disease. Am J Public Health. 2012;102:1300–6.
Article PubMed PubMed Central Google Scholar
Pohlhaus JR, Cook-Deegan RM. Genomics research: world survey of public funding. BMC Genom. 2008;9:472.
Article Google Scholar
Kim HJ, Fay MP, Feuer EJ, et al. Permutation tests for joinpoint regression with applications to cancer rates. Stat Med. 2000;19:335–51.
Article CAS PubMed Google Scholar
Zang C, Wang T, Deng K, et al. High-dimensional genomic data bias correction and data integration using MANCIE. Nat Commun. 2016;7:11305.
Article CAS PubMed PubMed Central Google Scholar
Li P, Piao Y, Shon HS, et al. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-seq data. BMC Bioinform. 2015;16:347.
Article Google Scholar
Jiang Y, Oldridge DA, Diskin SJ, et al. CODEX: a normalization and copy number variation detection method for whole exome sequencing. Nucleic Acids Res. 2015;43:e39.
Article PubMed PubMed Central Google Scholar
Chawade A, Alexandersson E, Levander F. Normalyzer: a tool for rapid evaluation of normalization methods for omics data sets. J Proteome Res. 2014;13:3114–20.
Article CAS PubMed PubMed Central Google Scholar
Liang K, Keles S. Normalization of ChIP-seq data with control. BMC Bioinform. 2012;13:199.
Article Google Scholar
Zhao S, Fung-Leung WP, Bittner A, et al. Comparison of RNA-seq and microarray in transcriptome profiling of activated T cells. PLoS One. 2014;9:e78644.
Article PubMed PubMed Central Google Scholar
Kukurba KR, Montgomery SB. RNA sequencing and analysis. Cold Spring Harb Protoc. 2015;2015:951–69.
Article PubMed PubMed Central Google Scholar
Basu A, Carlson JJ, Veenstra DL. A framework for prioritizing research investments in precision medicine. Med Decis Making. 2016;36:567–80.
Article PubMed Google Scholar
Aronson SJ, Rehm HL. Building the foundation for genomics in precision medicine. Nature. 2015;526:336–42.
Article CAS PubMed PubMed Central Google Scholar
Alexander EK, Kennedy GC, Baloch ZW, et al. Preoperative diagnosis of benign thyroid nodules with indeterminate cytology. N Engl J Med. 2012;367:705–15.
Article CAS PubMed Google Scholar
Meiri E, Mueller WC, Rosenwald S, et al. A second-generation microRNA-based assay for diagnosing tumor tissue origin. Oncologist. 2012;17:801–12.
Article PubMed PubMed Central Google Scholar
Mook S, Van’t Veer LJ, Rutgers EJ, et al. Individualization of therapy using Mammaprint: from development to the MINDACT Trial. Cancer Genom Proteom. 2007;4:147–55.
CAS Google Scholar
Salazar R, Roepman P, Capella G, et al. Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer. J Clin Oncol. 2011;29:17–24.
Article PubMed Google Scholar
Erho N, Crisan A, Vergara IA, et al. Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy. PLoS One. 2013;8:e66855.
Article CAS PubMed PubMed Central Google Scholar
Knudsen BS, Kim HL, Erho N, et al. Application of a clinical whole-transcriptome assay for staging and prognosis of prostate cancer diagnosed in needle core biopsy specimens. J Mol Diagn. 2016;18:395–406.
Article PubMed PubMed Central Google Scholar
Byron SA, Van Keuren-Jensen KR, Engelthaler DM, et al. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet. 2016;17:257–71.
Article CAS PubMed PubMed Central Google Scholar
Sonu RJ, Jonas BA, Dwyre DM, et al. Optimal molecular methods in Detectingp190 (BCR-ABL) fusion variants in hematologic malignancies: a case report and review of the literature. Case Rep Hematol. 2015;2015:458052.
PubMed PubMed Central Google Scholar
Doebele RC, Davis LE, Vaishnavi A, et al. An oncogenic NTRK fusion in a patient with soft-tissue sarcoma with response to the tropomyosin-related kinase inhibitor LOXO-101. Cancer Discov. 2015;5:1049–57.
Article CAS PubMed PubMed Central Google Scholar
Bertagnolli MM, Sartor O, Chabner BA, et al. Advantages of a truly open-access data-sharing model. N Engl J Med. 2017;376:1178–81.
Article PubMed Google Scholar
Van Voorhis WC, Adams JH, Adelfio R, et al. Open source drug discovery with the malaria box compound collection for neglected diseases and beyond. PLoS Pathog. 2016;12:e1005763.
Article PubMed PubMed Central Google Scholar
Scott JG, Berglund A, Schell MJ, et al. A genome-based model for adjusting radiotherapy dose (GARD): a retrospective, cohort-based study. Lancet Oncol. 2017;18:202–11.
Article PubMed Google Scholar

Download references

Availability

The database of human GSE, as well as code used for analysis, is available online at https://github.com/daniel-d-liu/GEO-Trends.

Funding

The work was supported by an Initiative for Multidisciplinary Research Teams (IMRT) award from Rutgers University, Newark, NJ (to L.Z.).

Author information

Authors and Affiliations

Department of Molecular Biology, Princeton University, Princeton, NJ, 08544, USA
Daniel D. Liu
Department of Pathology, University Medical Center of Princeton, Plainsboro, NJ, 08536, USA
Lanjing Zhang
Department of Biological Sciences, Rutgers University, Newark, NJ, 07102, USA
Lanjing Zhang
Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, 08903, USA
Lanjing Zhang
Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ, 08854, USA
Lanjing Zhang

Authors

Daniel D. Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lanjing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lanjing Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Supplementary table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, D.D., Zhang, L. Trends in the characteristics of human functional genomic data on the gene expression omnibus, 2001–2017. Lab Invest 99, 118–127 (2019). https://doi.org/10.1038/s41374-018-0125-5

Download citation

Received: 08 February 2018
Revised: 25 July 2018
Accepted: 15 August 2018
Published: 11 September 2018
Issue Date: January 2019
DOI: https://doi.org/10.1038/s41374-018-0125-5

This article is cited by

Sparassis latifolia and exercise training as complementary medicine mitigated the 5-fluorouracil potent side effects in mice with colorectal cancer: bioinformatics approaches, novel monitoring pathological metrics, screening signatures, and innovative management tactic
- Navid Abedpoor
- Farzaneh Taghian
- Kamran Safavi
Cancer Cell International (2024)
Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models
- Catherine H. Feng
- Mary L. Disis
- Lanjing Zhang
Laboratory Investigation (2022)
Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data
- Fei Deng
- Jibing Huang
- Lanjing Zhang
Laboratory Investigation (2021)

Trends in the characteristics of human functional genomic data on the gene expression omnibus, 2001–2017

Subjects

Abstract

Similar content being viewed by others

GREIN: An Interactive Web Platform for Re-analyzing GEO RNA-seq Data

GenomicSuperSignature facilitates interpretation of RNA-seq experiments through robust, efficient comparison to public databases

Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction

Introduction

Materials and methods

Metadata extraction

Data curation

Statistical analysis

Results

Discussion

References

Availability

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Electronic supplementary material

Supplementary table

Rights and permissions

About this article

Cite this article

This article is cited by

Sparassis latifolia and exercise training as complementary medicine mitigated the 5-fluorouracil potent side effects in mice with colorectal cancer: bioinformatics approaches, novel monitoring pathological metrics, screening signatures, and innovative management tactic

Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models

Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

GREIN: An Interactive Web Platform for Re-analyzing GEO RNA-seq Data

GenomicSuperSignature facilitates interpretation of RNA-seq experiments through robust, efficient comparison to public databases

Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction

Introduction

Materials and methods

Metadata extraction

Data curation

Statistical analysis

Results

Discussion

References

Availability

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Electronic supplementary material

Supplementary table

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Sparassis latifolia and exercise training as complementary medicine mitigated the 5-fluorouracil potent side effects in mice with colorectal cancer: bioinformatics approaches, novel monitoring pathological metrics, screening signatures, and innovative management tactic

Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models

Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data

Search

Quick links