Collection |

Data Access in Genomics

In this collection, we highlight commentary and reviews from across the Nature Research journals that address some of the questions and considerations surrounding access of human genomic data. We have also selected research articles from across our journals that demonstrate the power of large-scale genomic data and data access.

We are now firmly in the genomics era. The cost of large-scale genotyping and sequencing has declined to the point that human genomic association studies with hundreds of thousands of individuals is now a reality. At the same time, the biological insight that can be derived from these data dramatically increases when data are shared, used in combination with other data sets, and analyzed with new computational methods. Data sharing also allows verification of results. For these reasons, publishers, including Nature Research and BMC, and funders increasingly mandate some level of data access. However, the unique considerations for sensitive human data mean that these data cannot be made openly available without restrictions. 

Panel Discussion at ASHG 2019

We invite you to join us on Thursday, October 17, 2019 at the American Society of Human Genetics 2019 Annual Meeting for a discussion with leaders in the field about the challenges and opportunities for data access in the era of genomics and biobanks. For more information about this event, see here.  

Comment & Review

Although increasingly recognized as critical to genomic research, genomic data sharing is hindered by an absence of standards regarding timing, patient privacy, use agreement standards, and data characterization and quality. Only after months of identifying, permissioning for use, committing to terms restricting use and sharing, downloading, and assessing quality, is it possible to know whether or not a dataset can be used. In this paper, we evaluate the barriers to data sharing based on the Treehouse experience and offer recommendations for use agreement standards, data characterization and metadata standardization to enhance data sharing and outcomes for all pediatric cancer patients.

Comment | Open Access | | Scientific Data

Who benefits from sharing data? The scientists of future do, as data sharing today enables new science tomorrow. Far from being mere rehashes of old datasets, evidence shows that studies based on analyses of previously published data can achieve just as much impact as original projects.

Editorial | Open Access | | Nature Communications

Indigenous peoples are still underrepresented in genetic research. Here, the authors propose an ethical framework consisting of six major principles that encourages researchers and Indigenous communities to build strong and equal partnerships to increase trust, engagement and diversity in genomic studies.

Perspective | Open Access | | Nature Communications

Melinda Mills and Charles Rahal discuss genome-wide association studies published in the last 13 years, finding increases in sample sizes, rates of discovery, and traits studied over time. They discuss limitations, including sample diversity, and make recommendations for scientists and funding bodies.

Review Article | Open Access | | Communications Biology

A considerable proportion of the usefulness and interest of research publications in our field comes from the data and associated metadata. We therefore insist that data be available for peer reviewers to see and readers to use. Authors should use public permanent repositories designed for appropriately consented data.

Editorial | | Nature Genetics

All disciplines should follow the geosciences and demand best practice for publishing and sharing data, argue Shelley Stall and colleagues.

Comment | | Nature

A paper that analysed genetic variants in 14,000 people to identify disease-associated regions set the standard for collaborative genome-wide association studies and provided methodological advances whose effects are still felt today.

News & Views | | Nature


Increasing amount of public omics data are important and valuable resources for the research community. Here, the authors develop a set of metrics to quantify the attention and impact of biomedical datasets and integrate them into the framework of Omics Discovery Index (OmicsDI).

Article | Open Access | | Nature Communications

Genome-wide association studies have uncovered several loci associated with diabetes risk. Here, the authors reanalyse public type 2 diabetes GWAS data to fine map 50 known loci and identify seven new ones, including one near ATGR2 on the X-chromosome that doubles the risk of diabetes in men.

Article | Open Access | | Nature Communications

Persistently low levels of estimated glomerular filtration rate (eGFR) are a biomarker of chronic kidney disease. Here, the authors reinterpret the genetic architecture of kidney function across ancestries, to identify not only genes, but the tissue and anatomical contexts of renal homeostasis.

Article | Open Access | | Nature Communications

Most databases of genotype-phenotype associations are manually curated. Here, Kuleshov et al. describe a machine curation system that extracts such relationships from the GWAS literature and synthesizes them into a structured knowledge base called GWASkb that can complement manually curated databases.

Article | Open Access | | Nature Communications

Genetic analyses of ancestrally diverse populations show evidence of heterogeneity across ancestries and provide insights into clinical implications, highlighting the importance of including ancestrally diverse populations to maximize genetic discovery and reduce health disparities.

Letter | | Nature

Similarities in cancers can be studied to interrogate their etiology. Here, the authors use genome-wide association study summary statistics from six cancer types based on 296,215 cases and 301,319 controls of European ancestry, showing that solid tumours arising from different tissues share a degree of common germline genetic basis.

Article | Open Access | | Nature Communications

GWAS have identified more than 500 genetic loci associated with blood lipid levels. Here, the authors report a genome-wide analysis of interactions between genetic markers and physical activity, and find that physical activity modifies the effects of four genetic loci on HDL or LDL cholesterol.

Article | Open Access | | Nature Communications

Weihua Meng, Mark Adams et al. report a genome-wide association study of knee pain in the UK Biobank, identifying two loci near GDF5 and COL27A1 as significantly associated. These findings are supported by association data in additional cohorts, using self-reported osteoarthritis or radiographic knee osteoarthritis as a proxy for knee pain.

Article | Open Access | | Communications Biology

Eirini Marouli et al. use Mendelian randomisation analyses to investigate the causal relationship between adult height, coronary artery disease (CAD) and type 2 diabetes (T2D) in the UK Biobank. They find that height has a causal effect on CAD, which is mediated by lung function, while there is no direct effect on the risk of T2D.

Article | Open Access | | Communications Biology

Anonymization has been the main means of addressing privacy concerns in sharing medical and socio-demographic data. Here, the authors estimate the likelihood that a specific person can be re-identified in heavily incomplete datasets, casting doubt on the adequacy of current anonymization practices.

Article | Open Access | | Nature Communications

Data Descriptors