Data mining

Definition

Data mining is the process of extracting potentially useful information from data sets. It uses a suite of methods to organise, examine and combine large data sets, including machine learning, visualisation methods and statistical analyses. Data mining is used in computational biology and bioinformatics to detect trends or patterns without knowledge of the meaning of the data.

Latest Research and Reviews

  • Research |

    MSK-IMPACT is a clinical sequencing platform able to detect genomic mutations, copy number alterations and structural variants in a panel of cancer-related genes. This assay is implemented prospectively to inform patient enrollment in genomically matched clinical trials at Memorial Sloan Kettering Cancer Center (MSKCC). Sequencing results of tumor and matched normal tissue from a cohort of >10,000 patients with detailed clinical annotation provide an overview of the genomic landscape of advanced solid cancers and bring new insights into molecularly guided cancer therapy.

    • Ahmet Zehir
    • , Ryma Benayed
    • , Ronak H Shah
    • , Aijazuddin Syed
    • , Sumit Middha
    • , Hyunjae R Kim
    • , Preethi Srinivasan
    • , Jianjiong Gao
    • , Debyani Chakravarty
    • , Sean M Devlin
    • , Matthew D Hellmann
    • , David A Barron
    • , Alison M Schram
    • , Meera Hameed
    • , Snjezana Dogan
    • , Dara S Ross
    • , Jaclyn F Hechtman
    • , Deborah F DeLair
    • , JinJuan Yao
    • , Diana L Mandelker
    • , Donavan T Cheng
    • , Raghu Chandramohan
    • , Abhinita S Mohanty
    • , Ryan N Ptashkin
    • , Gowtham Jayakumaran
    • , Meera Prasad
    • , Mustafa H Syed
    • , Anoop Balakrishnan Rema
    • , Zhen Y Liu
    • , Khedoudja Nafa
    • , Laetitia Borsu
    • , Justyna Sadowska
    • , Jacklyn Casanova
    • , Ruben Bacares
    • , Iwona J Kiecka
    • , Anna Razumova
    • , Julie B Son
    • , Lisa Stewart
    • , Tessara Baldi
    • , Kerry A Mullaney
    • , Hikmat Al-Ahmadie
    • , Efsevia Vakiani
    • , Adam A Abeshouse
    • , Alexander V Penson
    • , Philip Jonsson
    • , Niedzica Camacho
    • , Matthew T Chang
    • , Helen H Won
    • , Benjamin E Gross
    • , Ritika Kundra
    • , Zachary J Heins
    • , Hsiao-Wei Chen
    • , Sarah Phillips
    • , Hongxin Zhang
    • , Jiaojiao Wang
    • , Angelica Ochoa
    • , Jonathan Wills
    • , Michael Eubank
    • , Stacy B Thomas
    • , Stuart M Gardos
    • , Dalicia N Reales
    • , Jesse Galle
    • , Robert Durany
    • , Roy Cambria
    • , Wassim Abida
    • , Andrea Cercek
    • , Darren R Feldman
    • , Mrinal M Gounder
    • , A Ari Hakimi
    • , James J Harding
    • , Gopa Iyer
    • , Yelena Y Janjigian
    • , Emmet J Jordan
    • , Ciara M Kelly
    • , Maeve A Lowery
    • , Luc G T Morris
    • , Antonio M Omuro
    • , Nitya Raj
    • , Pedram Razavi
    • , Alexander N Shoushtari
    • , Neerav Shukla
    • , Tara E Soumerai
    • , Anna M Varghese
    • , Rona Yaeger
    • , Jonathan Coleman
    • , Bernard Bochner
    • , Gregory J Riely
    • , Leonard B Saltz
    • , Howard I Scher
    • , Paul J Sabbatini
    • , Mark E Robson
    • , David S Klimstra
    • , Barry S Taylor
    • , Jose Baselga
    • , Nikolaus Schultz
    • , David M Hyman
    • , Maria E Arcila
    • , David B Solit
    • , Marc Ladanyi
    •  & Michael F Berger
  • Research | | open

    Cell lines are central to cancer research, but knowing which cell lines are the best representative of actual tumours is a major challenge. Here the authors provide a resource assessment of 65 renal cell lines to assist researchers in selecting suitable lines for studying specific renal carcinoma subtypes.

    • Rileen Sinha
    • , Andrew G. Winer
    • , Michael Chevinsky
    • , Christopher Jakubowski
    • , Ying-Bei Chen
    • , Yiyu Dong
    • , Satish K. Tickoo
    • , Victor E. Reuter
    • , Paul Russo
    • , Jonathan A. Coleman
    • , Chris Sander
    • , James J. Hsieh
    •  & A. Ari Hakimi

News and Comment

  • Comments and Opinion |

    • Yasset Perez-Riverol
    • , Mingze Bai
    • , Felipe da Veiga Leprevost
    • , Silvano Squizzato
    • , Young Mi Park
    • , Kenneth Haug
    • , Adam J Carroll
    • , Dylan Spalding
    • , Justin Paschall
    • , Mingxun Wang
    • , Noemi del-Toro
    • , Tobias Ternent
    • , Peng Zhang
    • , Nicola Buso
    • , Nuno Bandeira
    • , Eric W Deutsch
    • , David S Campbell
    • , Ronald C Beavis
    • , Reza M Salek
    • , Ugis Sarkans
    • , Robert Petryszak
    • , Maria Keays
    • , Eoin Fahy
    • , Manish Sud
    • , Shankar Subramaniam
    • , Ariana Barbera
    • , Rafael C Jiménez
    • , Alexey I Nesvizhskii
    • , Susanna-Assunta Sansone
    • , Christoph Steinbeck
    • , Rodrigo Lopez
    • , Juan A Vizcaíno
    • , Peipei Ping
    •  & Henning Hermjakob
    Nature Biotechnology 35, 406–409
  • Comments and Opinion |

    • John Vivian
    • , Arjun Arkal Rao
    • , Frank Austin Nothaft
    • , Christopher Ketchum
    • , Joel Armstrong
    • , Adam Novak
    • , Jacob Pfeil
    • , Jake Narkizian
    • , Alden D Deran
    • , Audrey Musselman-Brown
    • , Hannes Schmidt
    • , Peter Amstutz
    • , Brian Craft
    • , Mary Goldman
    • , Kate Rosenbloom
    • , Melissa Cline
    • , Brian O'Connor
    • , Megan Hanna
    • , Chet Birger
    • , W James Kent
    • , David A Patterson
    • , Anthony D Joseph
    • , Jingchun Zhu
    • , Sasha Zaranek
    • , Gad Getz
    • , David Haussler
    •  & Benedict Paten
    Nature Biotechnology 35, 314–316
  • Editorial |

    A recent recommendation that a large number of professional data stewards be trained and employed in all data-rich research projects raises the exciting prospect they will conduct research on data-intensive research itself. It also focuses us on questions about the role of all scientists in data quality and accessibility as well as how best to measure the value of good data stewardship to science and society.

  • Editorial |

    The FAIR data principles are simple guidelines for ensuring that machines can find and use data, supporting data reuse by individuals. More—and better—research can be generated by designing data and algorithms to be findable, accessible, interoperable and reusable, together with the tools and workflows that led to these data.