Data mining has been defined as “the nontrivial extraction of implicit, previously unknown, and potentially useful information from data”.1 In areas other than the life sciences and healthcare, data mining is a huge industry, with more than a hundred companies providing a vast array of software products and services to clients that obtain, generate, and rely on large quantities of data. The industries that rely daily on data mining for a number of their functions include marketing, manufacturing, database providers, government, the travel industry, banking and the financial industry, telecommunications, and engineering, among others. The common theme is that these industries all have truly massive amounts of information—about their operations and also about their clients—collected in a variety of ways. In order to maximize the usefulness of this information, they rely on software that helps glean specific patterns and trends from the data, in addition to making predictions and offering simulations of future events.
It should come as no surprise that the biopharmaceutical industry is increasingly beginning to employ a variety of data-mining methodologies to help it deal with the enormous amounts of biological information of various forms that the industry collects. Ranging from annotated databases of disease profiles and molecular pathways to sequences, structure–activity relationships (SAR), chemical structures of combinatorial libraries of compounds, individual and population clinical trial results, the biopharmaceutical industry is inundated with information, and data mining is the centerpiece of advanced methodologies to help the industry deal with this information overload2.
This is a preview of subscription content, access via your institution