We read with great interest the article by Jensen et al. (Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics 13, 395–405)1. This was a well-written Review that summarized a large, complex and topical subject. To augment the article, and in particular to augment Table 1, we would like to point out that one of the earliest and most successful research databases that integrated diverse data sources with electronic health records (EHRs) is the Utah Population Database (UPDB) at the University of Utah, USA. Its earliest success was the identification of families with a high incidence of breast cancer; this research led to the discovery of the breast cancer genes BRCA1 and BRCA2 (Refs 2,3,4,5). A crucial component of the UPDB — one that allowed it to probe genetic inheritance long before gene sequencing was widely available — was the linking of diverse data sources with family pedigrees that were originally supplied by the Utah Genealogical Society and that were later updated by probabilistic matching with vital records from the Utah Department of Health (records such as birth, death and marriage certificates).

As Jensen et al.1 pointed out, there is much to be gained by mining data in EHRs, especially when they are linked to other sources. Using the UPDB, researchers at the University of Utah have made discoveries across a wide variety of disciplines in addition to oncology, including gynaecology6, autoimmune disease7, spinal abnormalities8, ophthalmology9, gastroenterology10 and gerontology11. The utility of the UPDB derives from the integration of the EHRs from two large health-care networks in the state of Utah12,13 coupled with high-quality data from the Utah Department of Health, the Utah and Idaho cancer registries and a deep, expansive family pedigree database.

Finally, we would qualify the conclusion in Jensen et al.1, which stated that “True data interoperability requires the development and implementation of standards and clinical-content models for the unambiguous representation and exchange of clinical meaning”. Like those authors, we firmly advocate the wide adoption of standards, even for clinical-content models. However, the success of the UPDB, built as it was on well-crafted probabilistic matching, serves as an example of how quality research can be conducted even in the absence of uniform data standards.