NIH database integrates data from clinical genetic testing labs and literature.
When Heidi Rehm surveys a patient’s genes and finds a variant she’s never seen before, she improvises. First Rehm, who directs a clinical genetics testing laboratory at Partners HealthCare in Cambridge, Massachusetts, checks through as many as ten databases to learn whether that variant has ever been associated with disease. Then she may ask colleagues at other clinical sequencing laboratories whether they have seen it. But the launch this week of a database known as ClinVar will make her job much easier — and allow her to ask more sophisticated questions.
Developed by the US National Institutes of Health (NIH) National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, ClinVar integrates dozens of existing databases. It also provides, for the first time, a central place in which clinical testing laboratories can deposit their data, because most currently keep their data within the laboratory. By aggregating such information, ClinVar’s creators hope to accelerate clinicians’ understanding of the effects of variants as well as reveal whether different laboratories are interpreting the same variant in different ways.
“There is a growing recognition that a clinical lab may see a mutation once or never, so it’s better if all those data could be pooled,” explains James Ostell, chief of the information engineering branch at the NCBI and a member of the ClinVar team. Such information could not only help laboratories to improve quality, it could also prompt research on new variants.
“For everybody in the field, I think there will be a sigh of relief that this is finally happening,” says Stephen Kingsmore, who is using whole-genome sequencing to pin down genetic causes of rare diseases in newborns at the Children’s Mercy Center for Pediatric Genome Medicine in Kansas City, Missouri. He predicts that his team will turn to ClinVar every time it finds a mysterious variant in a patient sample.
ClinVar was built with computational analyses in mind. It uses standard nomenclature to describe disease, is designed to allow researchers to incorporate the data into their own software and supports searches for long lists of variants. “It provides a forum that is computer readable for people to develop tools to find connections between genetics and disease,” Ostell says.
“ I think there will be a sigh of relief that this is finally happening. ”
Already containing data on 30,000 variants, ClinVar is expected to grow quickly because of a shift in sequencing technologies and practices. Whereas researchers typically used to screen DNA samples only for the presence or absence of known mutations, it is now becoming more common to sequence a relevant gene in its entirety, revealing a plethora of never-before-seen mutations that may or may not be harmful. ClinVar has the capacity to hold detailed information about variants and disease links — although it will not hold the full-genome data that could potentially identify a patient.
ClinVar’s success will hinge on the quantity and quality of the data deposited there. If the submission processes are too onerous, then laboratories won’t participate, says David Dimmock, who is leading a whole genome-sequencing project at the Medical College of Wisconsin in Milwaukee. But even if the data in ClinVar swell, he and others worry that new users of the database will not be sufficiently sceptical of its contents. Existing databases often classify a variant as pathogenic when in fact it is not, and ClinVar might compound the problem by aggregating such mistakes, he says.
Another concern is that ClinVar could undermine well-regarded specialist labs that evaluate variants for particular diseases. “There is no revenue stream to pay an expert to review the data because you can get the data for free in ClinVar,” says Dimmock. “This could paradoxically be a way in which the interpretation of variants ceases.”
Rehm, who is co-leading an effort to help clinical labs to submit data to ClinVar, says that she once shared that concern. What changed her mind was the fact that so many of the variants that her lab and others identify are unique. She has collected data from more than 5,000 patients, she says, yet two-thirds of the potentially clinically relevant variants she sees have never turned up in her lab before, and she often has to tell patients that their variants cannot be interpreted. The only remedy, she says, is for labs to share genetic information from a much broader patient population.
Kingsmore agrees. “Patients are going to be getting the best thinking of the community as opposed to an individual lab.”
Related links in Nature Research
Related external links
About this article
PLOS ONE (2015)
Application of high-throughput sequencing for studying genomic variations in congenital heart disease
Briefings in Functional Genomics (2014)
Nature Genetics (2014)