For the first five months of Harrison Harkins’ life, doctors had little idea about what was causing his spinal malformation and inability to gain weight. But in November 2011, Matthew Bainbridge, a computational biologist at Baylor College of Medicine in Houston, Texas, found a clue. After analysing genetic data from Harrison and his parents, Bainbridge discovered that the child had an abnormal version of a gene called ASXL3.

But Bainbridge had no easy access to records of other children with ASXL3 mutations, and could not be sure that this mutation was the culprit. So he did what many scientists do: he networked. A Dutch team put Bainbridge in touch with German researchers who were treating another boy with an ASXL3 mutation — and symptoms similar to Harrison’s. After finding two further cases in an internal Baylor database, Bainbridge felt that the connection was concrete. He describes the syndrome seen in all four children, and probably caused by ASXL3 mutations, in a paper published on 5 February (M. N. Bainbridge et al. Genome Med. 5, 11; 2013).

Researchers are using new tools to increase the pace of discoveries such as Bainbridge’s. Efforts to connect sequences with symptoms — or in genetic parlance, genotype with phenotype — have taken on increased urgency as clinical sequencing gains traction and funders put more money towards rare diseases. Researchers are planning to address the barriers to data sharing at a workshop in April, after the first International Rare Diseases Research Consortium Conference in Dublin. “There is a very positive feeling in the community that things are changing for the better,” says Peter Robinson, a computational biologist at the Charity University Hospital in Berlin.

Thousands of people have had their genomes sequenced, but a reluctance to surrender ownership of the valuable data, along with the privacy concerns of researchers and families (see ‘Families find solace in sequencing’) often keep scientists from comparing findings. Many data are also off-limits because they are held by private diagnostic companies. “It’s a big conundrum for labs that are doing sequencing for diagnostic services,” says Michael Bamshad, chief of paediatric genetic medicine at the University of Washington in Seattle. “If they find a variant in a gene, how do they know the variant is causal?”

boxed-text

Patients with rare, difficult-to-diagnose disorders stand to gain the most from increased data sharing. Scientists have found the genetic roots of fewer than half of the 7,000 known rare heritable diseases, but a diagnosis can give parents an idea of a child’s outlook, and give researchers a target for drug development.

Several groups are trying to build richer databases and get them to communicate. In November, for instance, the US National Center for Biotechnology Information in Bethesda, Maryland, set up a database called ClinVar, which pools information from dozens of other databases, and allows labs to deposit data on mutations seen in individual patients (see Nature 491, 171; 2012).

Still, says Anthony Brookes, a geneticist at the University of Leicester, UK, many diagnostic labs are unable to share information with databases such as ClinVar, either because they do not have the time or the expertise in depositing data, or because they are afraid that they might compromise patient security and their own livelihoods. “It’s not their role to put data out there for researchers to play with,” he says.

Brookes is trying to address this problem with a tool called Cafe Variome, which he describes as more of a “shop window” than a database. Labs submit information about what data they have to Cafe Variome. Users can then browse the website to see what data exist, and, if interested, can follow up with the relevant labs. That allows the labs to control who sees their data, and to be credited when it is used. They are “much more comfortable sharing if they know the data are only being accessed by other diagnostic labs”, says Brookes.

Another problem is that even if database owners are willing to share data, they lack a common language for describing phenotypes, says Robinson. He is working on ways to standardize phenotype definitions for large-scale analysis.

For researchers such as Bainbridge, the tools can’t come quickly enough. His team’s final diagnosis came just a month before Harrison died last March, at the age of 9 months. “If you spent 15 minutes with the parents of any of these children, you would know that every­one should be doing this,” says Bainbridge. “This is going to help a lot of people at really low cost.”