One of biology's foremost mathematical modelers argues that now, more than ever, biology needs a Steve Jobs.
Merck. Sage. Pacific Biosciences. Eric Schadt's unconventional career has seen him land in 2011 as chair of the Mount Sinai School of Medicine's Department of Genetics and Genomics and director of the new Institute for Genomics and Multiscale Biology. Nature Biotechnology met with Schadt in his New York office to discuss his vision to spend a cool $100 million launching Mount Sinai to the forefront of the genomics revolution—and why he's enlisting Wall Street quants, user interface gurus and Facebook's former data whiz to do it.
Why did you come to New York?
Eric Schadt: Mount Sinai has the oldest genetics medical residency program in the country and a strong history in rare-disease research. Burrill Crohn was a doctor here and the first to describe Crohn's disease. Fabrazyme [agalsidase b], one of Genzyme's biggest selling drugs, was developed here. Now the aim is to become a leader on how genomic information and other large data sets can be leveraged in real time to better diagnose and treat patients. And there's this willingness to commit big money to see this through.
All of the pieces required to translate genomic discoveries into clinical care are embedded in a medical center like Mount Sinai. The Department of Genetics and Genomic Sciences already carries out rare-disorder testing. Sixty percent of all the newborn screening in New York runs through this department. Here they know how to turn the crank on making genetic information a part of routine clinical care, and we're training physicians on how you interact with this information. Overall, I felt like if you really wanted to figure out how information-driven medicine was going to happen, this was a perfect place to be.
What is your vision?
ES: The vision—the big, big vision—is how do we, in 10 years time, take the digital universe of information into account and aim it at patients as they walk through the door at Mount Sinai. We want to collect as much information on as many patients as we can, integrate it, build predictive models from it, and then derive from those models more refined diagnoses, risk assessments and plans for treatment than has been possible before. We're not just interested in molecular and clinical information that can be collected on patients and populations of patients. We're interested in taking environmental context into account, such as microbiome profiling in the hospital or local environments. Then, it's a question of how we organize that scale of information.
How can these big biological data sets be organized?
ES: The ideal would be to encapsulate knowledge into holistic predictive models. Currently, researchers and clinicians have to dig through endless papers, pathway databases and very large raw data sets. Our aim is to integrate all of those data into models, say, for a given subtype of disease. In terms of basic research, if the models are predictive and informative enough, people will use them to form hypotheses that can help drive decision making in the lab. As the community conducts experiments that refute or validate these models, we'll use that information to refine the models. The vision for this type of science can be found in other areas like physics or climatology. When the Large Hadron Collider generates big data, researchers don't reconstruct de novo all of the laws of physics. Rather, they say, “Where do our existing models work? Where don't they work? How do we refine them?” That's where I think biology needs to go.
How will doctors use these models?
ES: We want to give doctors tools to use this information to inform their decision making as they diagnose and treat patients. We want to do this in a similar way that quantitative traders, the mathematicians and physicists who flooded into Wall Street, applied mathematical algorithms to big data to answer the questions, “What companies do I want to place a bet on? If so, how much do I bet? When do I bet? When do I sell?” The birth of quantitative trading brought a whole new predictive modeling dimension to Wall Street that today drives decision making on many levels. We want to enable the same transformation in medicine, except instead of asking what company to bet on, we're asking which patients require treatment, and for those requiring treatment, what's the best treatment for them.
How do you empower people to exploit information on this scale?
ES: I think the number one need in biology today is a Steve Jobs. Where is the Steve Jobs of biology who can lead the design of amazing, intuitive interfaces into these complex data? Just as a physician in a little town today can make Google queries to help figure out a given condition (and physicians do now make routine use of Google), a Google-style interface into these complex data and predictive models would be transformational in driving decision-making on how best to diagnose and treat patients. I think these interfaces are crucial. That's why we're investing heavily here in biomedical informatics and engaging communities like quantitative finance who actively think about enabling those who may not understand the big data or sophisticated models to derive meaning from them.
What have you done so far?
ES: For starters, we've setup the first CLIA (Clinical Laboratory Improvement Amendments)-certified next-generation sequencing lab in New York City. And we're developing the expertise to manage exabyte scales of data. Recently, we announced that Jeff Hammerbacher is going to be spending a significant amount of time here. Jeff was a classmate of Mark Zuckerburg and an early employee at Facebook who created and led their Data team, which developed all of the computing infrastructure and algorithms to scale the Facebook operation. He then left Facebook to start a company Cloudera, with which we are now partnering. We also hired Patricia Kovatch from the Oak Ridge National Lab, who led the team that built the Jaguar supercomputer that in 2009 was the number one supercomputer on the planet. About half of the 30 faculty I've hired are experts in network modeling, predictive modeling or machine learning. The other half are focused on sequence informatics, disease biology and building interfaces. The idea is, “Can we create the right ecosystem so that the diversity of talent across disciplines is all in the same space, learning and working with each other?”
What problems need to be solved?
ES: One set of challenges lie in applying, in a systematic enough way, existing algorithms to build predictive models. How do we rank models against one another? How do we establish best-of-breed models and the methods that give rise to them? How do we potentially merge multiple models to get better models? Other challenges involve improving the actual algorithms. For instance, how can you integrate 'top-down' hypothesis-driven and 'bottom-up' data-driven modeling approaches into one unified mathematical framework? Today, practitioners of those approaches are largely working independently. One of our main efforts here is to marry the two to leverage the strengths of each while simultaneously minimizing their weaknesses. To me, it's lots of fun because I think all or most of the methods that exist today are wholly inadequate for handling both the scale and diversity of data that are being generated.
What broad insights are coming out of network modeling approaches?
ES: I think more and more we are seeing common networks disrupted that in turn alter biological processes associated with many diseases; this perhaps is almost more of a rule than an exception. One of the biggest learning experiences for me here has been thinking about, “How do we tie systems-based approaches to the rare disorders business?” I had always thought, “Well, for the single-gene disorders, they know the gene and corresponding pathway so it's game over.”
What I've learned since being here is that for every one of these rare disorders there is a whole spectrum of phenotypic responses that are manifested across a broad range of patients. It's not like those afflicted with single-gene disorders respond with the most horrific version of the disease. Some barely manifest any phenotype, some have extreme ones and lots are in the middle. But then they're also at risk for many other comorbidities that aren't related to the primary diagnosis. People with Gaucher's disease, for instance, are at 30-times increased risk of multiple myeloma. Why is that?
So these single-gene rare diseases are a window into what's going on in a system?
ES: Exactly, because they're harder hitting, they're an awesome window into the system. While it may be a primary hit that drives you into a disease state, there are likely many, many modifiers trying to compensate, which explain the great diversity of responses. The rare disorder clinic at Mount Sinai gets those sorts of cases, and we're going to try to understand how these hard-hitting perturbations ripple through an individual's molecular, cellular and tissue networks over time and in response to different environmental stresses.
What disruptive technologies are you paying attention to?
ES: There are several companies focused on competition-driven problem solving, such as Kaggle [http://www.kaggle.com]. They allow customers to post data online, pose data-analysis challenges that anyone can try to solve and offer cash prizes for the best solutions. These companies are finding ways of phrasing problems so that you don't have to be an expert, such as a biologist or a chemist, to understand the problem. Many times, the people who are winning those competitions know nothing, whatsoever, of biology or chemistry.
The important lesson in here is the value of engaging as many of the bright minds around the world as possible to look at problems differently than the “experts.” We must leverage the world as our laboratory. I think this is a disruptive technology for helping people think outside the box. There are lots of pretty smart people in China and India who would be happy to win $20,000 to solve a pressing problem, if it could be posed in a way they could understand and engage in. I see big business for being able to rephrase problems where, if you can improve their solution, it's going to have significant financial or health benefit. One of the biggest difficulties to overcome is how to not have the experts feel threatened that they're going to be put out of a job.
Should experts in research or clinicians in primary care feel threatened?
ES: No, because although the data sets of today are big, they are currently not extensive enough yet to throw into a big black box and expect knowledge to emerge that is clinically actionable. You still need master integrators, domain experts, to help guide the model building and to define how data may be better connected to enhance clinical utility. Don't underestimate the power of the human mind to recognize patterns and put stories together in the context of everything a researcher or clinician knows. That won't go away, but what it takes to organize the digital universe of data is changing. We can dramatically complement the pattern recognition capabilities of the mind with predictive models that help guide decision making whether in the lab or in the clinic. The question in my mind is: how do you create an ecosystem where you're leveraging the strengths of experts and crowds, while minimizing their weaknesses?
Are there any barriers to this type of open research?
ES: One thing I underestimated is just how hard it is to get people to share. People have this sense of ownership of their data. They secured funding to generate or collect it, they carried out the study, and so one natural human instinct is to hoard the data, so they can maintain a competitive edge. How can you attract, incent and enable groups to come together for a common cause so they can go far beyond what they could do on their own? These 'social networking' issues have turned out to be the hardest problems. This was one of the main motivations for founding Sage Bionetworks, an effort I had helped launch with Stephen Friend. The computational side, ironically I think, will turn out to be an easier problem. If you were to have asked me this question four years ago, I would have said exactly the opposite.
How can those barriers be broken down?
ES: Educating and enabling patients to be their own best advocate. Those participating in research studies can help push researchers to acknowledge that they don't really own the data. People who contribute their samples, their time or their clinical characterizations often aren't benefitting directly from the research. Why don't patients get more access to their clinical information so they can, for example, share it with other researchers who may help their cause? Patients becoming advocates for their own health, I think, is a big movement that I see as very powerful for changing our current culture of research.
What are your key goals for coming years?
ES: If I were to sum up all of our efforts, I would say we are aiming to become masters of information and enabling others to benefit from that. In 10 year's time, if we achieve that, it's going to be a lot of fun, especially here in New York City, where all of the right pieces exist to revolutionize how we diagnose and treat disease. I think from Mayor Bloomberg on down, everyone now is seeing the huge potential of this city in the healthcare and biotech space, from the many great research institutions and hospitals, to the incredibly diverse patient populations, to industries like quantitative finance that can help those in medicine think on all levels about building predictive models to inform decision making on a real-time basis. I think anybody outside of New York should be, not fearful of this potential, but should take note that New York is going to become very competitive in this arena, especially as we learn how to work together more effectively.
Interviewed by H. Craig Mak, Associate Editor, Nature Biotechnology