Software programming, algorithm development and other technological skills can give scientists an edge in their fields.
Karthik Ram had to reinvent himself in 2009, as have many other scientists in this data-driven age. When he started his postdoctoral work on how climate change affects elk in Yellowstone National Park in Wyoming, he thought of himself as an ecologist. But interpreting data from satellites and the tracking collars used to follow the animals pushed him to expand that mindset.
To make sense of the shifting ecosystem, he had to hone his programming and learn how to manage mountains of information — skills that have changed the way he views himself and his career. “I use the term 'ecologist' less and less often,” he says. “Now, I mainly call myself a data scientist.”
Data science was a young field in 2009, but it has quickly matured, and now intersects with many disciplines. Although its definition varies, data science generally involves using computing tools to manage and interpret large data sets.
Ram, now at the Berkeley Institute for Data Science at the University of California, Berkeley, works with former neuroscientists, social scientists and biologists who have also moved into the world of data. “Everyone at the institute is like me,” he says. “We have computational skills and statistical skills that we can bring to bear on our particular fields.”
The demand for data scientists has expanded beyond academia to industry, health care, government and any institution that generates complex information. IBM projects that there could be more than 2.7 million US jobs in data science and analytics by 2020, a 15% increase from 2015. Numbers are similar in Europe, according to the European Data Science Academy, a training and education group that identifies and collects job advertisements in Europe seeking data-science skills. The academy has identified more than 3 million such ads since 2015, including 290,000 posted during a 3-month period this year.
For those seeking a data-scientist role, the challenge isn't so much finding a job, but finding the best position for their aptitudes and interests (see 'Dig into the data world'). Identifying “the right fit can be tricky”, says Amelia Taylor, a former tenure mathematician at Colorado College in Colorado Springs and now a data scientist for Zymergen, a company based in Emeryville, California, that is developing new uses for genetically engineered microbes. “Data science can look very different at different places. There are so many companies out there, it's hard to know which ones to look at.”
Too many options — when many PhD holders in other fields are facing too few — is a good problem to have. Scientists who develop the right skills and understand their opportunities can expect a rewarding, data-driven future.
A multitude of roles
The rising tide of data science has lifted many boats. Along with a surge in 'data-scientist' searches, 'data engineer' and 'data analyst' are also popular terms on job-search boards. The differences in these roles are subtle but important. “The core skill of a data engineer is building robust systems that won't fail,” explains Marc Warner, chief executive at ASI Data Science, a London-based firm that offers consulting services and a data-science fellowship programme with industry placements.
One key difference between data scientists and analysts, he says, is that scientists tend to follow data where they lead them — a 'data-first' approach — whereas analysts generally use numbers to test an established hypothesis.
At the London-based Alan Turing Institute (ATI), Mihaela van der Schaar lets data lead the way. She develops computer algorithms to help personalize treatments, prognoses and risk predictions for patients. “I believe that such techniques can transform medicine, save lives and enable scientific breakthroughs,” she says.
Founded in 2015 by five UK universities and the nation's Engineering and Physical Sciences Research Council in an effort to spawn collaborations with industry and government, the ATI embodies the interdisciplinary spirit of data science, van der Schaar says. She adds that some of the biggest and most interesting problems in data science come from unexpected places. “One of the projects that I am most involved with currently at ATI aims to develop better methods to understand and treat cystic-fibrosis patients,” she says. “This comes from neither the industry nor the government, but through a partnership with the UK Cystic Fibrosis Trust.”
Interdisciplinary connections also form the foundation of Moore–Sloan Data Science Environments, an initiative that has created data-science centres at the University of California, Berkeley, the University of Washington in Seattle and New York University. Each centre gathers data scientists from a wide range of disciplines into a single workplace. “The physical spaces are really important,” says Edward Lazowska, a computer scientist at the University of Washington. “The idea was to accelerate discovery by building bridges between those who advance the methodology of data science — researchers in mathematics, statistics and computer science — and those who put it to work in the social, physical and life sciences.”
Not all PhD programmes prepare researchers for the real world of data science, so short-term training courses are becoming increasingly popular. Taylor got her foot in the door through a seven-week fellowship with Insight Data Science, an institution based in Palo Alto, California, that connects data scientists with US companies. Its fellows have gone on to careers at Amazon, Facebook, JP Morgan and both large and small technology companies.
Taylor says that the Insight fellowship was invaluable in teaching her the skills needed to land her current job. Among other proficiencies, that training taught her to think beyond data analysis to the practical applications of the finished product. “The business-oriented thinking at Insight was very helpful,” she says. She has observed that PhD-level scientists who land data-science jobs in industry tend to struggle with the transition unless they've already had first-hand industry experience. “I had a very fast start with my company because of my ability to think about products,” she says.
Data science has also arrived at hospitals and medical centres, giving many research scientists another outlet for their skills. As part of her neuroscience training at New York University and the nearby University of Rochester, Anasuya Das, also a former Insight fellow, had to learn the coding language C++ to build software to help individuals recovering from strokes to practise visual learning using their home computers. Das also took a couple of computational-neuroscience courses that helped to spark her interest in data science as a full-time career, which she now pursues at the Memorial Sloan Kettering Cancer Center in New York City.
Das is working on a system to match patients with clinical trials. “My days vary widely, and range from doing pure software engineering to meeting with physicians about the products we're building,” she says.
Lazowska predicts that the rise of data science will eventually transform the 'publish or perish' system of science. In time, codes and data sets will — as publications are currently— become prerequisites for career advancement, he thinks. For now, he says, he and his colleagues are encouraging researchers to list data-science accomplishments on their CVs. They also recommend that promotion and tenure committees consider these feats as valid metrics.
Ram has loaded his CV with a wide range of data-science projects. He is currently working on a long-term effort to measure the impact of human activity on Tahiti's ecology. The questions have become more sophisticated since his Yellowstone days, but so have the tools. Instead of labouring for months over a data set, he can now get results in hours — which is not to say that data science has become easy.
“A big challenge for a lot of folks is having the bandwidth to learn some of these new tools and how to correctly apply them,” Ram says. “As more time goes by, it will become more likely that almost every principal investigator will need someone on the team who has these special data skills.” The definition and expectations of data science may shift over time, but the field is here to stay.
Related links in Nature Research
Related external links
About this article
Cite this article
Donati, G., Woolston, C. Information management: Data domination. Nature 548, 613–614 (2017). https://doi.org/10.1038/nj7669-613a
Computer Languages, Systems & Structures (2018)