Information management: Data domination

Journal name:
Nature
Volume:
548,
Pages:
613–614
Date published:
DOI:
doi:10.1038/nj7669-613a
Published online
Corrected online

Software programming, algorithm development and other technological skills can give scientists an edge in their fields.

Karthik Ram had to reinvent himself in 2009, as have many other scientists in this data-driven age. When he started his postdoctoral work on how climate change affects elk in Yellowstone National Park in Wyoming, he thought of himself as an ecologist. But interpreting data from satellites and the tracking collars used to follow the animals pushed him to expand that mindset.

To make sense of the shifting ecosystem, he had to hone his programming and learn how to manage mountains of information — skills that have changed the way he views himself and his career. “I use the term 'ecologist' less and less often,” he says. “Now, I mainly call myself a data scientist.”

Toby Keane/Alan Turing Institute

The Alan Turing Institute in London is an interdisciplinary hub for the growing field of data science.

Data science was a young field in 2009, but it has quickly matured, and now intersects with many disciplines. Although its definition varies, data science generally involves using computing tools to manage and interpret large data sets.

Ram, now at the Berkeley Institute for Data Science at the University of California, Berkeley, works with former neuroscientists, social scientists and biologists who have also moved into the world of data. “Everyone at the institute is like me,” he says. “We have computational skills and statistical skills that we can bring to bear on our particular fields.”

The demand for data scientists has expanded beyond academia to industry, health care, government and any institution that generates complex information. IBM projects that there could be more than 2.7 million US jobs in data science and analytics by 2020, a 15% increase from 2015. Numbers are similar in Europe, according to the European Data Science Academy, a training and education group that identifies and collects job advertisements in Europe seeking data-science skills. The academy has identified more than 3 million such ads since 2015, including 290,000 posted during a 3-month period this year.

For those seeking a data-scientist role, the challenge isn't so much finding a job, but finding the best position for their aptitudes and interests (see 'Dig into the data world'). Identifying “the right fit can be tricky”, says Amelia Taylor, a former tenure mathematician at Colorado College in Colorado Springs and now a data scientist for Zymergen, a company based in Emeryville, California, that is developing new uses for genetically engineered microbes. “Data science can look very different at different places. There are so many companies out there, it's hard to know which ones to look at.”

Box 1: Dig into the data world

The spread of data science has opened up opportunities in fields ranging from finance to health care. Sorting through options can be challenging, but there are ways to ease the search. After all, data scientists are experts at finding solutions amid the noise. Here are some ways to get informed.

  • Meet-ups. Plugging into the data-science community is a great way to learn about job possibilities. The website for the European Data Science Academy (http://edsa-project.eu) lists data-science meet-ups and talks that are happening throughout Europe. Amelia Taylor, a data scientist at Zymergen, a company based in Emeryville, California, stays in touch through PyLadies, a community of female programmers who use the coding language Python. The group hosts regular meet-ups in Seattle, Washington; San Francisco, California; and other sites worldwide. Taylor adds that companies often sponsor their own meet-ups where they try to recruit talent.
  • Fellowship programmes. Schemes such as the Insight Data Science fellowships and the ASI Data Science fellowship will allow you to work on real-world projects and establish a large network of contacts. Taylor notes that the Insight website lists companies that have hired fellows in the past — a handy guide to the places that are likely to be hiring in the future, too.
  • Job postings. Company advertisements for data-science jobs often include a near-comprehensive list of qualifications that few (if any) people can meet, Taylor says. “It's like they're looking for a unicorn,” she says. If a company seems to be a good fit, and you meet at least some of the key requirements, it's OK to apply — and do it with confidence.
  • Stay in touch. “There's a lot of word of mouth in the data-science community,” Taylor says. “Don't be shy, but be concise and detailed when asking questions and reaching out.

Too many options — when many PhD holders in other fields are facing too few — is a good problem to have. Scientists who develop the right skills and understand their opportunities can expect a rewarding, data-driven future.

A multitude of roles

The rising tide of data science has lifted many boats. Along with a surge in 'data-scientist' searches, 'data engineer' and 'data analyst' are also popular terms on job-search boards. The differences in these roles are subtle but important. “The core skill of a data engineer is building robust systems that won't fail,” explains Marc Warner, chief executive at ASI Data Science, a London-based firm that offers consulting services and a data-science fellowship programme with industry placements.

One key difference between data scientists and analysts, he says, is that scientists tend to follow data where they lead them — a 'data-first' approach — whereas analysts generally use numbers to test an established hypothesis.

At the London-based Alan Turing Institute (ATI), Mihaela van der Schaar lets data lead the way. She develops computer algorithms to help personalize treatments, prognoses and risk predictions for patients. “I believe that such techniques can transform medicine, save lives and enable scientific breakthroughs,” she says.

Founded in 2015 by five UK universities and the nation's Engineering and Physical Sciences Research Council in an effort to spawn collaborations with industry and government, the ATI embodies the interdisciplinary spirit of data science, van der Schaar says. She adds that some of the biggest and most interesting problems in data science come from unexpected places. “One of the projects that I am most involved with currently at ATI aims to develop better methods to understand and treat cystic-fibrosis patients,” she says. “This comes from neither the industry nor the government, but through a partnership with the UK Cystic Fibrosis Trust.”

Jasmine Castagna

Insight programme fellows learn data science.

Interdisciplinary connections also form the foundation of Moore–Sloan Data Science Environments, an initiative that has created data-science centres at the University of California, Berkeley, the University of Washington in Seattle and New York University. Each centre gathers data scientists from a wide range of disciplines into a single workplace. “The physical spaces are really important,” says Edward Lazowska, a computer scientist at the University of Washington. “The idea was to accelerate discovery by building bridges between those who advance the methodology of data science — researchers in mathematics, statistics and computer science — and those who put it to work in the social, physical and life sciences.”

Not all PhD programmes prepare researchers for the real world of data science, so short-term training courses are becoming increasingly popular. Taylor got her foot in the door through a seven-week fellowship with Insight Data Science, an institution based in Palo Alto, California, that connects data scientists with US companies. Its fellows have gone on to careers at Amazon, Facebook, JP Morgan and both large and small technology companies.

Taylor says that the Insight fellowship was invaluable in teaching her the skills needed to land her current job. Among other proficiencies, that training taught her to think beyond data analysis to the practical applications of the finished product. “The business-oriented thinking at Insight was very helpful,” she says. She has observed that PhD-level scientists who land data-science jobs in industry tend to struggle with the transition unless they've already had first-hand industry experience. “I had a very fast start with my company because of my ability to think about products,” she says.

Health-care help

Data science has also arrived at hospitals and medical centres, giving many research scientists another outlet for their skills. As part of her neuroscience training at New York University and the nearby University of Rochester, Anasuya Das, also a former Insight fellow, had to learn the coding language C++ to build software to help individuals recovering from strokes to practise visual learning using their home computers. Das also took a couple of computational-neuroscience courses that helped to spark her interest in data science as a full-time career, which she now pursues at the Memorial Sloan Kettering Cancer Center in New York City.

Das is working on a system to match patients with clinical trials. “My days vary widely, and range from doing pure software engineering to meeting with physicians about the products we're building,” she says.

Lazowska predicts that the rise of data science will eventually transform the 'publish or perish' system of science. In time, codes and data sets will — as publications are currently— become prerequisites for career advancement, he thinks. For now, he says, he and his colleagues are encouraging researchers to list data-science accomplishments on their CVs. They also recommend that promotion and tenure committees consider these feats as valid metrics.

Ram has loaded his CV with a wide range of data-science projects. He is currently working on a long-term effort to measure the impact of human activity on Tahiti's ecology. The questions have become more sophisticated since his Yellowstone days, but so have the tools. Instead of labouring for months over a data set, he can now get results in hours — which is not to say that data science has become easy.

“A big challenge for a lot of folks is having the bandwidth to learn some of these new tools and how to correctly apply them,” Ram says. “As more time goes by, it will become more likely that almost every principal investigator will need someone on the team who has these special data skills.” The definition and expectations of data science may shift over time, but the field is here to stay.

Change history

Corrected online 22 September 2017
An earlier version of this story mistakenly described Amelia Taylor as a former tenure-track mathematician. In fact, she was a tenured associate professor.

Author information

Affiliations

  1. Gaia Donati is a freelance writer in Windsor, UK.

  2. Chris Woolston is a freelance writer in Billings, Montana.

Author details

Additional data