Data scientists in South Africa are readying themselves for a flood of information that is due to crash over them when the country’s biggest radio telescope doubles the scale of its operations in March.
A terabyte-an-hour data deluge, which would fill more than three DVDs a minute, will flow from a network of radio dishes called the MeerKAT array. Currently consisting of 32 operational dishes, the array will expand to 64 next month.
The impending flood of data is just a trickle compared with what will arrive after 2020, when international astronomers begin to expand MeerKAT to form part of the Square Kilometre Array (SKA). That will be the world’s largest radio telescope and astronomers are trying to develop the expertise to handle torrents of data ahead of its full opening in 2026. South African data scientists seek also to transfer their expertise to areas such as Earth observation and bioinformatics.
“We are building a system that empowers scientists, so that they can be part of processing the data — a system that allows the researchers to work with the data itself and work with the analytics, as if it was on their desktops,” says astronomer Russ Taylor, who divides his time between the University of Cape Town and the University of the Western Cape in the same city.
The MeerKAT array is designed to collect relatively weak radio signals from space and combine them to extract more information. To convert it into the first phase of the SKA, engineers will initially add another 136 dishes to the MeerKAT site in the Northern Cape province in South Africa, and connect them to 130,000 antennas scattered across Western Australia.
Data from the SKA will be shared with scientists from ten partner countries. But for now, South Africa is keen to retain control of its MeerKat data rather than exporting it to other countries that already have data-processing infrastructure, says Taylor.
This is partly because distributing data is very expensive. “Fibre optics to connect two points in urban areas cost thousands of US dollars per mile,” says Ugo Varetto, acting executive director at Australia’s Pawsey Supercomputing Centre in Perth, which crunches and stores data from existing radio telescopes dotted over Australia. “In extreme environments or underwater, that’s hundreds of thousands of dollars.”
Astronomers also value highly the data produced by their telescopes and don’t want to send it elsewhere, says J. J. Kavelaars, group leader at the Canadian Astronomy Data Centre in Victoria, British Columbia. “All the effort of collecting the observations is expressed in those data files. Sending those data out of your jurisdiction is like shipping diamonds overseas for cutting,” he says.
Because the astronomy data sets will be very large and will sit in geographically separate databases, scientists need to develop software tools to access and bring together these data sets in an efficient manner, says Mattia Vaccari, a data scientist at the University of the Western Cape.
Others seek to take advantage of the data-crunching capabilities that South Africa is developing. They can be used for applications such as monitoring water resources or urbanization across the continent, says Val Munsami, head of the South African National Space Agency in Pretoria.
Health care officials would also like to take advantage of the growing expertise in number crunching in South Africa. Glaudina Loots, director of health innovation in the South African government’s Department of Science and Technology, says that her unit plans to “piggyback” on the astronomy investment and data infrastructure. “Part of that is earmarked for precision medicine. If you can’t handle the data, and have to export it out of the country, then you start running into problems,” she says.
“South Africa has one of the best hands in the game at this point,” says Tony Beasley, an astronomer and head of the US National Radio Astronomy Observatory in Charlottesville, Virginia. “In terms of deployed science infrastructure, South Africa is way ahead.”
Nature 554, 286-287 (2018)