Scientific data must not be ‘balkanized’ into multiple databases, each with its own rules and restrictions.
Almost 40 years ago, GenBank and the EMBL databank started independently. They soon joined forces and, with the DNA Database of Japan, formed a repository now called the International Nucleotide Sequence Data Collaboration (INSDC). China is now set to join. The INSDC has been one of the world’s most successful initiatives to collect and share scientific data (see Nature 590, 183–184; 2021). As DNA sequence data accumulate at ever-greater rates, the need for INSDC to continue and expand has never been more urgent.
The COVID-19 pandemic is an excellent example of data sharing leading to effective science (see Nature 590, 195–196; 2021). The first sequence of the SARS-CoV-2 virus was released by Yong-Zhen Zhang on 11 January 2020 and was released completely openly that same day in the INSDC databases (accession #MN908947). This enabled the development of rapid PCR-based tests for the viral RNA and jump-started vaccine development.
As international advisers to the INSDC, we call on the scientific community to help ensure that this openness and sharing grows to include many more types of data, so that scientists can use the INSDC to catalyse ever more biological discoveries.
Nature 591, 202 (2021)
A full list of co-signatories to this letter appears in Supplementary Information.