Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

wwPDB biocuration: on the front line of structural biology

Biocurators, the backbone of the wwPDB, manage structural biology data deposition, quality, and integrity, and provide integral support to the research community worldwide.

Through the open-access Protein Data Bank (PDB) archive, structural biologists provide and gain access to atomic-level views of over 175,000 biological macromolecules. These structures augment our understanding of the connections between structure and function in biology. The archive continues to grow every year in both the number of structures and complexity––even during the COVID-19 pandemic. The PDB is widely regarded as one of the best-curated biodata resources1, supporting the structural biology community and resourced by a team of expert biocurators vital to the integrity of the PDB data ecosystem. The biocurators collaborate with PDB depositors using the OneDep software system2 to standardize, validate and biocurate incoming structure data, ensuring that they are findable, accessible, interoperable and reusable (FAIR). Now in its 50th year of operation, the PDB has exemplified the FAIR principles for responsible data management since long before they were widely known and adopted3.

wwPDB Biocurator Summit held virtually in 2020 during the pandemic. Top row, from the left: Yuhe Liang (RCSB PDB), Jasmine Young (RCSB PDB), Irina Persikova (RCSB PDB). Second row, from the left: Deborah Harrus (PDBe), David Armstrong (PDBe), Brian Hudson (RCSB PDB). Bottom row, from the left: Ezra Peisach (RCSB PDB), John Berrisford (PDBe), Minyu Chen (PDBj).

Key to the long-term success and sustainability of the PDB has been the Worldwide Protein Data Bank partnership4,5, formalized in 2003. The wwPDB jointly manages the single global PDB archive. Today, more than 15 biocurators work at wwPDB data centers located in the United States, Europe and Asia. Using the OneDep system, we share responsibility for managing incoming PDB structures along geographic lines, bringing diverse expertise in biochemistry, biophysics, computational chemistry, enzymology and small-molecule crystallography to the global enterprise. With training in macromolecular crystallography, nuclear magnetic resonance spectrometry and electron microscopy, we operate on the front lines of structural biology, working closely with thousands of PDB depositors every year.

Since its launch in 1971, more than 100 biocurators have served the PDB, united by passion for science, thirst for knowledge, and interest in file formats, data dictionaries and ontologies. This is as true now as it was 47 years ago when PDB biocuration pioneer Frances C. Bernstein and others at Brookhaven National Laboratory curated data that were represented in the original PDB file format. Today, strict definitions of data types and file formats in the fully extensible PDBx/mmCIF data dictionary allow identification of errors and inconsistencies, highlight molecular properties, and connect data depositors, biocurators and data consumers. While structural biologists push the envelope by determining ever larger and more complex structures, developing novel experimental methods, or designing new validation tools, we work closely with software developers, structural biology facilities and scientific innovators to refine PDB data management practices. We also use OneDep periodically to look back and across the archive with ‘remediation efforts’ to bring previously released structure data up to modern standards.

The move for biocurators to work from home, away from each regional data center during the COVID-19 pandemic, was enabled by the global collaborative practices that support the wwPDB partnership. As the structural biology community stepped up to provide much-needed insights into SARS-CoV-2, wwPDB biocuration of SARS-CoV-2 protein structures was prioritized to ensure rapid public release of data while maintaining our enduring commitment to quality. More than 1,000 COVID-19-related structures are now freely available from the PDB, a year after release of the first SARS-CoV-2 structure. Combined with SARS-CoV-1 (over 200 structures) and MERS-CoV (70 structures) PDB structures from the two earlier epidemics, PDB data are facilitating structure-guided discovery and development of anti-coronavirus drugs, vaccines and neutralizing antibodies.

Open access to PDB data has helped to expose the inner workings of the coronavirus beyond the scientific community. Throughout the COVID-19 pandemic, wwPDB biocuration staff have continued to support research, training and education worldwide. We are thrilled that our efforts have contributed to general awareness of the importance of scientific advances. As science evolves and new technologies and methods are adopted by the structural biology community, we will face increasing challenges in handling the growth in number, size and complexity of depositions to the PDB. In response to these challenges, we are working with the community to increase the accuracy of metadata in PDB entries through increased automatic data harvesting, increasing the level of automation in the biocuration pipeline to improve biocuration efficiency, and collaborating with the community task forces and mmCIF Working Group (http://wwpdb.org/task/mmcif) to extend the data model and improve validation of models and experimental data that will better support new technologies and methods.

References

  1. 1.

    Howe, D. et al. Nature 455, 47–50 (2008).

    CAS  Article  Google Scholar 

  2. 2.

    Young, J. Y. et al. Database 2018, bay002 (2018).

    Article  Google Scholar 

  3. 3.

    Wilkinson, M. D. et al. Sci. Data 3, 160018 (2016).

    Article  Google Scholar 

  4. 4.

    Berman, H., Henrick, K. & Nakamura, H. Nat. Struct. Biol. 10, 980 (2003).

    CAS  Article  Google Scholar 

  5. 5.

    wwPDB Consortium. Nucleic Acids Res. 47, D520–D528 (2019).

    Article  Google Scholar 

Download references

Acknowledgements

RCSB PDB is jointly funded by the US National Science Foundation (DBI-1832184), the US Department of Energy (DE-SC0019749) and the National Cancer Institute, National Institute of Allergy and Infectious Diseases and National Institute of General Medical Sciences of the US National Institutes of Health (R01GM133198). The Protein Data Bank in Europe is supported by the European Molecular Biology Laboratory–European Bioinformatics Institute and Wellcome Trust (104948). Protein Data Bank Japan is supported by the Database Integration Coordination Program from the National Bioscience Database Center (NBDC)–JST (Japan Science and Technology Agency), the Platform Project for Supporting in Drug Discovery and Life Science Research from AMED, and the joint usage program of Institute for Protein Research, Osaka University.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jasmine Y. Young.

Ethics declarations

Competing interests

The authors declare no competing interests.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Young, J.Y., Berrisford, J. & Chen, M. wwPDB biocuration: on the front line of structural biology. Nat Methods 18, 431–432 (2021). https://doi.org/10.1038/s41592-021-01137-z

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing