Wang, M. et al. Cell Syst. 7, 412–421 (2018).

Mass-spectrometry-based proteomics research has yielded huge amounts of data made publicly available via several dedicated repositories. Despite such wealth, these data have largely not been leveraged for reuse by the proteomics community. To facilitate this, Wang et al. report the MassIVE Knowledge Base (MassIVE-KB), a tool that continuously aggregates proteomics data as they become available, in an open, reusable format. They distilled spectral libraries from 31 TB of human proteomics data available in the MassIVE repository; statistical controls and provenance records are imposed to ensure that the libraries are of high quality. Such libraries enable researchers to leverage previous discoveries for protein identification and quantification in their own mass spectrometry experiments, using either data-dependent or data-independent acquisition modes. Wang et al. themselves found MassIVE-KB library evidence for 430 ‘missing’ human proteins with scant previous experimental support.