Quality control of protein reagents for the improvement of research data reproducibility

Proteins and peptides are amongst the most widely used research reagents but often their quality is inadequate and can result in poor data reproducibility. Here we propose a simple set of guidelines that, when correctly applied to protein reagents should provide more reliable experimental data.

The use of poor quality peptides, proteins and antibodies as experimental reagents impacts both the quality and cost of research carried out using these reagents. One estimate 3 puts a figure on the level of irreproducible preclinical experiments in the US (using 2012 data) at fifty percent, equating to a staggering economic cost of $28 billion per annum in the US alone, of which thirtysix percent ($10.4 billion worth of research) was directly attributed to poor quality 'biological reagents and reference materials'. At present we are aware of only very few journals where there is a requirement for authors to include QC data for the proteins used as 'reagents' in their studies. This situation appears to be in direct contrast to e.g., the high standards of statistical analyses and declarations of statistical compliance required in articles submitted to high-end journals when presenting genomic, proteomic and structural data 13 . With the aim of addressing this obvious imbalance, and in response to the problem of data reproducibility when protein reagents are involved, a working group comprised of members of both the ARBRE-MOBIEU and the P4EU networks produced a list of recommended tests (QC Guidelinesreported in Supplementary Note 1 and accessible at https://p4eu.org/protein-qualitystandard-pqs or https://arbre-mobieu.eu/guidelines-on-proteinquality-control). These guidelines were developed with reference to the available literature 12,14 and the extensive professional experience of the working group members, to aid in the validation of protein samples used in biological research. They have been embraced by a wide community of specialists (a full list of these researchers can be found on ARBRE-MOBIEU and P4EU website) and comprise three parts: (1) minimal information, (2) minimal QC tests, and (3) extended QC tests. We propose a list of minimal QC tests that are based on simple experimental methods that are widely available (Supplementary Table 1 and Supplementary Note 1, Supplementary Figs. 1-7). Together with this minimal information, we feel that these or similar disclosures should become compulsory documents in any submission to scientific journals when using protein/peptide reagents. While generally considered complementary, extended QC tests may be considered essential when using the proteins in specific experimental downstream applications. Our protein QC guidelines are summarized described below and schematically illustrated ( Fig. 1).

Minimal information
(1) For recombinant proteins, the complete sequence of the construct used in the reported experiments should be made available and we highly recommend confirming the sequence after cloning by sequencing to avoid wasteful production trials.
(2) Expression, purification and storage conditions should be fully described such that they may be accurately reproduced in any laboratory. (2) Homogeneity/dispersity refers here to the size distribution of the protein sample, which can generally be correlated with oligomeric state (monomer, dimer etc.) or the presence of aggregates. Whereas poly-dispersity is not per se an indication of instability, preparations showing the presence of 'incorrect' oligomeric states or higher order 'aggregates' suggest that the protein may not be in an optimal/functional state. This can have a dramatic effect on the results of experiments to determine e.g. enzyme kinetics and protein-ligand interactions, essentially as a result of an overestimation of the concentration of active protein.
Protein homogeneity/dispersity may be assessed by Dynamic Light Scattering (DLS), size exclusion chromatography (SEC) or, preferably, by SEC coupled to multi-angle light scattering. (3) The identity of a sample can be confirmed using either 'bottom-up' MS (mass fingerprinting or tryptic digests) or 'top-down' MS (by measuring intact protein mass). The former will confirm that the correct protein is being used and not e.g. a host protein of similar mass that has been purified in error. The latter will confirm the identity of the protein and will also indicate whether it has suffered any proteolysis during purification (intactness/micro-heterogeneity).

Extended QC tests
In addition to this short list of minimal/essential controls, other techniques are recommended to further characterize protein samples and their suitability as experimental reagents, for instance the folding state of proteins and the specific activity of enzymes. Proteins produced in Escherichia coli that are destined for use in experiments with cultured cells should be tested for the presence of lipopolysaccharides/endotoxins and UV spectrophotometry is mandatory for DNA/RNA binding proteins. Examples in which protein quality assessment resulted in improvements of sample quality with critical impact on downstream experimental results are presented in supplementary information (Supplementary Note 2, Supplementary Figs. 8-12). The results of a large scale survey among users who volunteered applying the guidelines in their routine experiments has also been carried out 15 .

Conclusions
In our experience, the application of the limited number of simple QC tests suggested above provides reliable indicators of the quality of the protein employed as experimental reagents, and yields more reproducible results in downstream applications. We believe that their implementation and the public availability of such QC data could therefore significantly increase the level of confidence in the published data resulting from the use of protein reagents, as well as the ability to reliably reproduce the experimental data.
This condition, which should ideally be the norm, is in reality challenged by several factors as reported in a recent survey 5 . Selective reporting, insufficient availability of raw data and the paucity of information in many 'Materials and Methods' sections are all factors which contribute to create opacity. The decline of the essential materials and methods sections of published papers dates back, understandably, to the times when many journals were available only in print and the pressures to minimize the sizes of submitted papers. With the advent of on-line publishing it is time to advocate the (re-) integration of these essential sections to their former status to allow other researchers to reproduce the data therein without resorting to making contact with the authors. Although this effect has been partly mitigated by the current availability of Supplementary Data sections in many online journals, the presented data often falls short of a full description of the experimental conditions used and often lacks any form of QC data relating to protein quality. The present interest of Editors for the systematic storage of (raw) data [https://www.springernature.com/gp/open-research/open-data/ practical-challenges-white-paper] should consider also the inclusion of this methodological data.
We suggest that implementation of guidelines for protein quality evaluation should be considered an entry point towards the development of improved and ideally compulsory reporting practices of data obtained with protein reagents. It is our contention that 'Supplementary Data' sections should also contain details of the QC tests performed on any protein/peptide reagents used in a study, independent of the source of the protein reagent (commercial vendors or purified in an academic lab), in order to give referees and readers an indication of the quality of the materials being used to derive any given data set. To this effect, we suggest the development-in co-operation with journal editors -of a standardized form for QC reporting and annotation for authors to complete during the submission process. A model of such a checklist is illustrated in Supplementary Table 1 and could be made available to referees and editors but also published in the supplementary material to allow reader scrutiny. Finally, all the stakeholders-scientists, editors and funding agencies-will profit from improving data reliability and reproduction by means of systematic and accurate reagent QC. Such practices should minimize the wasteing of time and resources and, in addition, favor future metadata analysis.