Neil M. Schweitzer, Robert M. Rioux and Rajamani Gounder are the organisers of a series of workshops on rigour and reproducibility (R&R) in heterogeneous thermal catalysis. Here, they share with Nature Catalysis their views on this cornerstone of science and discuss opportunities to cultivate best practice.
Neil, during one of your recent talks you introduced REACT. Can you describe what REACT is and your role in the facility?
NS: The Reactor Engineering and Catalyst Testing (REACT) core facility at Northwestern University is a user facility dedicated to research in catalysis. Most shared instrumentation labs (e.g., electron microscopy, NMR [nuclear magnetic resonance], XRD [X-ray diffraction]) at universities in the United States are core facilities. A key feature that distinguishes a core facility is its financial model; they are not directly funded from sponsored research projects, but by charging users hourly fees for instrumentation use. Core facilities are non-profit; the fees collected directly offset the expenses associated with running the lab.
Typically, the value of a core facility is providing access to specific and usually expensive instrumentation. REACT is unique as a core facility in this regard, because we don’t have any instrumentation that is particularly expensive. Our value lies in our people and expertise in experimental design and system construction. We provide users access to catalyst-testing methodologies, such as teaching a first-year chemistry student how to measure intrinsic kinetics in a packed bed reactor, or a fourth-year graduate student how to determine reactive surface intermediate coverages from transient spectroscopy measurements. In most cases we use instruments available in REACT, but we also help research groups design and construct systems in their own labs. Our mission statement is: we don’t merely train students to use equipment; we teach students how to collect data that answers their research questions.
How did you all become interested in R&R? And why is it so important to you?
NS: I have always enjoyed training students. In addition to teaching students how to operate instrumentation, it’s important to teach them how to collect robust data that can be used to support scientific conclusions. Catalysis is a complex phenomenon on several time and length scales, and it can be difficult to collect data on the process of interest. Newcomers to catalysis, regardless of career stage, need guidance in these areas. For example, a common discussion I have with students starts with the question: I tested my catalyst in the flow reactor but only recorded 5% conversion, how can I improve my catalyst? This question assumes that conversion is a good metric for comparing catalysts, and that active catalysts should always exhibit high conversion. Of course, conversion depends on many factors that go beyond the intrinsic properties of the catalyst, including process variables like space velocity. For chemical engineering students that have taken reactor engineering and kinetics courses, this is a straightforward concept to grasp. But to a chemistry student that may have only been introduced to kinetics within a physical chemistry class, the need to operate the system at differential conversion may be counterintuitive. I have spent significant effort standardizing instrument protocols and training users of various backgrounds and skill levels to generate reliable data, which put me on the path to thinking about R&R issues in the research community as a whole.
RG: Whenever we start studying a new catalyst or reaction in my research group, we perform a literature search to find prior articles that report data that we can use as an initial guide. Frequently, insufficient details are provided in earlier work to fully reproduce the experiment, and a variety of discrepancies in catalytic behaviour are found when we attempt an equivalent experiment and compare our data to published results. This scenario also commonly occurs within a given research group, when a new researcher encounters difficulties in reproducing data from former group members. A significant amount of effort is then spent to reconcile these differences. One example is the measurement of activation energies for protolytic alkane cracking and dehydrogenation reactions on Brønsted acid sites in zeolites. Over nearly four decades, research groups around the world have been able to consistently measure activation energies for protolytic alkane cracking, but not for dehydrogenation. Recently, our group determined that carbon deposits can form on zeolite surfaces during reaction, and play a major role in alkane dehydrogenation routes. The extent of carbon deposit formation depends, however, on various reactor-level hydrodynamic factors that are seldom reported in experimental studies but are responsible for the disparate data reported for alkane dehydrogenation reactions.
RR: As the principal investigator of a research group in academia, R&R is — or should be — an intrinsic aspect of your job. In practice, you are unable to conduct or supervise every experiment conducted in your lab. Therefore, we define internally how an experiment should be conducted through the development of standard operating procedures (SOPs), which are living documents that are continuously updated as we gain more understanding. SOPs are the primary mechanism to ensure rigour in our experimental design. Turnover of students and postdocs is high, so the knowledge learned must be transferred to future researchers by providing exhaustive and nuanced experimental details; well-kept laboratory notebooks are crucial in this regard. Losing details, no matter how trivial they are, inevitably impacts reproducibility. A recent example in my lab pertains to the impact of thermal history on the performance of oxidation catalysts. A previous student heated the catalyst to the reaction temperature very slowly, due to a poorly tuned furnace; a newer student, after properly tuning the furnace, heated the catalyst much faster. The impact of heating rate on the reproducibility of the catalytic results was stark; yet, it took months to identify that thermal history was the primary reason for reproducibility struggles. Seemingly innocent details can make all the difference!
Do you think that catalysis scientists are sufficiently confronted with R&R issues during their training?
NS: Yes, but researchers with less experience conducting common measurements in our field may not be able to evaluate or recognize issues when they arise. Take catalyst testing for example. I tell students that Mother Nature is never wrong; based on how their experiment is performed, they will always obtain the correct data. The hard part is designing the experiment so the data are representative of the fundamental phenomenon they are trying to understand. It can be difficult to discern if other processes (e.g., transport disguises, temperature gradients) are confounding the results, which will be reproducible but may lead one to conclude that someone else’s data are not reproducible. It requires significant experience to recognize these instances. In other words, researchers are confronted with questions about R&R in every experiment they run but may be inexperienced in identifying factors that determine if their measurements are inherently comparable to someone else’s measurements.
Further, although it is not true for all research groups, I think academia generally does a poor job of training students, particularly on the use of key instrumentation that may rely on operating principles outside of the students’ core classes or research area. This is particularly true for instrumentation that has been designed by manufacturers to be user-friendly. For example, a materials science student may rely heavily on surface area measurements of their materials as a critical component of their thesis. However, because adsorption theory has never been a part of their core curriculum and their own group does not have experience in the area, they may not know critical details about BET [Brunauer–Emmett–Teller] analysis or how the physisorption instrument itself works. Most commercial physisorption instruments are designed to run for long periods of time and automatically calculate BET surface areas without much human intervention. However, similar to above, these instruments are capable of calculating reproducibly unreliable values. It takes focused training and experience to recognize the need for instrument maintenance or adjusting the BET analysis parameters to derive reliable results. At Northwestern, I run a free seminar series and teach a catalysis lab course using REACT equipment. The focus is to teach students practical aspects of collecting data. This includes a deep dive into simple instrumentation hardware, demonstrating how key instrumentation works, and teaching how they can fail.
RG: This is certainly an issue in catalysis science, but I think it also reflects a more general phenomenon that occurs in all scientific fields. When new researchers enter a new field, they are confronted with a very steep learning curve, while also being motivated to obtain experimental results quickly, for example to generate data for a qualifying or candidacy exam or even a publication. But time is key — usually, there is a somewhat lengthy learning process that new researchers need to undergo by performing initial experiments, carefully scrutinizing the data with their advisor and other researchers, and testing for various artefacts and alternative hypotheses with extensive subsequent experimentation.
RR: Exposure to R&R issues vary greatly among (academic) laboratories based on the experiences of lab personnel. One challenge associated with R&R in catalysis is the multi-disciplinary nature of the field. At a minimum, a researcher is involved in synthesis, characterization, and reactivity evaluation of their catalyst — do they have mentors in their lab with expertise in R&R issues in each of these efforts? Probably not. This is why I believe developing a network of testing and training facilities, like REACT, is critical to the success of this latest effort in R&R in heterogeneous catalysis.
R&R is a topic that periodically receives attention in the literature and indeed a large number of articles have been produced over the years. How come we still face R&R issues?
NS: Although R&R is of paramount importance to science, enforcing this in our own work is not always the highest priority. One root cause of this problem is how academia at large evaluates scientists in awarding advanced degrees, tenure and promotion, and proposals and awards — a lot of emphasis is placed on grant and publication record, novel contributions to the scientific field, and publishing articles in high impact journals. As human beings, we use metrics like journal impact factor, citation counts, and h-index as shortcuts for evaluating the impact and quality of a researcher’s work. Pressure to enhance these metrics may lead to hasty decisions to publish findings before being scooped, before a student leaves, or ahead of a program review. R&R takes time — when time is short, R&R takes a back seat.
I think the cyclic nature of this topic is due to new generations of scientists coming into the field and wanting to make a change, usually after a high profile, well publicized failure. These topics have been discussed in detail before and benchmarking efforts have been undertaken in our field, but these efforts have faded out because these activities are not incentivized or rewarded by our current research environment. I have had discussions with senior researchers in our field, and they have conveyed a sense of hopelessness — that things can never get better — because these previous efforts have faded out.
To make lasting change, we need to recognize that this is a systemic problem, and be willing to make a seismic change in how the research community functions. If R&R is not incentivized and rewarded, it will never be the top priority for researchers. One approach is to recognize the effort of researchers who are not tenure-track faculty, and create more opportunities for these types of roles. For example, as director of REACT, I am not evaluated on generating new ideas or new understanding in science — this is the role of tenure-track faculty. Instead, I am evaluated by the output of my users, which incentivizes me to help them enhance the quality of their science.
If the root causes are systemic, what can an individual do?
RG: As a scientist, individuals can improve their publications by providing information about how many replicate experiments were performed, and documentation on efforts to reproduce data within the study. For example, in a study that reports the synthesis of a new catalytic material, how many times was the procedure attempted and how many attempts resulted in successful preparation of the intended material? This information is seldom reported, but is important information for other researchers that want to reproduce the study. Additionally, measurement of values (e.g., kinetic parameters) with more accurate statistical uncertainties can provide other researchers a better understanding of how reproducible the findings in the study may be. As a reviewer, individuals should make sure the conclusions are fully supported by the data and methods presented, and should ask for additional details in the methods section to help increase the likelihood that the findings in the study can be reproduced by other researchers.
RR: Individuals are the first line of defence to ensure the necessary experimental details are provided for inter-laboratory and intra-laboratory reproduction. Replication of experimental results within a laboratory by an associate not directly involved in the research is a simple approach that any individual can implement.
NS: We should also recognize our own biases that may play a role in the review process. Considering real-world constraints on our time and effort, we tend to take shortcuts when evaluating manuscripts. One such shortcut is assuming that methods, data, and conclusions that come out of well-established groups are more likely to be correct; thus, we do not scrutinize the findings or methodology as much as when reviewing a report from a less well-known group. We have to be willing to maintain uniform standards and effort reviewing every manuscript.
In your experience, how serious is the problem?
NS: It is a big problem, but maybe not as big as the general media would have us believe. I think the public generally lacks an understanding about how the scientific process works. The media will promote individual reports and discuss them as though they are fact, but this is not how science is supposed to work. Science is supposed to test hypotheses — failures are just as important as successes — and provide updated conclusions as new data become available. For example, this dynamic played out on a global scale during the coronavirus pandemic regarding the efficacy of masks to prevent the spread of the virus. A major problem, however, is if a large number of researchers have a high variance in nominally similar measurements, which makes it difficult to come to a common conclusion. Two recent examples in the catalysis and materials communities are the large variations in turnover frequencies (>100x) reported for vanadium-catalysed propane oxidative dehydrogenation, and in calculating surface areas from gas adsorption isotherms for metal-organic frameworks.
Can you describe your idea to address issues of R&R in catalysis, in particular the concept of benchmarking and the potential role of testing facilities?
NS: I believe the root causes of R&R issues are systemic within academia and require major changes in how science is conducted. Thus, any plan implemented to improve the current landscape will require maximal community engagement and backing. With this in mind, we have started an effort with the goals of engaging the community on R&R issues, creating a forum that can foster ideas for improvement, and providing a platform for individuals to make specific, strategic plans to implement ideas. Our website (www.catalysisrr.org) details some of our ideas and events.
One tool we are developing to help reviewers and new researchers in the field is a guidebook of best practices for reporting data in the literature. This guide will be the result of an NSF- [National Science Foundation] and DOE- [Department of Energy] sponsored workshop held on July 21–22, 2022. This workshop brought together a number of participants to detail best practices for reporting data from specific measurements (divided into catalyst synthesis, catalyst testing, and characterization groups) and recommendations for the use of benchmark materials (for different material classes). For specific types of measurements, the guide will include common uses, known pitfalls, and useful references describing best practices. For benchmarking materials, the guide will include known industrial catalysts for specific reactions, and guidance about what characteristics of benchmark materials should be reported in literature. The primary purpose of this guide is to collect current information about best practices in one location, to help reviewers and new researchers spend less time finding information they need.
We are also planning virtual workshops focused on specific ideas for establishing community tools we think can help support R&R efforts within individual research groups. Our initial ideas include developing training media and workshops, establishing a database of experimental data, exploring the future of publications, and establishing a national network of benchmarking facilities.
We are also exploring the possibility of establishing benchmarking standard materials in our field. What we need is ample material that can be tested by many different research labs, so its properties can be established. Similar past efforts were short-lived because of a finite supply of material or a lack of motivation for researchers to perform the necessary measurements. The key question is who is going to supply the material and do the work? I believe that this type of work is best suited for a facility like REACT — a laboratory that is not incentivized to create new knowledge, but is incentivized to help improve the quality of science in the research community. The limitation of REACT is the small portion of the community it can support. To serve the entire community, we need a network of labs that can work together and coordinate efforts.
In what ways can testing facilities contribute to the mission of academic research to produce innovation?
NS: Testing facilities can help academics be more efficient in two different ways. First, they can do the routine work of synthesizing and testing benchmark materials repeatedly. Outsourcing this work to an independent lab can free up academic labs to innovate. Second, testing facilities can help clear up conflicting data among different labs, as using common benchmarking materials can help ensure conflicting data is not due to instrumentation or methodology differences. It can help researchers identify key catalyst material differences that cause performance differences, advancing the field faster.
How do such testing facilities differ from corporate R&D testing units?
NS: There are industrial labs that do this type of work—I know of one corporate benchmarking facility that has aggregated so much data on a specific structure-sensitive reaction, they can accurately predict the physical and chemical properties of a new material solely from testing data without needing any characterization. This incredible predictive power comes from the large amount of data the lab has generated, but this is only for one material and reaction type. Moreover, this information is treated as proprietary, and not available for the benefit of the scientific community. Our testing facilities would provide data for various materials and reactions, and in the open scientific domain (e.g., open-access databases) to benefit the community. Another significant difference is that these facilities can serve as centralized training sites for the general scientific community, such as hosting students to learn best practices and holding workshops and short courses on specific types of measurements. I’ve said that for lasting change, we need to make a seismic change in the way we do research. We believe that these labs can provide the resources and tools we need to enhance the science of tomorrow.
About this article
Cite this article
Esposito, D. Induce to reproduce. Nat Catal 5, 658–661 (2022). https://doi.org/10.1038/s41929-022-00830-2