Introduction

Of all the opportunities AI offers, reducing the climate change-enhanced risks of natural hazards is among those most urgent and within reach. Wildfires burning ever hotter and worsening compounding tropical cyclone and flooding events that disproportionately affect marginalized communities exemplify the increasing dangers and costs of natural hazard events1,2,3. At the same time, applications of AI to weather and climate have advanced rapidly [e.g., refs. 4,5,6]. Global data-driven weather models are making headlines for their speed, efficiency, and performance7,8,9, and it is clear there is great potential for AI to transform predictions for climate-related disasters. However, this potential brings a number of challenges.

Of all the challenges AI poses, trustworthiness is one of the biggest10,11. Trustworthiness is particularly critical for managing hazards that are being exacerbated by climate change and associated global governance failures12. Realizing the full potential of AI to help manage and mitigate increasing and compounding risks without creating additional risks13,14 requires a deep understanding of how to develop trustworthy AI.

Developing trustworthy AI is complex and requires a multifaceted approach that includes research that is (a) both fundamental and applied (i.e., in Pasteur’s quadrant15), and (b) both disciplinary and convergent. Here, we illustrate how the NSF AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography (AI2ES)16 is conducting complementary research on these different fronts and how they work together to advance knowledge and practice about trustworthy AI. AI2ES is one of the 25 current NSF AI institutes. AI2ES researchers from AI, social, atmospheric, and ocean sciences are all working together to create trustworthy AI. To achieve this, we are working together to better understand what it means for AI to be trustworthy in the context of AI for weather and climate hazards (Fig. 1).

Fig. 1
figure 1

Convergence research as instantiated in AI2ES.

Convergence research for trustworthy AI

Over the last decade, the value of convergence research has emerged in recognition of increasingly complex and societally important science problems. These problems require knowledge creation and innovation beyond the bounds of traditional science disciplines and thus demonstrate the need for research that deeply integrates intellectually diverse sciences17,18,19,20. The need for and value of convergence research has been documented in the context of natural hazards, disasters, and risks21,22,23. Furthermore, there have been broad calls for facilitating convergence of natural, social, and computational sciences to advance Earth systems science24.

Developing AI for weather, climate, and coastal hazards that is trustworthy, as well as trusted and used, is a multifaceted science problem that requires convergence research to meaningfully and comprehensively address. To the best of our current understanding, trustworthiness stems from a complex intersection of factors, including users’ decision-making needs and contexts; data quality and representativeness; model development processes, techniques, and specifics; model availability, interpretability, explainability (tools to aid comprehension), and integration into users’ workflows; perceptions of the model developers’ expertise; and model skill across hazards and geography25,26,27,28. The centrality of human perceptions, communications, and decisions at this intersection makes the inclusion of social, behavioral, and cognitive scientists critical. Accordingly, AI2ES brings together geoscientists with domain expertise; computational scientists with diverse AI expertise; and social/behavioral scientists with expertise in judgment and decision making, risk and decision analysis, and communication of natural hazards. We are collectively committed to partnering with professionals like weather forecasters, whose work affects society broadly and is increasingly likely to depend on AI. Moreover, several AI2ES scientists are knowledge brokers and boundary spanners, who bridge across fields as well as between research and practice29,30. Together these attributes and commitments facilitate generative, supportive, and innovative collaborations amongst the team. Figure 1 illustrates how we work together across hazards and researchers to conduct convergence research.

In this paper we illustrate our approaches to trustworthy AI through three research threads. These three examples each highlight different research components and problem-solving strategies. They are representative of the efforts in AI2ES, but we have many additional examples as well.

AI trustworthiness perceptions of professional decision-makers

Risk messages consist of any type of information about the potential threat, severity, consequences, and recommended actions that pertain to a risk, including those posed by natural hazards31. In the context of weather, climate, and coastal hazards, the predictive information provided by AI models about whether, when, and where hazards will occur, along with their potential magnitude and impacts, therefore constitutes a type of risk information.

AI as a form of risk information serves a particular role for professional, public, and private sector decision-makers. These include weather forecasters and broadcasters, emergency managers, water resource managers, critical infrastructure officials, school officials, and others. These decision-makers hold the responsibility to protect the lives and livelihoods of their staff, their constituents, and society. Such professional users access, interpret, and use AI-derived information to assess natural hazard risks and to make job role-specific decisions, such as threat, communication, mitigation and protective actions, and emergency response. For example, in the event of hazardous weather conditions, critical infrastructure officials may be responsible for shutting down air or marine traffic, closing roads, or handling power outages. Emergency managers can be responsible for communicating warnings, informing or implementing protective actions such as evacuations or sheltering in place, and correcting misinformation.

AI trustworthiness and trust are paramount for such professional users because of the high-stakes nature of their job decisions, which often are further characterized by high uncertainty and time pressures. Such decision contexts render professional users vulnerable to the AI information that guides their decision-making. This amplifies the criticality of their ability to assess whether they can rely on and put faith in that AI information. These factors manifest acutely in the context of severe convective weather (including tornadoes, high winds, and hail), which thus serve as an important, valuable use case for initial research on AI trustworthiness and trust.

In our first example, we leveraged two newly developed, prototype AI models for severe convective weather prediction. The first applies a random forest-based technique to predict the probability of 1” and 2” hail32. The second is a 2-dimensional convolutional neural net trained to predict the probability of convective storm mode being supercellular, quasi-linear, or disorganized33. These existing prototypes allowed our working group to design formative research and collect data with National Weather Service forecasters to explore foundational research questions about AI trustworthiness with concrete examples. For instance, a central research question that we have investigated is how different descriptive and performance attributes of new AI guidance influence forecasters’ perceptions of a model’s trustworthiness. These include the AI technique used, the training of the AI model, the AI model input variables, the performance of the AI model, and the interactivity with the AI model. As a group of risk communication scientists collaborating closely with atmospheric and AI scientists, we co-developed a survey and structured interview protocol with a decision task. Using the survey and interview protocol, we examined forecasters’ assessments of how essential different features are when gaining familiarity with and when operationally using new forecast guidance; how forecasters perceived and evaluated the information provided to them about the prototype attributes; and how they applied such information to assess the trustworthiness of AI-based forecasting guidance.

Analysis of our data yielded multiple findings at different knowledge scales about trustworthiness and trust of AI information for high-stakes decision-making (see27 for full details). For instance, at the prototype-specific scale, we found that developers hand-labeling the inputs for the AI storm mode guidance increased forecasters’ stated trustworthiness of it. However, this was contingent on the labelers having the relevant domain expertise to do so, because storm mode is a complicated, latent feature to characterize and identify. This finding suggests that the resource-intensive task of human hand-labeling for developing predictive AI guidance may be important for user trustworthiness for difficult but important tasks. At a broader scale, across both prototype products and attributes we evaluated, we found that information about the AI model technique (particularly about the model input variables), information about the model performance (particularly about failure modes), and being able to interact with the AI model output were all essential to increasing forecasters’ assessment of the trustworthiness of the AI guidance. Finally, at the most generalized scale, we found that forecasters’ trust in new AI guidance is a progressive process. This process includes key phases of their initial exposure to the new guidance, their continued familiarization with it through non-operational exploration, and their experience in the operational forecasting environment when they can observe the AI model performance in real-time and potentially use it for forecasting. We further found that centering the forecaster by including them in the research process is a critical part of trustworthiness-building, and we recommended mechanisms for furthering this work.

Our complementary research includes a conceptual framework that recognizes trust and trustworthiness as relational concepts that require centering the one who is doing the trusting34 and a research agenda that emphasizes user-centered perspectives on AI in environmental sciences26.

These findings illustrate the value of our convergence science approach and the multitude of results, across different scales, that resulted from it.

New ways of thinking about science

The AI2ES working group studying severe convective weather prediction is one of several use-case-inspired working groups in AI2ES. Like many of our working groups, the group studying severe convective weather prediction integrates the work of foundational AI experts and geoscientists to develop user-oriented solutions to the challenge of trustworthy AI. This is done in collaboration with, and as identified and defined by, risk communication scientists. Diverse teams such as these are well situated for new ways of thinking about their scientific problems: they have the expertise and diverse perspectives to apply close analogies and create new, shared understandings of unexpected outcomes and surprises35. Being willing to engage in such team work is an essential characteristic of convergence science. Being clear about research goals and being willing to change goals and look for new hypotheses in response to inconsistent findings can facilitate scientific breakthroughs35,36.

Convergence research efforts in AI2ES to date have fostered several new research directions and ideas. Working groups with representation from across the disciplines and organizations in AI2ES have focused on specific environmental hazard use cases (e.g. the prediction of coastal fog37) to co-develop research methods, instruments, and products that will meaningfully advance both disciplinary sciences and convergence science. For example, identifying different comparatives to verify performance for ocean loop current eddy AI/ML models was a challenge for the oceanography research group, which was grappling with the lack of in situ observational data in this domain. In this domain, as in climate modeling, it is common to compare the performance of predictive models with reanalyses, rather than directly with environmental observations. After discussions about the role of verification in trustworthiness, the team ultimately got permission to use drifter data from another research group as a verification comparative, in addition to comparisons against reanalysis.

Convergent problem-solving has also inspired renewed consideration of probabilistic as well as deterministic approaches for coastal fog predictions and has expanded perspectives on the value of physics-informed explainable (X)AI models38. Challenges that we have tackled include how to create new architectures to extract spatio-temporal patterns involving multiple physical variables and developing XAI methods. XAI can be used to better understand and quantify the roots of the improved performance as well as to better explain the mechanics of the model to end-users.

Team problem-solving has also led to co-development of more rigorous research methods by borrowing ideas across the different fields. For example, one working group co-developed data analysis procedures by borrowing rigorous social science content analysis methods to improve the reliability and validity of hand labeling for supervised ML modeling of road surface conditions39. As another example, based on feedback from a large (100+) user base, AI2ES is developing uncertainty quantification methods and visualizations to help predict and communicate the start and end of cold stunning events during which endangered and threatened sea turtles and other marine organisms become lethargic and must be rescued to survive6. Foundational AI scientists contributed adaptation of probabilistic neural networks while geoscientists explored the uncertainty related to the training of the models and its predictors, and risk communication scientists formalized and are systematically studying users’ interactions with these models.

All of these examples exemplify innovative ways of thinking and new ideas that have emerged at the intersection of the environmental, computational, data, and risk communication sciences in AI2ES around shared research problems and tasks.

Using AI for foundational science discovery

Creating trustworthy AI requires us to develop and evaluate a wide variety of AI techniques, including explainable and interpretable AI [e.g., ref. 40] and physics-based AI [e.g., refs. 41, 42]. By developing and testing these on a wide variety of applications, we have a suite of techniques that we can then use for novel applications, including foundational science discovery. Human-AI teams can make use of AI techniques for science discovery in a variety of ways. We discuss several ways that AI2ES is using AI for foundational knowledge discovery.

Many of the most destructive weather and climate hazards are both rare in terms of frequency and also rare in terms of rich datasets describing these hazards. One such example is that of tropical cyclones, and in particular data on rapid intensification. By definition, rapid intensification is a rare event since it requires exceeding the 90th percentile of tropical cyclone intensification within 24 h. Complicating the problem of rarity is the lack of observational data of the internal structure of the tropical cyclones. In order to observe the internal structure of the tropical cyclone, you either need a fly-through by aircraft reconnaissance (rare and only provides a limited view of the overall storm structure over a few hours)43 or to have observations from an orbiting microwave sensor that can peer inside the clouds and observe the full structure of the storm. However, the microwave sensors are on low earth orbiting satellites instead of geostationary, so their passes are quick and infrequent44,45.

We have created a deep learning microwave sensor that transforms geostationary satellite imagery into a simulated microwave sensor, specifically for the 89-GHz channel46. This approach (Fig. 2) allows us to take advantage of the temporal and spatial resolution of the current suite of geostationary satellites to study the evolution of the internal structure of tropical cyclones through simulated microwave imagery. In particular the simulated microwave sensor allows for the study of processes involved in rapid changes of tropical cyclone structure and intensity, which have traditionally been missed by infrequent overpasses of low earth orbiting satellites with microwave sensors. While caution is needed in the interpretation of results based on uncertainties in the simulated microwave imagery, the generated archive of tropical cyclone overpasses is allowing for studies using both traditional statistical and AI/ML methods into these rare and data-sparse events. This new data that will also facilitate discovery of the coupled atmospheric-ocean processes that lead to rapid intensification.

Fig. 2: Use of simulated microwave dataset to study tropical cyclone structure.
figure 2

AI/ML models are trained with microwave imagery from low earth orbiting satellites as truth to use geostationary satellite input to generate simulated microwave imagery. This simulated microwave imagery is then used as a dataset to study tropical cyclone structure and rapid intensification.

Another example that we have focused on in AI2ES are tornadoes. Tornadoes are rare and they are also under-sensed in that existing instruments such as radar physically cannot provide a full 3-dimensional picture of the atmosphere. While studying such observational data is a critical step in understanding tornadogenesis, or why one storm generates a tornado when a similar storm does not, understanding the full behavior of the atmosphere requires creating idealized simulations. These simulations provide a complete picture of the atmosphere but they have far too much data for a human to analyze. AI techniques enable us to look for patterns in large-scale data sets, such as idealized simulations of tornadoes.

Convergence practices in AI2ES

Convergence research is an ongoing, evolving process that functions differently during different phases of a research team’s tenure and research efforts. Sundstrom et al.19 characterize this process as cycling between transcendent and focused phases of research, in which the former involves expansive ideating about important research problems to address and the latter involves more refined approaches to pursue specific research threads. This cyclic approach that tacks between larger- and finer-scale science while also building on an evolving foundation of research efforts is particularly valuable in the AI domain, in which the landscape of AI awareness, possibilities, and progress is rapidly evolving. Indeed, convergence research approaches are considered to be particularly well-suited to nonstationary systems19, as the complex, nonlinear aspects of AI for Earth systems science represents.

Equally important to the pressing, innovative science problems AI2ES is addressing, however, are the practices and values by which we work to facilitate convergence. Convergence requires leadership that convenes and enables full intellectual and equally valued participation from diverse and often disparate disciplines. It requires significant investments of time: time for relationship-building among team members; time for exchanging, translating, blending, and connecting ideas; and time for synthesizing co-produced knowledge. Convergence research also requires patience and persistence, given the multitude of challenges that inevitably arise. These and other requirements have been documented by scholars such as Peek et al.21, Morss et al.23, and Sundstrom et al.19.

In AI2ES, we operationalize these essential practices and values in a number of ways. We hold biweekly meetings of our leadership team of 15 people comprised of the Institute PI, co-PIs, focus area leads, private and government sector lead collaborators, and our external evaluator. In these leadership meetings, a culture is fostered that values convergence research, each other, and all AI2ES personnel. We also hold biweekly site-wide meetings to reach all of AI2ES and external colleagues, to foster inclusion and broad reach and to share the diversity of science that is emerging from and relevant to AI2ES. At the ”working level”23, we have multiple, cross-cutting working groups that are organized around key research and workflow topics of natural hazard use cases. These working groups functionally facilitate transcendent and focused phases of convergence research and practices, in which we collaboratively and iteratively generate and refine research questions, develop and execute research steps, interpret results, and create outputs for dissemination.

Supplementing organized and planned working groups, AI2ES also fosters convergence through emergent working groups and networks among its early career researchers (undergrads, grad students, postdocs, and new research scientists and faculty). These small interdisciplinary and inter-institutional working groups develop initially as mutual support for specific research tasks, but often transition into fruitful environments to ask questions, share experiences, give advice, and transfer knowledge across typically strong disciplinary and institutional divides. These nurturing environments are key for our convergence science, and they are essential for developing the skills and interdisciplinary networks of the next generation of the AI workforce.

We also facilitate convergence through the collaborative creation of a variety of boundary objects47. Examples of boundary objects for AI2ES include a co-developed figure illustrating the types of biases likely to affect AI development for Earth Sciences13,14; a co-developed codebook for rigorous quantitative content analysis of images for hand-labeling precipitation by an interdisciplinary group of scientists as input into an AI model39; and a co-developed structured interview protocol that distills and conveys the main AI model attributes to be evaluated27.

Importantly, AI2ES’s convergence research efforts also are built on a foundation of prior experience from many on the AI2ES team who have participated in and/or led interdisciplinary and convergence research. This reveals another contribution of AI2ES and reflects a broader and increasing need pertaining to natural hazards, AI, and society – that is, of training both the current and future workforce about the needs for, approaches for, and utility of deeply working in highly disciplinarily diverse teams.

Another critical piece of convergence research is education. If we are to take advantage of and deploy more convergent efforts to tackle our timely and wicked challenges48, the training of our current and future scientists must evolve beyond current practices49,50. Current practices at most universities focus on disciplinary knowledge, often to the exclusion of interdisciplinary and convergence science. With the goal of training future convergence researchers, AI2ES trains students, at the undergraduate and graduate level, to work in interdisciplinary research from the start. In order to achieve this, we intentionally create an inclusive culture where each member of the research group knows they are able to ask questions of the other disciplines and are willing to learn. Assembling and nurturing diverse, convergence research teams that involve mentors and postdoctoral or graduate scholars presently requires that the team be relatively large, preferably involving at least some established collaborations. AI2ES is developing a better understanding of which components are essential to foster convergence research teams and nurture future scientists at the intersection of AI, geosciences, and risk communication.

AI2ES also trains the existing workforce. This includes cross-training existing professionals in new skills such as AI but it also includes training those professionals labeled earlier as knowledge brokers or boundary spanners. In AI2ES, we support scientists across multiple career stages toward spanning boundaries while deepening their disciplinary expertise. Learning to span boundaries requires taking a leap of faith for many scientists, which is easier to do when they are in a supportive environment where they can see the success of other such professionals.

Future work

The examples discussed above are illustrative, although not fully encompassing, of the convergence research that AI2ES is conducting and producing. Furthermore, AI2ES efforts are ongoing and, arguably, just beginning.

Looking to the future of trustworthy AI for weather and climate, one of the key new developments in AI recently for weather has been the emergence of global data-driven weather models7,8,9. AI2ES is collaborating with the Cooperative Institute for Research in the Atmosphere at Colorado State University and the NOAA Global Systems Lab on developing research methods to evaluate the performance of these models. Our approach derives from AI2ES convergence research and other human-AI teaming research in the weather and climate domain. The goal is to understand how weather forecasters and other critical users perceive the trustworthiness of purely data-driven models relative to other AI/ML models, as well as to more conventional numerical weather prediction models. For example, taking the forecaster’s perspective entails focusing on evaluating model performance on metrics that are more specific to the meteorological functions and decision-support purposes of such models as opposed to generic metrics such as root mean square error.

Another key part of trustworthy AI is ensuring that the AI is developed and deployed ethically and responsibly. AI2ES is leading an effort on ethical and responsible AI for Earth and environmental sciences. We have first highlighted many of the issues that can arise from improper development and use of AI13 and have recently created a classification system focused on bias in both data and AI models McGovern et al.14. Our current efforts on ethical and responsible AI are centered on demonstrating how to mitigate the various types of bias that we highlighted.

Moving forward, partnerships will be key to ensuring that AI is trustworthy for the full cycle of development, from the foundational research to deployment. Currently private industry is leading the development of the foundation models while government and public entities lead the collection of data51. Academia and government may not have sufficient computational resources to develop large foundation models and keep pace with market forces and the private sector. Very rapid development and proprietary methods in the private sector can complicate partnerships and make it difficult to evaluate AI models, let alone regulate them. Academia has the ability take larger risks in initial research, and government can help engage critical end-users in model co-production. Yet such partnerships can be challenging to develop and sustain, especially in the United States where funding has typically not been set up to facilitate connections and merge efforts across academia, private industry, and governmental organizations. The challenges but importance of these partnerships pose another important opportunity for future convergence research efforts.