Practice and perspectives in the validation of resource management models

Quantitative modelling is commonly used to assist the policy dimension of sustainability problems. Validation is an important step to make models credible and useful. To investigate existing validation viewpoints and approaches, we analyse a broad academic literature and conduct a survey among practitioners. We find that empirical data plays an important role in the validation practice in all main areas of sustainability science. Qualitative and participatory approaches that can enhance usefulness and public reliability are much less visible. Data-oriented validation is prevalent even when models are used for scenario exploration. Usefulness regarding a given task is more important for model developers than for users. As the experience of modellers and users increases, they tend to better acknowledge the decision makers’ demand for clear communication of assumptions and uncertainties. These findings provide a reflection on current validation practices and are expected to facilitate communication at the modelling and decision-making interface.


Supplementary Figure 4. Validation techniques used by the survey respondents in each modelling
area. The figure visualizes the validation techniques used in each modelling area as reported by the respondents. The darker a cell in each row, the more commonly used the corresponding validation technique in this modelling area. Sensitivity analysis is the most commonly used technique in almost all areas, with the exception of demographics, social dynamics and public health. Comparison to historical data is another very common technique, yet it is outranked by sensitivity analysis, especially in areas where data is not expected to be rich due to recent development, such as transport, energy and urban development. Reality checks, in other words testing if the model demonstrates the expected behaviour under certain conditions, is another commonly used technique in all areas. As for the informal techniques such as peer reviews and expert consultations-they are rarely used compared to the formal techniques. Source data of this figure are provided in the Source Data file.

Supplementary Figure 5. Responses to the survey questions on model representativeness and usefulness.
The first two Likert scale questions in the survey asked whether the most important validity criterion is representativeness (Question 1) or usefulness (Question 2). In total, 67% of the respondents agree or strongly agree with Question 1, while the 79% agree or strongly agree with Question 2. This figure illustrates how the responses given to these two questions coincide. The size of circles represent the number of responses given to Question 1 on the x-axis and the number of responses given to Question 2 on the y-axis. For instance, 30 respondents agree with Question 1 and strongly agree with Question 2. As seen in the figure, the majority of respondents who agree with one of these questions agree with the other, too. Therefore, this figure supports the conclusion that practitioners value both the usefulness and representativeness, and a dichotomy does not exist. Source data of this figure are provided in the Source Data file.

Supplementary Figure 6. Responses to the survey questions on model representativeness and usefulness only by model developers.
The responses to Question 2 on usefulness are dependent on the modelling role. Model developers tend to agree with usefulness being the most important validity criterion more than model users. This figure visualizes the absolute number of responses given to this question on usefulness only by model developers, with respect to the question on representativeness. The majority of developers agree or strongly agree with the representativeness question. Very few developers remain neutral or disagree with Question 2 on usefulness, regardless of their responses to the representativeness question. In other words, 88% of model developers agree or strongly agree with usefulness being the most important criterion. Therefore, model developers tend to value usefulness more uniformly than they value representativeness. Source data of this figure are provided in the Source Data file.

Supplementary Figure 7. Responses to the survey questions on model representativeness and usefulness only by model users.
This figure visualizes the absolute number of responses given to the questions on representativeness and usefulness only by model users. Only 48% of users agree with the statement on usefulness (Question 2), while the 52%, who favor representativeness, remain neutral or disagree. Therefore, a lower fraction of model users value usefulness compared to the model developers. The division between representativeness and usefulness is more apparent among the users. Source data of this figure are provided in the Source Data file.

Supplementary Figure 8. Responses to Question 2 on model validation with respect to the modelling role.
This figure illustrates the percentage of responses to Question 2, which is about usefulness as the most important validity criterion, for each modelling role. A higher fraction of model developers than users agree that the most important validity criterion is usefulness. Source data of this figure are provided in the Source Data file.

Supplementary Figure 9. Responses to Question 8 on model validation with respect to experience level.
This figure illustrates the percentage of responses to Question 8, which is about the decision-makers' view on transparency, for each experience level. The respondents with moderate experience tend to disagree that decision makers expect clear communication of uncertainties and assumptions more than the ones with very high and low experience. In other words, more of the respondents with very short or very long experience acknowledge the decision-makers' demand for the communication of critical assumptions and uncertainties. Source data of this figure are provided in the Source Data file.

Supplementary Figure 10. Responses to Question 1 on model validation in the scenario generation context with respect to experience level.
This figure shows the percentage of responses to Question 1 in the scenario generation context, about following a different validation approach, with respect to experience level. A higher percentage of the respondents with medium experience (2-10 years) agree that model validation should not be performed differently when the model purpose is scenario generation. Source data of this figure are provided in the Source Data file.

Supplementary Figure 11. Responses to the survey questions on model structure and output.
The survey questions in the context of scenario exploration included two statements to compare the relative importance of model structure and output in validation. Question 3 stated that model output is more important than the structure, while Question 5 implied a higher importance of the structure. In total, 79% of respondents disagree with the relative importance of model output. This figure shows the number of relative responses given to these two questions. Among the respondents who disagree or strongly disagree with higher importance of the output, the ones who agree with the higher importance of the structure constitute the majority (31 people). Still, a large number of respondents who disagree with the higher importance of model output remain neutral or disagree also with the higher importance of the structure. Therefore, it can be said that the practitioners do not consider model output more important than the structure in the validation of models used for scenario exploration. However, they are equivocal about the structure being more important than the output. The respondents who disagree with the higher importance of the output yet remain neutral or disagree also with the higher importance of the structure may value these two characteristics almost equally, or may be indecisive about their answers. Source data of this figure are provided in the Source Data file. LDA algorithm used in this study to identify the main topics in the validation literature allocates each publication to a topic with a calculated probability. This figure visualizes these topic probabilities, where each line represents a document. The darker this line in the corresponding topics' segment (column), the higher the probability. Having heterogeneity across the columns in these figures indicate that the topics identified by the algorithm are distinct from each other. For instance, Topics 2 and 3 in (b) are the topics labelled as Hydrology and Climate Change and Ecosystems in the scenario-oriented validation publications. Most documents associated with Hydrology and Climate Change with high probabilities have a low association with the Ecosystems topic, and vice versa. This implies that most publications distinctively address these two topics, while the ones that integrate both are few.

Supplementary Tables
Supplementary

Supplementary Table 4. p-values resulting from the tests of independence between the responses to the scenario generation questions and the background factors.
None of the resulting p-values are smaller than 0.05; therefore a dependence conclusion cannot be derived. Only for Question 1 and the experience level, the p-value is smaller than 0.1, therefore we mention it as a potential significant factor. Source data of this