Introduction

The ongoing digital transformation is reshaping various aspects of society, including healthcare systems1. Many countries have transitioned from paper-based to digitalized healthcare systems in recent years, bringing numerous benefits such as improved efficiency, cost reduction, real-time monitoring of health outcomes, and enhanced communication among stakeholders (e.g., physicians and patients)1,2,3,4,5,6. However, digitalization also raises data security and privacy concerns, necessitating robust protection measures and regulations5. Countries must provide the required infrastructure and data security framework to enable digital technologies in healthcare while protecting their population’s most sensitive data7. Additionally, user adoption, sustainability, and ethical design are crucial considerations for successfully implementing digital interventions2,8,9.

The digitalization of healthcare systems presents both opportunities and challenges, highlighting the complexity of this transformation. However, with limited resources, the degree of digitalization varies across national healthcare systems10. Assessing the maturity of a country’s digital healthcare system through objective measurements allows for benchmarking, policy learning, and informed decision-making. It enables policymakers to identify areas that require improvement, prioritize funding for interventions, education, and technical infrastructure, and track the progress of digitalization efforts over time2,11.

The World Health Organization (WHO) and the International Telecommunication Union (ITU) emphasize a comprehensive strategy for assessing electronic health (eHealth) systems, encompassing leadership, governance, implementation, and funding strategies, information-communication-technology (ICT) infrastructure, legal regulations, and digital literacy skills of the workforce and the general population12. Assessing the maturity of a healthcare system as a result of this needs to be conducted holistically and include the above-given dimensions. This approach implies the use of various multidisciplinary indicators.

Over the years, numerous indices have been developed to assess different aspects of digital health systems11,13,14,15,16,17,18,19,20,21,22,23,24,25. However, none of these indices have taken a comprehensive approach to measuring the maturity of digital public health (DiPH) systems as a whole. Instead, they have focused on specific areas such as national ICT infrastructure12,14,26,27, legal regulations, political support1,11,12,27, social acceptance28, or implementing interventions within healthcare systems11,12,27. Notably, there is a lack of indices that specifically address DiPH.

To comprehensively evaluate the maturity of DiPH systems, a holistic approach is essential beyond assessing eHealth systems alone, as the WHO & ITU proposed12. DiPH encompasses health promotion, disease prevention, and population health surveillance29, expanding eHealth’s focus on digitalizing healthcare30. Consequently, DiPH tools are upscaled interventions targeting groups or entire populations to enhance user health. They encompass eHealth tools alongside additional services, tools, or devices for health promotion and primary prevention31 (see the definitions in Box 1).

To address the lack of a comprehensive and holistic approach, our study aimed to establish international consensus on quality indicators for assessing the maturity of national DiPH systems. Drawing on existing validated indices and WHO recommendations12, we identified four areas to collect and categorize maturity indicators:

  1. 1.

    ICT: Examines the necessary ICT infrastructure requirements for integrating DiPH tools into routine care and health promotion programs nationally.

  2. 2.

    Legal: Focuses on political support, legal regulations, and data protection measures for the nationwide implementation and use of DiPH tools.

  3. 3.

    Social: Assesses the general public’s collective willingness and capacity to effectively utilize DiPH tools in routine care and health promotion efforts.

  4. 4.

    Application: Explores the adoption and utilization of DiPH tools within the national healthcare system by government entities or public institutions such as compulsory health insurance.

Our study aims to accomplish three primary objectives. Firstly, we sought to establish consensus on which interventions, technologies, and tools (referred to as DiPH tools) align with the definition of (digital) public health and should be included in the assessment (refer to Textbox 1). Secondly, we aimed to reach a consensus on quality indicators that effectively measure the maturity of DiPH systems in a manner adaptable to diverse national contexts. Finally, we examined how the proposed indicators fit with existing assessment tools used to analyze the maturity of individual aspects of our research topic11,13,14,15,16,17,18,19,20,21,22,23,24,25.

To address our study goals, we employed a Delphi study, a well-established method for achieving expert consensus on multidisciplinary and complex topics32,33,34,35. This technique is frequently applied for research in social science36 for topics with limited knowledge or uncertainty37,38. Compared to focus group discussions, the Delphi technique allows every participant to express their opinions equally without risking individual experts dominating the debate and with the protection of anonymity to express their thoughts freely, reducing the risk of social desirability bias36. Additionally, as participants receive feedback on the overall voting behavior after each round, they can change their perspective straightforwardly39,40,41. We are, therefore, confident that the Delphi methodology fits our research goals due to the uncertainties in holistic DiPH-system maturity assessment, pragmatic reasons (as experts can participate cost-efficiently from all around the globe), and its ability to obtain consensus among participants on multidisciplinary and complex topics.”

Results

During the pre-survey, 87 experts expressed their written interest in contributing to our Delphi study, of which 82 met the inclusion criteria. Eventually, 54 specialists actively participated in at least one of the three official Delphi rounds study by providing and ranking quality indicators. The recruitment flow and study cohort size per round are displayed in Fig. 1. Of the 54 experts, 40 participated in the first round (74%), 47 in the second round (87%), and 41 contributed to the third and final round (76%).

Fig. 1: Recruitment flow and final study cohort size.
figure 1

In total, 87 experts registered for the Delphi study, of which 82 met the inclusion criteria. Of these, 54 participated in at least one of the three Delphi panels with 40 to 47 experts per survey round.

The characteristics and demographics of the 82 experts who met our inclusion criteria and those who participated in the official three Delphi rounds are displayed in Table 1. Amongst the experts with the professional background summarized as “Other” are experts (one each) from health economics, ICT, nursing science, pharmacy, policy analytics, politics, science laboratory technology, and urban planning, displaying a variety of different fields contributing to the study. From a geographical perspective, most experts came from Germany (27 registered, 16 participated in the first round, 17 in the second, and 14 in the third round). They were followed by six registered experts from Portugal of which 4 participated in each Delphi panel (accompanied by experts from Belgium, Croatia, Finland, France, Greece, Italy, Malta, the Netherlands, Norway, Sweden, Switzerland, and the United Kingdom). For Africa and South America, one expert each from Ethiopia, Nigeria, Uganda, Brazil, and Ecuador registered, however only the experts from Ethiopia and Brazil contributed to the study during rounds two and three. For the Australasian region, three experts from Australia registered, accompanied by experts from Sri Lanka, Turkey, the United Arab Emirates, Kazakhstan, and the Philippines. For North America, three Canadians, and one expert from the United States of America registered.

Table 1 Panel characteristics

Our international and multidisciplinary Delphi study concluded with 96 indicators (21 for ICT, 31 for Legal, 29 for Social, and 15 for Application) and 25 DiPH tools. After the third round, the indicators and tools were grouped into 22 clusters among the four sub-domains (see Fig. 2). An overview of all indicators and tools, including participation rate and overall rating per indicator, is listed in Supplementary Information 1.

Fig. 2: Clusters per sub-dimension after round 3, sorted by size.
figure 2

The finally agreed upon 96 indicators were clustered across 18 clusters. The 25 digital public health tools can be sorted into four clusters. An overview of all indicators proposed during the progress including their agreement rate is available in the Supplementary Information.

Representation of proposed indicators among other indices

We assessed how many proposed indicators were already included in other assessment tools and indicator lists. Overall, 48% of the indicators were covered by at least one published and validated index or indicator list11,13,14,15,16,17,18,19,20,21,22,23,24,25. However, our analysis has shown that particularly those indicators related to general ICT infrastructure, cybersecurity, and the regulation and use of health data were already sufficiently covered among existing indices. Here, we observed covering rates of 57% (for ICT) and 55% (for Legal). Conversely, we identified a lack of validated indicators specifically targeting DiPH-service implementation (40% inclusion) and measuring population interest and capability nationally (only 38% coverage).

Indicators on information-communication-technology requirements

ICT-system maturity is crucial for the success of a national DiPH service rollout. Without a sufficient broadband network in the country (especially in rural areas), potential users might be unable to access the system. This digital divide could potentially increase health inequalities42,43 and is, therefore, essential when assessing the maturity degree of DiPH systems as a core requirement. Our participants also proposed multiple indicators to determine the general ICT-system maturity (like prices for and availability of broadband internet connections) and financial support to improve the sector. Additionally, the experts emphasized the need for indicators that bring together the healthcare and the ICT sectors through indicators on physician offices with Internet connections, training in DiPH, or the use of electronic documentation systems in hospitals. All ICT indicators with at least 70% importance agreement during R3 are displayed in Table 2.

Table 2 ICT indicators that reached the needed agreement in the third panel

Indicators on the legal regulation and political support

Several validated assessment tools for digital health regulation exist11,15,17. However, currently none for DiPH. Nevertheless, as digital health (focusing on personalized healthcare) is a sub-dimension of the more holistic DiPH44, we argue that the validated digital health regulation indicators might be applicable to some aspects of the developed DiPH indicators. Unsurprisingly, most indicators proposed through the panel assess health data’s access, exchange, and security, which were also of interest for various validated indices (see Table 3). The global increase in the creation and use of health-related data holds promise for evidence-based and data-driven DiPH programs. However, poor data protection regulations pose a risk to individual users of data breaches or misuse. Therefore, countries must implement strong data governance structures and offer political support to protect these sensitive data from being misused45. This importance is mirrored in the expert agreement for the need for a legal framework in DiPH data exchange and regulations for accessing health data through EHRs (both received 100% agreement).

Table 3 Legal indicators that reached the needed agreement in the third panel

Indicators on the social willingness and capability to use DiPH tools

The sub-dimension Social willingness and capability to use DiPH tools in healthcare and health promotion had the lowest share of indicators covered among already existing validated assessment tools (only 38%). Adding to this observation, no proposed indicator received over 94% agreement, pointing towards a potentially lower consensus of experts in this field than ICT requirements or needed legal regulations and potentially bigger research gaps. The proposed indicators are displayed in Table 4. Most of the indicators focused on the users of DiPH tools. Still, the panel also deemed more general indicators on digital and health literacy and the use of mobile devices and the Internet as crucial for assessing the capability of a population to use DiPH tools.

Table 4 Social indicators that reached the needed agreement in the third panel

Indicators on the application degree of digital public health tools

Although various validated indices cover the aspects of the service’s implementation degree or secondary use of health data11,15,17 our participants consented that additional indicators are needed, especially in evaluating the implementation and access to the DiPH service. Table 5 displays the final distribution among the four clusters. All indicators focusing on the secondary use of health data were included in already developed assessment tools measuring digital health service maturity. Nevertheless, one needs to remember that digital health maturity tools might differ in their requirements compared to DiPH maturity assessment models.

Table 5 Application indicators that reached the needed agreement in the third panel

Digital public health tools

25 DiPH tools were named and agreed upon as DiPH tools. During R3, wearables and sensors received the lowest rating (65% each), whereas electronic registries (e.g., for vaccination) scored highest with 100% agreement on suitability. Figure 3 summarizes the agreement change for somewhat and very suitable between all three rounds. A blue line displays an increase in agreement. In contrast, a dashed red line shows a decline in agreement (however, the tool still received at least 70% agreement), and a thick red line explains which tools received below 70% agreement and were, therefore, excluded. Tools in the black box in the middle column were introduced during the second panel (R2) and only ranked in the final round (R3). Although the position change among the columns might appear drastically, the average share of agreement among those tools with at least 70% agreement did not change severely between R2 (87%) and R3 (84%).

Fig. 3: Change in ranking for DiPH tools according to expert agreement.
figure 3

The tools are highlighted according to their cluster. Yellow Digital alternatives to traditional public health tools, Orange Mobile health tools, Red Information or education tools, Brown Infrastructure tools. In round 1, tools were only proposed and the first rating happened in round 2. Here, all tools with less than 70% agreement were excluded (red lines leading to red box; same for round 2 to round 3). Newly added tools, first mentioned in round 2, are displayed in the black box below the original tools from round 1. As these tools were only rated once (from round 2 to 3), the connecting line from round 2 to round 3 is black for all tools with at least 70% agreement. Whereas tools which were rated in round 2 already but received a lower rating in the third round (with still at least 70%) are connected through a red dashed line. On the other hand, tools with increased agreement in round 3 compared to round 2 are connected through a blue line.

Discussion

This Delphi study aimed to comprehensively assess the maturity of national DiPH systems by collecting 96 indicators aligned with WHO and ITU recommendations12. Interestingly, only a minority (48%) of these indicators were already covered by existing validated indices. This discrepancy may be due to our focus on DiPH maturity, which encompasses population-oriented services for health promotion, surveillance, monitoring, and research44,46,47, in contrast to the currently more dominantly recognized and evaluated perspective of digital health that primarily emphasizes individual patient healthcare, treatment, and personalized medicine48. Consequently, the assessment requirements might differ, impacting the implementation of specific tools or services, required ICT infrastructure, legal regulations, and societal willingness and capacity to utilize such tools.

While the Delphi process sought consensus among experts from diverse regions to develop indicators applicable in multiple settings and healthcare systems, cultural and geographical differences likely influenced the consensus-building process. Our subgroup analysis (see Supplementary Information 2) revealed varying votes among participants from different regions, particularly concerning DiPH tools. For instance, during R3, experts from Germany advocated towards excluding business intelligence and intelligent care homes as DiPH tools, while experts from other regions supported their inclusion. Conversely, German experts voted to include risk models, solutions for transferring digital measures to regional care, data visualization tools, and wearables as DiPH tools, which were deemed unsuitable by other experts.

These discrepancies may arise from divergent interpretations of public health itself. In Germany, public health is deeply rooted in social medicine, prioritizing health promotion and primary prevention. The definition by Winslow in 1920 characterizes public health as “the science and art of preventing disease, prolonging life and promoting physical health and efficiency through organized community efforts […]”49. Consequently, German public health experts are more inclined to view interventions as DiPH tools if they target primary prevention, health promotion, or population health surveillance goals (e.g., wearables, risk models, or data visualization tools)50.

However, public health perspectives in other countries may differ. Alternatively, one could argue that health and public health interventions are a public good51,52. As long as digital interventions for health purposes are accessible to the user group without charge (e.g., covered by the state or compulsory health insurance), they could be considered DiPH tools. This broader approach could encompass system services like telemedicine but exclude wearables, as users typically bear the cost of smartphones or smartwatches (as supported by the multinational experts’ general agreement in the Delphi study). These varying ratings are closely associated with differing understandings of public health and DiPH and need to be reconsidered when developing assessment tools for multinational settings.

One of the key strengths of our study lies in its robust and inclusive design. We implemented a comprehensive, multi-pronged, and international recruitment strategy, contacting experts through various channels. This approach yielded in 82 participating experts from 27 countries across six continents, representing a diverse range of scientific fields. This range of expertise, combined with the Delphi method for consensus-building, enabled us to develop a comprehensive set of indicators of internationally agreed importance that can be used to measure the maturity of DiPH systems in different settings and healthcare systems.

The decrease in agreement regarding the importance of vaguely worded indicators demonstrates our study’s methodological effectiveness. This decline indicates that participants understood and acted upon our feedback after each round. Including probes in ranking such indicators further facilitated the convergence and precision necessary for indicator formulation and inclusion in our study. Last, the achieved response rates reinforce the strength and suitability of our chosen methodology. We observed strong interest and commitment from the participating experts among those who participated at least once in our study (54 of the registered 82 experts contributed to at least one round, and 30 experts participated in all three surveys). This level of engagement highlights the significance of our research and demonstrates the experts’ dedication to shaping the future of DiPH systems. Nevertheless, although 66% of all registered experts participated in at least one round (54/82), shedding light on their sound commitment, attrition during the process can also be considered a limitation. We did not ask for indicators during the pre-survey so that vital contributions may have been lost. However, we can only speculate on the motives why experts did not take part in subsequent rounds since we did not receive any additional comments from them.

In terms of limitations, it should be noted that we cannot guarantee that all participants used the “I cannot rate this indicator due to lacking expertise” option accurately when assessing the indicators and DiPH tools. However, the overall rating for poorly worded indicators decreased over the study period. Further, it is essential to acknowledge that the 70% threshold criterion for consensus on indicator importance was chosen arbitrarily despite incorporating best practice guidelines and reviewing other Delphi approaches for guidance2,53,54. Another potential limitation is the presence of language bias since the study was conducted in English, requiring participants to submit indicators in English. This language requirement may have excluded experts without English proficiency and could have led to misunderstandings of developed indicators or our instructions. Nonetheless, considering the inclusion of experts from diverse continents, we remain confident that the identified indicators, with appropriate translation, have the potential to be accepted and applied in non-English-speaking settings.

Additionally, we did not ask registered experts how they became aware of our study. Therefore, it remains unclear how many participants registered after being invited by colleagues and friends (snowballing recruitment) after seeing the call for participation on social media or reading about our study in their organization’s or association’s newsletter.

We recognize that our study panel has an overrepresentation of experts from Europe, more precisely Germany, and those with a background in public health. However, through sub-group and sensitivity analysis (see Supplementary Information 3), we demonstrated that significant differences in R2 occurred only for comparing European experts and experts from other continents in the sub-domains ICT and DiPH Tools. As we did not observe any significant differences for the overrepresented group of public health or German experts during R2, the conflicts for the European sub-group seem to stem from the experts from other European countries. The differentiation in DiPH Tools might hint at a varying understanding of public health, as discussed earlier. Overall, the analysis supports our assumption that the selection of indicators during R2 was not influenced by an academic background in public health as represented in the study and, therefore, did not influence the indicators included at the beginning of R3.

We put substantial effort into recruiting experts from all continents, e.g., by repeatedly contacting major organizations and directly contacting published authors in the overall field. Including more expertise from these countries is clearly desirable, nevertheless, our results provide broad insights into digital maturity assessment that certainly carries meaning beyond the countries currently included.

Finally, we point out that there is a lack of standardized procedures regarding the inclusion of non-participating experts in later Delphi rounds. While some Delphi studies refrain from including experts in the continuous iteration process who have not contributed to previous Delphi studies55,56, we have invited all experts for each round who registered during the pre-survey regardless of their prior contribution. We decided for this approach not to reduce our cohort size further. While this method was in line with other Delphi studies57,58, it might have raised issues in the iteration process as previously non-participating experts might have influenced the overall voting result of the cohort. However, our balanced panel analysis (see Supplementary Information 3) displayed no statistically significant differences in overall voting among the 30 experts who contributed to all rounds and the general participants in R2 and R3. Therefore, we are confident that our approach did not negatively impact the iteration process of our Delphi study.

Future research needs to be conducted to investigate how data can be effectively collected for the proposed indicators and to assess their suitability in comprehensively evaluating the maturity of DiPH systems across diverse cultural and geographical contexts. Furthermore, conducting regression analysis using real-world data to explore potential correlations among the individual indicators is crucial. We can gain valuable insights contributing to evidence-based decision-making and assessments by exploring these relationships.

We are planning to use the proposed DiPH indicators together with validated digital health, ICT, regulation, and sociological indicators to form a measurement tool that will assess the national maturity of DiPH systems according to the WHO toolkit: The Digital Public Health Maturity Index (DIPHMI). Its potential to inform policy decisions and improve resource allocation will make it a valuable tool for policymakers and stakeholders.

Another area requiring further research is the ever-evolving nature of DiPH: While some established tools and services, such as telehealth and health apps, have been identified, the integration of emerging technologies like blockchain, big data analytics, and artificial intelligence remains unclear. Understanding their contributions and identifying the specific needs they may pose to DiPH systems necessitates ongoing exploration.

Lastly, we want to point out that our results are purely based on expert opinion. However, due to the complexity and interdisciplinarity of the topic, it is also crucial to understand the needs of practitioners and representatives from the general population (such as patient representatives). Participatory approaches and citizen science can lead to an increased research capacity, better knowledge, and benefits for citizens59. Adding lay and traditional knowledge to scientific data can lead to a more effective response to complex problems or topics, such as measuring the maturity of DiPH systems. Especially for the social component regarding the willingness and capability of practitioners and laypeople to use DiPH tools in their routine healthcare and health promotion, it will be crucial to include representatives from these groups. This is why we encourage future research on the topic for more inclusive approaches that also include representatives from groups other than scientific experts.

In conclusion, the collaborative efforts of our multidisciplinary and multinational Delphi panel have culminated in a remarkable list of 96 indicators to be considered when assessing a national DiPH system’s maturity. This study holds immense promise, as its findings will resonate with a wide range of stakeholders, including public health authorities, governments, researchers, and industry professionals. The relevance of our research extends far beyond academia, creating a ripple effect that will positively impact the international public health landscape. By fostering a global vision for a comprehensive evaluation of DiPH systems, our consensus study serves as a first step towards international policy learning, benchmarking, and an improved allocation of limited resources in DiPH systems worldwide. By embracing the insights gleaned from our study, policymakers, researchers, and practitioners will be empowered to strengthen their digital infrastructure, enhance collaboration, and ultimately improve the health and well-being of populations on a global scale.

Methods

Structure of the Delphi study

The general structure of our Delphi is displayed in Fig. 4. This Delphi study consisted of one pre-survey (R0) to assess the eligibility of experts for participation as well as their socio-demographic and educational information. Further, all participants were electronically provided with data processing and protection information during R0. All experts provided electronic informed consent during R0 after being provided with sufficient electronic information to make an informed decision as to whether they want to take part in our study. As there is no explicit agreement on how many assessment cycles (survey rounds) are needed in a Delphi study60,61, this study consisted of three online panel rounds following R0 (R1–R3). All rounds were conducted through online questionnaires on the commercial platform QuestionPro (QuestionPro GmbH, Berlin, Germany) and piloted by persons with expertise in DiPH who did not belong to the research team. During each survey period, participants who registered in R0 but did not participate in the current survey round were reminded weekly via email to contribute.

Fig. 4: Structure of the Delphi study.
figure 4

The Delphi study was structured in a pre-survey to register interested experts, check for inclusion criteria, and assess interest in contributing to the four overarching domains. This round was followed by three official Delphi panels in which indicators and tools were proposed (round 1 and 2), rephrased (round 2), and rated on a four-point Likert scale regarding their importance for digital public health (round 2 and 3). Indicators and tools with at least 70% agreement on their importance (measured as the share of 3/4 and 4/4 votes on the scale) were included for the upcoming round.

During the first round (R1) from 16 May to 6 June 2022, all panelists were provided with the definition of public health, and DiPH used for this study (see Textbox 1). Participants then selected at least one of the four areas of interest and provided indicators to measure the maturity of DiPH systems from this perspective. Following the grounded theory approach by Glaser & Strauss62 we did not present existing indicators during R1. For research on interdisciplinary topics, it is crucial not to channel responses too much in advance but rather to be as open as possible to give representatives from different disciplines space to express their opinions63. Although presenting such criteria already for R1 might have led the participants in a desired direction and would have resulted in better comparisons for rating behavior, the disadvantages of this approach were seen as more prominent: Providing a list of developed indicators with the option to name additional ones might create a bias in answer categories and a loss of spontaneous response (as participants might not use the “other” option)64,65.

The clustered items from R1 were presented to the participants in R2 from 13 June to 27 June 2022. Experts were asked to rate the indicators and tools on a four-point Likert scale (unimportant to very important). Additionally, participants could select “I cannot rate this indicator due to a lack of expertise” if they felt an indicator was beyond their scope of expertise. The consensus was defined a priori if the item or tool was rated between 3–4 (important to very important) by at least 70% of the participants. This approach is commonly used in Delphi studies and suggested in gold-standard guidelines54,66,67,68,69,70. Further, we encouraged panelists to propose additional indicators or tools and rephrase them if necessary.

For R3 from 8 July to 15 September 2022, all indicators and tools that had received at least 70% consensus or were offered as alternatives in R2 were displayed for a final rating (same approach as in R2). This time, however, participants were unable to provide alternative formulations or comments.

All registered experts who met the inclusion criteria were invited to contribute to each Delphi round, even if they had not participated in the previous panel. Although such approaches might risk an attrition bias, we decided that this risk does not outweigh having a sufficient number of at least 15 experts contributing to each of the four domains during every panel. We conducted a balanced panel analysis to test if the ratings from experts who contributed to all three rounds differed significantly from the overall voting. This was not the case as displayed in Supplementary Information 3.

We did not apply strict exclusion rules as we expected that experts could meaningfully contribute to the further discussion even if they had missed one round (of all actively contributing experts, 30 participated in all three rounds). The rationale behind this approach was to not limit our cohort size further. Additionally, some Delphi studies start with already a pre-defined set of variables for the participants to rate. Due to this, we did not find it problematic that 10 participants only contributed to rounds 2 and/or 3 exclusively. Therefore, experts were encouraged to contribute to the following panels even when they did not contribute to the previous one.

Panelists

In total, 346 experts were identified and contacted by the authors via email based on their position as an editor for an internationally published and peer-reviewed digital health or DiPH health journal (n = 183), as contact persons for digital health or DiPH institutions, associations, or networks (n = 73), and based on relevant publications or teaching relevant digital health or DiPH classes at universities (n = 163). The email contained information regarding why the study is conducted, how it will be conducted, why the experts were selected for participation, and a link to the questionnaire platform. During the study, we encouraged participants to contribute to every round, and sent multiple reminders to increase the participation rate. These approaches were aimed to minimize attrition bias and equalize recruitment by geographical region and field of expertise. The study was also advertised on social media (Twitter and LinkedIn). Further, we applied a snowball sampling method71 and asked contacted experts to share the invitation in their professional network.

Due to the lack of standards regarding the ideal number of experts per round60, we followed the RAND/UCLA Appropriateness Method User’s Manual66, which suggests a panel with 7 to 15 experts. We calculated a priori with a 50% dropout rate throughout our study. Therefore, in our study protocol, we decided to start the official Delphi study with R1 once at least 30 experts per domain who met the inclusion criteria confirmed their interest in participation in R0. We followed a criterion sampling strategy as displayed in Table 6. Although heterogeneous panels tend to find consensus slower than homogeneous groups39,60,72,73, we deemed this approach necessary to reflect the interdisciplinary nature of the topic and, therefore, invited experts with diverse backgrounds in geography and scientific disciplines. Our method is supported by other Delphi studies where a panel needed to achieve consensus on a broader and more diverse topic39,72. This is also the case for holistically assessing the maturity of DiPH systems.

Table 6 Inclusion criteria

Data collection and analysis

The pre-survey and the official Delphi study were conducted anonymously through online questionnaires on the commercial platform QuestionPro. Further research shows that anonymous approaches in Delphi studies empower participants to present their ideas more freely while reducing the risk of individual panelists dominating the discussion74,75. The survey was piloted by four persons with expertise in DiPH who did not belong to the research team to reduce the risk of misinterpretation of statements and instructions.

We qualitatively assessed and clustered the given indicators and DiPH tools following the thematic analysis approach by Braun & Clarke76, displayed in Fig. 5. The clusters were developed from the empirical data given by the study participants (inductive approach) and merged during the Delphi process when clusters included less than three indicators or tools.

Fig. 5: Data consolidation process and change in indicator and tool numbers during the Delphi study.
figure 5

Initially, 303 indicators and 106 digital public health tools were proposed during round 1. After data cleaning, this resulted in 136 indicators and 32 tools which were presented during round 2. As new indicators and tools were proposed during round 2, 135 indicators and 30 tools were kept for the final round. Here, 96 indicators and 25 tools were agreed upon by the participants to be important for digital public health. Of these indicators, only 48% were covered by already existing indices.

Sub-group and sensitivity analysis

We assumed that different understandings of usable and practical indicators could arise based on national regulations or the scientific background of participating experts. Therefore, we conducted a sub-group analysis to see if the share of “somewhat important” and “very important” replies differed between over-represented sub-groups (namely participants from Germany and experts in public health) and the rest of the participating experts. A conflict was defined if one sub-group showed at least 70% agreement on keeping an indicator (70% of the participants choosing “somewhat important” or “very important”) while the other group had a lower overall rating of somewhat or very important. Building on this evaluation, we conducted a sensitivity analysis to see if the overall share of these ratings for each of the domains was significantly biased by decisions from the two sub-groups. After testing for normal distribution through the Kolmogorov-Smirnov test, we conducted a two-sided Gaussian test with alpha 5%. The results of this analysis are displayed in Supplementary Information 2, 3.

Ethical approval

This Delphi study sought to identify essential indicators to map digital health system maturity. We did not intend to gather any personal information (beyond basic socioeconomic data for group characterization) from the international participants. All interested experts were provided with written electronic information regarding the study design, aim, data protection, and data processing. We actively collected electronic written consent as soon as invited panelists indicated their willingness to take part. As in other Delphi studies with experts where the core aim is to work towards consensus (e.g., Greenhalgh et al.57) formal ethics approval was deemed not necessary as explicit informed consent by participants to share their expertise was obtained by the research group.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.