Modernizing and designing evaluation frameworks for connected sensor technologies in medicine

Coravos, Andrea; Doerr, Megan; Goldsack, Jennifer; Manta, Christine; Shervey, Mark; Woods, Beau; Wood, William A.

doi:10.1038/s41746-020-0237-3

Download PDF

Perspective
Open access
Published: 13 March 2020

Modernizing and designing evaluation frameworks for connected sensor technologies in medicine

Andrea Coravos ORCID: orcid.org/0000-0001-5379-3540^1,2,3,4,
Megan Doerr⁵,
Jennifer Goldsack³,
Christine Manta^1,3,
Mark Shervey¹,
Beau Woods^4,6,7 &
…
William A. Wood^3,8

npj Digital Medicine volume 3, Article number: 37 (2020) Cite this article

13k Accesses
50 Citations
113 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 02 April 2020

This article has been updated

Abstract

This manuscript is focused on the use of connected sensor technologies, including wearables and other biosensors, for a wide range of health services, such as collecting digital endpoints in clinical trials and remotely monitoring patients in clinical care. The adoption of these technologies poses five risks that currently exceed our abilities to evaluate and secure these products: (1) validation, (2) security practices, (3) data rights and governance, (4) utility and usability; and (5) economic feasibility. In this manuscript we conduct a landscape analysis of emerging evaluation frameworks developed to better manage these risks, broadly in digital health. We then propose a framework specifically for connected sensor technologies. We provide a pragmatic guide for how to put this evaluation framework into practice, taking lessons from concepts in drug and nutrition labels to craft a connected sensor technology label.

Assessment of ownership of smart devices and the acceptability of digital health data sharing

Article Open access 22 February 2024

Towards a digitally connected body for holistic and continuous health insight

Article Open access 05 January 2024

Smart wearable devices in cardiovascular care: where we are and how to move forward

Article 04 March 2021

Introduction

Over the past decade, the adoption of digital technologies in medicine—from electronic health records to wearable sensors—has occurred faster than the healthcare community’s ability to evaluate and secure these products^1,2,3. The fundamental goal of any biomedical product evaluation is to assure that, in the intended context of use, the benefits of deploying the technology outweigh the potential risks to the participant/patient and the organization. As new technologies enter medicine and biomedical research, manufacturers, regulators, clinicians, and patients are relying upon existing regulatory and evaluation frameworks. For example, traditionally consumer-facing companies, such as Apple, are now approaching the US Food and Drug Administration (FDA) as they develop products for clinical settings⁴. However, the risks of deploying these new technologies are inadequately understood and are not protected by what are quickly becoming legacy evaluation frameworks.

Compared to legacy biomedical products, digital technologies have features that change the benefit-risk calculation; therefore, we must adapt evaluation frameworks. In this manuscript, we focus specifically on connected biometric monitoring technologies, which we will refer to as “connected sensor technologies”. Connected sensor technologies are digital medicine products that perform algorithmic processing of data captured by mobile sensors to generate measures of behavioral and/or physiological function. Examples of connected sensor technologies include smartwatches that measure activity, connected monitors that sit on top of mattresses to measure sleep, wireless arm cuffs that measure blood pressure, and microphones that capture vocal biomarkers signaling changes in brain health^5,6,7. Notably, we intentionally use the phrase “connected sensor technology” or “connected product,” and we avoid the phrase “connected device” because “device” is an FDA term of art that refers specifically to cleared medical devices⁸.

In this paper, we first briefly discuss the benefits of connected sensor technologies and dive deeper into the risks these technologies present. Next, we outline the frameworks that are emerging across the industry to evaluate digital health, highlighting their strengths and shortcomings with reference to the connected sensor technologies industry. Finally, building on these emerging frameworks as a guide, we outline a practical guide for evaluating fit-for-purpose connected products across biomedical research and clinical care.

Features of connected sensor technologies and their benefits and risks

Connected products are being rapidly adopted, with the number of wearables worldwide estimated to increase from 325 million in 2016 to 929 million by 2021⁹. The proliferation of wearables, ingestibles, and other connected sensors is making it easier than ever to collect high-quality behavioral and physiological data outside of the clinic⁶. Remotely collected data allow clinicians to discover sights that are more reflective of patients’ day-to-day experiences.

For drug developers, connected sensor technologies can improve efficacy⁶, increase inclusivity¹⁰, and lower the costs of conducting clinical trials¹¹. For clinicians, these products can capture insights that are more reflective of patients’ day-to-day experiences, potentially resulting in major improvements in care delivery.

To capture these potential benefits, risk-benefit analyses are essential to ensure accurate measurement and patient safety in study protocols and clinical care. In the next section we highlight five dimensions that carry risks posed by connected sensor technologies: (1) validation, (2) security practices, (3) data rights and governance, (4) utility and usability; and (5) economic feasibility.

Verification and validation

To determine the appropriateness of using a particular product, a short-cut question that often arises is whether the product is “validated” (e.g., “is this wearable clinically validated?”). Validation carries widely different meanings for different stakeholders. For instance, the pharmaceutical industry may use “validation” as a substitute for “GxP”, which is a generalized abbreviation for “good practice” quality guidelines and regulations. Examples of GxP are Good Clinical Practice (GCP), Good Manufacturing Practice (GMP), and Good Laboratory Practice (GLP). GxP compliance is a set of quality system of management controls which have been developed over the years with and for stakeholders (e.g., clinical trialists, manufacturers, laboratories), and codified into current regulatory regimes.

For others, validation may be more akin to “analytical or clinical validation,” referring to the quality of the measurement coming from the sensor and algorithms that compose the connected sensor technology. Others may also bundle “validation” with “verification and validation” (V&V), quality management procedures that ensure that the system or product meets specifications and that it fulfills its intended purpose (e.g., “software V&V”). Over time, evaluation frameworks that are developed for connected sensor technologies will likely be codified into revised GxP and related quality management systems; however, to develop good practices, the underlying principles must first be established.

To account for the unique hardware, software, and algorithmic properties of connected biometric monitoring technologies (BioMeTs), we recommend the three-stage process of verification, analytical validation and clinical validation (V3) proposed by Goldsack, Coravos, Bakker et al.¹². In this framework:

Verification evaluates and demonstrates the performance of a sensor technology within a BioMeT, and the sample-level data it generates, against a pre-specified set of criteria.
Analytical validation evaluates the performance of the algorithm, and the ability of this component of the BioMeT to measure, detect, or predict physiological or behavioral metrics.
Clinical validation evaluates whether a BioMeT acceptably identifies, measures, or predicts a meaningful clinical, biological, physical, functional state, or experience in the specified population and context of use.

A strong V3 process serves as the foundational evidence base around the accuracy, reliability, and appropriateness of the data and results from connected sensor technologies. Nonetheless, conducting a successful V3 process is challenging for a number of reasons. First, most sensor-based products are comprised of a modular stack of hardware and software components, from sensors to signal processing to algorithms⁶. Each component may be built by a different company, each of which contribute to the product’s overall V3 results. Second, a change earlier in the data supply chain (e.g., at the signal-processing algorithm in the sensor) may alter the data inputs for an algorithm high-up the chain, which may result in an entirely new V3 valuation¹². Put another way, it is challenging to evaluate a connected sensor technology’s data supply chain, the data flow and data provenance for information generated from hardware, sensors, software, and algorithms. Indeed, it requires modifications to the V&V process for wet-lab tests or clinical outcome assessments (COAs) like electronic patient reported outcomes (ePROs).

Security

By definition, connected sensor technologies transfer data over the internet, which introduces immediate risks when deploying these products, because an actor could attack and access the product remotely and often in near-real time. This second dimension on Security risk evaluates unauthorized uses of data and results; the following section on Data Rights and Governance evaluates authorized uses of data and results. Cybersecurity involves protecting internet-connected systems, data, and networks from unauthorized access and attacks, including human error (e.g., the loss of a company’s unencrypted laptop). Notably, some data and system access may be authorized (or perhaps “not forbidden”), though unwelcome or undisclosed to the patient or other stakeholders. This type of access will also be covered in the next section. Although the security of a system cannot be guaranteed, quality design and execution can decrease the risk of harm from code flaws, configuration weaknesses, or other issues. A product’s security risk will need to be continuously re-assessed as new technologies and attack methods become available (e.g., advances in quantum technologies and corresponding quantum-resistant encryption standards).

Data rights and governance

When we consider data rights, we prefer to refer to governance rather than privacy, because we believe it’s more important to empower individuals to choose how to share their date—their rights and governance—rather than defaulting to privacy (e.g., a patient with a rare disease may want more freedom rather than barriers to share her data and results with relevant parties).

Over the past year, many popular tech companies have come under greater scrutiny for how they choose to share data with third-parties. For instance, the Cambridge Analytica incident with Facebook was not an unauthorized use or attack on the Facebook network (e.g., it was not a security incident). Aggregation of data in the ways utilized by Cambridge Analytica was part of Facebook’s feature set, though many have argued this feature was not thoroughly disclosed to all parties. Examples of wide-spread data sharing with inadequate disclosure is also seen in health tech products. Huckvale et al found in a cross-sectional study of 36 top-ranked apps for smoking cessation and depression in public app stores, “29 transmitted data to services provided by Facebook or Google, but only 12 accurately disclosed this in a privacy policy”¹³.

It is important to note that the regulatory environment is far from established when it comes to governing “digital specimens” (e.g., the data generated from connected sensor technologies). With respect to regulation, the FDA has oversight for digital specimen-collecting technologies, like wearables, when they are classified as a medical device. However, due to the narrow definition of device and the revisions with the 21st Century Cures Act, many connected sensor technologies fall outside of the FDA’s purview¹⁴. These narrow frames leave oversight of connected sensor technology functionality and health claims primarily to the Federal Trade Commission, which policies unfair and deceptive trade practices, including enforcing rules against false or misleading advertising¹⁵. In the United States, other agencies like National Institute of Standards and Technology (NIST), Federal Communications Commission (FCC) and Office of the National Coordinator for Health Information Technology (ONC) may each have oversight of components of connected sensor technologies, but no regulator has full responsibility for digital specimens. Given this ambiguous regulatory landscape, end-user license agreements (EULAs) for sensors with downloadable software (e.g., app), terms of service (ToS) for sensors themselves, and privacy policies (PP) have become the de-facto agreements that to retain rights in software and to create rights to monitor, aggregate, and share users’ digital biospecimens (see Box 1)¹⁵.

Box 1. Data rights disclosures: EULAs, ToS and PP intended use cases

Privacy policies (PP) disclose the terms for collection and use of the app/website user’s personal information.

Terms of service (ToS) disclose the rules and requirements of website and/or app use, for example, copyright, allowed uses, and the definition of abusive use.

End-user license agreements (EULAs) are a form of intellectual property licensing that tell people who have purchased software if/how many times they can copy the software and how they can or cannot use those copies.

Usability and utility

Commonly, concepts around verification and validation are confused with “clinical utility”. Clinical utility, defined as the process of evaluating whether the product improves health outcomes or provides useful information about diagnosis, treatment, management, or prevention of a disease, is also necessary to determine fit-for-purpose¹⁶. Clinical utility is typically evaluated by a process of user experience testing. It is common to define a product’s “usefulness” as usability plus utility¹⁷. Put simply, “utility” is whether a product has the features that users need, and “usability” is how easy and pleasant those features are to use¹⁸. If a product has high utility, people are often willing to accept lower usability thresholds. Connected sensor technologies require a web of participants to function successfully across the patient, the clinic/site, and the software-integration. Therefore the usability and utility has to be considered across multiple roles, including but not limited to the individual patient, the clinician/researcher, software engineer and data scientists who are using the product. For instance, the product must be easily understandable for the clinician or researcher to explain why and how to use it, for the patient to put it on and activate the product consistently during the observation period, and for the engineers and data scientists to ingest and analyze the data (e.g., if the product has poorly documented communication protocols or is hard to download/upload data, then the engineering team will struggle to make sense of the data).

Economic feasibility

Compared to drugs, which often use a per-use pricing structure, or a traditional medical device, with a one-time purchase price, connected sensor technologies typically deploy a different business model, such as a subscription or long-term fees around data storage and analysis. These software-as-a-service fees may also cover additional software development, such as developing and shipping cybersecurity patches for software updates. Given that connected sensor technologies may shift their pricing and business models over their lifetime, it may be difficult to calculate a connected sensor technology’s economic feasibility, defined as the degree to which a product’s economic advantages are greater than its costs¹⁹.

Emerging digital health evaluation frameworks

Fortunately, many stakeholders have already started to revise and create improved evaluation frameworks to better understand digital health benefits and risks. In response to new digital technologies flooding the market, the FDA has issued a number of guidances to “encourage innovation and enable efficient and modern regulatory oversight”²⁰, and multiple organizations have proposed improved standards and tools to better understand a technology’s risk-benefit analysis (Fig. 1).

**Fig. 1: Current Evaluation Frameworks for Connected Sensor Technologies.**

Within these emerging frameworks, there are a few themes:

1.
Product versus organizational-level evaluations. First, emerging frameworks contemplate whether the technology should be evaluated at the product-level, organization-level or both. Historically evaluations focused on the product, not on the organization or manufacturer (e.g., FDA judges the quality of a specific drug, and not Pfizer overall). With connected sensor technologies, the broader system needs to be taken into context, because a hardware or software change in one component (e.g., an update to the signal processing algorithm) can impact the system overall. Additionally, because software updates can occur frequently—in some instances multiple times per day—regulating these changes can require a new framework to manage risk. In software development the culture and processes at the organizational-level impact multiple products at once. Organizational-level views can also be better than product-level views when considering data rights and governance, as privacy policies and EULAs are often structured at the organizational-level rather than individual product-level. Therefore, some evaluation frameworks have shifted to consider this organizational view such as the FDA Precertification (Pre-Cert) Program, which evaluates the quality of the organization overall and then provides a “streamlined” review pathway for pre-certified organizations²¹.
2.
Research versus clinical care settings. Second, the same connected sensor technology may be used in either a research setting (e.g., to collect digital endpoint data to support a drug application) or in a clinical setting (e.g., to remotely monitor a patient’s quality of life). An optimal evaluation framework should likely have the same base evaluation for the quality of the connected sensor, and afterwards, adapt the evaluation for different requirements in a clinical and research setting, respectively. This issue is exacerbated when regulatory requirements vary across connected technologies (i.e., some are regulated as medical “devices”, and others are not). The distinction between which digital health technologies are regulated and which are not is still evolving²², and a gray area can be dangerous for public health. Take for instance the controversies surrounding JUUL, a technology that was not well understood when first deployed into the market and now is facing greater regulatory scrutiny²³. In a fast-paced technology world, it’s not only the responsibility of regulators to develop new evaluation models. With additional forethought, we do not have to wait for public crises to enact thoughtful oversight. It is the responsibility of all the parties involved to work towards understanding safe and effective products.
3.
Evaluation scope: software versus hardware review. Given the modularity of connected sensors (e.g., a hardware component, sensors, signal processing algorithms, and apps to display the data), some emerging frameworks look at the whole set of components and others conduct a software-only review. Additionally, this modularity split is also showing up in a regulatory context as the FDA has introduced the concept of a “software as a medical device” (SaMD), which is “software intended to be used for one or more medical purposes that perform these purposes without being part of a hardware medical device”²⁴.
4.
Comprehensiveness. The final theme is that some of the emerging frameworks review all five of the risks posed by connected sensor technologies (validation, security, data rights and governance, utility and usability, and economic feasibility), and others only look at a subset. In the following section, we build off the lessons from the emerging frameworks and propose a pragmatic evaluation criteria to consider when deploying connected sensor technologies in research or clinical care.

Building an evaluation framework for connected sensor technologies

Building on the existing frameworks, we propose a working evaluation framework for connected sensor technologies that reflects the five types of risks identified above (Fig. 2). We constructed this framework using the following principles:

1.
Evaluation criteria should be objective, observable, and verifiable (see Box 2). Objective criteria are clearly and reliably measurable. Observable criteria can be checked independently, without special or privileged access. Verifiable criteria can be demonstrated or refuted with empirical data.
2.
Evaluations need context. “What is the best drug?” or “What is the best food?” are meaningless questions without additional context (e.g., does the person need less sugar in her diet? Or more protein?). Similarly, “what is the best heart-rate monitor?” is an empty question without a clearly articulated context.
3.
Evaluations should be multidimensional (e.g., avoid a single metric “score”). While scoring a food by a single metric such as total calories can be helpful, calorie count in itself is not a way to construct an overall healthy diet. Similarly, we argue that compressing an evaluation of a Fitbit versus an Apple Watch into a single overall score lacks meaningful nuance.
4.
Evaluation components can have required minimum thresholds and optional features that enhance the desirability of the product. Required thresholds of each component may depend on the risk level and context of use.

**Fig. 2: Proposed Evaluation Framework for Fit-For-Purpose Connected Sensor Technologies.**

We propose a systematic and standardized approach to evaluate whether connected biometric monitoring technologies are “fit-for-purpose” across five dimensions:

1.
Verification, analytical validation, and clinical validation (V3)¹²;
2.
Security practices
3.
Data rights and governance
4.
Utility and usability; and
5.
Economic feasibility

The first three dimensions evaluate the data and subsequent results generated by connected biometric monitoring products. The fourth dimension, Utility and Usability, evaluates the ease of implementation and adoption of the product, and the last dimension, Economic Feasibility, evaluates the economic feasibility of adoption.

Notably, excellence in one dimension does not necessarily imply excellence in another. Indeed, significant deficiencies in any one dimension may lead to problems when using a connected sensor technology in research or practice. Thus, we propose a framework to simplify the evaluation process of connected sensor technologies for their intended uses.

Box 2. Objective, observable and verifiable evaluation criteria

Objective That it can be agreed whether the capability is in place - often this leads to binary proofs - it either is or is not, there are no degrees.

Observable That an independent person can know whether the capability exists, without special or privileged access. This characteristic gives public scrutability to the capability.

Verifiable That a capability can be demonstrated or refuted with empirical data.

Source: Adapted from comments on NIST considerations⁵¹.

Key evaluation criteria and metrics

Verification, Analytical Validation, and Clinical Validation (V3)

The following documentation is necessary to determine net benefit using V3 principles:

Verification can look like performance specifications for integrated hardware, overviews of software system tests, or output data specifications¹². Often this information is on the manufacturers’ website and not as a peer-reviewed journal article.
Analytical validation may look like studies that follow Good Clinical Practice (GCP) requirements, and could show up as a regulatory submission (e.g., 510k), white paper, or peer-reviewed journal article.
Clinical validation may look like a clinical study report (CSR), regulatory submission, journal paper or published conference proceedings¹².

Of all forms of documentation, those that make available complete data sets for external review should be weighted most heavily, particularly when machine learning algorithms are used²⁵. Emerging standards exist for the assembly and documentation of datasets²⁶ and the visualization of underlying data quality²⁷. There is also a need for a living database of published studies with these types of data, as efficiently gathering data across the medical literature in each of these areas can be challenging for end-users. The Clinical Trials Transformation Initiative’s (CTTI’s) Interactive Database of Feasibility Studies for Mobile Clinical Trials is a useful and welcome start²⁸. The Digital Medicine Society (DiMe)’s crowdsourced library is another useful resource that lists connected sensor technologies that collected data used to derive digital endpoints for industry-sponsored studies of new medical products or new applications of existing medical products^29,30. More work needs to be done to ensure reproducibility of machine learning algorithms in healthcare³¹, and whether or not publishing either the algorithms or datasets for an independent external review³² would be a constructive way to increase reproducibility and decrease bias¹.

When evaluating V3 for a connected sensor technology, the product should have documentation for each of the three stages. The documentation should align with the intended patient population and context of use. Measures that have been clinically validated in one group of patients cannot be assumed to be valid in another group in which patient or environment characteristics may affect measure performance (e.g., gait in Parkinson’s population may be evaluated differently than in a population with Multiple Sclerosis). As a desired threshold, the measure should be evaluated and published in multiple populations and data sets. Regulatory decisions (e.g., from FDA or EMA) can be distracting or misleading when evaluating a product. For instance, a connected sensor technology’s 510(k) FDA clearance as a device for clinical practice has no impact on whether it would be a suitable product for a drug clinical trial. All the decision indicates is that an external body has reviewed the manufacturer’s marketing claims.

We recognize that many connected sensor technologies may not meet the minimum threshold for verification and validation. For such technologies, we recommend identifying where along the data supply chain the product is missing documentation, and then conducting and/or soliciting research to complete the V3 process.

Sample threshold criteria for the V3 process

Minimum threshold or pass/fail: a proposal or initial white paper for how the product plans to run its V3 process (e.g., what’s been completed and where are the gaps).
Desired threshold: white paper or equivalent data describing elements of the V3 process; depending on the intended use (unless an analytical or clinical validation study is planned), published V3 data within the context of use.
High quality: well-documented V3 specifications; analytic and clinical validation data published in multiple populations and data sets.