Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • ADVERTISEMENT FEATURE Advertiser retains sole responsibility for the content of this article

Synthetic data to enhance patient privacy

Synthetic clinical models derived from real patient data will protect privacy and accelerate clinical research.Credit: sdecoret/Shutterstock

From discovering the root cause of diseases to offering new treatments and personalized medicine, the field of bioinformatics has enormous potential to change the way healthcare is delivered to improve patient outcomes.

However, this revolution relies on personal medical data collected from patients and volunteers. As the collection, storage, and use of such data increases, so does the risk that the security of sensitive data may be compromised, requiring parallel progress on how such datasets are secured.

Clever abstraction

Multiple research institutes under Singapore’s Agency for Science, Technology and Research (A*STAR) are focusing on various aspects of data security. Scientists from A*STAR’s Institute for Infocomm Research (I2R) and Institute of High Performance Computing (IHPC), for example, are researching fields such as data encryption, blockchain and artificial intelligence as key technologies for securing sensitive data.

A*STAR’s Bioinformatics Institute (BII), on the other hand, is exploring a different and innovative solution to the particular challenge posed by sensitive healthcare data by converting real-world patient data into ‘synthetic’ clinical datasets.

“Synthetic data is safe because it's an abstraction that retains clinical information but is devoid of the exact data that identifies an individual,” says Sebastian Maurer-Stroh, the institute’s executive director. “We need synthetic data because privacy concerns mean we often have difficulty in making impactful analysis, especially when it comes to comparing cohorts or clinical data.”

Clinical bioinformatics research often requires very large datasets of patient/volunteer health and DNA data to match genetic variations to the onset of diseases and patient responses to treatment.

Sensitive healthcare data must be secured with strong access controls to protect patient privacy.Credit: Khakimullin Aleksandr/Shutterstock

“To protect privacy, extensive security processes are put in place to control access to patient data,” says Maurer-Stroh. “An approach that overcomes this need, such as synthetic data, could therefore accelerate scientific discoveries and development.”

To be useful, synthetic data must replicate the key features of actual patient information. Wong Wing Cheong, senior principal scientist at A*STAR’s BII, says this can be done through statistical analyses that map important relationships within the data.

“Essentially we are replicating the data by representing it in a model,” Wong says.

Rapid access

Researchers and companies can then access the information quickly — without the usual high-security measures — for preliminary investigations that can be followed up later, if warranted, using real patient data.

“This can dramatically reduce the steps and effort required for clinical investigations,” Wong adds.

A*STAR’s BII, which undertakes a range of research for the biomedical sector, is also talking to regulators about how synthetic data could be incorporated into the clinical information used to assess and approve new drugs and treatments. The team is now working with government agencies to draft a guide for the generation and use of synthetic data, as well as setting out the considerations, risks, and policy positions.

To move quickly, the institute is inviting researchers and privacy experts from around the world to join this initiative to accelerate biomedical research through the development of synthetic health data.

“Our goal,” Maurer-Stroh says, “is to combine the role of a data custodian with getting value out of the data, to promote collaborative healthcare research while protecting patient privacy.”

For more information visit the A*STAR Bioinformatics Institute.

Related Articles

Search

Quick links