A digital mask to safeguard patient privacy

Yang, Yahan; Lyu, Junfeng; Wang, Ruixin; Wen, Quan; Zhao, Lanqin; Chen, Wenben; Bi, Shaowei; Meng, Jie; Mao, Keli; Xiao, Yu; Liang, Yingying; Zeng, Danqi; Du, Zijing; Wu, Yuxuan; Cui, Tingxin; Liu, Lixue; Iao, Wai Cheng; Li, Xiaoyan; Cheung, Carol Y.; Zhou, Jianhua; Hu, Youjin; Wei, Lai; Lai, Iat Fan; Yu, Xinping; Chen, Jingchang; Wang, Zhonghao; Mao, Zhen; Ye, Huijing; Xiao, Wei; Yang, Huasheng; Huang, Danping; Lin, Xiaoming; Zheng, Wei-shi; Wang, Ruixuan; Yu-Wai-Man, Patrick; Xu, Feng; Dai, Qionghai; Lin, Haotian

doi:10.1038/s41591-022-01966-1

Download PDF

Article
Open access
Published: 15 September 2022

A digital mask to safeguard patient privacy

Yahan Yang¹^na1,
Junfeng Lyu²^na1,
Ruixin Wang¹^na1,
Quan Wen²,
Lanqin Zhao ORCID: orcid.org/0000-0002-5182-3678¹,
Wenben Chen¹,
Shaowei Bi¹,
Jie Meng¹,
Keli Mao¹,
Yu Xiao³,
Yingying Liang³,
Danqi Zeng¹,
Zijing Du³,
Yuxuan Wu¹,
Tingxin Cui¹,
Lixue Liu¹,
Wai Cheng Iao¹,
Xiaoyan Li¹,
Carol Y. Cheung⁴,
Jianhua Zhou⁵,
Youjin Hu ORCID: orcid.org/0000-0002-7758-8733¹,
Lai Wei¹,
Iat Fan Lai⁶,
Xinping Yu¹,
Jingchang Chen ORCID: orcid.org/0000-0002-1378-898X¹,
Zhonghao Wang¹,
Zhen Mao¹,
Huijing Ye¹,
Wei Xiao¹,
Huasheng Yang¹,
Danping Huang¹,
Xiaoming Lin¹,
Wei-shi Zheng⁷,
Ruixuan Wang⁷,
Patrick Yu-Wai-Man ORCID: orcid.org/0000-0001-7847-9320^8,9,10,11^na2,
Feng Xu ORCID: orcid.org/0000-0002-0953-1057^2,12^na2,
Qionghai Dai ORCID: orcid.org/0000-0001-7043-3061^12,13^na2 &
…
Haotian Lin ORCID: orcid.org/0000-0003-4672-9721^1,14,15^na2

Nature Medicine volume 28, pages 1883–1892 (2022)Cite this article

13k Accesses
7 Citations
164 Altmetric
Metrics details

Subjects

Abstract

The storage of facial images in medical records poses privacy risks due to the sensitive nature of the personal biometric information that can be extracted from such images. To minimize these risks, we developed a new technology, called the digital mask (DM), which is based on three-dimensional reconstruction and deep-learning algorithms to irreversibly erase identifiable features, while retaining disease-relevant features needed for diagnosis. In a prospective clinical study to evaluate the technology for diagnosis of ocular conditions, we found very high diagnostic consistency between the use of original and reconstructed facial videos (κ ≥ 0.845 for strabismus, ptosis and nystagmus, and κ = 0.801 for thyroid-associated orbitopathy) and comparable diagnostic accuracy (P ≥ 0.131 for all ocular conditions tested) was observed. Identity removal validation using multiple-choice questions showed that compared to image cropping, the DM could much more effectively remove identity attributes from facial images. We further confirmed the ability of the DM to evade recognition systems using artificial intelligence-powered re-identification algorithms. Moreover, use of the DM increased the willingness of patients with ocular conditions to provide their facial images as health information during medical treatment. These results indicate the potential of the DM algorithm to protect the privacy of patients’ facial images in an era of rapid adoption of digital health technologies.

Evaluation of a computer-based facial dysmorphology analysis algorithm (Face2Gene) using standardized textbook photos

Article 30 April 2021

Matthew J. Javitt, Elizabeth A. Vanner, … Ta C. Chang

Early detection of visual impairment in young children using a smartphone-based deep learning system

Article 26 January 2023

Wenben Chen, Ruiyang Li, … Haotian Lin

Automated extraction of clinical measures from videos of oculofacial disorders using machine learning: feasibility, validity and reliability

Article 01 February 2023

Christopher B. Schulz, Holly Clarke, … Swan Kang

Main

Protecting the privacy of patients is central to healthcare delivery and has important ethical and medicolegal ramifications. Privacy protection has attained prominence over the past decade because of digitalization and increasingly widespread sharing of medical records and concerns about data breaches. Previous studies have explored the application of anonymization technologies for medical images. Researchers have proposed eliminating all digital imaging and communications in medicine (DICOM) metadata (such as patient name and sex)¹, with the application of defacing or skull-stripping algorithms to face or skull regions in DICOM images². From a privacy perspective, clinical data involving facial images are especially sensitive, given that facial information clearly contains biometric identifying information. It is therefore imperative to protect the facial information of healthcare users to maintain medical privacy and security; however, facial images aiming to record signs of disease, such as strabismus or nystagmus, inevitably record patients’ race, sex, age, mood and other biometric identifiers. Concerning facial images, common anonymizing methods, including blurring and cropping identifiable areas, may lose important disease-relevant information and they cannot fully evade face recognition systems³. An important challenge is, therefore, to separate biometric identity from medical information that can potentially be derived from facial images.

Additionally, the successful development and utility of digital health technology depends on broad participation in medical data collection and the broad participation of large populations requires trust and protection of privacy⁴; however, digital data studies based on heavy-training image sets have also raised the potential threat of misusing facial recognition technology for unintended and/or unauthorized purposes^5,6. Due to the understandable privacy concerns of individuals, people often hesitate to share their medical data for public medical research or electronic health records, thus largely hindering the development of digital medical care. Therefore, it is necessary to update the traditional procedure used to obtain informed consent at the front end of data collection, particularly by ensuring adequate privacy protection for personal health information and somehow improving the willingness of healthcare users to engage with these emerging digital technologies.

In whole facial images, periocular biometrics is one of the most distinctive subsets of individual biometric information of an individual and it can be used to assist in building robust identity verification systems⁷. Additionally, periocular features are important signs of eye and general health. For example, periocular features, such as deep forehead wrinkles and periorbital wrinkles, are significantly associated with coronary heart disease⁸ and abnormal topological changes in eye dynamics indicate poor visual function and visual cognitive development problems⁹. This study aims to protect the biometric information of patients and focuses on four pathological ocular manifestations, namely, thyroid-associated orbitopathy (TAO), strabismus, ptosis and nystagmus, which involve more than ten abnormal behavioral phenotypes, such as eyelid retraction, overactive or underactive extraocular muscles, horizontal or vertical strabismus, changes in the double eyelid line, poor fixation and compensatory head position.

To extract these disease-relevant features but remove patient identity features from facial images of patients, we developed the DM, a new technology based on real-time three-dimensional (3D) reconstruction and deep-learning algorithms. The DM takes an original video as input and outputs a reconstructed video that contains disease information, while discarding as much of the patient’s identity as possible. The refined eye reconstruction is highlighted. Converting DM-reconstructed videos back to raw videos is impossible because most of the information necessary to recreate the original attributes has been discarded and is no longer present in the set of digital representations that constitute the mask.

To demonstrate the feasibility of the proposed DM approach, we designed a clinical trial (NCT05058599) and evaluated the consistency of the diagnoses of patients with ocular diseases from reconstructed videos and original videos. Identity removal validation was also used to show whether the DM could effectively remove personal biometric attributes. Additionally, we performed an empirical investigation of the receptiveness of patients to applying this new technology to their personal health information. Finally, we conducted an artificial intelligence (AI)-powered reidentification validation to evaluate the performance of the DM in evading recognition systems. The following results show that DM proposes a new approach to safeguarding patient privacy, provides an additional data format for privacy protection and enhances the willingness of patients to share their medical data, thereby benefiting the quickly evolving field of digital health.

Results

The workflow of the DM

In this work, the proposed DM patient privacy protection technology was based on the complementary use of deep learning and 3D reconstruction. Deep learning achieved feature extraction from different facial parts, and 3D reconstruction automatically digitalized the shapes and motions of 3D faces, eyelids and eyeballs based on the extracted facial features (Fig. 1). Different from other face reconstruction methods^{10,11,12,13,14,15,16}, the proposed technology focused on accurate ocular reconstruction, including both shapes and movements.

**Fig. 1: Development of the DM system.**

In 3D reconstruction, we used three predefined parametric models for faces, eyelids and eyeballs. The face model was mathematically a bilinear model¹⁷ (Methods) that represented a 3D face as a shape vector w^fs and a motion vector w^fm. Given a particular w^fs and a w^fm, the bilinear face model can reconstruct a particular 3D face mesh M^f. The face model can represent the overall geometry of the face, but eye regions lack details. Since the eye regions are important for diagnosis, we used a linear eyelid model¹⁸ to represent detailed eye regions. Similar to the face model, given this eyelid model, a detailed eyelid M^e (of one eye) was represented by an eye shape vector w^es and an eye motion vector w^em. To additionally reconstruct the eyeballs, we used the simplified geometry and appearance eyeball model (SGAEM), introduced in our previous study¹⁹. The model approximated eyeballs as spheres and used three parameters, the eyeball radius r_e, the iris radius r_i and the position p_e relative to the face, to represent the static properties of an eyeball and the eyeball rotation in polar coordinates to represent eyeball motion.

Deep-learning techniques were leveraged to extract facial features that were used to infer the aforementioned model parameters to obtain the facial reconstruction results. First, a pretrained neural network was used as a face landmark detector to extract two-dimensional (2D) face landmarks L^face from an input red, green and blue color space (RGB) image. With the landmarks L^face, we estimated the face pose T (rotation and translation), face shape vector w^fs and face motion vector w^fm by minimizing the Euclidean distance between the 2D landmarks L^face and the 2D projections of the corresponding points on the 3D face M^f. Second, an eyelid landmark detector was used to extract 2D eyelid landmarks L^eyelid and an eyelid semantic line detector was used to extract 2D eyelid semantic lines S^eyelid. These two detectors were also neural networks trained by deep-learning techniques. Then, we similarly estimated the eyelid shape vector w^es and the eyelid motion vector w^em by minimizing the Euclidean distance between the 2D landmarks L^eyelid and the projections of the corresponding points on the eyelid mesh M^e, as well as by making the projected points on the semantic lines close to the detected semantic lines S^eyelid on the image. Here, semantic lines provided rich and continuous information on the eyelid area, while landmarks were robust discrete features for tracking eyelid motions. Combining these two types of features made the reconstruction more accurate and stable. Finally, for the eyeballs, we trained another neural network as an iris landmark detector to extract 2D iris landmarks L^iris from the input RGB image. As the eyeball radius r_e, iris radius r_i and relative position p_e were invariant in a video, we predicted them in the first frame and then fixed them in the following video frames. Per-frame eyeball rotations were estimated by minimizing the Euclidean distance between the 2D landmarks L^iris and the projections of the corresponding points on the SGAEM. The DM included optional operations for adapting to different clinical applications, such as dealing with eye occlusion in videos recording the alternate cover test or reconstructing eyebrow movements in diagnosing ocular diseases.

Quantitative evaluation of the DM

The feasibility of the proposed model was evaluated on a video dataset of patients in the clinical trial. From May 2020 to September 2021, 405 participants, 187 (46.2%) males, aged 4 months to 61 years, who agreed to participate in the prospective study at the Digital Mask Program either by themselves or via their legal guidance; the participants consisted of (1) 100 outpatients from strabismus departments; (2) 92 outpatients from pediatric ophthalmology departments; (3) 102 outpatients from TAO departments; and (4) 111 outpatients from oculoplastic departments (Extended Data Table 1). In total, 253 (62.47%) of the 420 patients were diagnosed with ocular diseases on the basis of face-to-face assessments of the patients’ eyes.

To evaluate the applicability of the model, different cameras, including a Nikon 3500, Huawei p30 and Sony 4k, were used for video collection according to the following standards. The whole appearances of participants were collected from a distance ranging from 33 cm to 1 m according to the specific ocular examination. These videos were taken under room illuminance ranging from 300 to 500 lx.

We used the proposed DM to process all the videos and quantitatively evaluated the reconstruction performance of the DM. In the quantitative evaluation, the performance of the DM was measured by the 2D normalized pixel error, with lower numbers indicating better reconstruction performance. We first acquired the Euclidean distance between the landmarks in DM-reconstructed videos and the corresponding landmarks in original videos (Fig. 2a). The pixel errors between landmarks were then normalized by the pixel distance between the centers of the two eyes.

**Fig. 2: Quantitative evaluation of the digital mask.**

For the eyes of 405 patients, the average normalized pixel errors in images of patients with TAO, strabismus, ptosis and nystagmus were 0.85%, 0.81%, 0.82% and 1.00%, respectively, in eyeball reconstruction and 1.52%, 1.24%, 1.52% and 1.61%, respectively, in eyelid reconstruction (Fig. 2b). The heat map of the normalized pixel errors in images of patients with the abovementioned four diseases is shown in Fig. 2c. The normalized pixel errors remained small and stable most of the time, with slight fluctuations when the eyes were looking down, thus indicating the precise reconstruction of the DM.

Clinical validation of DM

To evaluate the performance of the DM in clinical practice, we performed a relevant diagnostic comparison and an identity-removal validation. In the relevant diagnostic comparison, 12 ophthalmologists, 3 from each of the four departments, were invited to diagnose patients from their departments based on the DM-reconstructed videos and original videos. We evaluated the videos regarding pathological ocular manifestations that caused changes in the appearance of the eye and patients were diagnosed visually with diseases, including (1) TAO (exophthalmos, eyelid retraction and overactive or underactive extraocular muscles); (2) strabismus (horizontal or vertical strabismus and compensatory head position); (3) ptosis (drooping or lowering of the upper eyelid); and (4) nystagmus (Fig. 3 and Supplementary Video)⁹. For each eye, both the independent diagnosis from the original videos and the diagnosis from the DM-reconstructed videos were recorded and compared (Fig. 4a and Supplementary Data 1). If the two diagnoses were excellently consistent, this would suggest that the reconstruction was precise enough for use in clinical practice. Cohen’s κ values showed very high consistency (κ = 0.845–0.934 for strabismus, ptosis and nystagmus on both eyes and κ = 0.801 for TAO on right eyes) of the diagnoses, made by three ophthalmologists under majority rule, from original and reconstructed videos for all comparisons (Fig. 4b and Extended Data Table 2). Additionally, the accuracies of the diagnoses from the original and reconstructed videos, compared to the ground truth, were comparable for all paired comparisons (P = 0.131–1; Extended Data Table 3). These results indicate that the DM retains the important clinical attributes correctly and has the potential to be adopted in clinical practice.

**Fig. 3: Clinical signs of the ocular diseases studied.**

**Fig. 4: Clinical validation of the DM.**

In the identity-removal validation, we compared the identity-removal ability of the DM with that of cropping by using multiple-choice questions. Specifically, we processed the original images of the faces of the patients by using DM and cropping to generate 400 DM-reconstructed images and 400 cropped images, respectively. The selected generated images and the original images were staggered in the video time sequence. Correspondingly, we designed 800 multiple-choice questions. For the DM test, each question contained a DM-reconstructed image and five original images. For the cropping test, each question contained a cropped image and five original images. For each question, there were six options, including the five original images and an ‘other’ option. From these options, the respondents were asked to find the original image corresponding to the DM-reconstructed image or cropped image. The results showed that the accuracy rate for those taking the DM test was 27.3%; however, the accuracy for those taking the cropping was 91.3%, which was much greater than the accuracy of those taking the DM test (Fig. 4c). Both accuracies were likely overestimated because the test was conducted on the premise that the respondent knew only five people. In actual situations, the numbers of people are far higher; however, the results still demonstrate that the DM can effectively remove patient identity attributes and protect patient privacy, especially compared to cropping.

In addition, to evaluate the willingness of patients to share their eye and facial images during the application of DM, we performed an empirical investigation. 3D reconstruction software was developed to which users could provide their videos anonymously. The videos were then automatically processed by the DM and delivered to clinicians. Clinicians were only allowed to watch the DM-reconstructed videos for diagnosis (Supplementary Video) and the diagnoses were fed back to the users. A total of 317 outpatients, randomly selected via clinics, agreed to participate in the empirical investigation. During the investigation, the participants were asked to watch uploaded videos and the corresponding reconstructed videos processed by the DM using the software. The patients then completed a questionnaire to investigate their willingness to use DM at the end of the investigation (Fig. 5a). Among the respondents, 161 were males (50.7%). By age group, the highest proportion of respondents was in the 20–30-year group. Most of the respondents had university degrees (82.3%) and had used smartphones for more than 7 years (73.8%). In addition, in the questionnaire, regarding five hypotheses, 16 questions were designed from five aspects, including health support, privacy concerns, trust in physicians and medical platforms, willingness to share information and the influences of DM (Fig. 5b). The Kaiser–Meyer–Olkin measure of sampling adequacy and Cronbach α values for each component were larger than 0.617 and 0.718, respectively, thus supporting the reliability and validity of each question in the research design. Approximately 80% of the participants agreed that they had privacy concerns. Among the participants who had a disease with facial signs, more than 81.4% had privacy concerns, compared to more than 74.4% of participants without facial signs. Furthermore, we assessed the significance of the influence of the major aspects.

**Fig. 5: Empirical investigation of the willingness of patients to share personal health information.**

As shown in Extended Data Table 4, perceived benefits, such as health support of digital health information, positively affected patients’ trust in physicians and medical platforms with respect to digital health (β = 0.465, P < 0.001). In contrast, perceived concerns, such as privacy concerns, negatively affected patients’ trust in physicians with respect to digital health (β = −0.158, P = 0.005). The hypothesis that the DM had a positive impact on such trust was supported (β = 0.348, P < 0.001), thereby further improved the patients’ willingness to share information (β = 0.503, P < 0.001). The questionnaire details of each patient are included in Supplementary Data 2.

AI-powered re-identification validation of the DM

To evaluate the performance of the DM in evading recognition systems, we performed an AI-powered reidentification validation (Fig. 6a). In the validation, we conducted face recognition attacks by using three well-known deep-learning systems, namely, FaceNet²⁰, CosFace²¹ and ArcFace²². All the systems were trained on the CASIA-WebFace Dataset²³, which contains 494,414 face images of 10,575 real identities collected from the web. Using 405 patient videos, we randomly selected two frames in each video; one of the frames was used as the query image and the other was used as the database image. We processed 405 original query images to further generate 405 cropped query images and 405 DM-reconstructed query images. For the test, given a query image (original images, cropped images or DM-reconstructed images), the face recognition system (FaceNet, CosFace or ArcFace) was asked to match the image with database images of 405 patients. We used the area under the receiver operating characteristic curve (AUC), TAR@FAR = 0.1 (TAR, true accept rate; FAR, false accept rate), TAR@FAR = 0.01 and Rank-1 to evaluate the face recognition performance. The lower values of TAR@FAR = 0.1, TAR@FAR = 0.01 and Rank-1 indicate the weaker performance of the face recognition system and the greater performance of the privacy protection technology. As shown in Fig. 6b and Extended Data Table 5, the results on all the measurements show that taking original images as the query images, it was easy for face recognition systems to match the correct identity. When taking cropped images as the query images, the metrics had limited degradation. When using the DM, the performance of face recognition was significantly degraded. Rank-1 was <0.02 for all three systems, indicating that the systems had a very low possibility of identifying the correct identity with the DM-reconstructed images. Meanwhile, the receiver operating characteristic (ROC) curves of using DM-reconstructed images were close to y = x for all three systems, indicating that it was impossible to keep high TAR with low FAR. These results show the superiority in terms of privacy protection of our DM technique.

**Fig. 6: Validation of the DM using AI-powered re-identification algorithms.**

Discussion

In this study, we developed and validated a new technology called DM, which is based on real-time 3D reconstruction and deep learning, to retain the clinical attributes contained in patient videos, while minimizing access to nonessential biometric information for added personal privacy in clinical practice. Experimental results support that with the DM, examination videos of patients with manifestations of ocular disease can be precisely reconstructed from 2D videos containing original faces. A clinical diagnosis comparison showed that ophthalmologists achieved high consistency in reaching the same diagnosis when using the original videos and the corresponding DM-reconstructed videos. This new technology could effectively remove identity attributes and was positively accepted by patients with ocular diseases, who expressed an increasing willingness to share their personal information and have it stored digitally with this added layer of biometric protection.

It is notable that the DM offers a pragmatic approach to safeguarding patient privacy and data utilization in both research and clinical settings; patient privacy and data utilization are frequently cited as concerns by patients worried about data breaches. Compared to rather crude but still widely used options, such as covering identifiable areas with very large bars or cropping these areas out altogether³, the DM is a much more sophisticated tool for anonymizing facial images. Even next-generation privacy-protection techniques, such as federated learning and homomorphic encryption, do not safeguard privacy well and crucially, these techniques are vulnerable to model inversion or reconstruction attacks²⁴. The DM selects relevant features for reconstruction, but it is impossible to reconstruct original data particularly relevant to patient identification. Furthermore, compared with other face-swapping technologies, the DM can obtain quantitative parameters (such as the degree of eyeball rotation, eyelid shape parameters, blinking rate and rotation frequency), which might prove essential in the future for intelligent diagnosing disease or studying the relationships between diseases and certain facial characteristics.

In addition to its potential utilization in research and routine clinical practice, the DM can be applied to telemedicine, including online automatic diagnosis and patient triage for more efficient healthcare delivery²⁵. The wider adoption of digital medicine, partly prompted by the ongoing COVID-19 pandemic, will require that the barriers to privacy protection be overcome and an important step is removing biometric data that are not essential for healthcare delivery. The DM can encrypt data before they are submitted to the cloud, thereby allowing clinicians or AI algorithms to review the reconstructed data and removing concerns of patients whose medical records contain sensitive biometric data²⁶.

However, ‘protecting privacy’ does not equate to ‘absolute removal of identity characteristics.’ According to the Health Insurance Portability and Accountability Act Privacy Rule, protecting patient privacy refers to reducing the identification risk of health information²⁷. One of the most important principles is balancing disclosure risk against data utility. Therefore, the purpose of this study is to provide an approach to health information disclosure that de-identifies protected health information as much as possible, without compromising the need for the clinician to reach a diagnosis.

The study has several limitations. First, the reconstruction of conjunctival hyperemia, eyelid edema and abnormal growth of tissues, such as ocular tumors, remains challenging because of insufficient model capacity. Model-based 3D reconstruction assumes that the target lies in the linear space spanned based on a set of prepared models; however, it is difficult to cover all shapes in the aforementioned cases because shapes differ significantly from person to person. We intend to improve the DM by including a sufficiently large sample of abnormal cases for more detailed analysis or constructing an extra submodel on top of the existing model in the next research step. Second, this paper has demonstrated that the DM can protect re-identification from images but it may not work under certain circumstances if the video of the patient is exposed. We are currently extending our work to deal with video protection and circumvent this possible weakness. Third, the potential risk that the DM might be attacked still remains, as it might be abused to develop targeted attack algorithms; however, this risk can be mitigated by formulating relevant rules in the future.

In conclusion, we demonstrate the effectiveness of the DM in enhancing patient data privacy by making use of deep learning and real-time 3D reconstruction and notably, we demonstrate the DM’s acceptability to healthcare users. Future work is necessary to further evaluate the applicability of DM in a wider variety of clinical settings as the requirements for de-identification will vary according to the type of imaging dataset used.

Methods

Ethical approval

The research protocol and ethical review of this study were approved by the Institutional Review Board/Ethics Committee of the Zhongshan Ophthalmic Center. The clinical study protocol is shown in the Supplementary Note. Consent was obtained from all individuals whose images are shown in figures or the video for publication of these images. Informed consent was obtained from at least one legal guardian of each infant and the tenets of the Declaration of Helsinki were followed throughout this study. The trial in this study was registered with the Clinical Research Internal Management System of Zhongshan Ophthalmic Center and retrospectively registered at ClinicalTrials.gov (NCT05058599).

DM technique

Our reconstruction method consisted of three main stages: face reconstruction, eyelid reconstruction and eyeball reconstruction. At each stage, a unique detector was used to extract relevant features (the face landmarks L^face at the first stage, the eyelid landmarks L^eyelid and eyelid semantic lines S^eyelid at the second stage and the iris landmarks L^iris at the last stage). All detectors were neural networks based on deep-learning techniques. After acquiring the features, the corresponding model parameters were optimized to fit these features. The details of each stage are described below.

Face reconstruction

As we use a bilinear model¹⁷, we generated a 3D face ${{{\boldsymbol{M}}}}^{\,f} \in {\Bbb R}^{3N_F}$ using a shape vector ${{{\boldsymbol{w}}}}^{\,fs} \in {\Bbb R}^{N_{fs}}$ and a set of motion vectors ${{{\boldsymbol{w}}}}^{\,fm} \in {\Bbb R}^{N_{fm}}$:

$${{{{\boldsymbol{M}}}}^{\,f} = {{{\boldsymbol{C}}}}_r \times _2{{{\boldsymbol{w}}}}^{\,fs} \times _3{{{\boldsymbol{w}}}}^{\,fm}}$$

(1)

where ${{{\boldsymbol{C}}}}_r \in {\Bbb R}^{3N_F \times N_{fs} \times N_{fm}}$ is a pre-defined core tensor that stores 3D vertex positions of faces covering the major variations in shape and motion; ×₂ and ×₃ are the tensor product operations on the second dimension and third dimension, respectively; N_F is the number of 3D face vertices; and N_fs and N_fm are the dimensions of the shape vector and motion vector, respectively.

Given the face landmarks ${{{\boldsymbol{L}}}}^{face} \in {\Bbb R}^{2N_{LF}}$ on a video frame of a patient, we reconstructed a 3D face of the patient by solving an optimization problem; minimizing the landmark registration error E_face by searching for the optimal parameters w^fs, w^fm, R and t:

$$E_{face}\left( {{{{\boldsymbol{w}}}}^{\,fs},{{{\boldsymbol{w}}}}^{\,fm},{{{\boldsymbol{R}}}},{{{\boldsymbol{t}}}};{{{\boldsymbol{C}}}}_r,{{{\boldsymbol{L}}}}^{face}} \right) = \mathop {\sum}\limits_i^{N_{LF}} {\left\| {{{{\boldsymbol{L}}}}_i^{face} - {{{\mathrm{{\Pi}}}}}\left( {{{{\boldsymbol{M}}}}^{\,f\prime }_{c_{face}(i)}} \right)} \right\|_2^2}$$

(2)

$$\begin{array}{*{20}{c}} {{{{\boldsymbol{M}}}}^{\,f\prime } = \boldsymbol{R}{{{\boldsymbol{M}}}}^{\,f} + \boldsymbol{t}} \end{array}$$

(3)

where R ∈ SO(3) and ${{{\boldsymbol{t}}}} \in {\Bbb R}^3$ denote the rotation and translation of a 3D mesh, respectively; ∏(·) is a projection function that projects 3D points to 2D points; c_face(i) represents the corresponding face index for the ith face landmark, which is predefined manually; and N_LF is the number of face landmarks.

Note that w^fs was estimated based on only the first frame and then fixed for the following frames. Therefore, for the following frames, the objective function was slightly simplified to

$$E_{face}\left( {{{{\boldsymbol{w}}}}^{\,fm},{{{\boldsymbol{R}}}},{{{\boldsymbol{t}}}};{{{\boldsymbol{w}}}}^{\,fs},{{{\boldsymbol{C}}}}_r,{{{\boldsymbol{L}}}}^{face}} \right) = \mathop {\sum}\limits_i^{N_{LF}} {\left\| {{{{\boldsymbol{L}}}}_i^{face} - {{{\mathrm{{\Pi}}}}}\left( {{{{\boldsymbol{M}}}}^{\,f\prime }_{c_{face}(i)}} \right)} \right\|_2^2}$$

(4)

Eyelid reconstruction

Similar to the bilinear face model, our eyelid model¹⁸ contained a set of shape vectors ${{{\boldsymbol{w}}}}^{es} \in {\Bbb R}^{N_{es}}$ and a set of motion vectors ${{{\boldsymbol{w}}}}^{em} \in {\Bbb R}^{N_{em}}$. N_es and N_em are the dimensions of the shape vector and motion vector, respectively. Given our parametric eyelid model, a particular 3D eye region ${{{\boldsymbol{M}}}}^e \in {\Bbb R}^{3N_D}$ was reconstructed as follows:

$${{{\boldsymbol{M}}}}^e = {{{\boldsymbol{M}}}}_0^e + \mathop {\sum}\limits_i^{N_{es}} {w_i^{es}{{{\boldsymbol{M}}}}_i^{es}} + \mathop {\sum}\limits_j^{N_{em}} {w_j^{em}{{{\boldsymbol{M}}}}_j^{em}}$$

(5)

where ${{{\boldsymbol{M}}}}_0^e \in {\Bbb R}^{3N_D}$ is the template eyelid geometry model; ${{{\boldsymbol{M}}}}^{es} \in {\Bbb R}^{N_{es} \times 3N_D}$ and ${{{\boldsymbol{M}}}}^{em} \in {\Bbb R}^{N_{em} \times 3N_D}$ are also predefined and represent the basis geometry changes for shape and motion, respectively; and N_D is the number of 3D eyelid vertices.

Before reconstruction, we first fitted two polynomial curves for the upper eyelid and the lower eyelid according to the detected landmarks ${{{\boldsymbol{L}}}}^{eyelid} \in {\Bbb R}^{2N_{Ld}}$. Specifically, we fit cubic polynomial curves:

$$\begin{array}{*{20}{c}} {y = ax^3 + bx^2 + cx + d} \end{array}$$

(6)

by solving a least-squares problem:

$$\begin{array}{*{20}{c}} {\left( {\begin{array}{*{20}{c}} 1 & {x_1} & {x_1^2} & {x_1^3} \\ 1 & {x_2} & {x_2^2} & {x_2^3} \\ \vdots & \vdots & \vdots & \vdots \\ 1 & {x_{N_{Ld}}} & {x_{N_{Ld}}^2} & {x_{N_{Ld}}^3} \end{array}} \right)\left( {\begin{array}{*{20}{c}} d \\ c \\ b \\ a \end{array}} \right) = \left( {\begin{array}{*{20}{c}} {y_1} \\ {y_2} \\ \vdots \\ {y_{N_{Ld}}} \end{array}} \right)} \end{array}$$

(7)

x and y denote the 2D coordinates of a point on 2D image. Then, we applied dense sampling to acquire dense landmarks ${{{\boldsymbol{L}}}}^{dense} \in {\Bbb R}^{2N_{LD}}$ by uniform sampling $x^{dense} = \{ x_1^{dense},x_2^{dense}, \cdots ,x_{N_{LD}}^{dense}\}$.

$$\begin{array}{ll}{\boldsymbol{L}}_i^{dense} &= \left( {x_i^{dense},y_i^{dense}} \right)\\ &= \left( {x_i^{dense},a\left( {x_i^{dense}} \right)^3 + b\left( {x_i^{dense}} \right)^2 + c\left( {x_i^{dense}} \right) + d} \right)\end{array}$$

(8)

where N_Ld is the number of detected eyelid landmarks and N_LD is the number of dense landmarks.

For continuous features, the four detected semantic lines (representing the double-fold, the upper eyelid, the lower eyelid and the lower boundary of the bulge) S^eyelid are irregular curves defined on the 2D image space, thus indicating the positions of different parts of the eyelid.

Integrating both discrete features and continuous features, we solved the following energy function to search for the optimal w^es and w^em:

$$E_{eyelid}\left( {{{{\boldsymbol{w}}}}^{es},{{{\boldsymbol{w}}}}^{em};{{{\boldsymbol{R}}}},{{{\boldsymbol{t}}}},{{{\boldsymbol{M}}}}_0^e,{{{\boldsymbol{M}}}}^{es},{{{\boldsymbol{M}}}}^{em},{{{\boldsymbol{L}}}}^{dense}} \right) = \mathop {\sum}\limits_i^{N_{LD}} {\left\| {{{{\boldsymbol{L}}}}_i^{dense} - {{{\mathrm{{\Pi}}}}}\left( {{{{\boldsymbol{M}}}}^{e\prime }_{c_{eyelid}\left( i \right)}} \right)} \right\|_2^2}$$

(9)

$$E_{sl}\left( {{{{\boldsymbol{w}}}}^{es},{{{\boldsymbol{w}}}}^{em};{{{\boldsymbol{R}}}},{{{\boldsymbol{t}}}},{{{\boldsymbol{M}}}}_0^e,{{{\boldsymbol{M}}}}^{es},{{{\boldsymbol{M}}}}^{em},S^{eyelid}} \right) = \mathop {\sum}\limits_k^{N_{sl}} {\mathop {\sum}\limits_{j \in v_{sl}\left( k \right)} {dis\left( {{{{\mathrm{{\Pi}}}}}\left( {{{{\boldsymbol{M}}}}^{e\prime} _j} \right),S_k^{eyelid}} \right)} } ^2$$

(10)

$$\begin{array}{*{20}{c}} {{{{\boldsymbol{M}}}}^{e\prime } = \boldsymbol{R}{{{\boldsymbol{M}}}}^e + \boldsymbol{t}} \end{array}$$

(11)

where c_eyelid(i) represents the corresponding vertex index for the ith eyelid landmark, which is also manually predefined in advance. v_sl(k) represents a set of vertex indices belonging to the kth semantic line. dis(·,·) is the distance between a point and the closest point on a line. N_sl is the number of semantic lines, which is four in this paper. R and t are calculated at the face reconstruction stage.

Similar to the face reconstruction, w^es was determined in the first frame. In the following frames, the objective functions were changed to

$$E_{eyelid}\left( {{{\boldsymbol{w}}}}^{em};{{{{\boldsymbol{w}}}}^{es},{{{\boldsymbol{R}}}},{{{\boldsymbol{t}}}},{{{\boldsymbol{M}}}}_0^e,{{{\boldsymbol{M}}}}^{es},{{{\boldsymbol{M}}}}^{em},{{{\boldsymbol{L}}}}^{dense}} \right) = \mathop {\sum}\limits_i^{N_{LD}} {\left\| {{{{\boldsymbol{L}}}}_i^{dense} - {{{\mathrm{{\Pi}}}}}\left( {{{{\boldsymbol{M}}}}^{e\prime }_{c_{eyelid}\left( i \right)}} \right)} \right\|_2^2}$$

(12)

$$E_{sl}\left( {{{\boldsymbol{w}}}}^{em};{{{{\boldsymbol{w}}}}^{es},{{{\boldsymbol{R}}}},{{{\boldsymbol{t}}}},{{{\boldsymbol{M}}}}_0^e,{{{\boldsymbol{M}}}}^{es},{{{\boldsymbol{M}}}}^{em},S^{eyelid}} \right) = \mathop {\sum}\limits_k^{N_{sl}} {\mathop {\sum}\limits_{j \in v_{sl}\left( k \right)} {dis\left( {{{{\mathrm{{\Pi}}}}}\left( {{{{\boldsymbol{M}}}}^{e\prime} _j} \right),S_k^{eyelid}} \right)} } ^2$$

(13)

Eyeball reconstruction

Our SGAEM model¹⁹ represented a 3D eyeball ${{{\boldsymbol{B}}}} \in {\Bbb R}^{3N_B}$ based on the eyeball radius r_e and the iris radius r_i. N_B is the number of 3D eyeball vertices.

$${{{\boldsymbol{B}}}}= SGAEM\left( {r_e,r_i} \right)$$

(14)

The position of the eyeball relative to the face ${{{\boldsymbol{p}}}}_e \in {\Bbb R}^3$ is also needed to be determined for reconstruction. Here, certain prior knowledge is used to estimate the three parameters (r_e, r_i and p_e) in the first frame and then fixed in the following frames by minimizing the following objective function:

$$E_{eyeball}\left( {\theta ,\phi ;{{{\boldsymbol{R}}}},{{{\boldsymbol{t}}}},{{{\boldsymbol{L}}}}^{iris},r_e,r_i,{{{\boldsymbol{p}}}}_e} \right) = \mathop {\sum}\limits_i^{N_{LB}} {\left\| {{{{\boldsymbol{L}}}}_i^{iris} - {{{\mathrm{{\Pi}}}}}\left( {{{{\boldsymbol{B}}}}^{\prime} _{c_{iris}\left( i \right)}} \right)} \right\|_2^2}$$

(15)

$${{{{\boldsymbol{B}}}}^{\prime} = \boldsymbol{R}\left( {Rot\left( {\theta ,\phi } \right){{{\boldsymbol{B}}}} + {{{\boldsymbol{p}}}}_e} \right) + \boldsymbol{t}}$$

(16)

where θ and ϕ are the Euler angles of eyeball rotation. Rot(·,·) is a function that converts θ and ϕ into a rotation matrix. c_iris(i) represents the corresponding vertex index for the ith iris landmark, which is also predefined in advance. N_LB is the number of iris landmarks. R and t are calculated at the face reconstruction stage.

Sequence consistency

To maintain consistency between successive frames, the following smoothing terms were also considered for the above objective functions:

$$E_{smooth1} = \lambda _{fm}\left\| {{{{\boldsymbol{w}}}}^{{\,fm}} - {{{\boldsymbol{w}}}}_{prev}^{\,fm}} \right\|_2^2 + \lambda _R\left\| {{{{\boldsymbol{R}}}} - {{{\boldsymbol{R}}}}_{prev}} \right\|_2^2 + \lambda _t\left\| {{{{\boldsymbol{t}}}} - {{{\boldsymbol{t}}}}_{{\it{prev}}}} \right\|_2^2$$

(17)

$$\begin{array}{*{20}{c}} {E_{smooth2} = \lambda _{em}\left\| {{{{\boldsymbol{w}}}}^{em} - {{{\boldsymbol{w}}}}_{prev}^{em}} \right\|_2^2} \end{array}$$

(18)

$$\begin{array}{*{20}{c}} {E_{smooth3} = \lambda _\theta \left\| {\theta - \theta _{prev}} \right\|_2^2 + \lambda _\phi \left\| {\phi - \phi _{prev}} \right\|_2^2} \end{array}$$

(19)

where subscript prev represents the parameter at the previous frame.

Finally, the objective function for the three stages becomes

$$\begin{array}{*{20}{c}} {E_1 = E_{face} + E_{smooth1}} \end{array}$$

(20)

$$\begin{array}{*{20}{c}} {E_2 = E_{eyelid} + E_{sl} + E_{smooth2}} \end{array}$$

(21)

$$\begin{array}{*{20}{c}} {E_3 = E_{eyeball} + E_{smooth3}} \end{array}$$

(22)

The Gauss–Newton method was adopted to solve the nonlinear least-squares problem to minimize each objective function.

Network training

We introduced how to train the networks of the three landmark detectors (face, eyelid and iris landmark detectors) and the eyelid semantic line detector (Supplementary Table 1). The network architecture for the three landmark detectors follows HRNet²⁸. For the face detector, we used both the 300W²⁹ and WFLW³⁰ to train the network. For the eyelid and iris landmark detectors, we used UnityEyes³¹ to synthesize 20,000 images with groundtruth landmark positions in the training. The network architecture for the eyelid semantic line detector follows HED³² and we used the data in our previous work¹⁸ to train it. After all these networks were trained, we further used our own collected 775 patient portraits to fine-tune the networks, making the networks better able to handle the data of real patients. Specifically, we split 75 patient portraits from the fine-tuning dataset for validation and used the remaining 700 portraits for fine-tuning. The characteristics of the training dataset are shown in Extended Data Table 6.

Deformation transfer for eyebrow movements

Although the linear eyelid model provided sufficient eyelid variations, it included no degrees of freedom for motion in the eyebrow region. Reconstructing the eyebrow motions of patients would help in diagnosing TAO. To perform such a reconstruction, a deformation transfer method was applied, as described below.

We defined two semantic regions on both the face and eyelid models and by assuming that the region on the face model influences the corresponding semantic region on the eyelid model, the influence of the face vertex on the eyelid vertex could be estimated based on the influence weights w_i,j:

$$\begin{array}{*{20}{c}}{w_{i,j} = exp\left( { - \frac{{\left\| {{{{\boldsymbol{v}}}}_i^{\,f} - {{{\boldsymbol{v}}}}_j^e} \right\|_2^2}}{{2r^2}}} \right)} \end{array}$$

(23)

where ${{{\boldsymbol{v}}}}_i^{\,f}$ and ${{{\boldsymbol{v}}}}_j^e$ represent the ith face vertex and jth eyelid vertex, respectively and r is the influence radius. With w_i,j, each eyelid vertex ${{{\boldsymbol{v}}}}_j^e$ can be deformed together with the motion of the face vertices as follows:

$$\begin{array}{*{20}{c}} {{{{\boldsymbol{v}}}}_{j,t}^e = {{{\boldsymbol{v}}}}_{j,t - 1}^e + \mathop {\sum}\limits_{i \in {{{\mathcal{N}}}}\left( j \right)} {\frac{{w_{i,j}}}{N}\left( {{{{\boldsymbol{v}}}}_{i,t}^{\,f} - {{{\boldsymbol{v}}}}_{i,t - 1}^{\,f}} \right)} } \end{array}$$

(24)

where t and t − 1 represent the time index of the current frame and the previous frame, respectively. ${{{\mathcal{N}}}}(j)$ is a set of face vertex indices related to ${{{\boldsymbol{v}}}}_j^e$ and N is the number of related vertices. Notice that all the vertices were in the local coordinate system, which removes the influence of global rotation R and global translation t.

Definitions of pathological ocular manifestations for the clinical evaluation

Ptosis is defined as the upper eyelid falling to a position that is lower than normal (typically 1.0–2.0 mm below the superior corneoscleral limbus)³³. The palpebral fissure distance is often evaluated by guiding the patient’s eye fixation to a distant target³⁴. The frontalis muscle, levator palpebrae muscle and orbicular muscle are analyzed based on a series of movement guidelines to preliminarily explore the cause of ptosis; these movement guidelines include having the patient gaze upwards and downwards, maintain an upwards gaze for 1 min and close his or her eyes tightly shut³⁵. Additionally, the presence of Brown’s ocular movements and jaw motion are all provided to aid in diagnosing ptosis³⁶.

Strabismus is characterized as the eyes not properly aligning with each other when looking at an object. The cover test and alternate cover test are used in diagnosing strabismus³⁷. Because most people have exotropia but do not need treatment, we excluded exotropia when determining the diagnosis of strabismus. The test allows wearing glasses, especially in the case of patients with accommodative esotropia.

TAO is diagnosed by positive responses of eyelid retraction and at least two of the following four sets of findings: chemosis or eyelid edema, lid lag or restrictive lagophthalmos^38,39.

Nystagmus is characterized as the eyes moving rapidly and uncontrollably; this movement can be observed and diagnosed during eye movement recording⁴⁰. Additionally, compensatory head position and median zone are important features of nystagmus⁴¹.

Statistical analysis

In the sample size estimate of the clinical trial, the power was set at 0.9, the significance level was 0.025 and a one-sided test was used. Assuming k1 = 0.85 and k0 = 0.6, the probabilities of abnormal findings were 0.3 to 0.7 and the sample size for each disease was at least 82 estimated using the irr package in R 4.1.1 (R Project for Statistical Computing).

Our quantitative evaluation was based on the 2D pixel distance between the detected 2D landmarks and projected 2D positions of the 3D points on the reconstructed face. To exclude the influence of face size, we evaluated our method using the normalized pixel distance rather than the absolute pixel distance.

To acquire the normalized pixel distance, we calculated the absolute pixel distance first:

$$\begin{array}{*{20}{c}} {D_i^{abs} = \left\| {{{{\boldsymbol{L}}}}_i - {{{\mathrm{{\Pi}}}}}\left( {{{{\boldsymbol{V}}}}_i} \right)} \right\|_2} \end{array}$$

(25)

where L_i is the ith 2D landmark and V_i is the ith 3D point.

Then, we calculated the absolute pixel distance between the two eyes:

$$\begin{array}{*{20}{c}} {D^{eye} = \left\| {{{{\boldsymbol{C}}}}^{left} - {{{\boldsymbol{C}}}}^{right}} \right\|_2} \end{array}$$

(26)

where C^left and C^right are the center positions of the left and right eyes, respectively.

Finally, we normalized the pixel distance between landmarks according to the distance between the two eyes, that is,

$$\begin{array}{*{20}{c}} {D_i^{norm} = \frac{{D_i^{abs}}}{{D^{eye}}}} \end{array}$$

(27)

To further validate the reconstruction, the maximum normalized error and average normalized error were defined as

$$\begin{array}{*{20}{c}} {MRE = max\left\{ {D_1^{norm},D_2^{norm}, \ldots ,D_N^{norm}} \right\}} \end{array}$$

(28)

$$\begin{array}{*{20}{c}} {ARE = \frac{{\mathop {\sum}\nolimits_{i = 1}^N {D_i^{norm}} }}{N}} \end{array}$$

(29)

Generally, n = 38 for eyeball validation and n = 200 for eyelid validation, but we excluded some (landmark, point) pairs when they were occluded, especially in the strabismus dataset.

In the clinical validation, the characteristics of the participants were described as the frequency (proportion) for categorical variables and the median (IQR) for continuous variables due to nonnormal distributions. Cohen’s κ statistics were used to evaluate the diagnostic consistency in the relevant diagnostic comparison. Kappa was interpreted as recommended by Landis and Koch, where κ ≤ 0.00 is considered as poor, 0.00–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial and ≥0.81 almost perfect⁴². In addition, based on the groundtruth, we measured the accuracies of diagnoses from the original videos and diagnoses from the reconstructed videos and compared them using the McNemar test. In the empirical investigation, principal-component analysis was used to generate five factors from the 16 questions in the questionnaire. The Kaiser–Meyer–Olkin measure of the sampling adequacy and Cronbach’s α for each component were used to evaluate the reliability and validity of each question. Linear regression was used to measure the associations between components.

In the AI-powered re-identification validation, we used AUC, TAR@FAR = 0.1, TAR@FAR = 0.01 and Rank-1 to evaluate the performance of face-recognition systems. The TAR is the proportion of authorized people who the system correctly accepts and is defined as

$$\begin{array}{*{20}{c}} {TAR = \frac{{TP}}{{TP + FN}}} \end{array}$$

(30)

The FAR is the proportion that the system incorrectly accepts nonauthorized people, defined as

$$\begin{array}{*{20}{c}} {FAR = \frac{{FP}}{{FP + TN}}} \end{array}$$

(31)

By setting different threshold values for similarity scores (given by the face recognition systems), we obtain different TARs and FARs, resulting in a ROC curve. The AUC measures the 2D area underneath the ROC curve. TAR@FAR = X represents the TAR value when FAR equals X. Rank-1 is the probability that the similarity score of the same identity ranks first among all the identities.

Data were analyzed using SPSS (v.23.0, IBM Corp), R (v.4.1.1, R Project for Statistical Computing), C++ (v.11, Standard C++ Foundation) and Python (v.3.6, Python Software Foundation) with a designated significance level of 5%.

Algorithm efficiency

Although considerable engineering effort is still needed to build a practical application, our main algorithm can run in real time. In detail, our algorithm takes approximately 7 ms, 14 ms and 4 ms per frame for face, eyelid and eyeball reconstruction, respectively, on one Intel i7 CPU and one NVIDIA 1080 GPU.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data that support the findings of this study are divided into two groups: shared data and restricted data. Shared data are available from the manuscript, references, supplementary data and video. Restricted data relating to individuals in this study are subject to a license that allows for use of the data only for analysis. Therefore, such data cannot be shared.

Code availability

To promote academic exchanges, under the framework of data and privacy security, the code proposed by DM is available at https://github.com/StoryMY/Digital-Mask. In the case of non-commercial use, researchers can sign the license provided in the above link and contact H.L. or F.X. to access the code.

References

Aryanto, K., Oudkerk, M. & van Ooijen, P. Free DICOM de-identification tools in clinical research: functioning and safety of patient privacy. Eur. Radiol. 25, 3685–3695 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lotan, E. et al. Medical imaging and privacy in the era of artificial intelligence: myth, fallacy, and the future. J. Am. Coll. Radiol. 17, 1159–1162 (2020).
Article PubMed PubMed Central Google Scholar
Clover, A., Fitzpatrick, E. & Healy, C. Analysis of methods of providing anonymity in facial photographs: a randomised controlled study. Ir. Med. J. 103, 243–245 (2010).
CAS PubMed Google Scholar
Denny, J. C. & Collins, F. S. Precision medicine in 2030: seven ways to transform healthcare. Cell 184, 1415–1419 (2021).
Article CAS PubMed PubMed Central Google Scholar
Anon. Time to discuss consent in digital-data studies. Nature 572, 5 (2019).
Koops, B.-J. The concept of function creep. Law Innov. Technol. 13, 29–56 (2021).
Article Google Scholar
Mason, J. et al. An Investigation of biometric authentication in the healthcare environment. Array 8, 100042 (2020).
Article Google Scholar
Lin, S. et al. Feasibility of using deep learning to detect coronary artery disease based on facial photo. Eur. Heart J. 41, 4400–4411 (2020).
Article PubMed Google Scholar
Long, E. et al. Discrimination of the behavioural dynamics of visually impaired infants via deep learning. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-019-0461-9 (2019).
Zeng, X., Peng, X. & Qiao, Y. DF2Net: A dense-fine-finer network for detailed 3D face reconstruction. Proc. IEEE/CVF International Conference on Computer Vision, 2315–2324 (ICCV, 2019).
Ranjan, A., Bolkart, T., Sanyal, S. & Black, M. J. Generating 3D faces using convolutional mesh autoencoders. Proc. European Conference on Computer Vision, 704–720 (ECCV, 2018).
Liu, F., Zhu, R., Zeng, D., Zhao, Q. & Liu, X. Disentangling features in 3D face shapes for joint face reconstruction and recognition. Proc. IEEE Conference on Computer Vision and Pattern Recognition, 5216–5225 (IEEE/CVF, 2018).
Booth, J. et al. 3D face morphable models ‘in-the-wild’. Proc. IEEE Conference on Computer Vision and Pattern Recognition, 48–57 (IEEE/CVF, 2018).
Ploumpis, S. et al. Towards a complete 3D morphable model of the human head. IEEE Trans. Pattern Anal. Mach. Intell. 43, 4142–4160 (2020).
Article Google Scholar
Huber, P. et al. A multiresolution 3D morphable face model and fitting framework. Proc. 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (2016).
Tewari, A., Seidel, H.-P., Elgharib, M. & Theobalt, C. Learning complete 3D morphable face models from images and videos. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 3361–3371 (CVPR, 2021).
Cao, C., Weng, Y., Zhou, S., Tong, Y. & Zhou, K. Facewarehouse: a 3D facial expression database for visual computing. IEEE Trans. Visual Comput. Graphics 20, 413–425 (2013).
Google Scholar
Wen, Q., Xu, F., Lu, M. & Yong, J.-H. Real-time 3D eyelids tracking from semantic edges. ACM Trans. Graphics 36, 1–11 (2017).
Article CAS Google Scholar
Wen, Q., Xu, F. & Yong, J.-H. Real-time 3D eye performance reconstruction for RGBD cameras. IEEE Trans. Visual Comput. Graphics 23, 2586–2598 (2016).
Article Google Scholar
Schroff, F., Kalenichenko, D. & Philbin, J. FaceNet: a unified embedding for face recognition and clustering. Proc. IEEE Conference on Computer Vision and Pattern Recognition, 815–823 (IEEE, 2015).
Wang, H. et al. Cosface: large margin cosine loss for deep face recognition. Proc. IEEE Conference on Computer Vision and Pattern Recognition, 5265–5274 (CVPR, 2018).
Deng, J., Guo, J., Xue, N. & Zafeiriou, S. Arcface: additive angular margin loss for deep face recognition. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4690–4699 (IEEE/CVF, 2019).
Yi, D., Lei, Z., Liao, S. & Li, S. Z. Learning face representation from scratch. Preprint at arXiv https://arxiv.org/abs/1411.7923 (2014).
Kaissis, G. A., Makowski, M. R., Rückert, D. & Braren, R. F. Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. https://doi.org/10.1038/s42256-020-0186-1 (2020).
Hollander, J. E. & Carr, B. G. Virtually perfect? Telemedicine for COVID-19. N. Engl. J. Med. 382, 1679–1681 (2020).
Article CAS PubMed Google Scholar
Stanberry, B. Legal ethical and risk issues in telemedicine. Comput. Methods Programs Biomed. 64, 225–233 (2001).
Article CAS PubMed Google Scholar
Health Insurance Portability and Accountability Act of 1996. Pub. L. No. 104–191, 110 Stat. 1936 (1996).
Wang, J. et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3349–3364 (2020).
Article Google Scholar
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S. & Pantic, M. 300 faces in-the-wild challenge: the first facial landmark localization challenge. Proc. IEEE International Conference on Computer Vision Workshops, 397–403 (ICCV, 2013).
Wu, W. et al. Look at boundary: a boundary-aware face alignment algorithm. Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2129–2138 (CVPR, 2018).
Wood, E., Baltrušaitis, T., Morency, L.-P., Robinson, P. & Bulling, A. Learning an appearance-based gaze estimator from one million synthesised images. Proc. Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, 131–138 (ETRA, 2016).
Xie, S. & Tu, Z. Holistically-nested edge detection. Proc. IEEE International Conference on Computer Vision 1395–1403 (IEEE, 2015).
Latting, M. W., Huggins, A. B., Marx, D. P. & Giacometti, J. N. Clinical evaluation of blepharoptosis: distinguishing age-related ptosis from masquerade conditions. Semin. Plast. Surg. 31, 5–16 (2017).
Article PubMed PubMed Central Google Scholar
Thomas, I. T., Gaitantzis, Y. A. & Frias, J. L. Palpebral fissure length from 29 weeks gestation to 14 years. J Pediatr. 111, 267–268 (1987).
Article CAS PubMed Google Scholar
Díaz-Manera, J., Luna, S. & Roig, C. Ocular ptosis: differential diagnosis and treatment. Curr. Opin. Neurol. 31, 618–627 (2018).
Article PubMed Google Scholar
Pearce, F. C., McNab, A. A. & Hardy, T. G. Marcus Gunn jaw-winking syndrome: a comprehensive review and report of four novel cases. Ophthal. Plast. Reconstr. Surg. 33, 325–328 (2017).
Article PubMed Google Scholar
Wright, K. W., Spiegel, P. H. & Hengst, T. Pediatric Ophthalmology and Strabismus (Springer Science & Business Media, 2013).
Bartley, G. B. & Gorman, C. A. Diagnostic criteria for Graves’ ophthalmopathy. Am. J. Ophthalmol. 119, 792–795 (1995).
Article CAS PubMed Google Scholar
Gorman, C. A. et al. A prospective, randomized, double-blind, placebo-controlled study of orbital radiotherapy for Graves’ ophthalmopathy. Ophthalmology 127, S160–S171 (2020).
Article PubMed Google Scholar
Abel, L. A. Infantile nystagmus: current concepts in diagnosis and management. Clin. Exp. Optom. 89, 57–65 (2006).
Article PubMed Google Scholar
Tirosh, E., Shnitzer, M., Davidovitch, M. & Cohen, A. Behavioural problems among visually impaired between 6 months and 5 years. Int. J. Rehab. Res. 21, 63–69 (1998).
Article CAS Google Scholar
Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This study was supported by the Science and Technology Planning Projects of Guangdong Province (2018B010109008 to H.L.), the National Key R&D Program of China (2018YFA0704000 to F.X.), the National Natural Science Foundation of China (82171035 and 81770967 to H.L. and 62088102 to Q.D.), Beijing Natural Science Foundation (JQ19015 to F.X.), Guangzhou Key Laboratory Project (202002010006 to H.L.), the Institute for Brain and Cognitive Science, Tsinghua University (to Q.D.), Beijing Laboratory of Brain and Cognitive Intelligence, Beijing Municipal Education Commission (to Q.D.) and Hainan Province Clinical Medical Center (H.L.). These sponsors and funding organizations had no role in the design or performance of this study. P.Y.W.M. is supported by an Advanced Fellowship Award (NIHR301696) from the UK National Institute of Health Research (NIHR). P.Y.W.M. also receives funding from Fight for Sight (UK), the Isaac Newton Trust (UK), Moorfields Eye Charity (GR001376), the Addenbrooke’s Charitable Trust, the National Eye Research Center (UK), the International Foundation for Optic Nerve Disease, the NIHR as part of the Rare Diseases Translational Research Collaboration, the NIHR Cambridge Biomedical Research Center (BRC-1215-20014) and the NIHR Biomedical Research Center based at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

Author information

These authors contributed equally: Yahan Yang, Junfeng Lyu, Ruixin Wang.
These authors jointly supervised this work: Patrick Yu-Wai-Man, Feng Xu, Qionghai Dai, Haotian Lin.

Authors and Affiliations

State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Vision Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Yahan Yang, Ruixin Wang, Lanqin Zhao, Wenben Chen, Shaowei Bi, Jie Meng, Keli Mao, Danqi Zeng, Yuxuan Wu, Tingxin Cui, Lixue Liu, Wai Cheng Iao, Xiaoyan Li, Youjin Hu, Lai Wei, Xinping Yu, Jingchang Chen, Zhonghao Wang, Zhen Mao, Huijing Ye, Wei Xiao, Huasheng Yang, Danping Huang, Xiaoming Lin & Haotian Lin
School of Software and BNRist, Tsinghua University, Beijing, China
Junfeng Lyu, Quan Wen & Feng Xu
Department of Ophthalmology, Guangdong Provincial People’s Hospital; Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, China
Yu Xiao, Yingying Liang & Zijing Du
Department of Ophthalmology & Visual Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
Carol Y. Cheung
School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China
Jianhua Zhou
Ophthalmic Center, Kiang Wu Hospital, Macao SAR, Macao, China
Iat Fan Lai
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Wei-shi Zheng & Ruixuan Wang
Cambridge Center for Brain Repair and MRC Mitochondrial Biology Unit, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
Patrick Yu-Wai-Man
Cambridge Eye Unit, Addenbrooke’s Hospital, Cambridge University Hospitals, Cambridge, UK
Patrick Yu-Wai-Man
Moorfields Eye Hospital, London, UK
Patrick Yu-Wai-Man
UCL Institute of Ophthalmology, University College London, London, UK
Patrick Yu-Wai-Man
Beijing Laboratory of Brain and Cognitive Intelligence, Beijing Municipal Education Commission, Beijing, China
Feng Xu & Qionghai Dai
Department of Automation and BNRist, Tsinghua University, Beijing, China
Qionghai Dai
Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Haikou, China
Haotian Lin
Center for Precision Medicine and Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
Haotian Lin

Authors

Yahan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Lyu
View author publications
You can also search for this author in PubMed Google Scholar
Ruixin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Quan Wen
View author publications
You can also search for this author in PubMed Google Scholar
Lanqin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wenben Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shaowei Bi
View author publications
You can also search for this author in PubMed Google Scholar
Jie Meng
View author publications
You can also search for this author in PubMed Google Scholar
Keli Mao
View author publications
You can also search for this author in PubMed Google Scholar
Yu Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yingying Liang
View author publications
You can also search for this author in PubMed Google Scholar
Danqi Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Zijing Du
View author publications
You can also search for this author in PubMed Google Scholar
Yuxuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Tingxin Cui
View author publications
You can also search for this author in PubMed Google Scholar
Lixue Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wai Cheng Iao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Carol Y. Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Youjin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Lai Wei
View author publications
You can also search for this author in PubMed Google Scholar
Iat Fan Lai
View author publications
You can also search for this author in PubMed Google Scholar
Xinping Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jingchang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhonghao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Mao
View author publications
You can also search for this author in PubMed Google Scholar
Huijing Ye
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Huasheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Danping Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Lin
View author publications
You can also search for this author in PubMed Google Scholar
Wei-shi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ruixuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Yu-Wai-Man
View author publications
You can also search for this author in PubMed Google Scholar
Feng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qionghai Dai
View author publications
You can also search for this author in PubMed Google Scholar
Haotian Lin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.L., Q.D. and F.X. are the corresponding authors. H.L., Q.D. and F.X. contributed to the conceptualization; Y.Y., J.L., R.W. and Q.W. contributed to the methodology; Y.Y. and R.W. conceived and designed the clinical validation and the empirical investigation; X.X., W.C., S.B., J.M., K.M., X.L., H.Y., D.H. and X.L. collected data; J.L. and Q.W. developed the code and algorithm; Y.X., Y.L., D.Z., Z.D., X.W., T.C., L.L., W.C.I., X.Y., J.C., Z.W., Z.M., H.Y. and W.X. participated in the clinical validation; Y.Y., J.L. and L.Z. performed the analysis; Y.Y., J.L. and R.W. contributed to the original draft preparation; Y.Y., J.L., R.W., F.X., H.L., L.Z., C.C., J.Z., Y.H., L.W., I.F.L., W.Z., R.W. and P.Y.W.M. contributed to the review and editing. All authors discussed the results, commented on the manuscript and approved the final manuscript for publication.

Corresponding authors

Correspondence to Feng Xu, Qionghai Dai or Haotian Lin.

Ethics declarations

Competing interests

Zhongshan Ophthalmic Center and Tsinghua University have filed for patent protection for H.L., F.X., Y.Y., R.W. and J.L. for work related to patient privacy protection method. All other authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Charlotte Tschider and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Michael Basson, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Table 1 Characteristics of participants in the clinical trial

Full size table

Extended Data Table 2 Primary outcomes of the diagnostic comparison in the clinical trial

Full size table

Extended Data Table 3 Secondary outcomes of the diagnostic comparison in the clinical trial

Full size table

Extended Data Table 4 Prospective evaluation of the hypotheses included in questionnaire in the empirical investigation

Full size table

Extended Data Table 5 Performance of the face recognition systems in AI-powered reidentification validation

Full size table

Extended Data Table 6 Characteristics of the training datasets

Full size table

Supplementary information

Supplementary Information

Supplementary Note and Supplementary Table 1.

Reporting Summary

Supplementary Video

A brief introduction to the DM.

Supplementary Data

(1) The ophthalmologist’s diagnosis from the original videos, the ophthalmologist’s diagnosis from the DM-reconstructed videos and the normalized pixel error of the DM-reconstructed videos in relevant diagnosis comparisons of digital mask. (2) The questionnaire details of each patient in the empirical investigation of the DM.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, Y., Lyu, J., Wang, R. et al. A digital mask to safeguard patient privacy. Nat Med 28, 1883–1892 (2022). https://doi.org/10.1038/s41591-022-01966-1

Download citation

Received: 17 January 2022
Accepted: 25 July 2022
Published: 15 September 2022
Issue Date: September 2022
DOI: https://doi.org/10.1038/s41591-022-01966-1

This article is cited by

Coordinate-wise monotonic transformations enable privacy-preserving age estimation with 3D face point cloud
- Xinyu Yang
- Runhan Li
- Jing-Dong J. Han
Science China Life Sciences (2024)
Reply to: Concerns about using a digital mask to safeguard patient privacy
- Yahan Yang
- Junfeng Lyu
- Haotian Lin
Nature Medicine (2023)
Concerns about using a digital mask to safeguard patient privacy
- Matthieu Meeus
- Shubham Jain
- Yves-Alexandre de Montjoye
Nature Medicine (2023)
Early detection of visual impairment in young children using a smartphone-based deep learning system
- Wenben Chen
- Ruiyang Li
- Haotian Lin
Nature Medicine (2023)

Subjects

Abstract

Similar content being viewed by others

Main

Results

The workflow of the DM

Quantitative evaluation of the DM

Clinical validation of DM

AI-powered re-identification validation of the DM

Discussion

Methods

Ethical approval

DM technique

Face reconstruction

Eyelid reconstruction

Eyeball reconstruction

Sequence consistency

Network training

Deformation transfer for eyebrow movements

Definitions of pathological ocular manifestations for the clinical evaluation

Statistical analysis

Algorithm efficiency

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links