## Introduction

Breast reconstruction after mastectomy has been more widely used in breast cancer patients1. Breast reconstruction has the advantage of providing physical and psychological relief to patients who underwent breast cancer surgery2. Aligned with the prolonged life expectancy due to the early detection and treatment advances in breast cancer3,4, cosmetic satisfaction achieved by breast reconstruction after mastectomy is important for patients. Particularly, most breast cancer patients require radiation therapy (RT) after surgery, and RT is known to cause capsular contracture or deformity of the reconstructed breast. Therefore, cosmetic evaluation of breast reconstruction patients who underwent RT is more necessary. Since satisfactory cosmetic outcomes after breast cancer surgery can lead to a better quality of life (QOL), physicians are concerned about improving cosmetic results as well as clinical outcomes.

To evaluate cosmetic outcomes, there are a few established criteria. The Breast-Q questionnaire is a validated tool for measuring health-related QOL and satisfaction in patients with breast reconstruction5,6,7, albeit measurement is based on a subjective evaluation. The medical photographs taken according to standardized guidelines can be used for evaluation8, but breast photos cannot be an objective indicator due to the potential judgement bias by clinicians. To gain objectivity in cosmetic evaluation, several methods have been suggested, including breast retraction assessment (BRA) and Breast Cancer Conservative Treatment cosmetic results (BCCT.core) software9,10. BRA measures the distances between sternal notch-nipples and nipples-breast outline, which does not reflect skin alteration or scar problems11. The BCCT.core program automatically evaluates medical photographs of the patient and has been validated in several studies10,12,13. The BCCT.core software is designed for evaluating four categories of cosmesis: excellent, good, fair, and poor12. However, medical photography taking naked upper body might cause uncomfortable feelings for patients. Further, the BCCT.core software has limitation with regard to the lack of 3-dimentional (3D) volume information14.

Recently, deep learning methods have been applied in medical areas for anomaly detection based on training normal images15. Generative adversarial network (GAN) is a type of neural computational network model for two networks training simultaneously16. The final GAN-based anomaly model could capture abnormal features from new images based on the trained normal images17, and several studies validated its feasibility18,19. Compared to BCCT.core software, the GAN-based approach for detecting anomalies from computed tomography (CT) images did not cause additional discomfort for patients. In addition, continuous and numerical measurement of AS could make it possible for patients or clinicians to evaluate cosmetic outcomes given that AS is defined as the sum of loss of images and loss of features between normal images and reconstructed images.

Thus, the purpose of the current study is to develop GAN-based model that can generate AS for assessment of cosmetic results from mastectomy patients who underwent immediate reconstruction and to investigate its implication with regard to clinical factors.

## Results

### Patient characteristics

We retrospectively reviewed 61 breast cancer patients who underwent mastectomy followed by immediate reconstruction and adjuvant RT. Among the 61 patients, 39 (64.0%) received total mastectomy (TM)/radical mastectomy, 16 (26.2%) did nipple-sparing mastectomy (NSM), and 6 (9.8%) did skin-sparing mastectomy (SSM). As for the type of reconstruction, 47 used autologous tissue and 14 had tissue expander. Of all, 53 patients (86.9%) received conventional fractionated RT, and 8 (13.1%) received hypofractionated RT. Majority of patients received no neoadjuvant chemotherapy (N = 43, 70.5%), meanwhile, substantial patients received adjuvant chemotherapy (N = 39, 63.9%). Radiotherapy was delivered by using 3-dimensional (3D) conformal (N = 48, 78.7%) or intensity modulated radiation therapy (IMRT) technique (N = 13, N = 21.3%). Patients with Body mass index (BMI) ≤ 23 and those with BMI > 23 were distributed well. Time interval between pre-RT CT and Post-1Y CT was 417 days [interquartile range (IQR), 343–512 days], and interval between pre-RT CT and Post-2Y CT was 803 days (IQR, 741–951 days). AS of Pre-RT, Post-1Y and Post-2Y were 1.99 (range, −0.65 to 19.40), 2.92 (range, −0.16 to 11.50) and 2.94 (range, −0.36 to 14.35), respectively. Detailed characteristics of the study population are summarized in Table 1.

### Distribution of anomaly score

The f-AnoGAN model was trained with 3D-reconstructed images from RT planning CT images in patients receiving breast conserving therapy. Then, 3D-reconstructed from CT images in patients who received mastectomy and immediate reconstruction were collected in a time-series manner at Pre-RT, Post-1Y and Post-2Y time points. The developed f-AnoGAN model generated AS. Graphical representation of this process is depicted in Fig. 1A. Afterward, we compared the distribution of AS among time points (Fig. 1B). We found that AS of Pre-RT was significantly different compared to Post-1Y (P = 0.015) and Post-2Y (P = 0.011). There was no significant difference in AS between Post-1Y and Post-2Y (P = 0.980).

At the patient level, we classified the four patterns of change in AS. The decreasing trend of AS was shown in patients who received NSM with immediate transverse rectus abdominis muscle (TRAM) flap reconstruction (Fig. 2A), implying better cosmetic outcome. The increasing trend of AS was found in patients who received TM and immediate reconstruction with tissue expander insertion (TEI) (Fig. 2B). We also found that the increasing-decreasing (Fig. 2C) and decreasing-increasing (Fig. 3D) trend of AS in patients with SSM with TRAM and NSM with TEI, respectively. Overall, NSM/SSM and TRAM seem to be related with lower anomaly score than TM and TEI.

With a median follow-up of 28.3 months, we found 14 major complication events (16.4%) in the study population. Patients who experienced major complication demonstrated no significant AS changes among three time points (Fig. 3A), however, they showed significantly higher AS than those without major complication at pre-RT (P = 0.016, Fig. 3B). We observed that there was differential time effect in patients with autologous reconstruction after mastectomy. Compared with Pre-RT AS, Post-1Y and Post-2Y AS were significantly elevated in autologous-reconstructed patients (Fig. 3C). However, there was no specific trend of AS in patients with TEI. TEI showed significantly higher AS than autologous reconstruction at Pre-RT (P = 0.008) and Post-2Y (P = 0.010) (Fig. 3D). However, there are no difference in AS between TEI and autologous reconstruction at Post-1Y.

### Longitudinal analysis

We used a multivariable LME model to analyze time-effect on AS considering potential confounding factors. The results are listed in Table 2. Multivariate LME analysis revealed that day after RT (β = 0.004, P = 0.007) and TEI (β = 2.223, P = 0.015) were significant factors to be associated with AS. ﻿Meanwhile, LME model showed no statistically significant interaction effect of reconstruction type (P = 0.563) and RT fractionation (P = 0.389) over time. We found that age had marginally positive correlation with AS (β = 0.073, P = 0.078), but other variables including mastectomy type (P = 0.627), RT to SCN (P = 0.154), RT to IMN (P = 0.840), boost RT (P = 0.295), RT plan (P = 0.865), major complication (P = 0.930), and BMI (P = 0.364) have no significant impact on the change of AS.

Based on established LME model, we predicted the change of AS according to RT fractionation and reconstruction type in time-dependent manner. As shown in Fig. 4A, there was no significant difference in change of AS between patients who received hypofractionated and those who treated with conventional fractionated RT across all time points (P = 0.389). In a meanwhile, we observed significant difference in change of AS in all time points between TEI and autologous reconstruction (Fig. 4B). The gap widened over time, suggesting a consistently better cosmesis of autologous reconstruction compared with TEI after RT: Contrast = 2.2 (P = 0.015), 2.3 (P = 0.008), 2.4 (P = 0.005), 2.5 (P = 0.004), and 2.6 (P = 0.004) at 0, 180, 360, 540, and 720 days after RT, respectively.

## Discussion

The aesthetic results with respect to treatment of breast cancer is important for patient’s quality of life. This led to the advancement of techniques for breast-conserving surgery and oncoplastic breast surgery. Further, reconstruction after mastectomy in breast cancer patients is increasing. According to the registry of Korean Breast Cancer Society, the number of patients who underwent breast reconstruction surgery between 2002 and 2013 tripled1. Cosmetic evaluation for breast can be largely divided into subjective and objective considerations. As for the subjective assessment, BREAST-Q questionnaire is designed to measure the patient’s satisfaction with breast and psychosocial/physical well-being6,7. A prospective multicenter cohort study by Jagsi et al. utilized BREAST-Q questionnaire to evaluate patient-reported satisfaction in patients who received postmastectomy RT (PMRT) and reconstruction surgery. Also, there is the modified Garbay scale to assess the aesthetic results by rating 5 subscales: volume, shape, placement of breast, inframammary fold, and scars. However, limitation exists in that inter-rater agreement is low, even when performed by experts20. As an objective indicator for physicians, BCCT.core software using medical 2D photographs has been largely used due to the its reproducibility and reliability in terms of aesthetic assessment10,11,12,13. As mentioned earlier, it may cause psychological discomfort to patients when medical photographs are taken with their tops off. Moreover, it only uses a frontal view of photography, which entailed lack of volume information for processing the software. In a mean time, recently, anomaly detection using deep learning has evolved in oncology area, particularly screening and detection of cancer. Several studies facilitated GAN-based anomaly detection to diagnoses anomalous lesions in ultrasound images of breast16 and digital breast tomosynthesis17. Myung et al.21 newly published machine learning approaches for predicting complication in reconstructed breast cancer patients, though it did not provide cosmetic evaluation. To our knowledge, no research has been found to evaluate cometic outcome using by the GAN-based approach and its association with major complication after breast reconstruction and PMRT. Our study provides a novel information of GAN-based cosmetic evaluation for patients with breast reconstruction.

The present study analyzed 61 breast cancer patients who underwent immediate reconstruction with PMRT using f-AnoGAN algorithm. We generated AS that can detect abnormalities of reconstructed breast and quantify them as a continuous numerical measurement. Moreover, we used CT images achieved in regular follow-up imaging, which indicates no possibility of additional discomfort for patients. We found pre-RT AS was significantly higher in patients with major complications, in a mean time, reconstruction with autologous tissue showed lower AS than TEI. Also, we observed the day after RT and TEI were significant factors to predict AS, while RT fractionation showed no interaction on AS. LME model also revealed that autologous reconstruction had better cosmetic outcomes after RT in all time points, while hypofractionation and conventional fractionation showed no significant difference in change of AS. These results were consistent with other existing research results. Ho et al. reported TEI had more disadvantages of long-term complications such as rupture and capsular contracture22. In review articles by See2 and Yun23, immediate TEI showed relatively higher risk of complications compared to autologous reconstruction. In particular, capsular contracture was most observed complication in patients with PMRT after TEI. Regarding the RT fractionation, Kim et al. revealed that hypofractionation appeared to have comparable breast-related complications in patients with reconstruction compared with conventional fractionation24,25. Since complication itself could cause poor cosmetic outcomes5, results of these studies could be comparable with our findings. Thus, we found that AS detected by the f-AnoGAN deep learning mode could be feasible in evaluating the cosmetic outcomes.

For the analysis of longitudinal data, we established LME model to incorporate time variable. Since LME models are well suited for analysis of longitudinal data, we tested whether days after RT impacted on cosmetic outcomes considering other covariates. We identified that days after RT were associated with increased AS significantly, in contrast, age was not related with increased AS. This suggested the possibility of the chronic impact on cosmetic outcome induced by RT. Previous studies revealed that the timing of breast reconstruction, the reconstruction type, RT techniques (3D conformal vs. intensity-modulated RT), and RT fractionation could affect aesthetic satisfaction after breast reconstruction2,5,22,26,27. Also, there were several studies reporting that higher BMI might increase the complication after breast reconstruction28,29. However, reconstruction type, RT fractionation, either interaction with days, and BMI were not related with AS in current study. Instead, we found that patients with tissue expander demonstrated higher AS than those with autologous consistently across time after RT. Further difference between them has been widened with passage of time. This finding is consistent with the results of a systemic review and meta-analysis that autologous reconstruction yields better satisfactory breast and overall outcomes30.

There are several limitations. The current study is based on small dataset retrospectively collected from one institution, entailing an inherent bias for patient selections. For example, in terms of implant, only patients reconstructed with TEI were evaluated, not with permanent implant. Although mastectomy reconstruction with permanent implant is performed recently in our institution, those cases were excluded from the study population due to the short follow-up period less than 2 years. Long-term follow-up more than 5 years may be needed to consolidate the feasibility of AS. AS in patient who completed TEI followed by permanent implant was not evaluated in current study. Nevertheless, we used a relatively large number of normal data images, training the f-AnoGAN model that could differentiate anomaly from normal well. Since AS is generated as continuous value by the f-AnoGAN model, this score system might complement the categorical results from BCCT.core software. For benchmarking, the BCCT.core program cannot be used since it is not currently available on the official website. Also, because cosmetic evaluation using f-ANOGAN in our study is a completely novel method, it was difficult to find other deep learning models to compare with. Therefore, we plan to verify the clinical usefulness of the AS detected by f-ANOGAN through a large multicenter study.

Taken together, our findings might be helpful for physicians to evaluate cosmetic outcomes using regular follow-up CT images in patients who received mastectomy and immediate reconstruction. Importantly, AS should be validated in prospective study settings.

## Methods

This study is reviewed and approved by Seoul National University Bundang Hospital institutional review board (Approval number: B-2102–667–111). All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. Due to the retrospective study, the requirement for informed consent from participants was exempted.

### Subjects and data preprocessing

To train the GAN model with normal breast image data, altogether 251 breast cancer patients who underwent breast-conserving surgery and then who received post-operative RT were collected. As an evaluation dataset, we retrospectively reviewed 61 breast cancer patients who underwent mastectomy followed by immediate breast reconstruction with autologous tissue or tissue expander. In each patient, three time point of CT images were collected: CT simulation before RT (Pre-RT), 1 year after RT completion (Post-1Y), and 2 years after RT completion (Post-2Y). In order to preprocess data for using fast anomaly generative adversial network (f-AnoGAN), we reconstructed each patient’s CT Digital Imaging and Communications in Medicine (DICOM) images into 3-dimsional (3D) volume. Isosurface of 3D volume was generated with positioning of breast or chest wall as front-forward. Color of background and isosurface of volume was chosen to white. Then, image size was resized to 500 × 500 pixels. DICOM import, 3D reconstruction, isosurface acquisition, and image preprocessing were performed by using MATLAB 2021a ﻿(The MathWorks Inc, United States).

### Fast anomaly generative adversial network (f-AnoGAN) and calculation of anomaly score

We used the published the f-AnoGAN algorithm18 to develop the GAN-based model. The f-AnoGAN was intended for anomaly GAN to be performed fast. The network is composed of generative model, an encoder mapping new data to the latent space, and discriminator detecting anomalies. The f-AnoGAN is characterized to replace iterative mapping process with a learned mapping process from image to latent space, dramatically improving speed. Therefore, the f-AnoGAN technique exceedingly improved the process speed compared with other anoGAN algorithms.

We followed the methodology of the original study18, which is summarized here as follows:

We trained the generator network G and the discriminator network D to train the Wasserstein GAN, establishing a latent representation of normal breast images.

$$\mathop {{\min }}\limits_G \mathop {{\max }}\limits_D E_{X \sim Pr}\left[ {\log D\left( X \right)} \right] + E_{X\prime \sim Pg}\left[ {\log \left( {1 - D\left( {X^\prime } \right)} \right)} \right]$$
(1)

where $$X^\prime$$ is the generated instance of G(z) and the z is the learned latent feature. The G(z) is able to generate image $$X^\prime$$ from z: $$z \to X^{\prime}$$. However, the representation within the latent space for a given image is unknown. Thus, the encoder network (E) is required to map images to the latent space, E(X) = Xz. To find the best z corresponding to given image X, we trained an encoder based on izif architecture suggested in original study. The loss function of izi encoder training, izif, is defined as follows:

$$Loss_{izi} = Loss_{images} + k \ast Loss_{features}$$
(2)
$$Loss_{images} = \frac{1}{n}\left\| {X - X{^{\prime}} } \right\|^2$$
(3)
$$Loss_{features} = \frac{1}{m}\left\| {f\left( X \right) - f\left( {X^\prime } \right)} \right\|^2,$$
(4)

where k is the weighting factor. Lossimages is the mean squared error (MSE) loss between real image X and the reconstructed image $$X^{\prime}$$, and Lossfeatures is the discriminator feature space loss based on the activation of the intermediate layer of D and m dimensionality of intermediate feature. Finally, AS was calculated by weighted sum of discriminator feature residual error and an image reconstruction error as follows:

Given an image X,

$${{{\mathrm{Anomaly}}}}\;{{{\mathrm{Score}}}} = \left( {1 - \lambda } \right) \ast {{{\mathrm{R}}}}\left( {\mathrm{X}} \right) + {\uplambda} \ast {{{\mathrm{D}}}}\left( {{{\mathrm{X}}}} \right)$$
(5)

where λ represents a weight coefficient, R(X) represents reconstruction loss between X and corresponding image in latent space, and D(X) represents dissimilarity features from discriminator. In current study, λ was determined to be 0.95.

We adopted publicly available implementation code from repository (https://github.com/A03ki/f-AnoGAN). Hyperparameters for training was determined as follows: Optimizer = Adam, learning rate = 0.001, batch size = 32, the number of epoch = 7000, dimension of latent space = 128, input image size = 256 by 256, β1 = 0.5, β2 = 0.999. The training and testing were performed with a GeForce GTX 1080Ti graphics processing unit (NVIDIA, Santa Clara, CA, USA). The higher AS indicates worse cosmetic outcomes.

The process of developing f-AnoGAN model and generation of AS is visualized in Fig. 1. Firstly, CT at the time of RT planning was acquired, reconstructed into 3D volume, and its frontal surface was captured as 2D image. Then, these 2D images were used to train f-AnoGAN model to differentiate other images and scoring the degree of anomaly. After training f-AnoGAN model, we applied this model for patients with reconstruction who received adjuvant RT. In a time-series manner, we acquired CT images from those patients: at Pre-RT, Post-1Y, and Post-2Y. AS was generated as standardized z-score based on the model trained by normal breast image data.

### Statistical analysis

To compare AS among groups (Pre-RT, Post-1Y, and Post-2Y), we performed paired t-test. Complication events included hematoma, wound infection, wound dehiscence, reconstructive flap necrosis, flap contracture, fat necrosis, capsular contracture, implant leakage/rupture/deflation, breast pain, and breast lymphedema. Among these, major complication was defined as any event which needs reoperation or rehospitalization. This definition is consistent with previous papers24,25. We compared AS among three time points (Pre-RT, Post-1Y, Post-2Y) stratifying major complication and type of reconstruction. For longitudinal analysis, a linear mixed-effect model (LME) was employed for analyzing time-effect on the change of AS. AS was response variable, and patient identifiers were random effect in the model. Model covariates included type of reconstruction, days after baseline, RT fractionation scheme, type of mastectomy, irradiation to supraclavicular node (SCN), irradiation to internal mammary nodes (IMN), boost RT, RT modality, the presence of major complication, age, and body mass index (BMI, kg/m2). These variables were selected as potential factors related with cosmetic results, based on previous studies22,24,25,31. Using LME model with those variables, we investigated longitudinally interactive relationships of type of reconstruction and RT fractionation scheme with respect to AS. All statistical tests were performed, and residual plots were depicted by using STATA/MP version 15.0 (StataCorp, College Station, TX). Bar graphs were generated by PRISM version 9.1.1.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.