Association between the nucleosome footprint of plasma DNA and neoadjuvant chemotherapy response for breast cancer

Gene expression signatures have been used to predict the outcome of chemotherapy for breast cancer. The nucleosome footprint of cell-free DNA (cfDNA) carries gene expression information of the original tissues and thus may be used to predict the response to chemotherapy. Here we carried out the nucleosome positioning on cfDNA from 85 breast cancer patients and 85 healthy individuals and two cancer cell lines T-47D and MDA-MB-231 using low-coverage whole-genome sequencing (LCWGS) method. The patients showed distinct nucleosome footprints at Transcription Start Sites (TSSs) compared with normal donors. In order to identify the footprints of cfDNA corresponding with the responses to neoadjuvant chemotherapy in patients, we mapped on nucleosome positions on cfDNA of patients with different responses: responders (pretreatment, n = 28; post-1 cycle, post-3/4 cycles, and post-8 cycles of treatment, n = 12) and nonresponders (pretreatment, n = 10; post-1 cycle, post-3/4 cycles, and post-8 cycles of treatment, n = 10). The coverage depth near TSSs in plasma cfDNA differed significantly between responders and nonresponders at pretreatment, and also after neoadjuvant chemotherapy treatment cycles. We identified 232 TSSs with differential footprints at pretreatment and 321 after treatment and found enrichment in Gene Ontology terms such as cell growth inhibition, tumor suppressor, necrotic cell death, acute inflammatory response, T cell receptor signaling pathway, and positive regulation of vascular endothelial growth factor production. These results suggest that cfDNA nucleosome footprints may be used to predict the efficacy of neoadjuvant chemotherapy for breast cancer patients and thus may provide help in decision making for individual patients.


INTRODUCTION
Noninvasive tests offer a number of compelling advantages, and liquid biopsies have been developed as a valuable tool over the past decade, in particular for chromosomal aneuploidy screening and companion diagnostic testing. Blood is generally the easiest specimen type to work with. In peripheral blood, testing may target circulating tumor cells; circulating cell-free DNA (cfDNA), which in cancer patients contains circulating tumor DNA (ctDNA); circulating cell-free RNA (cfRNA); or circulating extracellular vesicles (EVs), such as exosomes, tumor-educated platelets, proteins, and metabolites 1,2 . The concentration of cfDNA is relatively high and stable in blood, and cfDNA has therefore become a widely used analyte in liquid biopsy. cfDNA is derived mainly from apoptotic and necrotic cells of primary tumors, circulating tumor cells, and normal cells 3,4 and is usually bound to mononucleosomes rather than present as free DNA 1,5 . ctDNA makes up only a small proportion of the total plasma cfDNA 6 , requiring a large volume of plasma and sensitive detection methods, and cannot be used to detect cancer when there is a low ctDNA:cfDNA concentration ratio or no mutation 7 .
In eukaryotes, nucleosomes are repeating units of chromatin that are thought to strongly affect gene expression 8,9 . A nucleosome-free region (NFR) or a nucleosome-depleted region (NDR) is usually present in the transcriptionally active core region of the gene promoter 9 . Nucleosome positioning relative to transcription start sites (TSSs) is directly correlated with RNA polymerase II (Pol II) binding, and genome-wide maps exhibit differential nucleosome positioning in active and silent genes 10 . Nucleosomes consist of 145-147 bp DNA segments wrapped around a histone octamer composed of two molecules each of the four core histone proteins (H2A, H2B, H3, and H4), cemented to the nucleosome surface by an additional~20 bp DNA (linker DNA) 11 . cfDNA fragments are around 166 bp in length 12 , which corresponds to the nucleosome DNA plus linker DNA.
In previous studies 13,14 , deep genome-wide sequencing of circulating cell-free DNA enabled identification of maps of nucleosome occupancy that provide a direct footprint of transcription factor occupancy. In addition, nucleosome footprint patterns in cell-free DNA are often specific to a type of cancer 13 . The presence or absence of nucleosomes in the TSS region of cfDNA results from expressed or silent genes in origin tissue and thus can be used to predict gene expression. Peter et al. determined that several TSSs matched with the expressed isoforms of genes from metastatic primary tumors 14 , and this result has been confirmed in individuals of different ages 15 .
Conventional gene expression profiling may be used to predict prognosis and guide treatment in the early stages of breast cancer. Five multigene expression testing techniques were included in the guidelines for breast cancer published by the National Comprehensive Cancer Network 16 and the American Joint Committee on Cancer 17 in 2018: the Oncotype DX 21-gene assay, the Mamma Print 70-gene assay, the Endo-Predict 12-gene assay, and the PAM 50 (Prosigna) and Breast Cancer Index tests. However, these tests are usually performed using tissue biopsies, which require invasive surgery and cannot capture the entire genomic landscape of breast tumors.
In the present study, to explore relationships among the cfDNA nucleosome profile, intracellular nucleosome positioning, and gene expression, we used breast cancer cell line supernatant to mimic plasma cfDNA and sequenced it using next-generation sequencing technology. Simultaneously the cell particles were subjected to MNase sequencing and mRNA sequencing. We analyzed correlations by comparing the nucleosome footprint profiles 1 kb upstream and 1 kb downstream of TSSs, in particular at the exact positions of TSSs. Furthermore, plasma cfDNA footprint profiles were characterized by low-coverage sequencing, and differences in profiles in TSS-adjacent regions were analyzed between healthy individuals and patients and between responders and nonresponders to neoadjuvant epirubicincyclophosphamide-docetaxel chemotherapy. Finally, we used plasma collected before and after treatment to assess correlations between cfDNA footprint profiles and response to breast cancer treatment and to examine changes in the footprint associated with treatment.

RESULTS
A study flowchart that includes analytical methods is shown in Fig. 1.
The cfDNA nucleosome footprint reveals intracellular nucleosome positioning and gene expression To determine whether the cfDNA profile reflected intracellular nucleosome positioning and predicted gene expression, we performed cfDNA whole-genome sequencing of cell supernatant as well as MNase sequencing and mRNA sequencing of the MDA-MB-231 and T-47D cell lines, respectively. We analyzed the cfDNA sequencing library using a 2100 Bioanalyzer, and the lengths of the inserted DNA fragments from cell supernatant and from the cell genome digested by MNase were~166 bp and~146 bp, respectively (ligation to~90 bp adapter DNA; Supplementary Fig.  S1), which is consistent with previous reports 14 . Next we analyzed chromosome 12p11.1, a 76 kb region containing more than 400 nucleosomes with strong positioning properties by cfDNA-seq and MNase-seq for cell line supernatant and plasma cfDNA from 50 breast cancer patients. The cfDNA read depth map showed a crest pattern whose position was highly correlated with that found in the MNase map, in particular in plasma DNA (Fig. 2).
We also screened highly expressed genes (TPM > 10) and unexpressed genes (TPM = 0) in these two cell lines using mRNAseq, then analyzed the genes' sequence coverage depth around TSSs using MNase-seq and cfDNA-seq. The results of MNase-seq showed that the sequence coverage depth around TSSs was significantly lower for highly expressed genes than for unexpressed genes (Fig. 3a, b). Analyses of the matching sequence coverage depth around TSSs of the cfDNA showed the same phenomenon, with a significant decrease in coverage depth at the TSS site (Fig. 3c, d).
Permutation tests to estimate the overlaps between cell-free HTSSs (high coverage depth around TSSs) or LTSSs (low coverage depth around TSSs) and intracellular HTSSs or LTSSs revealed significant enrichment for HTSSs pairs (p < 10 −22 ) and LTSSs pairs (p < 10 −22 ). This significant enrichment was not observed in overlaps between cell-free HTSSs and intracellular LTSSs or cell-free LTSSs and intracellular HTSSs (Fig. 3e, g). These findings suggest that cfDNA-seq of the cell supernatant can reveal intracellular nucleosome positioning. Similar permutation tests were performed to estimate the enrichment of overlaps between cell-free HTSSs or LTSSs and HEGs (highly expressed genes) or LEGs (lowly expressed genes). The results revealed a pattern opposite to that found between cell-free and intracellular TSSs. The LTSSs of cfDNA were highly consistent with the corresponding HEGs (p = 1.981 × 10 −05 ), and the HTSSs of cfDNA were highly consistent with the LEGs (p = 3.156 × 10 −10 ). However, this phenomenon was not observed in the HTSSs of cfDNA and the corresponding HEGs or in the LTSSs of cfDNA and the corresponding LEGs (Fig. 3f, h).
Breast cancer patients' cfDNA coverage is related to gene expression in breast cancer cells We sequenced the circulating cell-free DNA from plasma collected from 85 healthy individuals and 85 breast cancer patients and compared it to cfDNA collected from the supernatant of the breast cancer cell lines. Correlation analyses showed that the gene sequence coverage depth near cfDNA TSSs in cell lines was positively correlated with nucleosome positioning assessed by MNase-seq and negatively correlated with gene expression assessed by mRNA-seq (Fig. 4a). The cfDNA pattern from 85 breast cancer patients was the same as that of the two breast cancer cell lines (Fig. 3a-d). It is interesting that this pattern was more obvious in the cfDNA from breast cancer patients, with a lower high-expression gene sequence coverage depth near TSSs in the entire TSS ± 1 kb region, whereas coverage was lowest at the TSS site (NFR; Fig. 4b).
Then we performed permutation tests to analyze whether these breast cancer-specific TSSs identified from the plasma cfDNA were expected based on expressed genes from the TCGA breast cancer data. We observed enrichment for lower coverage depth near TSSs in breast cancer patients compared to healthy donors for highly expressed genes in primary tumor tissue compared to the adjacent breast tissue (TCGA; p = 0.0030) but not for HTSSs of highly expressed genes (p = 0.0039; Fig. 4c). However, this effect was not significant when we compared coverage depth for lowly expressed genes in primary tumor tissue (TCGA) near TSSs in breast cancer patients versus healthy donors. This may be because of the difference between cfDNA from healthy individuals and from tumor-adjacent breast tissue.
Different TSSs in pretreatment cfDNA between patients with breast cancer and healthy individuals We compared sequence coverage depth around cfDNA TSSs between patients with breast cancer and healthy donors. Technical reproducibility was evaluated using six samples, and the distance of each three technical replicates of the same sample was closer than those from different samples based on PCA analysis. (Supplementary Fig. 2). Among a total of 32444 tested  Table S1a). Hierarchical clustering analyses showed an obvious separation of patients with breast cancer from healthy donors (Fig. 5a).
To assess the ability of coverage at TSS regions to classify individuals into cancer and healthy, we constructed LASSO classifier and repeated fivefold cross validation for 100 times to prevent biases. And we recollected 60 patients' and 30 healthy donors' plasma in an independent validation test. High values of the area under curve (AUC, median: 0.863 in training cohort; and 0.834 in validation cohort) were observed using receiver operating characteristic (ROC, Fig. 5b, c). The significantly different genes were related mainly to regulation of cell adhesion, positive regulation of cell death, etc (Fig. 5d), and the related genes were listed in Supplementary Table S1b.
Simultaneously, we compared TSS profiles between different tumor stage, ER status and molecular subtypes. And the results showed that the early stage (T1 and T2, Supplementary Fig. 3a; I and II, Supplementary Fig. 3b) and late stage (T3 and T4, Supplementary Fig. 3a; III and IV, Supplementary Fig. S3b) groups, ER positive and negative groups ( Supplementary Fig. 4a), and different molecular subtypes ( Supplementary Fig. 4b, c) could also be clustered into different groups, in particular luminal A vs triple negative subtype ( Supplementary Fig. 4d). We also found some different related genes, such as cell adhesion related genes (ITGBL1, RAPGEF4, ADGRL3, CDH18, DCAF6, CUTA) in late-stage cancer group (Supplementary Table S2a

Different TSSs in pretreatment cfDNA between responders and nonresponders
We compared sequence coverage depth around TSSs of cfDNA between responders and nonresponders at pretreatment. A total of 232 TSS regions (p < 0.01 and |log[fold change]| ≥ log1.5) differed significantly in the 28 responders compared to the 10 nonresponders: 100 TSSs had high coverage and 132 TSSs had relatively low coverage in responders (Supplementary Table S6a). Hierarchical clustering analyses showed an obvious separation of responders from nonresponders ( Fig. 6a). Gene functional annotation analyses revealed the top 13 pathways (Fig. 7    We further compared patients with pCR and npCR, and with low RCB (residual cancer burden, RCB; RCB 0 and I) and high RCB (RCB II and III) at pretreatment, and hierarchical clustering analyses also showed an obvious separation between them (Fig. 6b, c

Differently altered TSSs after neoadjuvant chemotherapy between responders and nonresponders
We analyzed paired plasma specimens before (pretreatment), during (post-1 cycle, post-3/4 cycles), and after (post-8 cycles) neoadjuvant chemotherapy to compare sequence coverage depth changes around TSSs in 12 responders and 10 nonresponders. The TSS regions of 321 genes (p < 0.01 and |log[fold change]| ≥ log1.5) were significantly differentially covered in responders: 93 of these genes' TSS regions were downregulated and 112 were upregulated during early treatment (post-1 cycle), and 66 of these genes' TSS regions were downregulated and 50 were upregulated during midtreatment (post-3/4 cycles), with stable coverage after treatment (post-8 cycles). Conversely, these genes were not altered throughout the treatment period in nonresponders. Functional enrichment analyses revealed the top 20 pathways (Fig. 8a). Note that these significantly different genes were related mainly to positive regulation of the acute inflammatory response (GO:0002675), the TGF-beta signaling pathway (hsa04350), regulation of the T cell receptor signaling pathway (GO:0050856), the nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (GO:0000184), and positive regulation of vascular endothelial growth factor production (GO:0010575) and proteoglycans in cancer (Fig. 8b). BRCA1 (breast cancer 1 early onset), HIC1 (HIC ZBTB transcriptional repressor 1), and HMGB1 (high mobility group box 1) were involved in several of these pathways. Three pathways were related to cellular response to organonitrogen localization, response to estrogen, and lactation. Individual genes with significantly different coverage from these pathways are listed in Supplementary Table S8a. It is interesting that the 205 genes that were significantly altered from pretreatment to early treatment (post-1 cycle) in responders mainly comprised breast cancer 1 early onset [BRCA1], erb-b2 receptor tyrosine kinase 4 [ERBB4], GRB2-associated binding protein 1  . These genes were involved in positive regulation of the acute inflammatory response, positive regulation of vascular endothelial growth factor production, response to estrogen cellular response to growth factor stimulus, regulation of the T cell receptor signaling pathway, and positive regulation of response to DNA damage (Supplementary  Table S8b). However, 116 genes that were significantly altered from pretreatment to mid-treatment (post-3/4 cycles) in the responders were involved mainly in the PID TGFBR pathway, nonsensemediated decay, and the establishment of mitotic spindle orientation (Supplementary Table S8c).

DISCUSSION
Previous studies have focused mainly on the relationship between ctDNA and cancer occurrence and development, relapse, metastasis, and drug resistance. ctDNA may be used as a biomarker for cancer screening, early diagnosis, individualized treatment, and prognostic evaluation based on the detection of CNVs 18 , mutations 4 , or methylation patterns 19 . However, the clinical utility of the cfDNA nucleosome footprint has not yet been fully confirmed. We provide new insight into the nucleosome footprint of plasma circulating cfDNA. Our work directly maps the nucleosome footprint of cell-free DNA.
To confirm the relationship between the cfDNA nucleosome footprint and gene expression, we performed correlation analyses among the nucleosome footprint of DNA in the cell supernatant, intracellular nucleosome positioning, and gene expression. The results showed that the length of cell supernatant DNA was similar to the length of DNA bound to the mononucleosome. Correlation  analyses also confirmed the relationships among TSS coverage in cell supernatant DNA, intracellular nucleosome positioning, and gene expression (Fig. 3f, h). As expected, the TSS region coverage of cell supernatant DNA was positively correlated with intracellular nucleosome positioning and negatively correlated with gene expression. Nucleosome positioning relative to transcription start sites is directly correlated with RNA Pol II binding 10 , and transcriptionally active gene promoters are characterized by the presence of a NFR or NDR in their core region 9 . Therefore, we may infer that nucleosome footprint changes in vivo lead to gene expression or silencing. DNA protected by nucleosomes is released into the bloodstream as cfDNA, which can be sequenced directly. It is interesting that cfDNA from patients is more closely related to gene expression in breast cancer cell lines than cell supernatant DNA, possibly because of the high similarity between bovine serum DNA in culture medium and human serum DNA or because of extracellular release without complete digestion, which affects analyses of cell supernatant DNA. A correlation between cfDNA and gene expression has been reported in previous studies 14 , but the gene expression data sets used were from public databases. In the current study, the correlation between cfDNA and nucleosome positioning and the correlation between cfDNA and gene expression were demonstrated more clearly as a result of the use of cell lines.
We also confirmed the association between the cfDNA nucleosome positioning in breast cancer patients and the expressed breast cancer-specific genes using TCGA breast cancer data (Fig. 4c). Parallel analysis with RNAseq and MNase-seq of the matched primary tumor and blood samples may facilitate the discovery of correlations between cfDNA nucleosome positioning and relevant gene expression with the nucleosome occupancy of the genes. cfDNA contains DNA from both normal and tumor tissues in patients with breast cancer, and studies have found cfDNA derived from tissue-specific and tumor-specific open chromatin regions (NFR or NDR) 20,21 . Because the fractions of tumor-and non-tumor cfDNA vary among different patients 4 , a limitation of our study is that we failed to consider the two fractions of normal and tumor DNA. Normalization of the tumor fractions may increase the cancer prediction accuracy. Another limitation of our study is that we failed to profile the nucleosome positioning of immune cells, which are the major components of non-tumor-derived normal DNA in patients with cancer 22,23 . MNase-seq of different immune cell types, as well as single-cell RNA sequencing of peripheral blood mononuclear and tumor cells, will help to further elucidate the contributions of tissues and the origins of cfDNA to better understand the complexity and heterogeneous nature of cfDNAs in patients with breast cancer.
For breast cancer, neoadjuvant chemotherapy is equivalent to postoperative treatment for breast cancer and is used to reduce tumor size, decrease tumor stage, prolong patients' DFS and OS, and conserve breast tissue 24 . The decision to perform neoadjuvant chemotherapy for breast cancer is based on molecular typing; some patients do not benefit from this type of therapy. The response to neoadjuvant chemotherapy is usually assessed only after several treatment cycles, which leads to wasted resources and overtreatment of patients. Thus, another aim of the present study was to explore whether a characteristic plasma cfDNA profile can be used to predict the efficacy of neoadjuvant chemotherapy. We first analyzed pretreatment specimens and identified a number of biologic pathways related to treatment response, including regulation of hippo signaling, necrotic cell death, intrinsic apoptotic signaling pathway in response to DNA damage, and positive regulation of reactive oxygen species biosynthesis. Using Affymetrix GeneChip detection technology to analyze biopsy tissue, Larissa et al. 24 identified biologic pathways related to docetaxel and capecitabine treatment, including spindle regulation and microtubule depolymerization, DNA repair, and cellular proliferation. Using PCR, Gianni and colleagues 25 identified 86 genes that correlate with responsiveness to neoadjuvant doxorubicin and paclitaxel; these genes were from functional categories that influence sensitivity or resistance to chemotherapy (i.e., apoptosis, invasion, metastasis, drug resistance/metabolism, proliferation, ER). Ayers et al. 26 built a 74gene model classifier to predict pathologic response to neoadjuvant T/FAC therapy, achieving high positive predictive value and specificity.
Although the genes identified in the present study are different from those in previous reports, they belong to several of the same pathways (e.g., necrotic cell death and intrinsic apoptotic signaling in response to DNA damage). This may be explained by the fact that different genes play a role in different body parts. We also identified a pathway related to the regulation of erythrocyte differentiation, which may have been due to the presence of peripheral blood cell DNA; its relationship with tumorigenesis is unknown. These results support the feasibility of predicting the efficacy of neoadjuvant chemotherapy prior to treating breast cancer.
We examined changes in TSS region coverage in plasma collected from breast tumors at different time points during neoadjuvant chemotherapy as well as their association with response to treatment. Examining changes throughout treatment may provide more information regarding patient responsiveness than analyzing static time points. Analyzing changes throughout treatment may facilitate the development of improved predictors of response and drug resistance.
These altered genes were associated mainly with positive regulation of the acute inflammatory response, the TGF-beta signaling pathway, regulation of the T cell receptor signaling pathway, and positive regulation of vascular endothelial growth factor production (Fig. 6b). BRCA1, HIC1, and HMGB1 were involved in several of these pathways. BRCA1 is a tumor suppressor gene whose structural and functional abnormalities are closely related to the incidence of breast cancer. It plays an important role in the regulation of cell cycle progression, DNA damage, the repair of cell growth and apoptosis, transcriptional activation and inhibition, and other biological pathways 27 . HIC1, a tumor suppressor gene, is epigenetically silenced in a variety of tumors, and deleting HIC1 might contribute to premalignant transformation in the early stages of tumor formation 28 . The HMGB family is a group of chromosomal proteins involved in DNA replication, recombination, transcription, and repair that is related to the progression of a variety of cancers, including colorectal cancer 29 , hepatocellular carcinoma 30 , and gastric cancer 31,32 . In recent years, an increasing number of studies have focused on changes in gene expression in serial biopsy tissue specimens. Genetic changes associated with prediction and prognosis involve the immune response [33][34][35] , cell proliferation 24,33-36 , apoptosis 34,35 , DNA repair 24 , and the antiinflammatory response 26 . These findings are consistent with those of the current study.
We also noted significantly altered genes in early treatment (post-1 cycle) in responders. These genes were involved in positive regulation of the acute inflammatory response, positive regulation of vascular endothelial growth factor production, response to estrogen, cellular response to growth factor stimulus, regulation of the T cell receptor signaling pathway, and positive regulation of response to DNA damage. Genes that were significantly altered mid-treatment (post-3/4 cycles) were involved in the PID TGFBR pathway, nonsense-mediated decay, and the establishment of mitotic spindle orientation. These results indicate that different gene changes occurred during chemotherapy treatment. It is also possible that gene expression was delayed for a period after nucleosomes were depleted at the transcription initiation sites 37 .
cfDNA in plasma comes from apoptotic cells and includes ctDNA released by tumor cells as well as DNA from peripheral blood cells and other tissue. Therefore, gene changes based on cfDNA analyses reflect not only tumor tissue but also the reactions of the blood system and other tissue in the body, such as immune cells. Another limitation of this study is its relatively small sample size. Therefore, we consider our analyses to be exploratory: Larger studies are required to validate our findings and confirm specific associations between molecular data and clinical outcomes.
In summary, we confirmed a correlation between the cfDNA nucleosome footprint profile in the region around TSSs and gene expression. We also found significantly different nucleosome footprint profiles in the region near TSSs in plasma cfDNA from healthy individuals versus patients with breast cancer and in plasma cfDNA from responders versus nonresponders before, during, and after a series of neoadjuvant chemotherapy treatment cycles. These genes were related to pathways involved in the  inhibition of cell proliferation, response to DNA damage, and immune response. These findings are expected to increase the feasibility of plasma cfDNA nucleosome profiling as a new biomarker for predicting the efficacy of neoadjuvant chemotherapy for breast cancer.

Cell culture
The  Tables 1 and 2. cfDNA sequencing A total of 1 mL peripheral blood was collected in EDTA tubes from each patient and immediately centrifuged for 10 min at 16,000 rpm, 4°C, and 500 µL plasma and cell supernatant was stored at −80°C before use, which yielded at least 1 ng total cfDNA for sequencing. cfDNA extraction from plasma and cell supernatant was performed with the QIAamp DNA Blood Mini Kit (Qiagen). We prepared a starting amount of approximately 1-5 ng DNA (three biological replicates per input for six samples) for library construction using the Life Sciences Ion Xpress™ Plus Fragment Library Kit, and we omitted the fragmentation step because of the degradation of plasma DNA. The number of PCR cycles was set to 12. Libraries were analyzed on a Bioanalyzer instrument (Agilent Technologies, Singapore) to observe the DNA size distribution. Sequencing was performed with the Ion PI™ Hi-Q™ OT2 200 Kit and the Ion PI™ Hi-Q™ Sequencing 200 Kit. Ten libraries were pooled together and subjected to 520 flow on the Ion Proton platform (ThermoFisher Scientific, USA), and 6-10 million reads were generated for each cfDNA sample.

MNase sequencing
Approximately 10 7 cells were prepared for nucleosome digestion and DNA extraction with the Active Motif Inc Nucleosome Preparation Kit. A total of 50 µL obtained chromatin and 2.5 µL working stock enzyme was incubated for 15 min at 37°C. The digested nucleosome samples were immediately used for the next step of DNA extraction. Approximately 100 ng DNA was used for library construction and sequencing. Libraries were analyzed on a Bioanalyzer instrument (Agilent Technologies) to observe the DNA size distribution. The kits and parameters used were the same as for cfDNA sequencing. Approximately 100 million reads were generated per sample by MNase-seq.

Gene expression sequencing (mRNA sequencing)
We extracted RNA from approximately 10 7 cell particles using TRIzol Reagent (Invitrogen, USA). The amount and quality of the RNA were assessed with a NanoDrop™ 8000 UV Spectrophotometer (Thermo Scientific, USA). We used 1 µg total RNA for mRNA purification using a Dynabeads™ mRNA DIRECT™ Purification Kit (Invitrogen). We prepared the purified product for library construction using the Ion Total RNA-Seq Kit v2. We quantified the concentration of mRNA using the Qubit™ 3.0 Fluorometer (Invitrogen). Experimental operations followed the RNA enrichment and library generation protocols provided in the manual. Two libraries were pooled together and subjected to 520 flow on the Ion Proton platform (ThermoFisher Scientific), and 30-40 million reads per sample were generated.

Sequencing read alignment and processing
For cfDNA sequencing and MNase sequencing, we aligned sequencing reads with the human reference genome (hg19) using TMAP and removed PCR duplicates using the SAMtools (version 1.9) rmdup function 39 . For mRNA sequencing, we aligned sequencing reads with the GENCODE human transcriptome (Release 30) using Salmon (version 0.13.1) 40 and used transcripts per million (TPM) to quantify the expression of each gene.

TSS profiles of cDNA and MNase sequencing
Gene information was obtained from RefSeq. For cfDNA sequencing data, we calculated read counts of regions ranging from -1k bp to +1k bp around TSSs using bedtools (version 2.17.0), then normalized them using the reads per kilobase per million mapped reads (RPKM) method to present cfDNA-based nucleosome occupancy. In the MNase sequencing data, only the nucleosome-depleted region (NDR; from -150 bp to +50 bp of the TSS) showed depleted coverage, so we used coverage depth of the NDR to quantify the nucleosome occupancy of each TSS. The total depth of each NDR was calculated with SAMtools (version 1.9).
Based on the cfDNA RPKM value of each TSS, we performed Wilcoxon rank sum tests (two-sided) to identify TSSs with altered cfDNA coverage between groups. TSSs with p < 0.01 and |log(fold change)| ≥ log1.5 were considered significantly changed. Hierarchical clustering was performed with the average linkage clustering algorithm, and heatmaps were plotted with the pheatmap package (version 1.0.12).
Correlation analyses of cfDNA-seq, MNase-seq, and RNA-seq data To evaluate concordance across the three platforms, we performed Spearman's rank correlation analyses for gene expression, MNase-based TSS nucleosome occupancy, and cfDNA-based TSS nucleosome occupancy profiles. Moreover, for MNase-seq and RNA-seq data, sequence coverage depth around TSSs was compared between highly expressed genes (TPM > 10) and unexpressed genes (TPM = 0), and the depth of each genomic site was calculated with SAMtools (version 1.9).
Furthermore, we performed permutation tests to estimate the significance of overlaps between gene expression profiles and TSS profiles (cfDNA-seq and MNase-seq, respectively). We computed the frequency of highly expressed genes (TPM > 10) with high nucleosome occupancy (genes in the upper quartile), unexpressed genes (TPM = 0) with high nucleosome occupancy, highly expressed genes with low nucleosome occupancy (genes in the last quartile), and unexpressed genes with low nucleosome occupancy. A null distribution was generated from 1,000 permutations. The distributions were then standardized based on z scores and used to compute two-sided p values to determine the significance of overlaps.
Similarly, we estimated the concordance between differentially expressed genes and genes with altered cfDNA TSS coverage in breast cancer patients. We filtered the list of differentially expressed genes using GEPIA (http://gepia.cancer-pku.cn) 41 , and genes with different cfDNA TSS coverage (p < 0.01 and |log[fold change]| ≥ log1.5) between breast cancer patients and healthy donors were selected. Permutation tests were performed to determine the significance of overlaps.

Technical reproducibility assessment
Technical reproducibility of TSS coverage between replicates using Principal Component Analysis (PCA).

Procedure of classifiers construction
Genes with significant differential TSS coverages were used to develop promoter profiling-based classifiers, and fivefold cross validation was used to randomly divide samples into training and validation sets and evaluate the performance. In the training set, the normalized read count of each TSS was discretized according to the optimal cut-off point before classifier construction. The optimal cut-off point of each promoter was defined as the maximum value of (sensitivity + specificity)/2 in the training sets. R package glmnet (version 2.0-16) was used to perform the least absolute shrinkage and selection operator (LASSO). Receiver operating characteristic (ROC) analysis was used to calculate area under curve (AUC) of the validation set using pROC (version 1.16.2) R package (version 3.5.1). The whole process was repeated 100 times.

Functional annotation and enrichment
We performed functional annotation and enrichment analyses using metascape (http://metascape.org) 42 . Fig. 8 Differentially altered TSSs between responders and nonresponders after neoadjuvant chemotherapy. a Heatmaps summarizing changes in TSS region coverage over time in two major molecular functional groups in responders (PR) and nonresponders (SD). Samples are ordered from left to right by patient identification for each stage. Red and blue represent relatively high and low coverage, respectively. Stage 1 represents pretreatment, stage 2 represents post-1 cycle, stage 3 represents post-3/4 cycles, and stage 4 represents post-8 cycles. b The top 20 significantly altered pathways after neoadjuvant chemotherapy in responders. Red and blue represent relatively high and low coverage, respectively.