Latency-associated DNA methylation patterns among HIV-1 infected individuals with distinct disease progression courses or antiretroviral virologic response

DNA methylation is one of the epigenetic modifications that configures gene transcription programs. This study describes the DNA methylation profile of HIV-infected individuals with distinct characteristics related to natural and artificial viremia control. Sheared DNA from circulating mononuclear cells was subjected to target enrichment bisulfite sequencing designed to cover CpG-rich genomic regions. Gene expression was assessed through RNA-seq. Hypermethylation in virologic responders was highly distributed closer to Transcription Start Sites (p-value = 0.03). Hyper and hypomethylation levels within TSS adjacencies varied according to disease progression status (Kruskal–Wallis, p < 0.001), and specific differentially methylated regions associated genes were identified for each group. The lower the promoter methylation, the higher the gene expression in subjects undergoing virologic failure (R = − 0.82, p = 0.00068). Among the inversely correlated genes, those supporting glycolysis and its related pathways were hypomethylated and up-regulated in virologic failures. Disease progression heterogeneity was associated with distinct DNA methylation patterns in terms of rates and distribution. Methylation was associated with the expression of genes sustaining intracellular glucose metabolism in subjects undergoing antiretroviral virologic failure. Our findings highlight that DNA methylation is associated with latency, disease progression, and fundamental cellular processes.


Results
Methylation analysis was conducted for samples of 6 controls and 22 People Living with HIV (PLWH). Out of the total number of PLWH analyzed, seven were cART virologic responders, seven were cART virologic failures, six long-term nonprogressors (LTNP), and two elite controllers (EC). The Class I HLA alleles for the elite controllers and for four out of the six long term non-progressors are shown in Supplementary Table 1. Among the control group, three were males, and three were females (mean age of 31 years), whereas among HIV-1 infected individuals, 12 were males and 10 were females (mean age of 43 years).
The mean time since HIV diagnosis for all HIV infected individuals was 13 years (3-22 years). In the virologic responders' group, the mean duration of cART was six years, and the median CD4 + count was 555 cells/mm 3 (118-985 cells/mm 3 ). In the virological failure group, the mean duration of cART was eight years, the median CD4 + count was 240 cells/mm 3 (159-623 cells/mm 3 ), and the median viral load was 3.66 log 10 (1.8-4.57 log). In the group of LTNP, the median CD4 + T cell count was 693 cells/mm 3 (566-1,257 cells/mm 3 ), and the median viral load was 3.40 log 10 (2.3-4.04 log). EC exhibited a median T CD4 + count of 1401 cells/mm 3 (1,613 Table 1.

cells/ mm 3 ). A detailed characterization of the HIV infected individuals enrolled in the study is depicted in
There were no statistically significant differences in sex (p-value > 0.05) and time since HIV diagnosis (p-value > 0.05). The groups significantly differ in age (p-value = 0.015) and CD4 T cells counts (p-value = 0.007). However, according to the deconvolution analysis, modest effects on CD4 T cells fractions were observed (Suplementary Table 2, Supplementary Fig. 1). Each sample achieved an average of 65 million paired-end alignments with a unique best hit (see Supplementary Table 3), yielding billions of CpGs with sodium bisulfite conversion rate of 98.9% (see Supplementary Table 3) designed to cover CpG islands, shores (up to 2 kb flanking CpG islands), shelves (up to 2 kb flanking shores), promoters, RefGenes and cancer-specific sites. Read depth distribution after filtration is depicted in Supplementary Fig. 2. As expected for differentiated cells, CpG methylation levels followed a bimodal fashion (see Supplementary Fig. 3), which is set up during embryogenesis 20 . Disease progression heterogeneity was associated with different DNA methylation patterns. Methylation profiles between each HIV infected group and controls are represented by Principal Component Analysis (PCA) (Fig. 1a) and hierarchical clustering analysis using the correlation distance method (Fig. 1b). Samples grouped closer are similar in their methylation profiles. Pearson's correlation scores are depicted in Supplementary Fig. 4.
Each HIV infected group was compared to control group in order to identify significant differentially methylated regions (DMR). The highest number of DMR was detected in virologic failures, whereas the virologic responders exhibited the lowest number. Out of the 11,299 DMR, virologic failures and EC accounted for 4,733 and 3,600 regions, respectively; LTNP and virologic responders accounted for 1,965 and 1,001 regions, respectively (see Supplementary Table 4).
Concerning the methylation status and genomic location of DMR, we were able to identify a higher number of hypomethylated than hypermethylated sites in all HIV-1 groups (Fig. 2a), widely distributed across somatic and X human chromosomes (Fig. 2b), and nearly half colocalized with gene promoters computed from 1 kb upstream to 1 kb downstream of Transcription Start Sites (TSS) (Fig. 2c,d). Out of the DMR that colocalized with promoters in each group, 83.4% (n = 772) was hypomethylated in LTNP, followed by 82.6% (n = 1860) in subjects undergoing cART failure, EC with 71.7% (n = 1,250), and virologic responders with 58.8% (n = 294). Figure 3a represents hypo and hypermethylation distributions among HIV groups. Although intragroup comparisons of DMR distributions revealed slightly distinct patterns depending on the status of disease progression, only virologic responders exhibited higher distribution of hypermethylated regions closer to TSS (Wilcoxon rank-sum test, p-value = 0.03, Fig. 3b).
Hypo and hypermethylation medians within − 1 kb/ + 1 kb flanking TSS were significantly different among HIV groups. EC showed the highest median of hypermethylation (29.73%), followed by LTNP (29.28%), virologic responder (28.11%), and the lowest was exhibited by virologic failures (28.09%) (Kruskal-Wallis, p < 0.001) (Fig. 4a). Interestingly, hypermethylation levels between groups that spontaneously control HIV infection were not statistically different (p > 0.05); similarly, hypermethylation levels between HIV groups that do not have such ability were not significantly different as well (p > 0.05) regardless of the plasma viral load. Moreover, hypermethylation levels from LTNP were significantly higher than virologic failures (p = 0.012) irrespective of the similar median viral loads (3.66 log 10 for virologic failures and 3.40 log 10 for LTNP), providing further support for an It is worth mentioning that we were able to detect a methylation signature previously identified in PLWH compared to HIV uninfected subjects 21 . The promoter of the NLR family, CARD domain-containing gene 5 (NLRC5), involved in activating major histocompatibility complex class I gene expression 22 , was hypomethylated for all HIV groups except EC (see Supplementary Table 4). Out of the genes that were hypomethylated except for EC, changes in methylation for the individual groups were significant only for the region chr8:27397101-27397200, which corresponds to the Protein Tyrosine Kinase 2 Beta (PTK2B) ( Supplementary Fig. 5). We also identified other DNA methylation signatures previously associated with low and high plasma viral loads 23 . For instance, promoters of RUNX family transcription factor 3 (RUNX3), Deltex E3 Ubiquitin Ligase 3L (DTX3L), and Interferon Induced Protein 44 Like (IFI44L) were hypomethylated only in virologic failures; MX Dynamin Like GTPase 1 (MX1) promoter exhibited higher hypomethylation level for virologic failures than for virologic responders, and Lymphocyte Cytosolic Protein 2 (LCP2) promoter was hypomethylated in EC and hypermethylated in virologic responders (see Supplementary Table 4).
DNA methylation patterns were associated with distinct biologic pathways in HIV infected groups. DMR-associated biologic pathways in our data included immune response, cell signaling, and metabolic regulation (see Supplementary Table 5). Hypermethylation in virologic failures was associated with T cell receptor signaling pathway, including CD4 gene, whose promoter was hypermethylated; hypomethylation in such patients was associated with TNF-signaling, Phosphatidylinositol-3-kinase/Protein Kinase B (PI3K/Akt) signaling pathway, Hypoxia-inducible factor 1 (HIF-1), among other pathways. Conversely, HIF-1 signaling was enriched for hypermethylated regions in LTNP. This pathway coordinates the transcription activation of several genes involved in glucose metabolism, angiogenesis, and cell survival 24,25 . According to our data, provided that the HIF-1 pathway was related to hypermethylation in subjects that naturally sustain viral loads to low levels   www.nature.com/scientificreports/ and hypomethylation in individuals experiencing cART failure, methylation was a plausible aid to configure the expression of genes involved in regulating glycolysis and its interdependent cascades.

DNA methylation inversely correlated with gene expression in subjects experiencing virologic failure.
To better understand the interplay between methylation and gene expression in "Patients" experiencing virologic failure, RNA-seq analysis was conducted for six controls and four virologic failures. We were able to find 187 differentially expressed genes (DEG), considering log2 fold change > 3 and p-adjusted < 0.05 (see Supplementary Table 6). The saturation curve for each sample is shown in Supplementary Fig. S6. The observation that most genes were up-regulated in virologic failures (Fig. 5a,b) is consistent with the higher number of hypomethylated regions found in this group (Fig. 2a,b). Biological pathways for RNA-seq data are depicted in Supplementary Table 7 Some of the up-regulated genes were Interleukin-6 (IL-6) and Interleukin-1 beta (IL-1β), two well known pro-inflammatory mediators 26,27 , and also Interleukin-10. Previous reports demonstrated that the natural suppression of HIV was associated with the highly functional co-expression of cytokines such as TNF and IL-10 concomitant with significant maintenance of anti-inflammatory and anticoagulant properties 28 . Out of the 187 DEG genes, 13 showed differentially methylated promoters ( Table 2) and established different types of networks (Fig. 6). Promoter methylation and gene expression were inversely correlated in subjects experiencing cART failure (R = − 0.82, p = 0.00068) (Fig. 7). DNA methylation and gene expression negatively correlated in our data corroborate the long-term conception that promoters highly methylated associates with transcriptional silencing [29][30][31] .
Hypoxia-inducible factor 1-alpha (HIF-1α) was up-regulated and hypomethylated in virologic failures. This is consistent with the HIF-1 signaling pathway being one of the most significant pathways enriched for hypomethylated regions in "Patients" failing cART. HIF-1α is a subunit of the HIF-1 transcription factor primarily activated in response to microenvironment oxygen deprivation 24,32 , although oxygen-independent regulation www.nature.com/scientificreports/ may increase its expression in T cells 33 . Inside the nucleus, HIF-1 attaches to Hypoxia Response Element within regulatory sequences of target genes, rendering the cells adapted to survive under low oxygen supply 24,25,33 . Finally, although analyses were conducted in Peripheral Blood Mononuclear Cells (PBMC), which consists of a large pool of mononuclear cell subpopulations, certain alterations are likely associated with specific cell subsets. Nuclear Receptor Subfamily 4 Group A, Member 2, (NR4A2), which codes for a steroid/thyroid hormone nuclear receptor 34 , was the only up-regulated gene with a hypermethylated promoter, which is in agreement with a previous report 35 . NR4A2 plays a regulatory role in the development of dopaminergic neurons 36 and, interestingly, monocytes from HIV-1 patients with cognitive impairment showed the same epigenetic and expression profile for NR4A2 35 .

Discussion
The current study aimed at charting the methylation landscape of HIV-1 infected individuals heterogeneous in terms of HIV viremia control. We evaluated a panel of samples comprising individuals that naturally control HIV viremia (EC), partially naturally control the HIV harm to the immunity (LTNP), individuals that artificially control HIV viremia (virological responders), and individuals that were not able to artificially suppress HIV replication (virologic failures) Our major findings revealed distinct methylation patterns among PLWH distinct groups, concerning distribution and levels of methylation within human genome segments associated with transcription regulation. Additionally, we were able to associate promoter methylation with changes in gene expression for virologic failures, providing further insights into the role of epigenetic machinery on HIV persistence.
Cytosines followed by guanines (5'-CpG-3') in the human genome are methylated in a reaction catalyzed by DNA methyltransferases, which were shown to be increased upon HIV infection in vitro 17,19,37,38 . Furthermore, global DNA methylation percentage was higher in HIV infected CD4 + T cells in vitro 39 , and overall DNA hypermethylation was associated with HIV infection in a pair of serodifferent monozygotic twins 19 . However, a different dynamic was observed in our clinical samples, which is in agreement with a previous study addressing the DNA methylation status of a large cohort of HIV-infected and non-infected subjects 21 that showed a higher number of hypomethylated regions in PLWH.
Unexpectedly, nearly half of the disturbed regions colocalized with promoters, indicating a potential to affect gene expression 1,40 . Roughly 70% of promoters locate within CpG islands (CGI) 41 and the experiment is aimed at addressing CpG-dense segments. However, intergenic segments constitute 75% of the human genome, and introns account for nearly 24% 42 . Additionally, CpG sites spread throughout the genome are highly methylated, whereas CGI-associated promoters are markedly non-methylated 1,43 , then it is counter-intuitive that DMR was promoter-enriched and mostly hypomethylated. Given that HIV integrates into the human genome and harnesses the host transcription factors to complete its replication cycle, our data suggest that HIV-infected cells are www.nature.com/scientificreports/ primarily hypomethylated, which might favor the replenishment of infectious particles in the blood circulation through efficient provirus transcription. Nevertheless, considering that only a minority of cells in the bulk of PBMC is HIV infected 44,45 , we cannot provide clues on the exact mechanism by which the DNA is undergoing demethylation in such patients. Individuals on fully suppressive cART exhibited higher hypermethylation distribution closer to TSS. Conversely, hypo and hypermethylation distribution surrounding TSS were similar in subjects whose viral loads were detectable. This slightly different distribution may reflect the contingent of cells developing active replication and their latently infected counterparts. cART targets virus-producing cells, then a relatively greater amount of latently infected cells may be represented in individuals on a successful cART contributing to the higher hypermethylation closer to TSS. This idea is in line with longitudinal analysis showing that cART negatively selected single provirus integration in host genes expressed at higher rates 46 , suggesting that provirus integrated within poorly expressed regions are more prone to perpetuate and become part of the archived provirus. Indeed, pharmacological agents aiming at forcing expression of HIV-1, such as HDACi, efficiently stimulated viral transcription from latently infected cells and turned chromatin more permissive for HIV expression by increasing global acetylation 13,14,47 . However, to date, "kick and kill" strategies have not successfully reduced viral reservoir, even when HIV-specific T cell response was boosted with viral vector vaccine 48 , indicating that additional approaches might be necessary to "kill" infected cells reactivated from latency. On the other hand, if DNA methylation is regarded as a relatively stable epigenetic modification in mammalian genomes 49 , and hypermethylation surrounding TSS appears to be associated with latently infected cells, our data reinforce findings from an in vitro Significant regions (≥ 25% for methylation difference and q-value < 0.01) were annotated to find DMR-associated genes. DMR within − 1 kb/ + 1 kb flanking TSS was filtered for comparing hypo and hypermethylation differences. (a,b) Boxplots show comparisons of percentages for significant hypermethylated and hypomethylated promoters among HIV groups (Kruskal-Wallis rank-sum test < 0.001). P-values were calculated using the Wilcoxon rank-sum test with Benjamim-Hochberg correction for multiple comparisons and nonsignificant values were omitted. Venn-diagrams represent the number of DMR-associated genes detected for each group and their intersections. NS Non-significant p-values. www.nature.com/scientificreports/ proof-of-concept study highlighting that the combination of multiple LRA should be considered to activate the overall contingent of cells refractory to HIV expression 50 . The observation that disease progression and/or viral control heterogeneity was subject to distinct hypermethylation levels suggests that DNA methylation is one of the underlying mechanisms associated with in vivo HIV control. Recently, specific DNA methylation signatures were associated with innate and adaptive immune responses in untreated individuals with high and low viral 23 . DTX3L, IFI44L, and MX1 were part of a cluster containing hypomethylated genes involved in antiviral response in HIV infected subjects with high viral loads, whereas LCP2 was part of a hypermethylated cluster associated with adaptive response in such individuals 23 . To broaden this understanding, our data demonstrated that individuals who naturally control HIV infection www.nature.com/scientificreports/ exhibited higher hypermethylation rates within TSS adjacencies, perhaps promoting an autologous block and lock phenomena. Conversely, lower rates of hypermethylation were observed for subjects unable to naturally control HIV viremia, irrespective of viral loads. Furthermore, hypermethylation levels were higher for EC compared to virologic responders, suggesting the importance of this phenomenon in the natural control of viremia. Collectively, our data provide strong evidence that DNA hypermethylation distribution associates with the contingent of latently infected cells, and high hypermethylation rates associates with spontaneous control of HIV infection. It remains to be elucidated mechanisms that make the host naturally control the HIV viremia by epigenetic processes such as DNA methylation or the genetic correlates for HIV viral control. For instance, a deep metabolomic evaluation found evidence that EC harbors an inborn error of metabolism (late-onset multiple acyl-coenzyme A dehydrogenase deficiency [MADD]) 51 .
Regarding the DMR-associated biologic pathways, HIF-1 signaling was enriched for hypermethylated regions in LTNP, while it was associated with hypomethylation in the virologic failure group. The latter also showed  www.nature.com/scientificreports/ hypomethylation and up-regulation of HIF-1α, a subunit of the transcription factor HIF-1, whose activity is triggered by intracellular hypoxia 24,52 . However, hypoxic conditions inhibited HIV activation in vitro, with cells maintaining stable activation and viability 53 . Since patients experiencing virologic failure exhibit productive infection, it is conceivable that the increased expression of HIF-1α was driven by factors other than intracellular hypoxia. Indeed, HIF-1α may be triggered in normoxic T cells by Vpr-induced oxidative stress 54 and mitochondrial Reactive Oxygen Species (ROS) 55 . Mitochondrial ROS produced in response to cytosolic dsDNA induced HIF-1α expression in HIV infected cells irrespective of viral accessory proteins 55 . Upon activation of naïve T cells, metabolic reprogramming from oxidative phosphorylation to aerobic glycolysis takes place for potent T cell effector function and cytokine production 56,57 , which is supported by HIF-1α through upregulation of glucose transporter-1 (GLUT-1) 33,58 . T cells cultured in glycolysis deprivation fail to secrete Interferon-γ through a mechanism that prevents its translation 57 . By contrast, HIV infection also relies on cellular bioenergy status, evidenced by a reduction of HIV particle assembly when glucose metabolism was blocked in T CD4 cells 59 . Moreover, CD4 T cells and monocytes of HIV-infected patients were associated with increased phenotype for glucose metabolism irrespective of cART 60,61 . Glycolysis is not regulated only by HIF-1 signaling. Along with the increased expression of HIF-1α accompanied by HIF-1 pathway markedly associated with hypomethylation, we observed that hypomethylated regions were also enriched for PI3K-Akt signaling in individuals undergoing virologic failure. Upon activation of AKT by a PI3K-dependent signaling, downstream cascades control a variety of elementary processes, which include glucose metabolism by increasing the activity of GLUT-1 62,63 , bringing further support for the idea that methylation is associated with transcription of genes implicated on intracellular glucose metabolism in such individuals. However, further experiments are necessary to evaluate causality between methylation of individual regions and biologic pathways.
Some limitations in the present study must be emphasized. We recognize the absence of data related to the time of infection of each patient to correlate with differences in methylation. Although the results suggested that DNA methylation was associated with latently infected cells, our experiment was not designed to exclusively characterize latently HIV infected cells and does not allow the discrimination among latently infected cells, virus producing cells and uninfected bystander cells. While we found no statistic significance between hypermethylation levels within the gene promoters of virologic failures and responders groups, suggesting that similar hypermethylation rates between such groups might be associated with the viral rebound that responders would experience upon treatment interruption, we recognize that it is challenging to speculate the mechanistic role played by methylation alterations of individual genes in the viral rebound. Gene expression and methylation were inversely correlated, and distinct phenotypes for disease progression were associated with specific methylation patterns. However, it is not possible to establish causality, as methylation may reinforce transcriptional silence that was consolidated by other epigenetic modifications rather than cause the shutdown of gene expression 64 . www.nature.com/scientificreports/ Deconvolution analysis showed that the differences in CD4 + T cell fractions were not statistically significant. However, the hypermethylation levels follow the same trend of CD4 + T cell counts. The presence of multiple cellular subpopulations might influence the methylation variability observed in epigenetic studies 65 , once the PBMC are constituted by cellular lineages with differences towards their methylation patterns 66 . Although the groups significantly differ in age, a previous report that developed an epigenetic clock to estimate the DNA methylation age of tissues and cells subsets showed that age causes minor effects on beta values of individual CpG sites 67 . Furthermore, our analyses were restricted to circulating cells and were based on a small sample size, particularly for the EC group, which could prevent some DMRs from reaching statistical significance. Future studies addressing whether and how the methylome is disrupted in other compartments that support HIV replication may deepen the knowledge regarding the impact of epigenetic modifications on HIV infection. Nonetheless, we have shown that DNA methylation was associated with latently infected cells and natural control of HIV infection. Furthermore, we demonstrated the association between DNA methylation and expression of genes and cascades sustaining intracellular glucose metabolism in individuals undergoing virologic failure. Our findings highlight the dynamic nature of the methylation landscape in HIV infection, which may impact future studies aiming at HIV remission without the use of antiretrovirals.

Methods
Patients. Clinical and epidemiological data of HIV-1-infected subjects from the outpatients' Clinics of the Federal University of São Paulo were analyzed, and candidates were selected from 2013 and 2014. Patients were asked about the possibility of taking part in the study voluntarily, and all the individuals enrolled in the study agreed and signed an informed consent term. The research was approved by the ethics committee from the Federal University of São Paulo (#51854). The methods were carried out according to the approved guidelines.
Patients were divided into (i) virologic responders group, in which viral loads were below detection limits during the preceding six months using cART; (ii) virologic failure group, consisting of detectable viral loads despite cART during the last six months; (iii) LTNP, antiretroviral naïve individuals presenting low detectable viral loads and stable CD4 + T cell counts above 500 cels/mm 3 for at least three years; (iv) EC, presenting undetectable viral loads and T CD4 + counts above 500 cels/mm 3 for at least seven years without cART. The control group consisted of HIV-1/HIV2 seronegative individuals, which were also negative for Hepatitis B, Hepatitis C, Chagas' disease, Syphilis, Human T Lymphotropic virus-1 (HTLV-1), and Human T Lymphotropic virus-2 (HTLV-2).
CD4 + T cell counts, and viral loads were performed with the same samples used to perform Methyl-seq and RNA-seq. Viral load was assessed using the RT-PCR HIV-1 Abbott Real Time assay (Abbott Molecular Inc.), with a limit of detection of 40 copies/mL.

Methyl-seq library preparation. Methyl-seq was performed using the SureSelect Methyl-seq Target
Enrichment System for Illumina Multiplexed Sequencing kit (Agilent Technologies), according to the manufacturer's instruction. Briefly, 3 µg of genomic DNA was sheared by sonication in a Covaris S2 (Covaris, Inc), and the extremities of the sequences were repaired, ligated to adapters, and hybridized with biotinylated probes designed to capture CpG-rich regions of the human genome. The libraries were treated with sodium bisulfite followed by polymerase chain reaction (PCR) with primers complementary to the adapters. DNA treatment with bisulfite converts non-methylated cytosines into uracyls and, during the PCR, uracyls were amplified in the complementary strands as adenine and then as thymine. Finally, different indexes were added to each sample for them to be sequenced in a multiplex.
Libraries were validated before sequencing using the Kapa Library Quantification kit (Kapa Biosystems), and the size of the fragments was analyzed using a Bioanalyzer 2100 (Agilent Technologies). For sequencing, libraries were denatured with NaOH at 0.1 N and sequenced at 12 pmolar. Sample pools were distributed along the seven lanes of the flow cell, and one of the lanes was spared for sequencing of the control using PhiX (Illumina, Inc). Sequencing runs of 100 paired cycles (2 × 100) were performed with the Hiseq 1500 platform, Illumina.

RNA-Seq library preparation.
Library for the RNA-Seq was performed using the Truseq Stranded mRNA Sample Prep KIT (Illumina, Inc), according to the manufacturer's instruction. 1 µg of RNA was used for generating poly-A mRNA libraries from total RNA. Libraries were validated through quantification by real-time PCR using the Kapa Library Quantification kit (Kapa Biosystems), and the size of the fragments was analyzed using a Bioanalyzer 2100 (Agilent Technologies) and the High sensitivity DNA assay kit (Agilent Technologies). The concentration of all the samples was adjusted to 2 nM with Tris EDTA buffer. Sequencing of 100 paired cycles (2 × 100) was carried out in the Hiseq 2500 platform using the fast mode.
Bioinformatic analysis. FASTQ files for methylation analysis were processed using fastQC 68 and Cutadapt (v. 2.10) 69 . Sequence alignment was run with Bismark v.0.19.1 70 and Bowtie 2 (v. 2.2.5) against the reference genome GRCh38 with the specified options: -q -L 19 -score-min L,0,-0.2 -p 4 -reorder -ignore-quals -no-mixed -no-discordant -dovetail -maxins 500. The remaining parameters from Bismark were used as default. BAM files were subjected to the Bioconductor package Methylkit (v. 1.12.1) 71 , and bases in CpG context covered in all samples with depth above 10X and Phred quality score above Q20 were considered to carry out differential  71 . Bases having more than 99.9 th percentile of coverage were excluded. To estimate sodium bisulfite conversion efficiency, the number of thymines was divided by total coverage for each non-CpG context. The percentage of methylation was calculated by dividing the number of cytosines by the total number of cytosines and thymines for each CpG context. Differential methylation analysis was performed creating non-overlapping tiling windows consisting of 100 bp, and the methylation percentage for a given region was compared against a control group. Methylkit employs logistic regression to detect Differentially Methylated Regions (DMR) and adjusts p-values to q-values with the Sliding Linear Model (SLIM) method 72 to correct for multiple testing. Since a methylation difference greater than 25% was associated with a twofold repression in the gene expression 73 , we adopted a stringent moderate cutoff of ≥ 25% for methylation difference and a q-value < 0.01. GTF files containing genomic coordinates for DMR were annotated with Ensembl (v.92) to find DMR-associated genes. Biological pathway analysis for DMR was conducted through the WEB-based Gene SeT Analysis Toolkit 74 using a database from the Kyoto Encyclopedia of Genes and Genomes (KEGG).R language 75 (version 3.5.3) was utilized for statistical analysis. Hypo and hypermethylation difference among HIV groups were subjected to Kruskal-Wallis rank-sum test and Wilcoxon rank-sum test with Benjamini-Hochberg correction for multiple comparisons. Statistical significance for distances to TSS was calculated using the Wilcoxon rank-sum test. The cell fractions were estimated with the Bioconductor package EpiDISH version 2.10.0 76 . The robust partial correlation (RPC) inference was coupled with the reference dataset FlowSorted.Blood.450 k.compTable 77 to estimate the cell fractions and the relative proportions for each cell subset were compared using the Kruskal-Wallis test.
The quality of the RNASeq libraries was evaluated using the fastQC 68 . We next used BWA-mem (v 0.7.17-r1188) 78 for removal of contaminating ribosomal RNA (rRNA). The remaining reads were aligned to the reference genome GRCh37/hg19 using the STAR (v2.6.1a_08-27) 79 aligner with paraments -chimSegmentMin 20, -limitIObufferSize 62500000 e-runThreadN 8. The resulting BAM files of accepted reads and GTF file with gene annotations (Ensembl v87) were used as inputs for HTSeq-count (v. 0.11.2) 80 to obtain the gene-level counts. Using DEseq2 (v1.18.1) 81 , we assessed the differentially expressed genes among the samples. RNA-seq pathway analysis was performed according to the REACTOME database 82 . Network analysis was carried out with the Cytoscape platform 83 coupled with GeneMANIA plugin 84 .
We then mapped DMR to differential expression genes and the Pearson correlation was made by using R scripts. Accession numbers pending. Raw data available on request. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.