Whole mitochondrial genome sequencing highlights mitochondrial impact in gastric cancer

Mitochondria are organelles that perform major roles in cellular operation. Thus, alterations in mitochondrial genome (mtGenome) may lead to mitochondrial dysfunction and cellular deregulation, influencing carcinogenesis. Gastric cancer (GC) is one of the most incident and mortal types of cancer in Brazil, particularly in the Amazon region. Here, we sequenced and compared the whole mtGenome extracted from FFPE tissue samples of GC patients (tumor and internal control – IC) and cancer-free individuals (external control – EC) from this region. We found 3-fold more variants and up to 9-fold more heteroplasmic regions in tumor when compared to paired IC samples. Moreover, tumor presented more heteroplasmic variants when compared to EC, while IC and EC showed no significant difference when compared to each other. Tumor also presented substantially more variants in the following regions: MT-RNR1, MT-ND5, MT-ND4, MT-ND2, MT-DLOOP1 and MT-CO1. In addition, our haplogroup results indicate an association of Native American ancestry (particularly haplogroup C) to gastric cancer development. To the best of our knowledge, this is the first study to sequence the whole mtGenome from FFPE samples and to apply mtGenome analysis in association to GC in Brazil.


Results
After processing the mtGenome sequences with the adequate mapping scores (Q30), the average depths of coverage were 195x and 36x for tumor samples and internal control (IC) samples, respectively. For external controls (EC), the average depth of coverage was 351x. All samples with low quality were excluded. Table 1 presents the sample size of all groups for each analysis.
Out of the 20 sequenced sample pairs of GC patients, only eight had both samples with enough coverage for a paired comparison. In another eight patients, only one of their samples (five tumors and three internal controls) had enough coverage to be included in further analyses. The remaining four pairs did not pass our quality standards and were excluded from all analyses. As for external controls, 47 samples had enough coverage to be included in the general analyses.
It is noteworthy that (i) mean age was of 66 years old in case group and 41 years old in EC group (Mann-Whitney test, p < 0.001); and that (ii) case group was composed of 77% male and 23% female, while control group was composed of 28% male and 72% female (Chi-squared test, p = 0.002).

Paired sample analyses.
When we compared the mtGenome of each of the eight paired samples from gastric cancer patients, we found that, in most cases, tumor samples presented more variants than their respective internal control samples (up to 5.5-fold more variants). Table 2 shows the number of variants by pair and the genes in which most variants were found. The full list of variants and proportions is found in Supplementary Material (Table S1).
By grouping all paired samples from tumor and IC, we found 25 variants that were exclusive to tumor samples and eight variants that were exclusive to internal control samples, as well as 36 variants that were shared by both groups (Fig. 1). The distribution by region and position of these variants is shown in Supplementary Material (Table S2).
Hence, tumor samples presented more than 3-fold the number of exclusive variants in comparison to their paired internal control samples. This notable number of variants that were exclusive to tumor was distributed

Paired analyses of variant distribution
Paired analysis 8 8 -

Mitochondrial haplogroup analyses
Sample number - 11 47  www.nature.com/scientificreports www.nature.com/scientificreports/ in 13 genes, of which three (MT-ATP8, MT-ND4L and MT-TG) were only affected in this group. The variants exclusive to internal control samples were distributed in seven genes and the shared variants were distributed in 14 genes, of which MT-CO3 and MT-RNR1 were affected in both groups.
In addition, the genes with more variants were: (i) for tumor-exclusive variants, MT-DLOOP1 and MT-DLOOP2 (with 16% each); (ii) for variants exclusive to internal control, MT-ND5 (with 25%); and (iii) for variants shared by both groups, MT-DLOOP1 (with 22%) and MT-ND5 (with 19%).   When comparing heteroplasmy, six out of the eight pairs matched our standards and were included in this analysis. We found that, with one exception, tumor presented notably more heteroplasmic regions than their paired internal control (3 to 9-fold times more; Fig. 2).
Among the heteroplasmic regions, the most commonly affected were MT-DLOOP1, MT-ND4 and MT-ND5. It is noteworthy that these three, along with MT-ND6, were heteroplasmic in all tumor samples, but not in all internal controls. Four other genes were heteroplasmic in almost all tumor samples, but they were not as fre- General heteroplasmy analysis. When comparing heteroplasmy levels between all groups of samples (tumor, internal control and external control), we included all tumor (N = 13), internal control (N = 9) and external control (N = 40) samples that met our quality control standards (see Table 1). In addition to the quality scores, only variants with at least 5% heteroplasmy levels were considered for this analysis, in order to control possible artifacts and false-positives due to the limited coverage of the samples 24,25 .
Using this criterion, we assessed heteroplasmy levels of all three groups. Aiming a fair comparison of heteroplasmy levels, we evaluated the presence of the same distinct 630 variants in all samples considering a heteroplasmy level of 1% when absent or under 5%.
The log heteroplasmy levels were compared using a mixed linear regression assuming a fixed effect of the group and a random effect of the individuals. This model was better fitted to the data than a model without the group effect (Wald test χ 2 10.11; P 0.006). The pairwise comparison (with the p-value adjusted according to Tukey) showed that the tumor group presented a significantly higher level when compared to internal group (t 2.60; P 0.025) and external group (t 3.04; P 0.006) both with statistical power above 73%. These control groups did not present statistically significant difference between each other (t −0.42; P 0.906).
Considering the previous analysis, the amount of heteroplasmic variants may be more important to tumor development than heteroplasmy levels of variants per se. In fact, out of 1,664 heteroplasmic variants found after Further, individual's number of heteroplasmic variants average (i.e. total of heteroplasmic variants divided by number of samples) was of 50.23 for tumor, 16.89 for internal control and 24.54 for external control. Thus, tumor samples presented nearly 3-fold more heteroplasmic variants than internal control samples (Fig. 3). Then, a Poisson regression considering group effects was applied and it showed a significant difference between these groups (ANOVA F 5.46; P 0.007), with a statistical power above 90%.
In Fig. 3, it is possible to see that, even though the external control group was composed of more samples, most of them showed less heteroplasmic variants in comparison to tumor samples. Proportionally, internal control group also showed many samples with less heteroplasmic variants than tumor group. In addition, about half of tumor group presented samples with a high number of heteroplasmic variants, including some samples with more than 2-fold the mean tumor's heteroplasmy variants found (approximately 50 variants).
This difference became even more evident in the pairwise comparison of number of heteroplasmic variants between all groups using a post-hoc Tukey test. Tumor group presented a statistically significant difference when compared to internal (P 0.027) and external (P 0.008) controls with a statistical power above 70%. However, when both controls were compared no significant difference was found (P 0.903).
Regarding location of the observed heteroplasmic variants, heteroplasmy levels were observed in all protein-coding genes and in almost all non-protein-coding genes -appearing in all DLOOP and rRNA genes, but not in all tRNA genes. It is noteworthy that two heteroplasmic variants were observed in unidentified regions in external control and tumor samples and that they are probably in tRNA genes due to location. These unidentified regions were excluded from further analyses. Figure 4 shows a representation of all protein-coding and non-coding regions throughout 16,569 bp of the mtGenome. This graph supports our previous result that tumor group presents most heteroplasmic variants, followed by external control and internal control groups.
Moreover, out of the 37 genes in mtGenome, 32 were heteroplasmic in external control group, 28 in tumor group and 15 in internal control group. Hence, in order to compare heteroplasmic regions between all three groups, we analyzed 15 regions affected in all of them. Among such regions, 12 were protein-coding and three were non-protein-coding genes (one DLOOP and two rRNA regions) (Fig. 5). In this analysis, we observed a statistically significant difference between all groups, with tumor group presenting more heteroplasmic variants in MT-RNR1, MT-ND5, MT-ND4, MT-ND2, MT-DLOOP1 and MT-CO1.

Mitochondrial haplogroup analyses.
Furthermore, we assessed haplogroup information of all samples in order to investigate a possible association with GC. In some cases, tumor samples presented a different haplogroup than their paired samples, indicating that the tumors might not be an accurate ancestry informative tool, especially considering that MT-DLOOP presented a particularly high rate of variants in this group and these regions are the most informative of maternal ancestry 26 . Thus, all tumor samples were disregarded from haplogroup analyses and the case group was be represented only by the internal control.  (Table 3). Native American haplogroups were the most frequent in both case (36.4%) and control (44.6%). Following, control group presented about the same frequency rate for both European and African haplogroups (25.6% each), while case group presented a similar rate for European (27.3%), but not for African haplogroups (18.2%). Asian haplogroups were around 4-fold more frequent in cases (18.2%) than in controls (4.2%).
The most common haplogroup was C (Native American ancestry) in both groups, but it is interesting that case has more than 2-fold the frequency of the control (36.4% and 17.0%, respectively). In case group, the second most common haplogroup was H (European ancestry), with 18.2%, followed by the other five haplogroups with 9.1% of presence each. In control group, haplogroup L1 (African ancestry) is the second most common, with 15%, followed by haplogroup H, with 12.8%, and the other haplogroups.
In addition, our results showed a difference in the distribution of variants by mitochondrial ancestry (Fig. 6). Firstly, we compared it within cases using ANOVA and found that 53% of all variants were in individuals from Native American haplogroups, followed by European (18%), Asian (15%) and African (14%) haplogroups (F 4.48; P 0.047). As for control group, individuals from Native American haplogroups also presented the majority of variants (46%), followed by African (36%), European (16%) and Asian (2%) haplogroups (F 2.57; P 0.066). When we compared case and control there was no significant difference (F 0.93; P 0.338).

Discussion
In this study, we investigated the association of mitochondrial variants with gastric cancer in individuals from a population of the Brazilian Amazon. This was done through whole mtGenome sequencing of FFPE samples from patients and cancer-free individuals. To the best of our knowledge, this is the first study to successfully sequence mtGenome from FFPE samples.   www.nature.com/scientificreports www.nature.com/scientificreports/ This is particularly important because DNA sequencing from this type of material can be challenging. Although good for long-term storage of tissue samples, the FFPE process may affect the material integrity so that the extracted DNA is generally of low quality, often leading to the loss of samples. However, some studies performing NGS have shown that DNA extracted from FFPE tissue samples can still provide results in concordance to frozen tissues [27][28][29] . The challenge presented by FFPE samples is probably the reason for the loss of samples and for the limited coverage in our study. In spite of that, we were able to successfully sequence the mtGenome, obtaining interesting results.
When we compared the samples from the same eight gastric cancer patients, i.e. paired tumor and internal control samples, we observed significantly more somatic variants present in tumor samples than in the respective internal control samples (up to 5.5-fold more), a kind of pattern that has been reported in other types of cancer, such as colorectal cancer and penile cancer 11,30 . In addition, studies comparing mtDNA variants and genomic instability of different types of cancer reported that gastric cancer was among the types with more somatic variants and genomic instability with a tumor-specific pattern 10,25 .
Regarding the mitochondrial genes in which most of the tumor-exclusive variants were found, we observed that MT-DLOOP2, MT-DLOOP1 and MT-ND5 were among the regions with more variants in at least half of the tumor samples included in this analysis (Table 2). There was no similar pattern observed when we considered the regions with more variants exclusive to internal control. As for shared variants, they were similarly distributed among different genes. Additionally, there was a mean of 2.43-fold increase in the number of genes with exclusive variants when we compared tumor and internal control samples. It is noteworthy that most of these variants are present in more than one sample and are counted as such in this analysis.
Therefore, when we considered single variants only, there were 50 variants present only in tumor samples and most of them was distributed in four regions -MT-DLOOP1 (16%), MT-ND5 (16%), MT-ND4 (10%) and MT-DLOOP2 (10%) -, while there were eight variants present only in internal control samples, of which most was found in MT-ND5 (25%). This suggests that, in tumor, there is an increase not only in the number of variants, but also in the regions in which such variants occur.
This greater presence of variants in tumor samples was reinforced when we grouped and compared the eight pairs. In this analysis, we found 25 tumor-exclusive somatic variants and eight somatic variants that were exclusive to internal control samples, as well as 36 shared variants. While some variants that were considered somatic in the pair-by-pair comparison were then considered germline, tumor samples still showed 3-fold more variants than internal control samples.
Such tumor-exclusive variants were distributed in 13 genes, of which three (MT-ATP8, MT-ND4L and MT-TG) were only observed in this group. Only a few studies have reported variants in MT-ATP8 31,32 and MT-ND4L 31,[33][34][35] in different types of cancer and none was in gastric cancer. No studies were found associating variants in MT-TG to cancer. Furthermore, two genes (MT-CO3 and MT-RNR1) only presented shared variants, which are probably not involved in tumorigenesis. However, five genes (MT-CO1, MT-CYB, MT-DLOOP1, MT-ND1 and MT-ND5) not only presented germline variants, but also different somatic variants exclusive to tumor and internal control groups, indicating a possible association of such genes and variants with tumor development. It is noteworthy that, among these genes, MT-DLOOP1 and MT-ND5 stand out for presenting more germline variants than the others.
In addition to the shared variants, MT-ND5 also presents a high rate of exclusive variants not only in tumor samples but also in internal control samples, suggesting that this gene could also be altered in cells in other adjacent organs of cancer patients, contributing to possible tumor advances. Regardless, MT-ND5 encodes a core subunit of Complex I, essential for electron transport from NADH to ubiquinone in the mitochondrial respiratory www.nature.com/scientificreports www.nature.com/scientificreports/ chain, so that variants in this gene and others related to this process may lead to impairment in OXPHOS and an increase of reactive oxygen species (ROS) generation, which can contribute to cancer proliferation and metastasis 36,37 . In fact, a high rate of mutations in MT-ND5 has been reported in esophageal cancer 38 .
Moreover, the regions with more tumor-exclusive variants were the MT-DLOOP1 and MT-DLOOP2 (16% each). This corroborates previous studies that associated an increase of variants in such control regions to the development of different types of cancer, including hepatocellular carcinoma 39 , brain tumor 40 , oral squamous cell carcinoma 41 , colon cancer 42 and gastric cancer 43 . Further, a study have associated five variants in these genes with gastric cancer, including A73G and T16519C 44 , which we found frequently in our tumor samples but not in internal control samples.
We also obtained interesting results in the paired analysis of heteroplasmy levels. In most cases, tumor presented notably more heteroplasmic regions than the paired internal control (3 to 9-fold). About these heteroplasmic regions, we highlight that: (i) the four most commonly heteroplasmic genes in tumor samples were not heteroplasmic in all internal control samples; (ii) four genes were heteroplasmic in almost all tumor samples, but not as frequent in internal control genes; (iii) seven genes were heteroplasmic only in tumor samples; and (iv) there were no genes heteroplasmic only in the internal control group. These results are suggestive of an involvement of such heteroplasmic variants in GC development.
This was reinforced in our analyses comparing heteroplasmy distribution between all groups of samples (tumor, internal control and external control). When we compared heteroplasmy level means, the number of variants was different between groups, so we normalized the number of single heteroplasmic variants in all groups and found a statistically significant difference from tumor to internal control (P 0.025) and external control (P 0.006). These control groups did not present such difference between each other (P 0.906).
Therefore, the number of heteroplasmic variants might be as important or even more important to cancer development than heteroplasmy levels per se. In the individual heteroplasmy mean comparison, we observed that individual tumor samples presented almost 3-fold more heteroplasmic variants than internal control samples. A study conducted with patients of hepatocellular carcinoma also reported that tumor samples presented more heteroplasmic variants and a higher heteroplasmy ratio than their matched cancer-free samples 45 .
Moreover, tumor group presented 39% of all heteroplasmic variants, while internal control group presented 9% and external control group presented 52%. This is especially interesting given that tumor group had about a third of the number of external control samples, but about half of tumor group was composed of samples with a high number of heteroplasmic variants, as seen in Fig. 3. When compared to the control groups, tumor group presented a statistically significant difference regarding this distribution of heteroplasmic variants (P 0.027 for internal control and P 0.008 for external control). There was no significant difference between both controls (P 0.903).
As for location of heteroplasmic variants, out of the 37 mitochondrial genes, 32 presented variants in at least one group, out of which 15 regions carried such variants in all three groups, allowing the comparison shown in Fig. 5. Among these regions, 12 were protein-coding and three were non-protein-coding genes (DLOOP and rRNA only). Further, the higher rate of heteroplasmic variants in tumor groups was statistically significant in six of such regions: MT-RNR1, MT-ND5, MT-ND4, MT-ND2, MT-DLOOP1 and MT-CO1.
In the current specialized literature, not many studies were found associating heteroplasmic variants to the development of gastric cancer, regardless of mitochondrial genomic location, including the six genes above. A study performed showed an association of heteroplasmic variants in DLOOP regions to gastric cancer development in a Chinese population 46 . Another study have indicated homoplasmic and heteroplasmic variants in mitochondrial 12 S rRNA (encoded by MT-RNR1) to the development of intestinal-type gastric cancer 47 . No studies were found associating heteroplasmic variants in the four other genes to GC development.
In the mitochondrial haplogroup comparison between case and control, we observed 17 macro-haplogroups, distributed in four main ancestries: Native American (A, B, C and D), European (H, HV, J and V), African (L0, L1, L2, L3 and L3e) and Asian (M, N, R and Y).
Our results showed Native American ancestry with more variants than the other three ancestries in both cases and controls, which could be related to the great number of individuals from this ancestry in each group. Native American haplogroups were the most frequent in both case (36.5%) and control (44.6%). They were followed by European (27.3%), African (18.2%) and Asian (18.2%) in cases and by European and African (25.6% each) and Asian (4.2%) in controls. This ancestry profile based on mtDNA is consistent to the expected for the studied geographic region, considering the admixture process that formed the Brazilian population 26,48,49 . In addition, it is informative that there was a statistical significance in case group (P 0.047), probably due to the high frequency of Native American haplogroups in this group. This is especially interesting considering that GC is one of the most incident types of cancer in the studied Brazilian region, as previously mentioned.
Moreover, in both case and control groups, the most frequent haplogroup was C (Native American ancestry), but it is important to highlight that this haplogroup accounted for more than 2-fold in case group than in control group (36.4% and 17%, respectively). This suggests that haplogroup C might be related to gastric cancer development, which is reinforced by a study performed with Tibetan patients of gastric cancer, in which haplogroup C was the third most common (accounting for 17%) 50 .
Considering that the parent haplogroup of C is M 51 , it is also interesting that this haplogroup is present in our case group, but not in our control group. On the other hand, our study reported sibling haplogroup N in control group, but not in case group. Similarly, it has been suggested that haplogroup N is associated to good outcome of gastric cancer clinical evolution when compared to haplogroup M 18 . In addition, haplogroup H (European ancestry) was the second most common haplogroup in case group (18.2%). Although this haplogroup has not been previously associated to gastric cancer, it has been indicated that it might influence other types of cancer 52 .

conclusions
In this study, we were able to successfully sequence the whole mitochondrial genome of FFPE samples and associate it to gastric cancer development. To the best of our knowledge, this was the first study to achieve that. We found more mitochondrial variants in tumor group than in controls, suggesting that such variants and mitochondrial heteroplasmy might influence gastric cancer development. In addition, haplogroup C might also be important to the development of this type of cancer. Although limited by sample number, our findings contribute to a greater understanding of mitochondrial influence in gastric cancer. Further, functional studies are recommended to clarify the mechanism of this impact.

Methods
Sampling. In this study, we included formalin-fixed and paraffin-embedded (FFPE) tissue samples from patients going through gastric biopsy by endoscopy or surgical resection in the Unit of High Complexity in Oncology of the University Hospital João de Barros Barreto (UNACON/HUJBB-UFPA). Individuals with gastric cancer diagnosis had their samples collected before undergoing chemotherapy or radiotherapy. All samples were analyzed by a pathologist that confirmed the positive or negative diagnosis of cancer.
We investigated 70 individuals (90 samples), divided in two main groups (Fig. 7). The first group ("Case") consists of samples obtained from 20 individuals with gastric cancer diagnosis before treatment: gastric tumor tissue samples ("Tumor" subgroup) and non-tumor tissue of the duodenum ("Internal Control" subgroup) were collected from each patient, totalizing 40 samples in the Case group. Duodenal portion was extracted during gastrectomy and confirmed to be cancer-free by a pathologist. These samples were chosen as internal control respecting the cancerization field effect. The second group ("External Control") is composed by gastric tissue samples from 50 individuals with no history of cancer and no infection by Helicobacter pylori.  Table S2). These sets were composed of 33 pairs of designed primers with discontinuous genomic location in order to provide overlaps in order to cover the whole fragmented genome.
Mitochondrial genome sequencing. The mtGenome was sequenced in the MiSeq System (Illumina Inc., Chicago, IL, USA) with MiSeq Reagent Kit v3 (300-cycles) (Illumina). Preparation of libraries was done with Nextera XT DNA Library Preparation Kit (96 samples) (Illumina), according to manufacturer's instructions. DNA quality during library preparation was assessed with High Sensitivity D1000 ScreenTape at Agilent 2200 TapeStation System (Agilent Technologies, Santa Clara, CA, USA).
Data analysis. Paired-end sequencing reads (.fastq files) were aligned with the human reference mtDNA sequence -revised Cambridge reference sequence, rCRS -using Burrows-Wheeler Alignment tool (BWA) 53 . SAMtools 54 were used for mapping and sorting sequences, while Picard 55 was used to mark duplicated reads. After preprocessing sequences with the aforementioned steps, paired-end.bam files were submitted to mtDNA-Server analysis 56 for heteroplasmy detection and haplogroup inferences. Phred quality scores (Q scores) of 20, 30 and 20 were considered for mapping quality of reads, alignment quality and base quality, respectively. For heteroplasmy detection at mtDNA-Server, we considered at least 10x of coverage on forward and reverse strands. Using its own