A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour

A complete and accurate genome sequence provides a fundamental tool for functional genomics and DNA-informed breeding. Here, we assemble a high-quality genome (contig N50 of 6.99 Mb) of the apple anther-derived homozygous line HFTH1, including 22 telomere sequences, using a combination of PacBio single-molecule real-time (SMRT) sequencing, chromosome conformation capture (Hi-C) sequencing, and optical mapping. In comparison to the Golden Delicious reference genome, we identify 18,047 deletions, 12,101 insertions and 14 large inversions. We reveal that these extensive genomic variations are largely attributable to activity of transposable elements. Interestingly, we find that a long terminal repeat (LTR) retrotransposon insertion upstream of MdMYB1, a core transcriptional activator of anthocyanin biosynthesis, is associated with red-skinned phenotype. This finding provides insights into the molecular mechanisms underlying red fruit coloration, and highlights the utility of this high-quality genome assembly in deciphering agriculturally important trait in apple.

really nice to carry out a CRISPR-Cas9 knock out of redTE to see how that would affect fruit colour, this would be far too time consuming (maybe for future projects?).
At least the authors should demonstrate that redTE shows a similar transcriptional regulation as MdMYB1 upon high light exposure. This should be the case since redTE has two LTRs, one presumably driving MdMYB1 and the other redTE itself.
Response: We thank the reviewer for positive comments and suggestions. It is a very good idea to use CRISPR-Cas9 knockout of redTE to vertify its function. But, ideally, it will take about 3-5 years to observe the phenotype of fruit after CRISPR-Cas9 knockout of redTE in apple. A CRISPR-Cas9 knockout projects will be carried out in our following work.
In view of suggestion, we carried out a transcriptional expression analysis of redTE and MdMYB1 upon high light exposure & dark condition (bagged fruits) by RNA-seq. There is a silght relationshps of transcriptional expression of redTE and MdMYB1 between samples from high light exposure & dark condition. The mining of transcriptome data showed redTE to be transcribed in flanking LTR (482bp, gray regions within LTR sequence of redTE, see following), but reads within gray regions matching with redTE (Chr.9) is identical with another LTR of TE (Chr.6). Therefore, we did not distinguish transcriptional reads from redTE or other TE. As for how LTRs drives MdMYB1, by providing either other transcription factor binding site or influencing the chromatin state in a non-sequence specific manner, or as non-coding RNA (eRNA), need to be further studied in future. So we didn't provide relative description with a transcriptional expression analysis of redTE and MdMYB1 in our report.
Here, to demonstrate that redTE acts as an enhancer of MdMYB1 expression, the construct with the redTE obviously increased expression of the luciferase reporter Response: Thanks you for providing these insights. The DNA methylation changes, we compared our data with the studies that have previously been published. Indeed, this statement, "indicating that methylation may be associated with the red fruit phenotype", is easy to misunderstand. We have revised and stated "indicating that redTE-induced epigenetic change may be associated with different coloration patterns in red-skinned apple". TE can alter epigenetic marks to nearby genes thereby modifying the host gene expression (Bucher, Etienne, Jon Reinders, and Marie Mirouze. "Epigenetic control of transposon transcription and mobility in Arabidopsis." Current opinion in plant biology 15.5 (2012): 503-510.). In fact, under reviewer's guidance, we found that the degree of methylation in Hanfu is much higher than that of HanM (bud sports of Hanfu) in the region MR12 ( Supplementary Fig. 16b), indicating that transposon-mediated epigenetic regulation may control the variable colour patterns, See lines 369-377. We have properly modified the statement as explained above.

"Embroy" instead of Embryo
Response: This has been corrected as suggested (Fig. 1a), and we have moved suppl. Does Copia-7 correspond to the HODOR repeat that was identified in GDDH13? It shows a very similar chromosomal distribution. If it is the same it should be named as such.
Response: Actually, Copia-7 shows a low similarity with HODOR and the HODOR repeat was identified as the most repetitive consensus sequence in the whole apple genome, but Copia-7 was identified as the most repetitive sequences in the identified heterochromatin regions (lines 153-163).
line 189: " 'Hanfu' evolved much faster, providing resistance to various environmental stresses."The authors cannot state this as the opposite (that Golden rapidly lost resistance) may as well be true.
Response: Thanks for the rigorous suggestion and this has been modified as "these genes in two cultivars may evolve under different selection pressures, providing resistance to various environmental stresses", see lines 197-199. line 212: "after 'Hanfu' diverged from 'Golden Delicious'" I guess the authors mean after Hanfu and GD diverged from a common ancestor?
line 273: "we found that approximately 2.9% of the LTR-RTs in the HFTH1 genome may have been deleted and replaced with completely unrelated sequences in the GDDH13 genome (Fig. 3b)."It is unclear to me how the authors get to this conclusions. Are there remnants of the TEs, the actual TSD of redTE?
Response: We have elaborated on this question (lines 281-287 and 668-671). The 'deletion' was defined in both cases: (1) if two TSDs exist at the corresponding inserted site of the GDDH13 genome and the intervening sequence between two TSDs with length less than 100 bp or had no blast hits (BLASTN e-values ≤ 10) to the corresponding LTR-RT sequence in the HFTH1 genome. Partial sequence of this LTR-RT has been deleted or replaced with other TE sequences.
(2) No TSDs exist at the corresponding inserted site of the GDDH13 genome. The LTR-RT with two flanking sequences has been deleted. Most of the "replaced sequences" are the remnants of other TEs, because these sequences cannot be aligned to the corresponding LTR-RT sequence in the HFTH1 genome, but can be aligned to other TEs at other positions in the GDDH13 genome. Two TSDs of redTE can be detected in the HFTH1 genome, but only one can be detected in the GDDH13 genome ( Fig.   4a). In order to distinguish from 'deletion and insertion' events defined in previous SV analysis, we have modified the term 'deletion' in this section to 'excision'.
There is also the epigenetic silencing of MdMYB1/10 that has been shown previously by multiple reports. Do these lines also carry the redTE? Are the DNA methylation changes associated with the redTE? Are the MR primers the same as the ones used in the other studies?
Response: Because we could't get these special mutation lines that were used in other reports, we don't sure whether these special mutation lines also carry redTE. But we chose randomly the accessions of some bud sports harbor redTE in our study (Supplementary Table 10 Response: This exactly is a result of computational prediction. According to your suggestion, we selected two transcription factor MdHY5, and MdCBF2 for further analysis. We tried to perform an experimental supplement by yeast one-hybrid to test whether MdHY5is able to bind the predicted elements of redTE, but there was self-activation when target fragments of redTE was fused to pAbAi for bait vector construction. Also, we tried to do it by EMSA, but GST-MdHY5 fusion protein was not detected in our prokaryotic expression system under different induction condition. The NCBI database does not contain an entry for PRJNA482033 Response: we have asked NCBI to release all data of this project, FASTA files of chromosomes and genes, gff files for gene models also can be downloaded from https://github.com/moold/Genome-data-of-Hanfu-apple.

Reviewer #2 (Remarks to the Author):
This study presents a high-quality apple genome with a better contiguity and completeness than previously sequenced apple genomes. The availability of genome sequence from another apple genotype is certainly helpful to the community as the genetic variation can be identified and exploited for apple breeding. However, the manuscript is largely descriptive and does not provide much novel information on apple genome evolution and biology. In addition, most of the conclusions the authors drew are not supported by the data they presented (details below).
Response: Thanks you for remarks and pointing out the shortcomings of our work.
Here, we made a substantial improvement for our manuscript and polished it.
It was quite an achievement to assembly a novel apple genome with a better contiguity and completeness, which will provide much novel information on apple genome evolution and biology for the community. In our studies, we also found extensive genomic variations with far exceeding expectations, and especially the discovery of redTE is intriguing, which is an important step forward in understanding red colour in apple.

Genome assembly, annotation and assessment
The authors need to provide a genome size estimation of HFTH1 by kmer and flow cytometry analysis. Genome sizes estimated for apple in different studies are highly  Fig.2a, so that the results are more explicit (Fig. 2a).
Line 199-202: These two sentences are disconnected. What is the logic to put together "the deletion in 5' UTR of MdCBF2" and "MdCBF2 and other MdCBFs were responsive to low-temperature treatment"?
Response: we removed this sentence as suggested.
Line 220-222: The conclusion here is too speculative. The low diversity in one end of chromosome 5 could be due to introgression and fixation during domestication and breeding process, instead of long-term evolution after WGD.
Response: Thanks for the suggestion and we revised this as suggested.

Dynamic evolution of LTR retrotransposons
The evolution of TEs in apple has been emphasized previously in Daccord et al Line 239-242: These statements need more evidence, otherwise it's too speculative.
Response: Thanks for the rigorous suggestion and we totally agree with your opinions.
In apple, we try to enrich the dynamic evolution of TEs by the comparative genomic data. But, it is difficult for us to give more evidences in this project. In order to make our description more explicit, so we have deleted this sentence.
Response: This has been corrected as suggested (lines 255). Response: Thank you and we have checked our findings in previous published papers.
We agree that "Indel-associated mutations have been widely reported in plants and animals", but the trend of mutation rate as the divergence time increased did not report by any published papers (Fig. 3f, g), we have revised this sections and simplified the description of previous reported results.
Lines 276-277, "deletion events might have made a greater impact than insertions did during the evolution of TEs": deletions and insertions are relative depending on which genome is used as the reference. "deletions" in HFTH1 mean "insertions" in GDDH13. Therefore, this general conclusion is totally wrong. See lines 290-291). GDDH13 (MdMYB1-2) showed a different base (G) in the second intron, but this different base was also found in the sequence of Cripps Pink (MdMYB1-3).

Sequence alignment of HFTH1 and Cripps'RedMYB1-1
Line 297-310: The authors identified several SNPs and two large indels in the                                                Line 314-315, "redTE was not found in the non red-skinned accessions": I do find in Table S10 two non-red-skinned accessions ("Roxbury Russet" and "Ningguan") containing redTE.

T T T T A A
Response: Thank you for pointing out mistakes for our negligence. So, we had properly modified them, according to our records (Supplementary Figure 11).
Line 322-324: I couldn't follow the logic here. Why talk about flesh color here? It's OK to conclude that red-flesh in these three accessions is independent of redTE, but the conclusion that it is caused by constitute MdMYB10 expression is highly speculative.
Response Line 331-335, Figure S15: Line 360-373: Ref 34 describes the identification of a TE using differential display between Jonagold and its color mutants. Authors from this paper never concluded that "red apples have arisen from non red-skinned apples by transposon-induced mutation".
Discussion in the section is highly speculative and I couldn't find any of their data to support their speculations. For example, I could not find expression data to support redTE as "a tissue-specific enhancer".
Response: We agree that the citation here is incorrect. In the revised manuscript, we had rewrited. Although enhancer is usually defined as cell or tissue-specific regulatory elements in textbook, we deleted it, in order to make this paper more rigorous.

M. sieversii in Xinjiang
Line 406-411: Another speculation. A lot of new data need to be generated to support the TE-regulated expression of MdACS3 and the usefulness of the said marker.
Response: Due to the lack of precise phenotype data, we removed this speculation.
Special thanks to you for your good commends.

Reviewer #3 (Remarks to the Author):
In the manuscript entitled "A high-quality apple genome assembly reveals a retrotransposon controlling red fruit colour" by Zhang et al the authors present a large data set further refining the apple genome, from its first published draft, to a more highly resolved resource. They then focus on one trait -skin color -to illustrate the versatility of having this new detail.The final assembly has very impressive statisticssimilar or better than model plant genomes.
The paper is well written and the data is convincing.I have a few issues with the way the paper introduces the work. Firstly it wasn't until the first section of the results, that I realised the genome wasn't a double haploid or some other trick of homozygosity. Somehow the authors should state clearly earlier what was done to make HFTH1, and why it lacks much of the heterozygosity that apple usually has.
Response: Thanks for nice suggestions, we have now clarified the origin of HFTH1 in  (2007)). Homozygous tri-haploids HFTH1 (3n) was obtained from the anther culture of heterozygous parents 'Hanfu'. These results were confirmed by SSR markers and k-mer spectrum analysis as indicators of the homozygosity, but may produce some mutations during in vitro culture, so HFTH1 lacks much of the heterozygosity that apple usually has.
Line 65 -the genetic basis of apple skin color is well understood -perhaps not the mutation that drives yellow verses green skin.
Response: Although, the studies on anthocyanin biosynthesis has been mounting, the mechanism behind the developmental regulation of anthocyanin biosynthesis in fruits remains limited understood. For example, MdMYB1 is the critical light-inducible TF for anthocyanin biosynthesis. However, how MdMYB1 is transcriptionally regulated is still not clear. In apple, colour mutation is a complex and intriguing problem. Fruit skin colour was determined by the ground colour and over colour (anthocyanin pigmentation which was superimposed). Ground colour (yellow verses green skin) and over colour of the apple skin were controlled by major gene separately. Therefore, it is perhaps not the mutation caused by redTE that drives yellow verses green skin.
Response: Because HFTH1 is the first generation after a spontaneous chromosome