Adaptation and conservation insights from the koala genome

The koala is the only extant species of the marsupial family Phascolarctidae and is now classified as ‘vulnerable’ due to habitat loss and widespread disease. We sequenced the koala genome, producing the most complete and contiguous marsupial reference genome to date. We show that the koala’s ability to detoxify eucalypt foliage, toxic to most other mammals, may be due to expansions within a Cytochrome P450 gene family, and its ability to smell, taste, and moderate ingestion of plant secondary metabolites, may be due to expansions in the vomeronasal and taste receptors. We characterised centromeres and novel lactation proteins that protect young in the pouch, as well as immune responses to chlamydial disease. Historical demography revealed a significant population crash coincident with the decline of Australian megafauna, while contemporary populations revealed biogeographic boundaries and increased inbreeding in populations impacted by historic translocations. Genetically diverse populations requiring habitat corridors and translocation programs were identified and provide the key to the koala’s survival in the wild.


Introduction
The koala is an iconic Australian marsupial, instantly recognisable by its round, humanoid face and distinctive body shape. Fossil evidence reveals as many as 15-20 species, following the divergence of koalas (Phascolarctidae) from terrestrial wombats (Vombatidae) 30-40 million years ago 1,2 ( Supplementary Fig. 1). The modern koala, Phascolarctos cinereus, which first appeared in the fossil record ~350,000 years ago, is the only extant species of the Phascolarctidae. Like other marsupials, koalas give birth to underdeveloped young. Birth occurs after just 35 days of gestation with young lacking immune tissues or organs. Their immune system develops while they are in the pouch meaning survival during early life depends on immunological protection provided by mother's milk.
A specialist arboreal folivore feeding almost exclusively from Eucalyptus spp., the koala has a diet that would be toxic or fatal to most other mammals 5. Due to the low calorific content of this diet, the koala rests and sleeps up to 22 hours a day 6. A detailed understanding of the mechanisms by which koalas detoxify eucalyptus and protect their young in the pouch has previously eluded us, as there are no koala research colonies and access to milk and tissue samples is opportunistic. The genome enables unprecedented insights into the unique biology of the koala, without having to euthanize or disturb an animal of conservation concern.
The genome also enables a holistic, scientifically grounded approach to koala conservation. Australia has the highest mammal extinction record of any country during the Anthropocene 7, and koala numbers have plummeted in parts of the northern end of its range since European settlement of the continent 8, but increased in sections of the southern end of the range, notably in parts of Victoria and South Australia. The uneven response of koala populations throughout its range is one of the most difficult issues in its management 9. The species was heavily exploited by a pelt trade (1870s to late 1920s) which harvested millions of animals 8,10,11. Today, the threats are primarily due to loss and fragmentation of habitat, urbanisation, climate change and disease. Current estimates put the number of koalas in Australia at only 329,000 (range 144,000-605,000), and a continuing decline is predicted 8. Koalas present a complex conservation conundrum: causes of decline in the north include due to ongoing habitat fragmentation, urbanisation, and disease; yet in the south has followed a different path 12, with widespread, often sequential, translocations (1920-1990s) from a limited founder population have resulted in genetically bottlenecked populations that are overabundant, to the point of starvation, in some areas 13. There are marked differences in the degree to which threats affect each population, thereby cautioning against one prescription for population recovery.
Adding to the complexity of koala conservation is the impact of disease, specifically koala retrovirus (KoRV) and Chlamydia. KoRV arrived in Australia, it is postulated, via a putative murine vector before cross-species transmission 14,15. It is now widespread in northern koalas and appears to be spreading to southern populations 16. Some strains appear to be more virulent than others and are putatively associated with an increase in neoplastic disease 17. Similarly, Chlamydia, which in some individuals causes severe symptoms yet in others remains asymptomatic, may have crossed the species barrier from introduced hosts such as domestic sheep and cattle following European settlement 18. A complete koala genome offers insights into the species' genetic susceptibility to these diseases, provides the genomic basis for innovative vaccines, and can underpin novel conservation management solutions incorporating the species' population and genetic structure, such as facilitating gene flow via habitat connectivity or translocations.

Genome landscape
Koalas have 16 chromosomes, differing from the ancestral marsupial 2n = 14 karyotype by a simple fission of ancestral chromosome 2 giving rise to koala chromosomes 4 and 7 19. We sequenced the complete genome using 57.3-fold PacBio long-read coverage, generating a 3.42 Gb reference assembly. The primary contigs from the FALCON assembly (representing homozygous regions of the genome) yielded genome version phaCin_unsw_v4.1. This comprised 3.19 Gb, including 1906 contigs with an N50 of 11.6 Mb and the longest at 40.6 Mb. The heterozygous regions of the genome (representing the alternate contigs from the assembly) totalled 230 Mb, with an N50 of 48.8 kb (Table 1; Methods and Supplementary  Tables 1-3). Approximately 30-fold coverage of Illumina short reads was used to polish the assembly. BioNano optical maps plus additional conserved synteny information for marsupials were used for scaffolding 24 to assemble long-read contigs into 'virtual' chromosome scaffolds (or 'super-contigs') (Supplementary Tables 4-5 and Supplementary Note 2.1). The largest super-contig spanned approximately half of koala chromosome 7 ( Supplementary Fig. 2).
Our long-read-based sequence presented the opportunity to identify and study centromeres, which are multi-megabase "black holes" in eutherian genome assemblies due to intractable higher order arrays of satellites (e.g. human and mouse) 27. Centromeres are also smaller in marsupials than eutherians, so more amenable to analysis 28. ChIP-seq using centromeric antibodies (CENP-A and CREST) 29 enabled the identification of scaffolds containing putative centromeric regions ( Supplementary Fig. 3) and characterisation of known and novel repeats, including composite elements within koala centromeric domains (Supplementary Table 6; also Supplementary Tables 7-10) yet lack the previously annotated retroelement, Kangaroo Endogenous Retrovirus (KERV), found in some tammar wallaby centromeres 30. Koala centromeres span a total of 2.6 Mb of the koala haploid genome, equivalent to an average of 300 kb of centromeric material per chromosome. Like other species with small centromeres 27,28,31,32, koala centromeres lack higher-order satellite arrays (Supplementary Tables 7-10). Among the novel repeats we identified some are similar to composite elements recently described in gibbon centromeres 33, in which absence of higher order satellite arrays accompanied the evolution of novel composite elements with putative centromere function. The composition of the koala centromere therefore supports mounting evidence that transposable elements represent a major, functional component of small centromeres when higher order satellite arrays are absent 28,32,33.
Interspersed repeats account for approximately 47.5% of the koala genome, 44% of which are transposable elements (Supplementary Table 11). As in other mammalian genomes, SINEs and LINEs are the most numerous elements (35.2 and 28.9% of total number of elements, respectively), with LINEs making up 32.1% of the koala genome. The long-read sequence assembly also enabled full characterisation and annotation of repeat-rich long noncoding RNAs, including Rsx which mediates X chromosome inactivation in female marsupials 34. Koala Rsx represents the first marsupial Rsx to be fully annotated and to have its structure predicted (Supplementary Fig. 4 and Supplementary Note 2.4). As expected, it was expressed in all female tissues, but in no male tissues 37.
The assembled koala genome has very high coverage of coding regions: we recovered 95.1% of 4,104 mammalian BUSCOs 38, the highest for any published marsupial genome (Supplementary Table 5) and comparable with the human assembly (GRCh38, which scores 94.1% of orthologs). Analysis of gene family evolution using a maximum-likelihood framework identified 6,124 protein-coding genes in 2,118 gene families with at least two members in koala. Among these, 1,089 have more gene members in koala than in any of the other species (human, mouse, dog, tammar wallaby, Tasmanian devil, gray short-tailed opossum, platypus, chicken; Supplementary Fig. 5).
Having characterised the genome, we undertook detailed analyses of key genes and gene families in order to gain insights into the genomic basis of the koala's highly specialised biology. Gene families of particular interest were those that encode proteins involved in induced ovulation, in the complex lactation process, those responsible for immunity, and those enzymes that enable the koala to subsist on a toxic diet.

Unique biology of the koala Ability to tolerate a highly toxic diet
The koala's diet of eucalyptus leaves contains high levels of plant secondary metabolites (PSMs) 39 phenolic compounds 40 and terpenes (e.g. 41) that would be lethal to most other mammals 42. Unsurprisingly, koalas experience little competition for food resources. Eucalyptus grandis, showed substantial expansion in terpene synthase genes relative to other plant genomes 43. Eucalypt toxicity is therefore likely to have exerted intense selection pressure on the koala's ability to metabolise such xenobiotics, so we searched for genes encoding enzymes with a detoxification function and investigated sequence evolution at these loci.
Cytochrome P450 monooxygenase (CYP) genes represent a multi-gene superfamily of haem-thiolate enzymes that play a role in detoxification through phase 1 oxidative metabolism of a range of compounds including xenobiotics 44. These genes have been identified throughout the tree of life: including in plants, animals, fungi, bacteria and viruses 45. In the koala genome we found two lineage-specific monophyletic expansions of the Cytochrome P450 family 2 subfamily C (CYP2Cs, 31 members in koala) (Fig. 1a). The functional importance of these CYP2C genes was further demonstrated through analysis of expression in 15 koala transcriptomes from two koalas, revealing particularly high expression in the liver, consistent with a role in detoxification ( Supplementary Fig. 6).
Comparing CYP2C gene context in mouse versus koala revealed conserved flanking markers, strongly suggestive of tandem duplication (Fig. 1b). Further sequence-level analysis of the CYP expansions indicated that most conserved regions are under strong purifying selection (Fig. 1c). However, there is evidence that individual CYP codons have experienced episodic diversifying selection ( Fig. 1c; Supplementary Note 3.3), while purifying selection shapes the rest of the gene ( Fig. 1c; Supplementary Tables 12 and 13). Adaptive expansion of CYP2C and maintenance of duplicates appear to have worked in concert, resulting in higher enzyme levels for detoxification, while the interplay between purifying and diversifying selection resulted in neofunctionalisation within the CYPs. Such adaptations enable koalas to detoxify their highly specialised and PSM-rich diet.
The characterisation of koala CYP2Cs has significant therapeutic potential. The high expression levels of CYP2C genes in the liver explain why meloxicam, a non-steroidal antiinflammatory drug (NSAID) known to be metabolised by CYP2C in humans 46,47, and frequently used for pain relief in veterinary care, is so rapidly metabolised in the koala and a handful of other eucalypt-eating marsupials (common brush-tail possum and eastern ring-tail possum) compared with eutherian species 47,48. It is expected that other NSAIDs are also rapidly metabolised in koalas and have little efficacy at currently suggested doses 49. Anti-Chlamydia antibiotics like chloramphenicol are degraded rapidly by koalas; treatment with a single dose applicable for humans is insufficient in koalas, which require a daily dose for up to 30 to 45 days. This discovery of CYP2C gene expression levels will inform new research into the pharmacokinetics of medicines in koalas.

Taste, smell and food choice
Like many specialist folivores, koalas are notoriously selective feeders, making food choices both to target nutrients and to avoid PSMs 50. Koalas have been observed to sniff leaves before tasting them 51, and their acute discrimination has been correlated with the complexity and concentration of PSMs 52. This suggests an important role for olfaction and vomerolfaction, as well as taste. While most herbivores circumvent plant chemical defences by detoxifying one or a few compounds 53, the complexity of eucalyptus PSMs, in combination with the terpene expansion in eucalypts, led us to hypothesise that the koala requires superior capabilities both in specialist detection and in PSM detoxification. We therefore investigated the genomic basis of the koala's taste and smell senses, and found multiple gene family expansions that could enhance its ability to make food choices.
Here we report an expansion of one lineage of vomeronasal receptor type 1 (V1R) genes associated with the detection of non-volatile odorants (Supplementary Note 3.4). There are six in koala, compared with one in the Tasmanian devil and gray short-tailed opossum, and none found in tammar wallaby, human, mouse, dog, platypus or chicken. The expansion of one lineage of V1R genes is consistent with the koala's ability to discriminate between diverse PSMs. Surprisingly, given the degree of its dietary specialisation, the olfactory receptor (OR) genes characterised in koala (1,169 OR genes) revealed a gene repertoire slightly smaller than that of gray short-tailed opossum (1,431 genes), tammar wallaby (1,660 genes) and Tasmanian devil (1,279 genes) (Supplementary Note 3.5). This may be understood in the context of relaxed selection on olfactory receptors among dietary specialists 54.
We also report genomic evidence of expansions within the taste receptor families that would enable the koala to optimise ingestion of leaves with a higher moisture and nutrient content in concert with the concentration of toxic PSMs in their food plants. The koala's ability to 'taste water' is likely to be enhanced by an apparent functional duplication of the aquaporin 5 gene 55-57 (Supplementary Table 14; Supplementary Note 3.6).
The TAS2R family has a role in 'bitter' taste, enabling recognition of structural toxins such as terpenes, phenols and glycosides. These are found in various levels in eucalypts as PSMs 5,40,41,58. In marsupials the TAS2R family includes the orthologous repertoires from eutherians, and in addition three specific expansions in the last common ancestor shared by all marsupials 59,60 ( Fig. 2). Massive koala-specific duplications in four marsupial orthologous groups have produced a large koala TAS2R repertoire of 24 genes (Fig. 2). The koala has more TAS2Rs than any other Australian marsupial, and amongst the most of all mammal species 59,60, including paralogs of human and mouse receptors whose agonists are toxic glycosides (Supplementary Note 3.7; Supplementary Table 15). The TAS1R gene families, responsible for sweet taste and umami amino acid perception, have previously been reported as pseudogenized in eutherians with highly specialised diets, such as the giant panda 61. In the koala, however, we found that all TAS1R genes are putatively functional ( Supplementary Fig. 7).

Genomics of an induced ovulator
Koala reproduction is particularly interesting because the koala is an induced ovulator 62.

Genomic characterisation of koala milk
A koala young is about the size of a kidney bean and weighs <0.5 gram. It crawls into the mother's backward-opening pouch and attaches to a teat, where it remains for 6-7 months. It continues to suck after it has left the pouch until about a year old.
Analysis of the genome, in conjunction with a mammary transcriptome and a milk proteome, enabled us to characterise the main components of koala milk (Supplementary Fig. 8;) (Supplementary Table 16; Supplementary Note 3.9; and 63). The high-quality assembly of the genome enabled both identification of marsupial-specific genes, and determination of their evolutionary origins based on their genomic locations. For instance, we found that four LLP genes are tightly linked to both Trichosurin and Beta-lactoglobulin ( Supplementary Fig. 8), potentially allowing marsupials to fine-tune milk protein composition across the stages of lactation to meet the changing needs of their young.
Meanwhile, koala Marsupial Milk 1 (MM1) gene is located close to the gene encoding Very Early Lactation Protein (VELP), an ortholog of Glycam1 (or PP3) that encodes a eutherian antimicrobial protein 54 ( Supplementary Fig. 8). This region in eutherians contains an array of short glycoproteins that have antimicrobial properties and are found in secretions such as milk, tears and sweat. We propose that MM1 has an antimicrobial role in marsupial milk along with three other short novel genes located in the same region. We also detected expansions in another antimicrobial gene family, the cathelicidins. We showed that Phci-CATH5 has broad-spectrum antimicrobial activity against a range of bacteria and fungi (unpublished data E.P., Y.C., D.O., K.B.) and is able to significantly reduce the infectivity of Chlamydia pecorum by rapidly inactivating elementary bodies prior to infection, and is thus a potential topical agent for the treatment of ocular chlamydiosis (unpublished data E.P., Y.C., D.O., K.B.).

Koala immunome and disease
At the time of European settlement koalas were widespread in eastern mainland Australia, from north Queensland to the south-eastern corner of South Australia. Today they are mainly confined to the east coast and are listed as 'vulnerable' under Australia's Environment Protection and Biodiversity Conservation Act 1999 64. There is strong evidence to suggest that some fragmented populations of koalas are already facing extinction, particularly in formerly densely populated koala territories in south-east Queensland and northern NSW. A major challenge for the conservation of these declining koala populations is the high prevalence of disease, especially caused by the obligate intracellular bacterial pathogen, Chlamydia pecorum, which is found across the range, with the exception of some offshore islands 65. A primary challenge for managing these populations has been the lack of knowledge about the koala immune response to disease. Recent modelling suggests the best way to stabilise heavily affected koala populations is to target disease 66. Of the more than 1000 koalas presenting annually to wildlife hospitals in Queensland and NSW, 40% have late-stage chlamydial disease and cannot be rehabilitated. Annotation of koala immune genes enabled us to study variation within candidate genes known to play a role in resistance and susceptibility to chlamydia infection in other species (Supplementary  Tables 18-20). Basic case/control association tests for five koalas involved in a chlamydia vaccination trial revealed the MHCII DMA and DMB genes, as well as the CD8-a gene, may be involved in differential immune responses to chlamydia vaccine (Supplementary Note 3.11, Supplementary Table 21). We also conducted differential expression analysis of RNASeq data from conjunctival tissue collected from koalas at necropsy, both with and without signs of ocular chlamydiosis, revealing that in diseased animals 1508 of the 26558 annotated genes (5.7%) were two-fold upregulated, while 685 (2.6%) were downregulated by greater than two-fold when compared with healthy animals (Supplementary Note 3.12; and Supplementary Fig. 9). In diseased animals, upregulated genes were associated with GO terms for a range of immunological processes, including signatures of leukocyte infiltration (Supplementary Fig. 9). Immune responses in the affected conjunctivas were directed at Th1 rather than Th2 responses. Proinflammatory mediators such as CCL20, IL1α, IL1β, IL6 and SSA1 were also upregulated. As in human trachoma, this cascade of proinflammatory products may help to clear the infection but may also lead to tissue damage in the host 71. Furthermore, resolution of human trachoma infection is thought to require a IFN-γ driven Th1 response 72, and in diseased koalas we found that IFN-γ was upregulated 4.7-fold in the conjunctival tissue. These annotated koala immune genes -the first data on the mucosal immune response to chlamydial disease -will now enable us to define features of protective versus pathogenic immunological responses to the disease and may be invaluable for effective vaccine design.
Koala genomes are undergoing genomic invasion by Koala Retrovirus (KoRV) 73, which is spreading from the north of the country to the south. Both endogenous (germline transmission) and exogenous (infectious "horizontal" transmission) forms are extant 74. Our results provide the first comprehensive view of KoRV insertions in a koala genome. We found a total of 73 insertions in the phaCin_unsw_4.1 assembly (detailed in Supplementary  Table 22). It is likely that most of these 73 loci are endogenous, consistent with our observation of integration breakpoint sequences that are shared with one or both of the other koala genomes reported (Supplementary Tables 23-24).
We investigated the sites of KoRV insertion to define their proximity to protein-coding genes and explore possible disruptions. This revealed insertions into 24 protein-coding genes (Supplementary Table 25). However, none is likely to disrupt protein-coding capacity, since 22 insertions are in introns and the other two are in 3' untranslated regions. Transcription proceeding from the proviral LTR could possibly affect the transcription of the host genes.
Understanding the genetics of host resistance to chlamydia and the aetiology of the retrovirus will help inform the development of vaccines against both diseases, as well as translocation strategies.

Genome-informed conservation
Broad-scale population management of koalas is critical to conservation efforts. This is challenging because distribution models are not easily generalised across bio-regions, and further complicated by unique regional issues described above. Since it is not possible to generalise management, it is imperative that decisions are informed by empirical data relevant to each bio-region.
The koala genome allowed the unique opportunity to combine historical evolutionary data with high-resolution contemporary population genomic markers in order to address these management challenges. To infer the ancient demographic history of the species, we used the long-read reference genome and short read data from two other koalas, using the pairwise sequentially Markovian coalescent (PSMC) method 75 (Fig. 3a) (Supplementary  Fig. 10; and Methods). The data show that the modern koala, which appeared in the fossil record 350kya 2, underwent an initial increase in population, followed by a rapid and widespread decrease in population size ~30,000-40,000 years ago. This is consistent with fossil evidence of rapid declines in multiple Australian species, including the extinct megafauna, 40,000-50,000 years ago 84 and 30,000-40,000 years ago 85. The koala demonstrates that there was ongoing survival of at least some species present at the time 85.
Distinct PSMC profiles of the koalas from two geographic areas and their failure to coalesce suggests some regional differences in koala populations including impediments to gene flow (Fig. 3a). Regional differentiation was also detected in analyses of mtDNA80,86 although over a shorter time scale.
We analysed populations of recent koala samples using 1200 SNPs derived from targeted capture libraries mapped to the koala genome (Supplementary Note 5.2). We found significant levels of genetic diversity with limited fine-scale differentiation consistent with long-term connectivity across regions. We find clear evidence of low genetic diversity in southern koalas, consistent with a recent history of sequential translocations 10,80,87,88 ( Fig. 3b). At a continental scale, we reveal biogeographic barriers to gene flow associated with the Brisbane Valley and Clarence River as identified by mtDNA studies 80,81 and reveal a previously undetected barrier associated with the Hunter Valley, which was not previously known in koalas (Fig. 3b). Levels of inbreeding varied across regions (Fig. 3c), but the northern populations most under threat in NSW and Queensland currently show high levels of genetic diversity.
The information generated here provides a critical foundation for conservation management to maintain gene flow regionally whilst incorporating the genetic legacy of biogeographic barriers. Furthermore, the stark contrast in genome-wide levels of diversity between southern and northern populations highlights the detrimental consequences of the unmonitored use of small isolated populations as founders for reestablishing and/or rescuing of populations on genome-wide levels of genetic diversity. Low levels of genetic diversity in southern koalas have been associated with genetic abnormalities consistent with inbreeding depression, including testicular abnormalities 89. Now that we understand the consequences of past translocations, and the existing genetic structure, it is clear that maintaining and facilitating gene flow via habitat connectivity will be the most effective means of ensuring genetically 'healthy' koala populations long term. However, where more intensive measures such as translocation are required to rescue genetically depauperate southern populations, these tools and data provide the basis for decisions that maximize benefits whilst minimizing risks 90,91. Future utilization of these SNPs will also include tracking of individual pedigrees in captive koala populations and in those wild populations being intensively monitored.
The koala genome offers considerable insights into historic and contemporary population dynamics, providing essential evolutionary and genetic context for a species that is the focus of considerable management actions and resources. By providing a deeper understanding of disease dynamics and population genetic processes including the maintenance and monitoring of gene flow, it will enable the development of strategies necessary to preserve the species, from the preservation of habitat corridors through to the genetic rescue of isolated populations. Some of this work is already underway. As members of government advisory committees, some of the authors have initiated inclusion of genomic information into the NSW Koala Strategy. This will be used to inform koala management in the state with the goal of securing koalas in the wild for the future.

Discussion
The koala genome provides the highest quality marsupial genome to date. This assembly has enabled insights into the colonisation of the koala genome by an exogenous retrovirus and revealed the architecture of the immune system necessary to study and treat emerging diseases that threaten koala populations. A greater understanding of genetic diversity across the species will guide the selection of individuals from genetically-healthy northern populations to augment genetically restricted populations in the south, bearing in mind that Chlamydia has not been detected on some off-shore islands, so risk assessment should be carried out before embarking on translocations. Sequencing the genome has significantly advanced our understanding of the unique biology of the koala, including detoxification pathways and innovations in taste and smell to enable food choices in an obligate folivore. Long term survival of the species depends on understanding the impacts of disease and management of genetic diversity, as well as the koala's ability to source moisture and select suitable foraging trees. This is particularly important given the koala's narrow food range, which makes it especially vulnerable to a changing climate. The genome provides a springboard for conservation of this biologically unique and iconic Australian species.

Koala Genome Online Methods
A full description of the Methods can be found in the Supplementary Information. No statistical methods were used to predetermine sample size.

Genome sequencing and assembly of the koala reference genome
Sequencing-Samples were obtained as part of veterinary care at the Port Macquarie Koala Hospital and Australia Zoo Wildlife Hospital, and from the Australian Museum Tissue Collection. Sample collection was performed in accordance with methods approved by the Australian Museum Animal Ethics Committee (Permit Numbers: 11-03, 15-05). "Pacific Chocolate" (Australian Museum registration M.45022), a female from Port Macquarie in northeast New South Wales was sampled immediately after euthanasia by veterinary staff at the Port Macquarie Koala Hospital (27/06/2012), following unsuccessful treatment of severe chlamydiosis. Two koalas from southeast Queensland; a female, "Bilbo" (Australian Museum registration M.47724) from Upper Brookfield, and a male, "Birke", from Birkdale, were sampled following euthanasia due to severe chlamydiosis (20/08/2015) and severe injuries (26/8/2012) respectively. High Molecular Weight (HMW) DNA was extracted from heart tissue for "Pacific Chocolate" and kidney tissue for "Birke" using the DNeasy Blood and Tissue kit (Qiagen), with RNaseA (Qiagen) added following digestion. HMW DNA from "Bilbo" was extracted from spleen tissue of using Genomic-Tip 100/G columns (Qiagen) and DNA Buffer set (Qiagen). Fifteen SMRTbell libraries were prepared (RCG) as per the PacBio 20kb template preparation protocol, with an additional damage repair step performed after size selection. A minimum size cutoff of 15 or 20kb was utilized in the size selection stage using the Sage Science BluePippin™ system. The libraries were sequenced on the Pacific Biosciences RS II platform (Pacific Biosciences) employing P6 C4 chemistry with either 240 minor 360 min movie lengths. A total of 272 SMRT Cells were sequenced to give an estimated overall coverage of 57.3x based on a genome size of 3.5Gbp. A TruSeq DNA PCR free library was constructed with a mean library insert size of 450 bp. 400,473,997 paired-end reads were generated yielding a minimum coverage of 34x. HMW gDNA was sequenced on an Illumina 150bpPE HiSeq X Ten sequencing run (Illumina) before assembly was estimated by total bases from reads divided by 3.5 Gbp genome size. The estimated total coverage is 57.3x. FALCON leverages error-corrected long seed reads to generate an overlapping layout consensus representation of the genome. Approximately 23x of long reads are required by FALCON as seed reads, and the rest are used for error correction. The seed read length of the reads at the 60% percentile was calculated as 10,889 bp. The FALCON assembly was run on Amazon Web Service Tokyo region using r3.8xlarge spot instances as compute node, with the number of instances varying from 12~20 depending on availability.
After filtering low-quality and duplicate reads, approximately 57.3-fold long-read coverage was used for assembly. The primary contigs from the BUSCO analysis on the draft assembly, was run against the mammalian ortholog database with the --long parameter on all genomes under comparison. This initial analysis showed the assembly only reached about 60% of genome completeness, suggesting a high number of indels in the draft genome. The genome polishing tool, Pilon 92, was employed to improve draft assembly from FALCON. About 30x of 150 bp paired-end Illumina X Ten short reads from "Bilbo" was used as an input for this polishing process, which was run on a compute cluster provided by Intersect Australia Limited.
We implemented the method of Deakin et al 24 for super-scaffolding. Briefly, tables of homologous genes were generated using the physical order of genes on the chromosomes of gray short-tailed opossum and tammar wallaby as references and koala phaCin_unsw_v4.1 (Bilbo) as target (Supplementary Table 4).

Analysis of centromeric regions and repeat structure
Repeat content was called using RepeatMasker with combined RepBase libraries (v 2015-08-07) and RepeatModeller calls generated from the genome assemblies. The resulting calls were then filtered using custom python scripts to remove short fragments, and combine tandem or overlapping repeat calls. To characterize the centromeric regions of the genome, chromatin immunoprecipitation (ChIP) was performed using the Invitrogen MAGnify Chromatin Immunoprecipitation System (Revision 6). Repeat content of the centromeric regions was determined using RepBase annotated marsupial repeats and output from RepeatModeller analysis of koala. RepeatMasker was used to locate repeats. Candidate centromeric segments were identified using two sliding window analyses, with a window size of 200 kb and 20 kb and a step size of 100 kb and 10 kb respectively. Small tandem repeats were discovered in koala RSX sequence using the Tandem Repeat Finder program 93, using +2, -3, and -7 as scores for match, mismatch and gap opening respectively. Alignments of consensus repeat units with the RSX sequence were processed to obtain nucleotide frequency at each position. Chip-seq data is deposited under Bioproject: PRJNA415832 and GEO submission: GSE111153 (see URLs ).

Genome annotation and gene family analysis
Annotations were generated using the automated genome annotation pipeline MAKER 94,95. We masked repeats in the assembly by providing MAKER with a koala specific repeat library generated with RepeatModeler 96, against which RepeatMasker (v 4.0.3) 97 queried genomic contigs. Gene annotations were made using a protein database combining the Uniprot/Swiss-Prot 98, protein database and all sequences for human (Homo sapiens), gray short-tailed opossum (Monodelphis domestica), Tasmanian devil (Sarcophilus harrisii) and tammar wallaby (Notamacropus eugenii) from the NCBI protein database 99, and a curated set of marsupial and monotreme immune genes 100. We downloaded all published koala mRNAseq reads from SRA (PRJNA230900, PRJNA327021) and reassembled de novo male, female and mammary transcriptomes using the default parameters of Trinity v 2.3.2 101. Each assembly was filtered such that contigs accounting for 90% of mapped reads were passed to MAKER as homologous transcript evidence. Ab initio gene predictions were made using the programs SNAP 102, Genemark 103, and Augustus 104. Three iterative runs of MAKER were used to produce the final gene set.
Gene families were called using NCBI Blast (2.3.0) OrthoMCL (2.0.9, 105). The protein sequences of genes belonging to orthogroups identified by OrthoMCL were aligned using MAFFT (7.2.71, 106) and the gene tree was inferred using TreeBest (1.9.2, 107) providing a species tree to guide the phylogenetic reconstruction. Custom scripts were applied to identify families with expansion within the koala, Diprotodontia, Australidelphia and marsupial lineages.

RNASeq analysis of koala conjunctival tissue samples
Conjunctival tissue samples were collected from 26 koalas euthanised due to injury or disease by veterinarians at Australia Zoo Wildlife Hospital, Currumbin Wildlife Hospital and Moggill Koala Hospital. The collection protocol was approved by the University of the Sunshine Coast Animal Ethics Committee (AN/S/15/36). Health assessments of the eye were performed by an experienced veterinarian and classified as either 'healthy' (n=13) or 'diseased' (n=13) based on evidence of gross pathology consistent with ocular chlamydiosis 65. Conjunctival tissue samples from each animal were placed directly in RNALater (Qiagen, Germany) buffer overnight at 4°C prior to storing at -80°C for later use. RNA was extracted using an RNeasy Mini Kit (Qiagen, Germany), according to the manufacturer's instructions, with an on-column DNase treatment to eliminate contaminating DNA from the sample. The concentration and quality of the isolated RNA was determined using a NanoDrop ND-1000 160 Spectrophotometer and Agilent BioAnalyser (Agilent, USA). Library construction and sequencing were performed by The Ramaciotti Centre (UNSW, Kensington, NSW) with TruSeq stranded mRNA chemistry on a NextSeq500 (Illumina, USA). Reads were mapped to the phCin_unsw_v4.1 assembly using the default parameters of STAR 110 and counts summed over features using featureCounts 111. Differentially expressed genes were called using DESeq2 112 as implemented in the SARTools package 113. Reads have been deposited in the SRA under the accession BioProject PRJEB19389.

Koala Retrovirus (KoRV)
We searched for KoRV sequences within the scaffolds of the phaCin_unsw v4.1 assembly of the Bilbo genome sequence, and also within alternative contig sequences prior to their correction by Pilon (since we noticed that in a few cases KoRV sequences were removed in the course of the sequence polishing process). KoRV sequences were found using by using the program blastn 114 to search with KoRV genome reference sequences (GenBank AF151794 and AB721500) as well as with a recKoRV sequence from Löber et al. 115. Search results were converted to BED format and the KoRV and recKoRV components of each read were merged with the program mergeBed. KoRV insertions within genes were identified using the program intersectBed 116. Pre-integration allelic sequences were found by using blastn 114 to search the phaCin_unsw v4.1 genome sequence assembly with sequences flanking KoRV/recKoRV integrations as queries. In two cases the expected allelic sequence was not present in the Bilbo genome, but was found by searching the genome of another koala (Pacific Chocolate). To check the expected relationship between pairs of allelic sequences we inspected dotplot alignments of representative sequences (not shown) created with the program dotter 117.

Koala population genomics
Historical population size-Demographic history was inferred from the diploid sequence of each of the three koalas, using a pairwise sequential Markovian coalescent (PSMC) method 75. We conducted a range of preliminary analyses and found that PSMC plots were not sensitive to the values chosen for the maximum number of iterations (N), the number of free atomic time intervals (p), the maximum time to the most recent common ancestor (t), and the initial value of rho (r). Based on these investigations, our final PSMC analyses of the three genome sequences used values of N=25, t=5, r=1, and p=4+25*2+4+6.
The number of atomic time intervals is similar to that recommended for analyses of modern human genomes 75, which are similar in size to the koala genomes. We determined the variance in estimates of N e using 100 bootstrap replicates. Replicate analyses in which we varied the values of p, t, and r produced PSMC plots that were broadly similar to those using our chosen 'optimal' settings ( Supplementary Fig. 10).
The plots of demographic history were scaled using a generation length of 7 years, corresponding to the midpoint of the range of 6 to 8 years estimated for the koala 118 and the midpoint of the estimates of the human mutation rate (1.45 x 10 -8 mutations per site per generation; summarised by 119) and mouse mutation rate (5.4 x 10 -9 mutations per site per generation 120) was applied in the absence of a mutation rate estimate for koala ( Supplementary Fig. 10). The koala mutation rate is likely to be closer to that of humans, based on greater similarity in genome size, life history, and effective population size, relative to mouse 119.
Contemporary population analysis-Fifty-six koalas were sampled throughout the distribution using a hierarchical approach to allow examination of genetic relationships at a range of scales, from familial to range-wide. All individuals were sequenced using a target capture approach described in 121, with a kit targeting 2167 marsupial exon sequences. Illumina sequence reads were quality-filtered and trimmed (see 74 for details) and mapped to the koala genome (Bowtie2, v2.2. 4 122). A panel of 4257 SNP sites was identified (using GATK version 3.3-0-g37228af 123) that showed expected levels of relatedness and differentiation among the sampled individuals.
A set of SNP sites was identified (using GATK version 3.3-0-g37228af 123) and showed expected levels of relatedness among individual samples. A panel of 1200 SNPs (obtained by mapping to targets, filtering, and selecting one SNP per target) showed fine-scale regional differentiation consistent with evolutionary history and recent population management (Fig.  3).

Code Availability Statement
Custom scripts 1) to identify gene families with expansion within the koala, Diprotodontia, Australidelphia and marsupial lineages; 2) to identify refined repeat calls; 3) and code used to generate SNP genotypes from exon capture data are available at: https://github.com/ DrRebeccaJ/KoalaGenome

Data Availability Statement
The Phascolarctos cinereus BioSamples are as follows: Bilbo 61053 -SAMN06198159, Pacific Chocolate -SAMEA91939168 and Birke -SAMEA103910665. Koala Genome Consortium Projects for the Koala Whole Genome Shotgun project and genome assembly are registered under the umbrella BioProject PRJEB19389 (union of PRJEB5196 and PRJNA359763).
Transcriptome data is submitted under PRJNA230900 (adrenal, brain, heart, lung, kidney, uterus, liver and spleen) and PRJNA327021 (milk and mammary gland). Chip-seq data have been deposited under Bioproject PRJNA415832. Illumina short-read data for Birke is submitted under PRJEB19982.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.  TAS2R genes are responsible for bitter taste perception, a role that makes them very important in koalas' need to optimise nutrient content against the high concentration of plant secondary metabolites in the various plants on which they feed. a Maximum-likelihood tree of TAS2Rs (including pseudogenes) in the four marsupials, where the sequences contained 250 amino acids. 28 representative TAS2Rs of orthologous gene groups (OGGs) in eutherians (red circles) and 7 platypus TAS2Rs (grey circles) were also used. There were 27 a Phylogenetic tree of the CYP2 gene family in the koala (31 members of CYP2), as compared with marsupials: tammar wallaby, Tasmanian devil, gray short-tailed opossum; and eutherian mammals: human, rat, mouse, dog, platypus; and outgroup chicken. Two independent monophyletic expansions are seen in koala, in the CYP2C subfamily (highlighted by red sectors).  Table 1 Comparison of assembly quality between koala genome assembly phaCin_unsw_v4.1 and published marsupial and monotreme genomes.