Introduction

Mammals have a complex immune system that predates the divergence of eutherians and marsupials, with these two groups having broad commonalities in the gene content, structure and function of their immune systems1,2,3,4. However, a crucial difference exists in their mode of reproduction. While eutherians have a long gestation period with extended uterine development of young, marsupials have a short gestation followed by a prolonged period of development, usually in the pouch5. Marsupial young are born undeveloped and immunologically naïve. At birth they do not have an adaptive immune system6,7 yet are able to survive the pouch environment, known to contain an array of potential pathogenic bacterial and fungal species8,9. After an extended early lactation period during which the young is fixed to the mother’s teat, marsupial young emerge from the pouch and face new immune challenges during mid and late lactation as they are exposed to novel pathogens outside the pouch.

Due to the long period of extra-uterine development, marsupial and eutherian lactation differs in several ways. While the composition of eutherian milk is relatively consistent after an initial colostral stage5, the volume and composition of macronutrients, micronutrients and protein content of marsupial milk changes dramatically throughout lactation10,11,12,13. The concentration of immune compounds including immunoglobulins (Igs) varies throughout lactation to meet the changing needs of marsupial young13,14,15,16. This distinct form of lactation, where milk provides crucial immune protection of the developing young across the period of lactation14,15,16,17, is reflected in the immunogenetic repertoire of marsupials. For instance, previous research has discovered an expanded repertoire of cathelicidins in marsupials that are expressed in the mammary gland and have broad spectrum antibacterial activity including against multi-drug resistant strains1,18,19.

Koalas (Phascolarctos cinereus) are a unique and iconic Australian marsupial, and the last surviving species of the family Phascolarctidae. Koalas are born after just 34–35 days of gestation20. Young first emerge from the pouch at around six months to seven months, but they continue to suckle for up to one year21. The major cellular components of koala milk are immune cells including neutrophils and macrophages22. The temporal pattern of milk composition in koalas differs from that of other marsupials, indicating that koalas may have adopted a different lactation strategy from other marsupial species11. Koalas are of interest from an immunological perspective due to the infectious diseases they carry. Most significant among these is the transmissible and recently endogenised koala gamma-retrovirus (KoRV)23. This virus has been implicated in immunosuppression and immunomodulation of the host, and is associated with fatal lymphoma and leukaemia24. In addition koalas are particularly susceptible to Chlamydia, which has a significant impact on female koala fertility25.

The aim of this study was to investigate the protein components of koala milk. To do this we used multiple data sources including a mammary transcriptome, milk proteomes from two koalas at two different stages of lactation, and the koala genome. We focus on the immune components of the milk and analyse the proteomic differences between the two stages of lactation.

Results

Most abundant transcripts and peptides

In this study we constructed a transcriptome from the mammary gland of a koala during early lactation, and two koala milk proteomes, one from early lactation and one from late lactation. The mammary transcriptome assembly contained a high number of contigs (nearly 225,000). We used CEGMA26 and BUSCO27 to search for a defined set of single-copy, conserved eukaryote and vertebrate orthologs in the assembly. CEGMA reported 245 complete alignments against the core set of 248 eukaryote orthologs (98.8% complete, up to 99.6% when partial alignments are included). BUSCO reported 85.9% complete (89.8% complete including partial alignments) of 3,023 conserved vertebrate orthologs. To remove assembly artefacts and transcriptional noise, we filtered transcripts with a Fragments Per Kilobase Million (FPKM) <1, retaining 26,852 transcripts. Of these, 10,700 transcripts have unique BLAST28 hits to the SwissProt non-redundant database, providing an estimate of the number of proteins encoded in the transcriptome. The early lactation proteome includes 230 peptides, while the late lactation proteome identified 235 peptides. As the milk samples were snap frozen prior to our analyses we cannot exclude that the origin of some proteins detected in koala milk proteome result from lysis of cells in the milk. The two proteomes shared 106 peptides in common (46.1%). The 50 most highly expressed transcripts in the mammary gland transcriptome and the 50 most abundant peptides in the milk proteomes are shown in Table 1, while pie charts showing the percent abundance of the top 20 transcripts/peptides are shown in Fig. 1. See Supplementary Tables S1–3 for the top 200 expressed transcripts in the transcriptome and all transcripts matched in the proteomes.

Table 1 Top 50 most abundant transcripts/peptides in the mammary transcriptome and two milk proteomes.
Figure 1
figure 1

Pie charts of peptides/transcripts based on relative abundance as a percentage.

Top 20 in each koala data set is labelled, with remaining peptides/transcripts grouped under ‘All other’. (A) Early lactation mammary gland transcriptome. (B) Early lactation milk proteome. (C) Late lactation milk proteome.

Known major milk proteins were identified among the most abundant transcripts and peptides. β-lactoglobulin, a major nutrient protein29, was the most abundant transcript and peptide in the early lactation sample and was the second most abundant peptide in the late lactation sample. All three caseins (α, β and κ), key nutritional compounds in the milk, were highly abundant in all the samples investigated, and beta casein was the most abundant casein in both early and late lactation samples (10th most abundant peptide in both). Trichosurin, a protein unique to marsupials, was highly abundant in both early and late lactation (7th and 11th most abundant peptide respectively). In the late lactation milk sample, lactotransferrin was the most highly abundant peptide and was far more abundant (24.47% of peptides) than in the early lactation milk sample (0.08% of peptides). Late lactation protein (LLP), which is highly expressed during late lactation in wallaby and possum13,30, was not seen in the late lactation milk proteome. LLP was expressed in the early lactation transcriptome, although at a very low abundance. It may be that our single, opportunistic, late-lactation sample missed the window of high expression of this protein in the koala. Early lactation protein (ELP) and whey acidic protein (WAP) were both highly abundant in the early lactation proteome (5th and 6th respectively). The second most abundant peptide in early lactation was very early lactation protein (VELP).

Very early lactation protein

VELP is a protein previously identified in the milk of brushtail possum (Trichosurus vulpecula) and wallaby (Macropus eugenii)31,32. Homologs of this protein outside marsupials have not been previously identified. In the present study, we identified a full length koala VELP transcript and this was used to search the koala genome, wallaby mammary transcriptome12 and the Tasmanian devil (Sarcophilus harrisii) genome and milk transcriptome33, and full length orthologs were identified in these species. No ortholog could be identified in opossum (Monodelphis domestica) through BLAST and HMMer searches. The koala and devil sequences were identified on short scaffolds with no annotated genes, and in the devil no prediction was made at this locus by NCBI or Ensembl pipelines.

Using the VELP sequence we performed BLAST searches on the other available lactation transcriptome: the mammary transcriptomes from the wallaby12 and milk transcriptome from the devil33. We found that VELP was expressed during pregnancy, early lactation and mid-lactation in the tammar mammary gland, but not in late lactation. In the devil mid-lactation milk transcriptome, VELP was present but had very low expression. Koala VELP was very highly abundant in the early lactation milk proteome being the second most abundant peptide (13.3% of peptides). However it was also abundant in the late lactation sample being the 13th most abundant peptide.

Using BLAST and HMMer searches with full-length marsupial VELP sequences, we identified homology with the eutherian protein Glycam1, also known as PP3 or lactophorin. Glycam1 is a functional gene in only some eutherian mammals; in some species, including humans, pseudogenes have been identified34. To evaluate the relationship between VELP and Glycam1 further, we constructed an alignment using all available VELP and Glycam1 protein sequences (Fig. 2) and maximum-likelihood phylogenetic tree using all marsupial VELP sequences and a phylogenetically representative subset of eutherian Glycam1 sequences (Fig. 3). Eutherian Glycam1 sequences range from 141 to 164 residues. The marsupial sequences are similar with 159 or 160 residues. The tree groups the marsupial VELP and eutherian Gycam1 together, albeit with moderate bootstrap support (75%). We compared the genomic structure of cow and mouse Glycam1 and devil and koala VELP (Fig. 4) and found that they are highly similar. The genes each have four exons and the lengths of the exons and UTRs are very similar between the four species. The divergence of these sequences is high, both within marsupials and between marsupials and eutherians. Peptide sequence identity of the koala sequence with the wallaby and devil sequences was 75% and 67.5% respectively. Identity with eutherian sequences varied between 19.3–23.4%. Comparison of the genomic context of marsupial VELP and eutherian Glycam1 was not possible, because no other flanking genes are present on the relatively short devil and koala genomic scaffolds to which VELP maps.

Figure 2
figure 2

Alignment of marsupial VELP and eutherian Glycam1 (PP3/lactophorin) sequences.

Dots indicate identity to the koala VELP sequence. Accession numbers: Tammar wallaby (Macropus eugenii; EX207743), Brushtail possum (Trichosurus vulpecula; P85093), Tasmanian devil (Sarcophilus harrisii; GEDN01008364), Mouse (Mus musculus; Q02596), Squirrel (Ictidomys tridecemlineatus; I3N1S9), Macaque (Macaca mulata; F7H8B2), Cow (Bos Taurus; P80195), Camel (Camelus bactrianus; P15522), Rhinocerous (Ceratotherium simum; XP_004429511.1), Panda (Ailuropoda melanoleuca; G1L2P5), Armadillo (Dasypus novemcinctus; ENSDNOT00000038407).

Figure 3
figure 3

Phylogenetic tree of marsupial VELP and eutherian Glycam1 (PP3/lactophorin) sequences.

Human CD34 (P28906) and Glycophorin-A (P02724) are used as outgroups. The tree was constructed using the maximum likelihood approach and the JTT model with bootstrap support values from 500 bootstrap tests. Bootstrap values less than 50% are not displayed. Accession numbers as for Fig. 2.

Figure 4
figure 4

Schematic comparison of the gene structure of mouse (ENSMUSG0000002249) and cow (ENSBTAG00000013417) Glycam1 (PP3) and Tasmanian devil (GEDN01008364.1) and koala VELP.

Roman numerals indicate exon number. Filled in boxes indicate coding sequence, empty boxes indicate UTR sequences.

Novel milk proteins and transcripts

A novel peptide previously identified only as a transcript in the devil milk transcriptome and wallaby mammary transcriptomes (previously named Novel Gene 133) was identified in koala milk. In the koala, this peptide was the 26th and 42nd most abundant peptide in the early and late lactation milk proteomes respectively (Table 1; “Marsupial Milk 1”). This peptide was not identified in any other marsupial transcriptome through BLAST searches. Homology with other proteins was not identified through BLAST or HMMER searches of the opossum genome and the SwissProt database, nor through protein domain searches. This short protein is only 98–99 residues long, and is highly divergent, with 69.6% and 66.6% sequence identity of the koala sequence with the wallaby and devil sequences respectively. In the koala genome this gene is located approximately 8 kb upstream of the TESPA1 (Thymocyte Expressed, Positive Selection Associated 1) gene and flanked a gene that showed homology with eutherian lacritin (LACRT) and dermicidin (DCD). The orthologous region in humans encodes the genes DCD and LACRT, as well as the GLYCAM1 pseudogene and Mucin-like 1 (MUCL1), each of which encode short glycoprotein peptides with antimicrobial functions.

Two putative non-coding RNA (ncRNA) transcripts were among the top 200 most highly-expressed transcripts in the mammary transcriptome (see Supplementary Table S1). One was the 39th most highly-expressed transcript, which showed homology to a devil sequence previously characterised as a ncRNA in the automated annotation of the devil genome (Ensembl release 83). A second putative ncRNA (78th most highly-expressed) shares homology with a transcript expressed in the wallaby mammary transcriptome, suggesting a common function in marsupial lactation. This sequence was located approximately 1kb downstream of the WBSCR27 gene, in both devil and opossum genomes. WBSCR27 encodes WBS27, a methyltransferase protein. We consider this to be putatively non-coding due to the lack of homology with any known protein, absence of an open-reading frame, absence of known protein domains in Pfam, lack of introns, and absence in the koala milk proteomes.

KoRV

In the early lactation mammary transcriptome, KoRV sequences collectively represented the fourth most highly expressed transcripts and 3% of all transcripts in the transcriptome. In the milk from the same animal the three major KoRV proteins (gag, env and pol) were collectively the 14th most abundant peptides in the early milk proteome, representing 1.07% of peptides in the milk. The proportion of KoRV retroviral peptides was similar in the late lactation milk proteomes where they represented 0.73% of all peptides.

Immune Proteins

Immune proteins were a focus in this study due to the important role milk plays in immune defence of marsupial young. By performing a BLAST search of the early lactation mammary transcriptome with the Immune Database for Monotremes and Marsupials (IDMM35), 851 genes with primary immune function present in the mammary transcriptome were identified (see Supplementary Table S4). This represents approximately 9% of all genes expressed in the koala mammary gland. Among these are lysozyme, cathelicidins, immunoglobulins, complement factors, cytokines, and MHC I and II. The 50 most highly expressed immune transcripts are shown in Table 2. The top three proteins have roles in both nutrient transport and immune defence. These are ferritin, which aids in iron transfer but also sequesters free iron to prevent bacterial growth36, zinc-alpha 2 glycoprotein which is involved in lipid mobilisation and immunoregulation37, and butyrophilin which is involved in the synthesis of milk fat globules and also has a role in immune regulation38.

Table 2 Top 50 most highly expressed immune transcripts in the early lactation mammary transcriptome.

Immunoglobulins and Ig receptors

Igs in the milk have a crucial role in protecting marsupial young. IgG and IgA are the main isotypes of Igs present in eutherian and marsupial milk14,39,40,41. IgG is absorbed across the gut epithelium and enters the circulatory system of the pouch young protecting against systemic infection; IgA is not absorbed but has a critical role in protecting the neonatal gut42,43. Like IgA, IgM is not absorbed, but has a role in protecting the gut. All marsupial Ig heavy and light chains were present in the koala mammary transcriptome with the exception of IgE (Table 3). IgA and both of the Ig light chains were more abundant in the late milk sample. IgA was highly abundant in late lactation being the third most abundant peptide. IgA had a large difference in abundance between the two samples, with the IgA heavy chain comprising over 2% of peptides in the late lactation proteome, but only 0.08% of peptides in the early lactation proteome. IgG had a similar abundance across the two samples (0.68% and 0.46% of peptides in the early and late lactation proteomes respectively). IgM was more abundant in early lactation where it comprised 0.25% of all peptides, but it was not detected in the late milk sample.

Table 3 Abundance of Ig heavy and light chains in the early and late lactation proteomes.

Polymeric Immunoglobulin Receptor (PIgR), involved in transfer and protection of IgA and IgM44, was highly abundant in both early and late lactation being the 12th most abundant protein in both proteomes. FcRN, a component of the Fc receptor which allows for the transport of IgG across the intestinal wall of the neonate45, was only identified in early lactation (228th protein).

Alpha 1B glycoprotein-like

A peptide identified in the late and early milk proteomes showed homology to eutherian alpha 1B glycoprotein (A1BG), a plasma protein with unknown function46, as well as venom inhibitors characterised in the Southern opossum Didelphis marsupialis (DM43 and DM4647,48,49), all members of the immunoglobulin superfamily. To characterise the relationship between the peptide sequence identified in koala, A1BG, DM43 and DM46, a phylogenetic tree was constructed (Fig. 5) including all marsupial and monotreme homologs (identified by BLAST), three phylogenetically representative eutherian sequences, with human IGSF1 and TARM1, related members of the immunoglobulin super family, used as outgroups. This phylogeny indicates that A1BG-like proteins in marsupials and the Didelphis antitoxic proteins are homologs of eutherian A1BG, with excellent bootstrap support (98%). The marsupial A1BG-like sequences and the Didelphis antitoxic proteins formed a single clade with strong bootstrap support (97%).

Figure 5
figure 5

Phylogenetic tree of eutherian A1BG, opossum DM43 and DM46, and A1BG-like sequences in marsupials.

Human TARM1 and IGSF1, related members of the immunoglobulin superfamily are used as outgroups. The tree was constructed using the maximum likelihood approach and the JTT model with bootstrap support values from 500 bootstrap tests. Bootstrap values less than 50% are not displayed. Accession numbers: Tasmanian devil (Sarcophilus harrisii; XP_012402143), Wallaby (Macropus eugenii; FY619507), Possum (Trichosurus vulpecula; DY596639) Virginia opossum (Didelphis virginiana; AAA30970, AAN06914), Southern opossum (Didelphis marsupialis; AAL82794, P82957, AAN64698), Human (Homo sapiens; P04217, B6A8C7, Q8N6C5), Platypus (Ornithorhychus anatinus; ENSOANP00000000762), Cow (Bos taurus; Q2KJF1), Alpaca (Vicugna pacos; XP_015107031).

The wallaby and devil A1BG-like sequences were not present in the devil milk transcriptome33 and the tammar mammary transcriptome12 (determined by BLAST). However, this protein was highly abundant in the koala milk, being the 9th and 17th most abundant peptide in the early and late lactation milk proteomes respectively, and was also in the mammary transcriptome, though at a lower abundance (0.3 × 10−3 of transcripts). In addition, we searched all available marsupial transcriptomes (including transcriptomes from devil, wallaby and koala) using BLAST and found that this gene was expressed in a variety of koala tissues (spleen, lymph node, liver, bone marrow, heart and brain), but was not in any other marsupial transcriptome. This protein has not been previously identified in mammalian milk and its function is unknown. The expression of this protein in many tissues across multiple individuals without evidence of acute exposure to venom does not support a role in response to envenomation in koalas. Presence in a variety of immune tissues and its homology with immunoglobulins suggests it may have an immune role. Alternatively, as it was expressed koala liver, but not the immune tissues of devil and wallaby, a role in detoxification of dietary plant compounds is also possible.

Antimicrobial peptides (AMPs)

We examined the expression of AMPs in koala as these proteins in the milk of placental mammals have been demonstrated to prevent bacterial or viral infection. Cathelicidins have the ability to directly lyse pathogenic cells50, and are thought to be crucial in protecting immunologically naïve marsupial young18. Cathelicidins identified in the milk of the wallaby are known to have broad spectrum antibacterial activity19. Previously, five cathelicidins have been identified in the tammar mammary transcriptome12 and four in the devil milk transcriptome33. We found evidence of four cathelicidins in koala mammary tissue and milk. All four were expressed in the koala early lactation mammary transcriptome. One of these was detected in the early lactation milk proteome while two were detected in the late lactation milk proteomes. Cathelicidins were relatively abundant in the late lactation milk sample where they together comprised 1.1% of peptides. Several other peptides with direct antimicrobial activity were also identified. WAP four-disulfide core domain protein 2 (WFDC2) was abundant in the early lactation proteome (31st most abundant protein) and was also present in the late lactation proteome (113th most abundant protein). This protein has antibacterial activity against many pathogenic species of bacteria, while not showing activity against common gut commensal species51. Thus it is thought to protect against pathogenic species without disturbing the balance of commensal gut flora51. Lysozyme plays an important role in innate immunity as it is capable of lysing bacterial cells52, providing crucial protection to mammalian young. This protein identified in the late lactation proteome (51st most abundant protein), but it was not detected in the early milk proteome, and only had a very low expression in the mammary transcriptome (0.01% of transcripts). A homolog of C10orf99 (also known as AP-57 and CSBF), was identified in the late lactation proteome (70th protein). C10orf99 is an AMP that was only recently described in humans53,54. This protein has broad spectrum antimicrobial activity against gram-positive bacteria, fungal species, mycoplasma and lentivirus54. Although not in the top 100 most abundant peptides, peptidoglycan recognition protein 1 (PGLYRP1) and mucin-1 were also identified in the late lactation proteome. PGLYRP1 is an antibacterial protein that has a different structure and mechanism of action to other known mammalian AMPs. In humans, this protein is produced by epithelial cells, body secretions and leukocytes55. It is bactericidal against various pathogenic gram-positive bacteria, but not against commensal bacterial flora55. Additionally, it has a bacteriostatic effect against both gram-positive and gram-negative bacterial species55. This dual activity is thought to remove pathogenic species while limiting overgrowth of commensal species55. Mucin-1 can inhibit binding and invasion of bacterial pathogens in the gut56,57,58 and has anti-viral activity59.

Complement factors

Numerous complement factors were observed to be highly abundant in both milk proteomes. Classical complement genes have been identified in the colostrum of cows60, and have demonstrated bactericidal activity61,62,63. These components also provide protection to immunoglobulins in solution60. In the koala mammary transcriptome, eleven complement transcripts were identified while six were present in the early milk proteome and four in the late milk proteome. This included three that were highly abundant in both proteomes, (C2, C3 and C4A), all components of the classical complement pathway. These three components were more abundant in the early milk proteome together comprising 2.41% of peptides in the proteome.

Discussion

In order to comprehensively analyse the proteins and transcripts present in koala milk, with a specific focus on immune proteins, we have constructed a transcriptome from the mammary gland in early lactation, and two milk proteomes from early and late lactation. The most abundant koala milk proteins were similar to those of wallaby, possum and Tasmanian devil, including β-lactoglobulin, caseins, Trichosurin and α-lactalbumin12,32,34,64. Trichosurin, a protein unique to marsupials, was highly abundant in both early and late lactation. Its function is unknown, but a role in priming the neonate liver to produce detoxifying enzymes has been suggested65. Such a role might explain its particularly high abundance in koala milk, as their diet of eucalypt leaves is rich in phytotoxins. An ortholog of eutherian A1BG was highly abundant in both early and late milk samples, but has not previously been reported in the milk of any species. This protein may have an immune or detoxification function in koala milk.

An intriguing finding was that KoRV transcripts and peptides were highly abundant in all samples (3% of transcripts, ~1% of peptides). This indicates that retroviral particles are being transmitted in the mother’s milk. In primates, transmission of GALV, the closely related gibbon ape leukaemia virus, occurs in utero, postnatally, via contact and via faeces66. In koalas, vertical transmission of the exogenous form of KoRV does not occur via the germ line67. Previous studies have suggested that KoRV may be transmitted in utero or in the milk67. Should the milk-borne viral particles prove infectious, we provide the first evidence for a vertical transmission route via the mother’s milk.

Identification of a full-length VELP transcript allowed us to demonstrate that this gene is orthologous to Glycam1 (PP3). The expression of both VELP and Glycam1 in milk68,69,70, their sequence similarity and common gene structure provides strong evidence that marsupial VELP is orthologous to eutherian Glycam1. This protein was very highly abundant in the early lactation milk, comprising 13.3% of peptides. In contrast to wallaby and possum31,32, this protein was also abundant in koala late lactation. The divergence of these sequences is high, comparable to some of the most divergent immune proteins4. In cows, antibacterial activity of this protein against Gram positive and negative bacteria71,72, and antiviral activity73 has been demonstrated. It is therefore highly likely that VELP has antibacterial activity and may be key for protection of marsupial young.

A novel protein was the 26th most abundant protein in early lactation. Previously only the transcript encoding this protein had been identified in devil milk and wallaby mammary transcriptomes33, where it was referred to as Novel Gene 1. Homology to any protein could not be found through BLAST and HMMer searches. As this peptide appears to be unique to marsupials and has lactation-specific expression, we propose calling this gene and protein Marsupial Milk 1 (MM1). Interestingly, koala MM1 was located in a region that shares synteny with a genomic region in eutherian mammals that encodes several proteins with antimicrobial functions, including Glycam1, lacritin, dermcidin and mucin-like 174,75. Each of these proteins have a glandular expression pattern, including mammary, lacrimal and sweat glands74,75. Although no clear homology was identified between MM1 and these genes, we propose that its genomic context and similar glandular expression suggest a relationship to this group of genes, and that therefore it may have a similar antimicrobial function.

Several proteins were much more abundant in the early lactation sample than in the late lactation sample, most notably ELP, WAP, WFDC2, and VELP. Each of these proteins have potential immune roles; while ELP and WAP are both thought to prevent degradation of Igs in the gut76,77, WFDC2 and VELP likely play antimicrobial roles. Lactotransferrin, a protein key for iron transport and which aids in immune defence78, was much more abundant in the late lactation sample. Additional immune proteins, including Igs, complement and AMPs also showed a difference in expression between the two milk proteomes. These immune peptides may have a stage-dependent function during lactation, perhaps reflecting different microbial exposure at different stages.

In general Igs, in particular IgA, were more abundant in the late lactation proteome. This matches observations in possum, wallaby and common wollaroo (Macropus robustus), where IgA is highly upregulated from mid-lactation14,39,40. The elevated abundance of IgA in late lactation may aid in protecting the young from novel pathogens as it emerges from the pouch and shifts to a solid diet. Abundance of IgG was approximately the same in the early and late lactation proteomes; however FcRN, crucial for IgG transport across the gut epithelium, was only detected in early lactation. IgM was only detected in early lactation; this Ig may be important for protecting the gut of koala young while IgA expression in milk is low.

Three classical complement components (C2, C3 and C4A), were highly abundant, particularly in the early proteome (2.4% of peptides). To our knowledge, this is the first time classical complement components have been reported in marsupial milk. As classical complement factors have demonstrated bactericidal activity61,62,63 and provide protection to Igs in solution60, they may have a key role in protection of koala young, particularly in early lactation.

A large range of AMPs was identified in the koala lactation samples. Studies in placental mammals have demonstrated that AMPs in the milk can prevent bacterial or viral infection57,59; thus AMPs acting in concert are likely to have a crucial role in protecting the gut of the young from pathogens. WFDC2, a protein that protects against pathogenic species without disturbing the balance of commensal gut flora52, was highly abundant in koala early lactation. Mucin-1, which inhibits binding and invasion of pathogens in the gut57,58, was also detected in early lactation.

AMPs abundant in late lactation included cathelicidins, lysozyme, PGLYRP1, and C10orf99. Four cathelicidins, proteins thought to be crucial in protecting marsupial young18, were identified. Lysozyme, which is expressed in the milk of a range of eutherian and marsupial species32,33,64,79, was highly abundant in the late lactation proteome. C10orf99, a newly identified broad-spectrum antimicrobial, has previously only been identified in human, where it is expressed in the mucosa of the gastrointestinal tract54, but has not been reported in milk. PGLYRP1 is an antibacterial protein that has a dual activity, removing pathogenic species while limiting overgrowth of commensal species55. This protein could be crucial during late lactation, protecting against pathogens and regulating gut flora as the koala young develops its own gut flora to aid in digestion of solid food.

Currently there is a major research focus on identifying and testing antimicrobial peptides for clinical purposes as an alternative to conventional antibiotics. Most previous research has been on eutherian mammal AMPs. Cathelicidins expressed in wallaby milk have been previously found to be highly potent against pathogenic bacteria, including multi-drug resistant species18. Koala cathelicidins identified in this study can now be tested to see if they have similar potency. In addition to cathelicidins, many other AMPs were identified, some of which have not been previously reported in the milk of any mammalian species. These would make excellent targets for further investigation into the antimicrobial function with a long term aim for clinical application. Furthermore, a number novel proteins were identified in this study, and future research should focus on identifying their function. Synthesising and assaying these proteins will enable us to determine what role they play in the protection of pouch young. The findings of this study will also have applications for the process of hand-raising koala young, which is an important step in the conservation of koalas, particularly in areas of decline. Through increasing our understanding of the composition of koala milk at different stages of lactation, milk substitutes provided to hand-raised koala young may be improved in the future.

Conclusions

Koalas, like other marsupials, are born underdeveloped and grow through an extended period of lactation with distinct developmental stages. They are highly reliant on the mother’s milk to protect them from pathogens in the pouch, and from novel pathogens encountered as they emerge from the pouch. In this study we have provided an insight into the protein components of koala milk. Due to the difficulty of obtaining such samples this study is limited to two time points and two individuals, and future studies may aim to extend this to a greater number of individuals at different stages as samples become available. The proteins found in koala milk showed many commonalities with marsupial milk investigated in previous studies, although some notable differences occurred. This includes the high abundance of trichosurin and A1BG, potentially linked to the specialised diet of the koala, and the very high abundance of VELP which likely has an antimicrobial role. We have identified a range of immune proteins critical for protection of the underdeveloped koala young, both prior to, and after emergence from the pouch. This contributes to our understanding of how marsupial young are able to survive and provides us with a host of distinct immune proteins that can be investigated for their function and clinical potential.

Methods

Ethics Statement

As samples were collected from animals at necropsy following euthanasia as part of their routine veterinary care, their collection was considered by the University of the Sunshine Coast Animal Ethics Committee and considered exempt from requiring further approval (AN/E/15/06).

Milk and mammary sample collection

Samples were collected at necropsy from two female koalas admitted to Australia Zoo Wildlife Hospital for veterinary care. The koala “Leah” was euthanized following identification of an osteochondroma while the koala “Little Jo” was euthanized due to severe dog attack injuries (Table 4). Little Jo was in the early stage of lactation with an attached pouch young of 18 g, aged between 2 and 10 weeks old. Both milk and mammary gland tissue were collected from Little Jo. Milk was collected from Leah who was in the late stage of lactation with a young aged 8 months. The mammary gland sample was stored in RNA-later and the milk samples were snap frozen.

Table 4 Koala sample details.

Koala milk sample preparation for proteomics

The frozen milk samples were thawed and 100 μL was diluted in 400 μL of ultrapure water and spun at 14,000 ×g for one hour to separate the top lipid layer from the water soluble fraction (containing whey and casein). From the water soluble fraction, 100 μL was taken and spun in pre-conditioned centrifugal filter unit (10,000Da MWCO, Millipore) at 13,000 × g for 10 minutes at 10 °C. The retained fraction was collected and 20 μL was reduced with 5 mM dithiothreitol for one hour at 56 °C, alkylated with 10 mM iodoacetamide at RT for one hour and digested overnight with trypsin at 37 °C. The digested samples were dried in a vacuum centrifuge.

Strong Cation Exchange (SCX) High Performance Liquid Chromatography (HPLC)

The dried, digested peptide samples from both early and late lactating koala milks were independently fractionated by SCX HPLC (1260 Quaternary HPLC system with Polysulfoethyl A, 200 mm × 2.1 mm, 5 μm, 200 Å column; Agilent). The samples were resuspended in loading buffer (5 mM potassium Phosphate, 25% ACN, pH 2.7). After sample loading and washing, buffer B (5 mM potassium Phosphate, 350 mM KCl, 25% acetonitrile, pH 2.7) was increased from 10% to 45% in 70 minutes and then increased to 100% for 10 minutes at a flow rate 300 μl/min. The SCX HPLC eluent was collected every 2 minutes from the start of the gradient for 30 mins and then at 4 minute intervals for the next 40 mins. The elutes were pooled into 13 fractions, dried and used for nanoLC ESI analysis.

High pH Reversed-Phase Peptide Fractionation

An aliquot of dried, digested early lactation milk was fractionated using a Pierce™ High pH Reversed-Phase Peptide Fractionation Kit, (Thermo Scientific). The spin column was conditioned with acetonitrile (centrifugation at 5,000 g for 2 min; repeated twice) followed by 0.1% trifluoroacetic acid (centrifugation at 5,000 g for 2 min; repeated twice). The sample was resuspended in 300 μL of 0.1% trifluoroacetic acid and 150 μL was loaded onto the column. The flow-through fraction was collected by centrifugation at 3,000 g for 2 min then washed by adding 300 μL of Milli-Q water and spun at 3,000 g for 2 min. A total of eight fractions were eluted by centrifugation at 3,000 g for 2 min using the eight step-high-pH elution solutions following the manufacturers specification (5% acetonitrile (ACN), 0.095% triethylamine (TEA); 7.5% ACN, 0.093% TEA; 10% ACN, 0.09% TEA; 12.5% ACN, 0.088% TEA; 15% ACN, 0.085% TEA; 17.5% ACN, 0.083% TEA; 20% ACN, 0.08% TEA; 50% ACN, 0.05% TEA). The eight fractions were dried and used for nanoLC-MS/MS analysis.

NanoLC ESI MS/MS data acquisition

A 5600 TripleTOF mass spectrometer (AB Sciex) coupled to an Eksigent Ultra-nanoLC-1D system (Eksigent, Dublin, CA) was employed for LC-MS/MS analysis. Each of the dried peptidefractions were resuspended in 80 μL of loading/desalting solution (0.1% (v/v) formic acid, 2% (v/v) acetonitrile). Forty μL of sample was injected onto a reverse phase peptide C18 Captrap (Bruker) for pre-concentration and desalted for 5 minutes with the loading buffer at a flow rate of 10 μL per minute. After desalting, the peptide trap was switched in-line with an in-house packed analytical column (75 μm × 10 cm) directly in a fused silica PicoTip emitter (New Objective, Woburn, MA, USA) with solid core Halo C18, 160 Å, 2.7 μm (Bruker). Peptides were eluted from the column using the buffer B (99.9% (v/v) acetonitrile, 0.1% (v/v) formic acid) gradient starting from 10% and increasing to 40% over 45 minutes at a flow rate of 300 nL per minute. After peptide elution, the column was flushed with 95% buffer B for 10 minutes and re-equilibrated with 95% buffer A (0.1% (v/v) formic acid) for 12 minutes before next sample injection. The peptides were analysed in the positive ion nanoflow electrospray mode in an information dependent acquisition (IDA) mode.

TOF-MS survey scan was acquired at m/z 350-1500 with 0.25 second accumulation time, with the ten most intense precursor ions (2+−5+; counts >150) in the survey scan consecutively isolated for subsequent automated measurement of their corresponding product ions. Dynamic exclusion was used with a 20 sec exclusion time and 50 ppm precursor mass window. Product ion spectra were generated using rolling collision energy and accumulated for 200 msec over the 100–1500 m/z range.

Data processing

The data were exported using ABSciexCommandDriver.exe (AB Sciex) in a format suitable for submission to the database search software, Mascot v2.4 Daemon (Matrix Science Ltd, London, UK). The search parameters were as follows: Variable modifications: Carbamidomethyl (C), Oxidation (M); Peptide tol. ±: 20 ppm; MS/MS tol. ±: 0.1 Da; Peptide charge: 2+−4+; Enzyme: Trypsin. A database was constructed based on transcripts obtained from the koala mammary gland (sequencing and assembly described below). A decoy database of reverse sequences was used to report 1% peptide false discovery rate (FDR). Protein abundances were estimated using the exponentially modified protein abundance index (emPAI) method80. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD003726.

RNA isolation and sequencing

RNA was isolated from 100 mg mammary gland preserved in RNA-later (QIAGEN) using TRI-Reagent (Sigma-Aldrich) following the manufacturer’s instructions, including the additional step to remove fat. This was followed by DNA removal using DNase I (Sigma Aldrich). The RNA quantity and quality was checked on a Nano-Drop 2000 (Thermo Scientific) and on a Bioanalyser (Agilent Technologies). The final yield of total RNA was approximately 3 μg and the RNA integrity number was 7.1. Library construction and sequencing were performed by The Ramaciotti Centre (UNSW, Kensington, NSW) with TruSeq chemistry on a HiSeq2000 (Illumina). The 100 bp paired end reads were submitted to the NCBI Sequence Read Archive (BioProject [PRJNA327021], and BioSample [SAMN05300458]).

Transcriptome assembly and annotation

RNAseq reads were assembled with the Trinity pipeline (version 2.0.6)81 using the default parameters and the options -trimmomatic and -normalize reads. This assembly resulted in 224,496 contigs, with a mean length of 1,213 bp, and a transcript sum of 272 Mb. The longest scaffold was 24, 321 bp and the shortest 224 bp. We checked the assembly for completeness against single-copy, conserved eukaryotic genes using the core set in CEGMA and the vertebrate set in BUSCO, using the transcriptome option. Functional annotation of the koala milk transcriptome was performed using the Trinotate pipeline (version 2.0.2) (https://trinotate.github.io/). In brief, BLASTp was performed using koala mammary predicted ORFs as the query and the SwissProt non-redundant database (provided with Trinotate) as the target and the de novo transcripts aligned against the same using tBLASTx. HMMER and Pfam databases82,83 were used to predict protein domains, SignalP (version 4.184) to predict the presence of signal peptides, RNAmmer (version 2.3.285) to predict ribosomal DNA, and TMHMM (version 286) to predict transmembrane helices within the predicted ORFs from the milk transcriptome. These transcriptome annotations were loaded into an SQLite database, and abundance estimation was performed using the RSEM method87.

The number of protein coding genes was estimated by determining the number of unique SwissProt entries identified through the BLASTp search. A list of immune transcripts in the mammary transcriptome was generated by searching the milk transcriptome with proteins from the Immunome Database for Marsupials and Monotremes (IDMM)35 using tBLASTn. IDMM is a manually curated database of immune genes obtained from a number of marsupials including the Tasmanian devil, tammar wallaby, brushtail possum, northern brown bandicoot (Isoodon macrourus) and opossum and the monotremes platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus).

The top 200 transcripts in the mammary transcriptome, all transcripts matched in the proteome and all transcripts in the immune list were manually checked to confirm identity using the following methods. Transcripts with no BLAST hits or poor quality BLAST hits (E value > 10−12) were checked through additional BLAST searches to the NCBI database, marsupial transcriptomes or to marsupial genomes on Ensembl. For those transcripts whose identity could still not be confidently assigned through these methods, further searches and analyses were performed. Searches to the Pfam and Rfam databases were used to identify conserved protein or RNA domains. For genes where marsupial homologs were identified, including VELP and MM1, HMM profiles were constructed and used to search, using HMMsearch88, additional marsupial genomes and the SwissProt database. For genes where potential homologs were identified, including VELP, MM1, and AB1G-like, alignments and phylogenetic trees were constructed to examine the relationships between the genes. Alignments were produced in BioEdit89 using ClustalW alignment90. For phylogenetic tree construction, protein sequences were aligned using the MUSCLE algorithm91 in MEGA692, with default parameters. Phylogenetic trees were constructed using the maximum-likelihood method and the Jones-Thornton-Taylor (JTT) model93, and evaluation through 500 bootstrap replicates in MEGA6. The exon structure of devil and koala VELP was examined by identifying the scaffolds encoding this gene through TBLASTN to the respective genome, then using the VELP protein sequences and the identified contigs as inputs to FGENESH+94. The identity of several very short transcripts could not be conclusively determined due to the short length and these have been assigned as Unknown.

Data Availability

The data sets supporting the results of the article are available in the Short Read Archive repository, (Bioproject [PRJNA327021], Biosample [SAMN05300458]) and the PRIDE repository [PXD003726].

Additional Information

How to cite this article: Morris, K. M. et al. Characterisation of the immune compounds in koala milk using a combined transcriptomic and proteomic approach. Sci. Rep. 6, 35011; doi: 10.1038/srep35011 (2016).