Genome of the tropical plant Marchantia inflexa: implications for sex chromosome evolution and dehydration tolerance

We present a draft genome assembly for the tropical liverwort, Marchantia inflexa, which adds to a growing body of genomic resources for bryophytes and provides an important perspective on the evolution and diversification of land plants. We specifically address questions related to sex chromosome evolution, sexual dimorphisms, and the genomic underpinnings of dehydration tolerance. This assembly leveraged the recently published genome of related liverwort, M. polymorpha, to improve scaffolding and annotation, aid in the identification of sex-linked sequences, and quantify patterns of sequence differentiation within Marchantia. We find that genes on sex chromosomes are under greater diversifying selection than autosomal and organellar genes. Interestingly, this is driven primarily by divergence of male-specific genes, while divergence of other sex-linked genes is similar to autosomal genes. Through analysis of sex-specific read coverage, we identify and validate genetic sex markers for M. inflexa, which will enable diagnosis of sex for non-reproductive individuals. To investigate dehydration tolerance, we capitalized on a difference between genetic lines, which allowed us to identify multiple dehydration associated genes two of which were sex-linked, suggesting that dehydration tolerance may be impacted by sex-specific genes.

individuals), has eight autosomes and one female (U) or male (V) sex chromosome, and reproduces sexually by spores, or asexually by fragmentation or the formation of gemmae (specialized asexual propagules). Marchantia inflexa typically grows on rock and soil surfaces along stream banks in tropical forests, but can also colonize more exposed and disturbed sites along roads. Vegetative growth produces a dichotomously branching thallus mat with dorsiventral organization, and the haploid gametophyte is the dominant life phase. Marchantia inflexa is a useful model to investigate sexual dimorphisms, population sex ratios, and stress tolerance because prior work has established that M. inflexa exhibits a considerable degree of sexual dimorphism 22,23 , variable population sex ratios 22,24,25 , and fluctuating stress tolerance 26,27 .
Bryophytes harbor a high proportion of dioecious species. Nearly half of all extant mosses, and approximately two-thirds of liverworts are dioecious 28 . In many bryophytes (including M. inflexa) the reported sex ratio is often female biased 24,25,29 , and in M. inflexa, this may be related to females' superior ability to recover from drying events 26,27 , faster growth rate 23 , or the increased establishment of female gemmae 27 . However, true population sex ratios are largely unknown, except for the few cases where genetic sex markers have been developed and utilized [30][31][32] . Typical methods for assessing sex ratios depend on counting the number males and females with visible sex organs and using this to infer the underlying population sex ratio. However, this approach fails to account for plants not currently displaying sex organs and assumes that the sex ratio of vegetative plants is equivalent to that of plants with sex organs 33 . This assumption may not hold true in natural settings. In fact, for both M. polymorpha and M. inflexa (where sex organ development can be artificially induced) the timing of reproductive development is sex-specific 34 and some individuals never produce sex organs (unpublished data).
The reproductive biology of bryophytes (with the haploid gametophyte being the dominant life stage) provides a unique perspective on the evolution of sex-linked genes, as the female (U) and male (V) sex chromosomes are present at the same copy number (1N) as autosomal chromosomes for the majority of the organism's life cycle and are subject to haploid selection. Sex chromosome evolution in diploid dominant systems has received considerable research attention. However, less is known about the forces shaping sex chromosomes in haploid dominant systems 35 , and the ramifications of haploid selection on sex chromosomes may have unique consequences. For example, exposure to haploid selection should reduce the prevalence of deleterious mutations 36,37 and could allow beneficial mutations fix more rapidly 36 . However, lack of recombination on UV sex chromosomes could lead to degeneration on UV chromosomes 35,36 , as has been observed in XY and ZW chromosomes. Further, the smaller effective population size of sex chromosomes relative to autosomes may increase the impact of genetic drift, further influencing adaptive evolution of sex-specific genes 38 . The extent to which these forces shape sex chromosome evolution in haploid dominant systems is not well understood, but the numerous dioecious bryophyte taxa provide novel opportunities to test related questions.
Stresses caused by environmental fluctuations are accentuated in plants due to their sessile nature. Consequently, numerous tolerance mechanisms have evolved to combat environmental pressures, many of which have potential translational utility. Some of these stress tolerance traits, such as embryo retention (allowing for the development and dispersal of desiccation tolerant spores), UV radiation, desiccation, heat, and freezing tolerance 2,39 may have facilitated the transition from aquatic to terrestrial environments by early plants. Many extant bryophytes retain these early stress tolerance mechanisms, allowing them to occupy marginal niches (characterized by nutrient poor substrates 40 , toxic concentrations of metals 41 , variable light 42 and moisture levels 43 ). Consequently, bryophytes are particularly informative with respect to understanding the evolutionary history and physiological strategies of stress tolerance.
Desiccation tolerance (DT) 43 in particular, has important translational utility. A number of studies have described the genomes [44][45][46][47] and transcriptomes [48][49][50][51][52][53][54][55] of DT plants, and the amassing data provide a strong foundation on which to construct our understanding of DT. These studies have demonstrated that DT is a complex multigenic trait 44,48,51,56,57 , and that there are multiple means of achieving DT 58 . The genetic basis of DT, although not entirely described, may derive from regulatory differences in gene expression pathways 45 , increased copy number of anahydrobiosis related genes 59 , or differences in the structural organization of these genes 44 . However, more studies are needed to resolve the specifics of DT mechanisms, and should include work on species spanning a wide phylogenetic range and degree of tolerance levels (such as the intermediate trait of dehydration tolerance (DhT also dehydration tolerant) 26,60 ). Marchantia inflexa is DhT, which provides an important opportunity to enhance our understanding of the evolution of this intermediate trait.
Growing genomic resources for bryophytes provide novel opportunities to conduct comparative studies within these lineages, which are particularly well suited to addressing questions related to sex chromosome evolution, sex differences, and stress tolerance adaptations. Here, we aimed to characterize patterns of sequence divergence between the thalloid liverworts M. inflexa and M. polymorpha, define genetic sex markers, and investigate the genomic basis of intraspecific variation in DhT. Our analyses indicate that the greatest sequence conservation between M. inflexa and M. polymorpha is among chloroplast genes, likely due to the conservation of plastid function across lineages. Conversely, we show that mitochondrial genes are highly divergent, which may be related to reduced recombination of mitochondrial genomes (as observed in M. polymorpha) 19 , or variable mutation rates 61 . Sex-linked genes exhibit signatures of strong diversifying selection, relative to autosomal genes. Interestingly, this is driven primarily by male-specific (V) genes, which we speculate is related to strong selection on male traits to maintain species recognition. Because sperm is broadcast indiscriminately and water dispersed in Marchantia, pressure to maintain species recognition is expected to be particularly strong on genes related to sperm characteristics 62 . Although females could be subject similar selective pressures, our analyses indicate that selection is acting primarily on male-specific genes in M. inflexa. Putatively sex-specific sequences were identified and leveraged to develop diagnostic markers for genetic sex in M. inflexa. Regarding DhT, we detect higher copy number of dehydration associated genes in the tolerant female genotype compared to the less tolerant male genotype. Interestingly, some of these DhT genes are putatively sex-linked, offering a possible explanation for elevated

Sequence similarity between M. inflexa and M. polymorpha. To investigate genome evolution within
Marchantia we measured sequence divergence between M. inflexa and M. polymorpha. Initially, we compared nucleotide differentiation among coding sequences (CDS), introns, and intergenic regions to estimate general patterns of divergence between lineages (Fig. 2). Comparison of orthologous CDS, introns, and intergenic sequences, revealed that (not surprisingly) intergenic sequences were the least conserved (64.5% ± 0.009%), introns were intermediate (81.8% ± 0.008%), and CDS were the most conserved (82.4% ± 0.001%) (Fig. 2). There was a significant effect of sequence type on %ID (F 2,40000 = 39756, p < 0.0001). Patterns of sequence divergence between M. inflexa and M. polymorpha fit general expectations that CDS should exhibit higher sequence similarity compared to introns and intergenic sequences. That being said, we observed surprisingly high sequence conservation among some introns, which we speculate is related to the relatively short length of M. inflexa introns, in which functional elements (such as splice sites) may be preferentially retained.
These analyses reveal several genes and pathways that may be under diversifying selection (dN/dS > 1) in M. inflexa and M. polymorpha. Sex-linked genes with dN/dS > 1 included the female allele of CCR4-NOT transcription related complex protein (Mapoly0018s0021.1), the male bHLH-MYC2 transcription factor in the involved the jasmonate signaling pathway 67 (MapolyY_B0018.1), a male-specific phosphatidylinositol-4,5-bisphosphate 3-kinase (MapolyY_A0049.1), two male-specific genes of unknown function (MapolyY_B0032.1 and MapolyY_ B0003.1). No chloroplast genes in our analyses had dN/dS > 1, but three mitochondrial open reading frames (orf 84, orf 69, rpl 10) had dN/dS > 1. Of the 243 autosomal genes with dN/dS > 1, 51 had identifiable homologs in the Uniprot database. GO analyses of these genes revealed that many were associated with the cellular components www.nature.com/scientificreports www.nature.com/scientificreports/ intracellular, cytoplasm, and membrane, the molecular function catalytic activity (followed closely by hydrolase activity and transferase activity), and the biological processes of metabolic process and cellular process. A complete list of genes with dN/dS > 1 and associated protein names can be found as Supplementary Table S1.
Sex marker identification. We identified 4,468 regions (covering 2,234,000 bp) in the M. inflexa genome assembly with substantial differences in copy number among genetic lines through coverage analysis with DifCover (https://github.com/timnat/DifCover) 65 (see Supplementary Fig. S2). Of these, 89 were found on scaffolds also containing a predicted protein, 31 of which could be assigned to an identifiable homolog across M. polymorpha, P. patens, A. thaliana, and refseq databases. From this set, we identified five putatively male-and three female-specific sequences that were also orthologous to sequences on the U and V chromosomes in M. polymorpha. These candidate sex markers were analyzed by PCR in nine males and nine females to verify their fidelity, leading to the validation of one positive marker for each sex (Fig. 4). Other candidate sex markers exhibited non-specific amplification and were therefore discarded. Plants used for validation were originally collected from five distinct populations, suggesting that the markers are robust to genotypic variation. Primer sequences of the validated male and female sex markers are listed in Table 1.

Dehydration tolerance.
To address specific hypotheses on DhT we probed M. inflexa and M. polymorpha annotated proteins for orthologs to a list of 195 DT genes (compiled from publicly available mRNA sequences of genes expressed under water stress in model DT plants). See Supplementary Table S2 for the accession numbers, species, and studies from which genes were compiled. Of this set of DT genes, 112 had identifiable homologs in M. inflexa and 141 had identifiable homologs in M. polymorpha. Our analyses of dN/dS captured 38 of these DT orthologs, one of which (a putative aldehyde dehydrogenase (Mapoly0030s0099.1)) had a dN/dS value > 1. The There were significant differences among gene types in mean dN/dS. Secondary comparisons reveal significant differences in mean dN/dS among male-specific and autosomal genes, chloroplast and autosomal genes, and mitochondrial and autosomal genes. Statistical tests were performed on log transformed data to satisfy the assumptions of normality, but untransformed numbers are presented here. All dN/dS values > 1 exhibit the signature of diversifying selection. www.nature.com/scientificreports www.nature.com/scientificreports/ function of diversification in this gene is unclear, given the lack of evidence for any difference in DhT between these two Marchantia species.
Prior studies showed that the male and female M. inflexa genotypes used for genome assembly have reproducible differences in DhT 26 . Consequently, we aimed to identify DT genes with substantial coverage differences among these two genotypes, presuming that they may impact relative differences in DhT. Of the 112 DT genes detected in M. inflexa, most had standardized coverage ratios of ~1. However, six genes had considerably higher coverage (log 2 fold change > |4|) in the highly tolerant female and one had higher coverage in the less tolerant male ( Table 2). Specific genes with higher coverage in the tolerant female genotype include a calcium dependent protein kinase (CDPK), glucose related protein 94 (GRP94), the aldehyde dehydrogenase (ALDH) (also identified in dN/dS analyses above), heat shock proteins 70 and 101 (HSP101, HSP70), and superoxide dismutase (SOD). The sole DT gene with higher coverage in the less tolerant male genotype is a heat shock factor 1 (HSF 1). Of the DT genes with coverage difference among genotypes, one (CDPK) was assigned to the putative U chromosome, and one (HSF 1) was assigned to the putative V chromosome.
To verify that candidate DT associated genes were expressed during dehydration and to test for sex-specific patterns of expression, we conducted qPCR validation of seven candidate dehydration associated genes. We quantified expression for each gene in three males and three females under both hydrated and dehydrated conditions by qPCR. These analyses revealed that all candidate genes were expressed in M. inflexa under both hydrated and dehydrated conditions (Fig. 5). Analysis of changes in relative expression of DhT associated genes during dehydration identified an overall increase in expression during dehydration (F 1,1 = 5.70, p = 0.019), differences among genes (F 6,6 = 9.03, p < 0.001) and a significant interaction between gene and hydration state (F 6,6 = 2.41, p = 0.0351). The interaction effect was driven primarily by an increase in HSP70 expression during dehydration (other genes did not show significant changes in expression during dehydration). Interestingly, the two candidate genes that were putatively sex-linked (female-specific CDPK and male-specific HSF 1) exhibited sex-specific expression, but autosomal candidate DhT genes (GRP94, ALDH, HSP101, HSP70, and SOD) were expressed at similar levels in males and females suggesting copies are present in both sexes. These autosomal genes were present at different copy number in the male and female used for genome assembly, but in the larger panel assayed by qPCR this did not translate to differences in relative expression among the sexes.

Discussion
Our assembly of the M. inflexa genome represents a new resource for comparative studies among land plants. We capitalized on the recently published genome of related liverwort M. polymorpha 2 to improve scaffolding of our assembly, estimate divergence rates among specific sequence types, and to identify sex-linked sequences that were leveraged to generate male and female genetic sex markers for M. inflexa. Our analyses identify several genes on the autosomes, organelles, and sex chromosomes that show strong signatures of recent diversifying selection in Marchantia. Additionally, we identified multiple genes possibly underlying an observed genotype difference in DhT in M. inflexa, which point towards a complex mechanism of heightened DhT. We detected differences in copy number of DhT genes across multiple loci, two of which were putatively sex-linked. Evidence of sex-linked genes underlying differences in DhT is intriguing, as prior studies indicate complex patterns of sexual dimorphism in DhT in M. inflexa 26,63 .
Analyses of dN/dS ratios for genes on autosomes, sex chromosomes, and organelles in M. inflexa and M. polymorpha showed evidence of increased diversification of sex-linked genes relative to autosomal genes, and conservation of organellar genes (particularly the chloroplast) relative to autosomes. UV sex determination systems are expected to differ from diploid dominant (XY and ZW) sex determination systems in multiple ways, due primarily to haploid selection 35,37 . However, both empirical and theoretical studies on sex chromosome evolution  Table 2. DhT proteins with coverage differences among the sexes in Marchantia inflexa. We designated a log 2 fold change > |4| as the cutoff to define a significant coverage difference. Negative log 2 coverage ratios indicate higher female coverage relative to males, whereas positive log 2 coverage ratios indicate higher male coverage.
www.nature.com/scientificreports www.nature.com/scientificreports/ in haploid dominant systems are limited. Some UV sex chromosomes exhibit suppressed recombination similar to XY and ZW sex chromosomes, which can lead to degeneration of UV chromosomes 36,37 . However, because UV chromosomes are fully exposed to selection during the majority of the organism's life cycle, positive adaptations may sweep through populations more rapidly than in XY or and ZW systems. Our analyses indicate that rapid diversification of sex-linked genes is occurring in Marchantia, suggesting that exposure to haploid selection can be a diversifying force on sex chromosomes.
Interestingly, the finding that M. inflexa and M. polymorpha sex-linked genes are more divergent than autosomal genes is driven primarily by diversification of male-specific (V) genes. Diversification of female-specific (U) genes, and genes with alleles on both the U and V chromosomes is similar to diversification of autosomal genes (Fig. 3). It has been suggested that UV chromosomes can become highly differentiated between the sexes due to sex-specific selection 36 . Males are generally thought to be under stronger sexual selection than females, and variability in male reproductive success may facilitate rapid adaptation of V-linked genes 35 . Further because male gametes are broadcast indiscriminately, we expect strong selection on sperm characteristics to maintain species recognition in this system 62 . Evidence for diversification of multiple male-specific genes provides a possible mechanism for maintaining species boundaries.
Sex-linked genes with high dN/dS appear to be involved in multiple cellular processes, some of which contribute to differences in sex function in other systems. Notably, the female-allele of CCR4-NOT transcription related complex (dN/dS = 4.27) is homologous to the rcd1 + gene of Schizosaccharomyces pombe, which is critical for nitrogen starvation induced sexual reproduction 68 . Divergence of these genes within Marchantia points towards specific reproductive differences among these species, which may be related to selection for species-specific recognition, habitat-specific optimization, or the evolutionary dynamics of haploid selection on sex chromosomes.
Our analyses define male and female-specific genetic sex markers, which will be of great utility in future studies of M. inflexa populations where plants are latent to express sex. The ability to sex plants via a simple PCR assay will expedite general efforts to characterize individuals and populations of M. inflexa 63 , and will aid in efforts to develop M. inflexa as a model system to investigate population dynamics and ecological genetics.
In our efforts to identify genes underlying DhT we capitalized on a previously identified genotype difference in DhT in M. inflexa 26 , targeting genes that exhibited substantial coverage differences among genotypes. We identified several such genes, which we speculated contribute to differences in DhT among genotypes. Interestingly two of these genes were putatively sex-linked, suggesting that there may be sex-specific components to DhT in M. inflexa. Importantly, qPCR validation of seven candidate dehydration associated genes in additional male and female plants revealed persistent expression of these genes under hydrated and dehydrated conditions, supporting a possible role for these genes in DhT. The unique biology of bryophytes makes them prone to rapid drying and thus, many genes involved in DhT are expressed prior to dehydration in bryophytes 43 . Various patterns of expression have been observed in DhT and DT bryophytes, including constitutive expression, expression induced during drying, and the sequestration of mRNAs in ribonucleoprotein particles 69,70 . Thus, there is no absolute predication for how a given gene will be expressed during dehydration. Our qPCR analyses suggest that even in this small set of candidate DhT genes, multiple patterns of expression are evident in M. inflexa. The majority of tested genes were expressed constitutively, but one (HSP70) showed a significant increase in expression during dehydration. Importantly, the two genes that were putatively sex-linked (female-specific CDPK and male-specific HSF1) were expressed in a strictly sex-specific pattern. Interestingly CDPKs have been recognized as important hubs in plant stress signaling pathways 71 with highly conserved structure 72 , which provides a possible explanation for elevated DhT in females. Other candidate DhT genes were expressed at similar levels across the sexes, suggesting that www.nature.com/scientificreports www.nature.com/scientificreports/ despite possible involvement in DhT these genes are unlikely to drive sex-related differences in DhT within M. inflexa. Quantification of candidate DhT gene expression by qPCR further reinforces gene annotations in demonstrating that predicted M. inflexa genes are expressed in the expected tissues and individuals.
In summary, the draft genome for M. inflexa adds to a growing body of genomic resources for land plants, which will enable investigation of early plant evolution and physiology. We leveraged this assembly to identify genes under diversifying selection in Marchantia, to develop genetic sex markers, and to target genes contributing to DhT. Our analyses comprise one of the few empirical studies on haploid sex chromosome evolution and suggest that several sex-linked genes (particularly male-specific (V) genes) have undergone rapid diversification in Marchantia. We identified multiple sex-specific sequences, which were used to develop genetic sex markers and identify genes underlying differences in DhT of M. inflexa. We find evidence that DhT in M. inflexa is likely impacted by the constitutive expression of select DhT genes, and that sex differences in DhT may be impacted by sex-linked DhT genes. . Vegetative tissue was transported to Lexington, Kentucky, USA and thirty-six clones (generated though vegetative propagation) of one male and one female genotype were planted on steam-sterilized soil and maintained in a randomized layout in a climate-controlled greenhouse. Plants were watered daily with distilled water and kept under shade cloth to mimic field conditions. Vegetative tissue (growing aerially with no soil contact) was collected from male and female plants after ~5 years in greenhouse conditions. Prior to DNA extraction, thalli were washed in distilled water three times to remove surface contamination. DNA was extracted following a CTAB extraction protocol modified from Doyle 73 . Sequencing libraries were constructed with 300 base pair (bp) inserts and whole genome sequencing was conducted on an Illumina HiSeq2000 for 100 bp paired end (PE) reads at the Beaty Biodiversity Research Centre, University of British Columbia.

Methods
Genome assembly and annotation. Sequence read quality was assessed with fastQC version 3 74 , and filtered with Trimmomatic version 0.33 75 . Male and female reads were combined to increase coverage and k-mer plots were generated with DSK version 1.1 (see Supplementary Fig. S1) 76 . Assembly was carried out using SOAP de novo version 2.04-r240 77 with a k-mer length of 31. Reads shorter than 100 bp were not included, alignments of less than 32 bp were not considered reliable, and k-mers observed nine or fewer times were excluded from the assembly.
Following initial assembly, we plotted the length and GC content of each scaffold in JMP ® , Version 12 (SAS Institute Inc.). The plot revealed two distinct clusters of well-assembled (long) scaffolds: one with a mean GC content of ~65% and one with a mean GC content of ~45% (see Supplementary Fig. S3). Consequently, we probed each distinct GC cluster to identify the taxonomic source of the contributing sequence reads by aligning the 100 longest scaffolds of each GC cluster to NCBI's refseq database 78 using TBLASTX 79 . Taxonomic classification of the resulting alignments with Megan version 4 80 revealed that scaffolds with high GC content were derived from a diverse microbial community, whereas scaffolds with low GC content were derived exclusively from plant material (see Supplementary Fig. S4). Notably, other members of Marchantia have a GC content of ~45% 81 , providing additional support for the assumption that low GC content reads were derived from M. inflexa. Consequently, we filtered the raw sequence data to remove all reads with a GC content >55%. The remaining reads (although likely not entirely contamination free) represent a data set enriched for M. inflexa genomic information. Using only reads with a GC content <55%, we reassembled the sequence data with the same parameters as above.
The resulting scaffolds were aligned to the M. polymorpha reference genome 2 with BLASTN, allowing us to organize M. inflexa scaffolds with Chromosomer version 1.3 82 , which leverages pairwise sequence alignments and local synteny to assign orthologous regions to super-scaffolds. All M. inflexa scaffolds not mapped to M. polymorpha were appended to the super-scaffold assembly, and all contigs under 1000 bp were removed. Assembly statistics were computed using assemblathon_stats_2.pl script (https://github.com/lexnederbragt/ denovo-assembly-tutorial/blob/master/scripts/assemblathon_stats_2.pl). To estimate assembly completeness, we quantified the percentage of Universal Single-copy Orthologs from the plant set of OrthoDB v9 of BUSCO v3 64 present in the M. inflexa genome assembly. We conducted a parallel assessment of the M. polymorpha genome assembly.
To assemble the mitochondrial and chloroplast genomes of M. inflexa raw reads were trimmed with Trimmomatic version 0.33 75 and error corrected using the ErrorCorrectReads module of Allpaths-LG version 50156 83 . These were aligned to the M. polymorpha reference plastid and mitochondrial sequences (GI 11466673 84 and GI 11467097 11 , respectively) with BWA mem version 0.7.12-r1039 85 . Reads with alignments, plus their mates, were extracted and partitioned. Each partition was assembled separately with Ray de novo version 2.3.1 86 at k = 31. Ray contigs were scaffolded against their homologous M. polymorpha reference sequences using Abacas version 1.3.1 87 . Adjacent contig overlaps were identified using BLASTN 88 and merged at the shared substring. Pilon version 1.13 89 was then run for thirty iterations. Assemblies were annotated with Daisie Huang's script PLANN (https://github.com/daisieh/plann) and visualized in OrganellarGenomeDRAW (http://ogdraw. mpimp-golm.mpg.de/cgi-bin/ogdraw.pl).
We used Crossmap version 2.7 90 to transfer all M. polymorpha gene annotations to orthologous M. inflexa sequences. In addition to these lift-over annotations, de novo gene prediction was carried out via Maker version 2.31.8 91  www.nature.com/scientificreports www.nature.com/scientificreports/ genome 2 , Physcometrella patens genome 7 , the Arabidopsis thaliana genome 92 , and NCBI's refseq database 78 using BLASTP 79 . All alignments were merged and the single best hit for each M. inflexa predicted protein was selected based on bitscore. The combined set of de novo and lift-over annotations was used for downstream analyses.
Sequence similarity between M. inflexa and M. polymorpha. To 95 . To explicitly test for differences in nucleotide differentiation among these sequence types, CDS, introns, and intergenic sequences were realigned to one another using LASTZ version 1.04. The resulting mean % identity (%ID) for each sequence type was computed and differences among sequence types were tested for significance with a mixed effects linear model in JMP ® , Version 12 (SAS Institute Inc.). The fixed effect of sequence type on %ID was tested (sequence length was included in the model as a random effect).
To investigate patterns of gene divergence we computed dN/dS for all orthologous CDS in M. inflexa and M. polymorpha. Initially, we extracted the complete CDS and translated amino acid sequence for all orthologous genes using gffread (https://github.com/gpertea/gffread). Orthologous translated CDS were aligned with Clustal Omega version 1.2.4 96 , and codon aware DNA alignments were defined using PAL2NAL version 14 97 , during which all gaps and internal stop codons were removed. Next, dN/dS ratios for each ortholog were calculated with the yn00 function of PAML version 4.9 98 , which computes dN/dS using pairwise comparisons accounting for both transition/transversion bias and base/codon frequency bias 99 . Following filtering conventions 100 , cases in which dS = 0, dN >2, and dN/dS >10 were removed from the output, and dN/dS values were log transformed to satisfy assumptions of normality for statistical testing. Differences among groups in mean dN/dS were tested for significance using a mixed effects linear model in JMP ® , Version 12 (SAS Institute Inc.). Initially, the fixed effect of gene type (autosomal, sex-linked, or organellar) on dN/dS was tested (scaffold ID was included in the model as a random effect). Post hoc comparisions among gene types were made using orthogonal contrasts to explicitly compare autosomal genes to sex and organallar genes. Subesquntly, we made more detailed comparisions among specific gene types with a mixed effects linear model testing the fixed effect of specific gene type (autosomal, male-specific, female-specific, male-allele, female-allele, mitochondria, and chloroplast) on dN/dS (scaffold ID was included in the model as a random effect). Again, post hoc comparisons were made using orthogonal contrasts to specifically compare each gene type to the autosomal genes. Finally, all individual genes with dN/dS values >1 were identified, and gene ontology (GO) terms were defined with the GORetreiver tool and summarized with the GOSlimViewer tool available at AgBase (http://agbase.msstate.edu/cgi-bin/tools.pl) 101 .

Sex marker identification.
Read coverage was computed using DifCover (https://github.com/timnat/ DifCover) 65 to identify regions of the genome unique to these male and female genotypes. Briefly, we determined the genotype-specific coverage by mapping male and female sequence reads back to the draft assembly with Bowtie2 102 . Coverage was calculated for 500 bp windows with BEDtools version 2.19.1 94 , and DifCover was used to calculate the log 2 ratios of male:female coverage for each 500 bp window. Marchantia inflexa sequences that were both homologous to the M. polymorpha sex chromosomes and showed genotype-specific coverage were flagged as potential sex markers. PCR primers were designed with primer3 103 for five candidate male markers and three candidate female markers.
Candidate sex markers were tested for fidelity by PCR analysis using plants from five distinct populations in Trinidad in 2016 (see Supplementary Table S3 for sample collection info). Plants were cultivated in greenhouse conditions on steam-sterilized soil, under shade cloth, and watered daily for ~one year. When plants began to produce sex organs naturally, vegetative tissue (visibly connected to a reproductive structure to ensure accurate sex identification) was collected from nine individuals of each sex. DNA was extracted from plant tissues following a modified CTAB extraction protocol (same as above), and PCR reactions were conducted with a DNA template concentration of 0.8 ng/ul and combined forward and reverse primer concentration of 0.4 uM. Reaction conditions consisted of initial denaturation at 94 °C for five minutes, followed by 34 Supplementary Table S2)). DT orthologs were identified using BLASTX 79 , and the single best hits for each M. inflexa and M. polymorpha sequence were determined based on bitscore. Presuming that some DT genes would be multi-copy in M. inflexa, we calculated the genotype-specific coverage of each DhT ortholog using DifCover 65 with the aim of detecting genes contributing to the observed genotype difference in DhT in M. inflexa 26 . We targeted sequences corresponding to DhT genes showing log 2 coverage ratios > |4| as potential contributors to the observed difference in DhT. Finally, we computed dN/dS ratios for all DhT genes and identified those exhibiting signs of diversifying selection (dN/dS > 1).
To verify that candidate dehydration associated genes were expressed during dehydration and to test for sex-specific patterns of expression, we conducted qPCR validation of seven candidate genes. To do so, we selected three male and three female plants from East Turure stream (see Supplementary Table S3). Plants were cultivated in a common garden for ~eight years prior to being subjected to dehydration treatment as described in 26 . Briefly, thallus tissues (~5 mm × 7 mm) were harvested from greenhouse cultivated plants, fully hydrated for 24 hours, and placed into desiccation chambers with an internal relative humidity (RH) of 75%. Thallus tips were randomized in Petri dishes within in the desiccation chamber. Each desiccation chamber contained 18 thallus tips (three tips from each of the six genotypes). Air circulation was maintained by inserting a small fan in the chamber, and RH was verified with a RH sensor integrated into HOBO ™ humidity sensor attached to a data (2019) 9:8722 | https://doi.org/10.1038/s41598-019-45039-9 www.nature.com/scientificreports www.nature.com/scientificreports/ logger (Onset Computer Corporation, Bourne, MA, USA). The chamber was maintained at 14 °C for 22 hours. Dehydration assays were conducted at designated times of day to reduce off target variation due to fluctuations in light, temperature, and circadian rhythms.
Plants were sampled at two time points during the dehydration assay: initial hydrated conditions and dehydrated conditions (22 hours). At each time point, tissues were removed from the desiccation chamber and immediately flash frozen in liquid nitrogen to prevent further transcriptional changes, RNA was extracted from samples using the Triazol ® Reagent according to the manufacturer's instructions, mRNA was isolated through poly(A) enrichment with the NEBNext ® Poly(A) mRNA Magnetic Isolation Module, and cDNA synthesis was conducted with SMARTScribe ™ Reverse Transcriptase using random primers. Biological replicates (genotypes) were processed independently and randomized for all downstream processing and analyses to minimize possible batch effects.
We measured gene expression for seven dehydration associated genes (GRP94, ALDH, HSP101, HSP70, SOD, female-specific CDPK, and male-specific HSF 1) by qPCR. The housekeeping gene actin was included as an internal control. Primer pairs for each gene were designed using Primer3 103 and are listed Supplementary Table S4. qPCR reactions were carried out in 96 well plates on a Roche LightCycler ® 96. The reaction mix consisted of 1ul cDNA, 0.5ul each of the forward and reverse primers (10uM), 10ul Luna ® Universal qPCR Master Mix, and 8ul PCR-grade H 2 O, for a final reaction volume of 20ul. A template free control was included for each primer pair. The amplification program consisted of a 60 second preincubation at 95 °C, followed by 45 cycles of 95 °C for 5 seconds, 60 °C for 15 seconds, and 72 °C for 10 seconds. Following amplification, high resolution melting analyses was conducted, via one cycle of 95 °C for 60 seconds, 40 °C for 60 seconds, 65 °C for 1 second, and 97 °C for 1 second.
Data were analyzed using the Roche LightCycler ® 96 Software Version 1.1. Initially, melting curve analysis was conducted to confirm product quality, and any samples with abnormal profiles were removed from the analyses. Raw Cq (quantitation cycle) values were used to estimate cDNA concentration for the seven candidate dehydration associated genes and housekeeping gene actin (used as an internal control) (see Supplementary Fig. S6). Cq indicates the first cycle at which fluorescence could be detected (smaller Cq indicates a higher starting concentration of the transcript). Subsequently, we computed the abundance of each candidate gene relative to the internal control gene actin. These analyses were performed according to the instructions for relative quantification analyses in the Roche LightCycler ® 96 manual to determine the ratio of target cDNA to actin (to account for potential differences in the starting concentration of cDNA among samples) based on raw Cq values and estimated cDNA concentrations. To test for changes in expression during dehydration and investigate sex differences in DhT gene expression, the relative ratio of target cDNA to actin (internal control gene) was analyzed in JMP ® , Version 12 (SAS Institute Inc.). We used a mixed effects linear model to test the fixed effects of gene, hydration state, sex, and all second and third order interaction effects on the relative ratio of target:control expression. Contrasts were used to test for changes in expression during dehydration of each individual gene.

Data Availability
The unprocessed sequence data associated with this study are available in GenBank (accession numbers: SRR7348360 and SRR7348361). This Whole Genome Shotgun project (M_inflexa_v1.1) has been deposited at DDBJ/ENA/GenBank under the accession QLSQ00000000. The version described in this paper is version QLSQ01000000. The complete mitochondrial and chloroplast sequences are available at FigShare (https://doi. org/10.6084/m9.figshare.6639209.v1).