Anthoceros genomes illuminate the origin of land plants and the unique biology of hornworts

Hornworts comprise a bryophyte lineage that diverged from other extant land plants >400 million years ago and bears unique biological features, including a distinct sporophyte architecture, cyanobacterial symbiosis and a pyrenoid-based carbon-concentrating mechanism (CCM). Here, we provide three high-quality genomes of Anthoceros hornworts. Phylogenomic analyses place hornworts as a sister clade to liverworts plus mosses with high support. The Anthoceros genomes lack repeat-dense centromeres as well as whole-genome duplication, and contain a limited transcription factor repertoire. Several genes involved in angiosperm meristem and stomatal function are conserved in Anthoceros and upregulated during sporophyte development, suggesting possible homologies at the genetic level. We identified candidate genes involved in cyanobacterial symbiosis and found that LCIB, a Chlamydomonas CCM gene, is present in hornworts but absent in other plant lineages, implying a possible conserved role in CCM function. We anticipate that these hornwort genomes will serve as essential references for future hornwort research and comparative studies across land plants.

A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code

Data collection
No software was used. We constructed paired-end DNA sequencing libraries with approx. 400 bp insert sizes for standard WGS sequencing using Illumina HiSeq and Novaseq machines. We also prepared RNA-seq libraries and sequenced them on Hiseq and Novaseq machines. We generated long-reads using high-molecular weight DNA on the Oxford Nanopore Minion machine using R9 flow cells.  Tables 2-3) and will become public upon publication.
The genome assemblies, annotations ("Submitted.zip") as well as alignment matrices and tree files ("phylogeny_dataset.zip") can be found on Figshare (private link: https://figshare.com/s/e3ebfc9104663c5d08de). A future genome browser will be available for the public upon publication.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Ecological, evolutionary & environmental sciences study design
All studies must disclose on these points even when the disclosure is negative.

Study description
Here we provide three high-quality genome assemblies and their annotations for the genus Anthoceros. We use these data to refine our inferences on the nature of land plant MRCA and to gain new insights into hornwort biology.

Research sample
Cultures of Anthoceros agrestis (Oxford and Bonn strains) and A. punctatus were all derived from a single spore, (haploid gametophyte tissue) and axenically propagated and maintained on either BCD or Hatcher's medium. Supplementary Table 13 shows the origin and specimen voucher for each of the three strains. We have been developing the three Anthoceros isolates as model systems for multiple years. Our selection was tailored by the potential of these strains to become model species for hornworts.

Sampling strategy
No statistical test was used to determine sample size. In gene expression studies three or two biological replicates were used to estimate differential gene expression (significance and fold change). The number of biological replicates used was tailored by the difficulty in obtaining tissue samples and extracting high-quality RNA from Anthoceros tissues.
Data collection DNA was derived from axenic isolates of the three Anthoceros accessions. For gene expression studies RNA was extracted from tissues of the very same isolates after vegetative propagation. Data was recorded and analyzed as described in the Authors Contribution section of the main text.
Timing and spatial scale Samples for DNA-sequencing were collected when available. Samples for RNA-seq experiments followed well-defined developmental stages described in the manuscript.
For the CO2 response experiment, we subjected the plant cultures to one of the three CO2 environments at 150 (low), 400 (ambient), and 800 (high) ppm in a CO2-controlled growth chamber for 10 days (12/12hr day/night cycle). These CO2 concentrations match up with those used in previous experiments investigating hornwort pyrenoid function. Therefore, our results are directly comparable with observations of previous investigations. Sampling intervals also followed previous experiments to ensure comparability.
For the cyanobacterial symbiosis experiment, plants were transferred from solid BCD plates to flasks with 100 ml BCD media solution, and placed on an orbital shaker with 130 rpm for two weeks. For the cyano-/N+ and cyano-/N-conditions, plants were transferred to fresh new BCD solution with and without KNO3, respectively and grown for 10 days before harvest. These conditions and time intervals correspond to those that were previously applied in studies investigating hornwort-cyanobacteria symbiosis.

Data exclusions
Raw sequence data was quality filtered and trimmed using either fastp or trimmomatic (default parameters). Genome assemblies were filtered for contaminant scaffolds with blobtools and were excluded. Our data exclusion strategy was not pre-established. We used well accepted thresholds to filter out low-quality sequence data.