Uncoupled evolution of the Polycomb system and deep origin of non-canonical PRC1

Polycomb group proteins, as part of the Polycomb repressive complexes, are essential in gene repression through chromatin compaction by canonical PRC1, mono-ubiquitylation of histone H2A by non-canonical PRC1 and tri-methylation of histone H3K27 by PRC2. Despite prevalent models emphasizing tight functional coupling between PRC1 and PRC2, it remains unclear whether this paradigm indeed reflects the evolution and functioning of these complexes. Here, we conduct a comprehensive analysis of the presence or absence of cPRC1, nPRC1 and PRC2 across the entire eukaryotic tree of life, and find that both complexes were present in the Last Eukaryotic Common Ancestor (LECA). Strikingly, ~42% of organisms contain only PRC1 or PRC2, showing that their evolution since LECA is largely uncoupled. The identification of ncPRC1-defining subunits in unicellular relatives of animals and fungi suggests ncPRC1 originated before cPRC1, and we propose a scenario for the evolution of cPRC1 from ncPRC1. Together, our results suggest that crosstalk between these complexes is a secondary development in evolution.


Reporting Summary
Nature Portfolio wishes to improve the reproducibility of the work that we publish.This form provides structure for consistency and transparency in reporting.For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested
A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g.means) or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g.Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection No code was used for data collection, please see below for phylogenetic analysis

Data analysis
Assembly of the influenza A viral genomes was performed using a custom in-house pipeline as described previously but adapted for nanopore sequence reads.All influenza sequences generated and used in this study are available through the GISAID EpiFlu Database (https:// www.gisaid.org).The remaining dataset was separated by segment and aligned using Mafft v7.52079, and manually trimmed to the openreading frame using Aliview version 1.2680The trimmed alignments were then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.381 along with ModelFinder8 and 1,000 ultrafast bootstraps.For the time-resolved phylogenetic analysis, all HA sequences available from South America, and representatives from North America were combined with the sequences from South Georgia and the Falkland Islands and used to infer phylogeny using BEAST version 1.10.483with the BEAGLE library.The Shapiro-Rambaut-Drummond-2006 (SRD06) nucleotide substitution model was implemented with a four-category gamma distribution model of site-specific rate variation and separate partitions for with the BEAGLE library.The Shapiro-Rambaut-Drummond-2006 (SRD06) nucleotide substitution model was implemented with a four-category gamma distribution model of site-specific rate variation and separate partitions for codon positions 1 plus 2 versus position 3 with the Hasegawa-Kishino-Yano (HKY) HKY substitution models on each with a strict clock and a coalescent GMRF Bayesian skyride tree prior.Three independent Markov Chain Monte Carlo (MCMC) runs were performed and combined using the Log Combiner tool in the BEAST package.Each chain consisted of 200,000,000 steps and was sampled every 20,000 steps, and the first 10% of samples were discarded as the burn-in.Discrete geographical transition events were reconstructed using a symmetric continuoustime Markov Chain model with an incorporated Bayesian stochastic search variable selection (BSSVS) to determine which transition rates sufficiently summarize connectivity86.SpreaD3 was used to determine the rates of transmission using a Bayes factor (BF) test.The BF represents the ratio of two competing statistical models, represented by their marginal likelihood, and, in this case, was used to determine the likelihood of transmission between geographical locations.The support of the BF for transmission was interpreted as described previously).BF and representative transitions related to South Georgia and the Falkland Islands were visualised on maps in R (v4.3.2) using the

Location
Islands within the Antarctic region as described in the manuscript where all locations are stated.
Access & import/export All samples collected with permissions from the Falkland Islands government and the government of South Georgia through collaboration with the British Antarctic survey.

Disturbance
Minimal disturbance as described above as critical to avoid further pathogens spread.Skilled ornithologists/field staff were collecting samples.
Reporting for specific materials, systems and methods for location and season.
We require information from authors about some types of materials, experimental systems and methods used in many studies.Here, indicate whether each material, system or method listed is relevant to your study.If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.