A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Although originally primarily a system for functional biology, Arabidopsis thaliana has, owing to its broad geographical distribution and adaptation to diverse environments, developed into a powerful model in population genomics. Here we present chromosome-level genome assemblies of 69 accessions from a global species range. We found that genomic colinearity is very conserved, even among geographically and genetically distant accessions. Along chromosome arms, megabase-scale rearrangements are rare and typically present only in a single accession. This indicates that the karyotype is quasi-fixed and that rearrangements in chromosome arms are counter-selected. Centromeric regions display higher structural dynamics, and divergences in core centromeres account for most of the genome size variations. Pan-genome analyses uncovered 32,986 distinct gene families, 60% being present in all accessions and 40% appearing to be dispensable, including 18% private to a single accession, indicating unexplored genic diversity. These 69 new Arabidopsis thaliana genome assemblies will empower future genetic research.


Sampling strategy
Describe the sampling procedure (e.g. random, snowball, stratified, convenience).Describe the statistical methods that were used to predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient.For qualitative data, please indicate whether data saturation was considered, and what criteria were used to decide that no further sampling was needed.

Data collection
Provide details about the data collection procedure, including the instruments or devices used to record the data (e.g. pen and paper, computer, eye tracker, video or audio equipment) whether anyone was present besides the participant(s) and the researcher, and whether the researcher was blind to experimental condition and/or the study hypothesis during data collection.

Timing
Indicate the start and stop dates of data collection.If there is a gap between collection periods, state the dates for each sample cohort.

Data exclusions
If no data were excluded from the analyses, state so OR if data were excluded, provide the exact number of exclusions and the rationale behind them, indicating whether exclusion criteria were pre-established.

Non-participation
State how many participants dropped out/declined participation and the reason(s) given OR provide response rate OR state that no participants dropped out/declined participation.

Randomization
If participants were not allocated into experimental groups, state so OR describe how participants were allocated to groups, and if allocation was not random, describe how covariates were controlled.

Ecological, evolutionary & environmental sciences study design
All studies must disclose on these points even when the disclosure is negative.

Study description
Briefly describe the study.For quantitative data include treatment factors and interactions, design structure (e.g. factorial, nested, hierarchical), nature and number of experimental units and replicates.

Research sample
Describe the research sample (e.g. a group of tagged Passer domesticus, all Stenocereus thurberi within Organ Pipe Cactus National Monument), and provide a rationale for the sample choice.When relevant, describe the organism taxa, source, sex, age range and any manipulations.State what population the sample is meant to represent when applicable.For studies involving existing datasets, describe the data and its source.

Sampling strategy
Note the sampling procedure.Describe the statistical methods that were used to predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient.

Data collection
Describe the data collection procedure, including who recorded the data and how.
Timing and spatial scale Indicate the start and stop dates of data collection, noting the frequency and periodicity of sampling and providing a rationale for these choices.If there is a gap between collection periods, state the dates for each sample cohort.Specify the spatial scale from which the data are taken

Data exclusions
If no data were excluded from the analyses, state so OR if data were excluded, describe the exclusions and the rationale behind them, indicating whether exclusion criteria were pre-established.

Reproducibility
Describe the measures taken to verify the reproducibility of experimental findings.For each experiment, note whether any attempts to repeat the experiment failed OR state that all attempts to repeat the experiment were successful.

Randomization
Describe how samples/organisms/participants were allocated into groups.If allocation was not random, describe how covariates were controlled.If this is not relevant to your study, explain why.

Blinding
Describe the extent of blinding used during data acquisition and analysis.If blinding was not possible, describe why OR explain why blinding was not relevant to your study.

Did the study involve field work?
Yes No

Field conditions
Describe the study conditions for field work, providing relevant parameters (e.g.temperature, rainfall).

Location
State the location of the sampling or experiment, providing relevant parameters (e.g.latitude and longitude, elevation, water depth).
Access & import/export Describe the efforts you have made to access habitats and to collect and import/export your samples in a responsible manner and in compliance with local, national and international laws, noting any permits that were obtained (give the name of the issuing authority, the date of issue, and any identifying information).

nature portfolio | reporting summary
April 2023

Disturbance
Describe any disturbance caused by the study and how it was minimized.
Reporting for specific materials, systems and methods

Validation
Describe the validation of each primary antibody for the species and application, noting any validation statements on the manufacturer's website, relevant citations, antibody profiles in online databases, or data provided in the manuscript.

Eukaryotic cell lines
Policy information about cell lines and Sex and Gender in Research Cell line source(s) State the source of each cell line used and the sex of all primary cell lines and cells derived from human participants or vertebrate models.

Authentication
Describe the authentication procedures for each cell line used OR declare that none of the cell lines used were authenticated.

Mycoplasma contamination
Confirm that all cell lines tested negative for mycoplasma contamination OR describe the results of the testing for mycoplasma contamination OR declare that the cell lines were not tested for mycoplasma contamination.

Commonly misidentified lines (See ICLAC register)
Name any commonly misidentified cell lines used in the study and provide a rationale for their use.

Specimen provenance
Provide provenance information for specimens and describe permits that were obtained for the work (including the name of the issuing authority, the date of issue, and any identifying information).Permits should encompass collection and, where applicable, export.

Specimen deposition
Indicate where the specimens have been deposited to permit free access by other researchers.

Dating methods
If new dates are provided, describe how they were obtained (e.g.collection, storage, sample pretreatment and measurement), where they were obtained (i.e.lab name), the calibration program and the protocol for quality assurance OR state that no new dates are provided.
Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information.

Ethics oversight
Identify the organization(s) that approved or provided guidance on the study protocol, OR state that no ethical approval or guidance was required and explain why not.
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Plants
Seed stocks The stock center accesion numbers are provided a a supplementary table

Novel plant genotypes
Describe the methods by which all novel plant genotypes were produced.This includes those generated by transgenic approaches, gene editing, chemical/radiation-based mutagenesis and hybridization.For transgenic lines, describe the transformation method, the number of independent lines analyzed and the generation upon which experiments were performed.For gene-edited lines, describe the editor used, the endogenous sequence targeted for editing, the targeting guide RNA sequence (if applicable) and how the editor was applied.

Authentication
Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to assess the effect of a mutation and, where applicable, how potential secondary effects (e.g.second site T-DNA insertions, mosiacism, off-target gene editing) were examined.

Data deposition
Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g.BED files) for the called peaks.

Data access links
May remain private before publication.
For "Initial submission" or "Revised version" documents, provide reviewer access links.For your "Final submission" document, provide a link to the deposited data.

Files in database submission
Provide a list of all files available in the database submission.
Genome browser session (e.g.UCSC) Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to enable peer review.Write "no longer applicable" for "Final submission" documents.

Methodology
Replicates Describe the experimental replicates, specifying number, type and replicate agreement.

Sequencing depth
Describe the sequencing depth for each experiment, providing the total number of reads, uniquely mapped reads, length of reads and whether they were paired-or single-end.

Antibodies
Describe the antibodies used for the ChIP-seq experiments; as applicable, provide supplier name, catalog number, clone name, and lot number.
Peak calling parameters Specify the command line program and parameters used for read mapping and peak calling, including the ChIP, control and index files used.

Data quality
Describe the methods used to ensure data quality in full detail, including how many peaks are at FDR 5% and above 5-fold enrichment.

Software
Describe the software used to collect and analyze the ChIP-seq data.For custom code that has been deposited into a community repository, provide accession details.

Graph analysis
Report the dependent variable and connectivity measure, specifying weighted graph or binarized graph, subject-or group-level, and the global and/or node summaries used (e.g.clustering coefficient, efficiency, etc.).
Multivariate modeling and predictive analysis Specify independent variables, features extraction and dimension reduction, model, training and evaluation metrics.
involve any of these experiments of concern: No Yes Demonstrate how to render a vaccine ineffective Confer resistance to therapeutically useful antibiotics or antiviral agents Enhance the virulence of a pathogen or render a nonpathogen virulent Increase transmissibility of a pathogen Alter the host range of a pathogen Enable evasion of diagnostic/detection modalities Enable the weaponization of a biological agent or toxin Any other potentially harmful combination of experiments and agents We require information from authors about some types of materials, experimental systems and methods used in many studies.Here, indicate whether each material, system or method listed is relevant to your study.If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.
Describe all antibodies used in the study; as applicable, provide supplier name, catalog number, clone name, and lot number.
Define your software and/or method and criteria for volume censoring, and state the extent of such censoring.Specify type(mass univariate, multivariate, RSA, predictive, etc.)and describe essential details of the model at the first and second levels (e.g.fixed, random or mixed effects; drift or auto-correlation).Define precise effect in terms of the task or stimulus conditions instead of psychological concepts and indicate whether ANOVA or factorial designs were used.Specify voxel-wise or cluster-wise and report all relevant parameters for cluster-wise methods.CorrectionDescribe the type of correction and how it is obtained for multiple comparisons (e.g.FWE, FDR, permutation or Monte Carlo).Report the measures of dependence used and the model details (e.g.Pearson correlation, partial correlation, mutual information).