T cell receptor recognition of hybrid insulin peptides bound to HLA-DQ8

HLA-DQ8, a genetic risk factor in type I diabetes (T1D), presents hybrid insulin peptides (HIPs) to autoreactive CD4+ T cells. The abundance of spliced peptides binding to HLA-DQ8 and how they are subsequently recognised by the autoreactive T cell repertoire is unknown. Here we report, the HIP (GQVELGGGNAVEVLK), derived from splicing of insulin and islet amyloid polypeptides, generates a preferred peptide-binding motif for HLA-DQ8. HLA-DQ8-HIP tetramer+ T cells from the peripheral blood of a T1D patient are characterised by repeated TRBV5 usage, which matches the TCR bias of CD4+ T cells reactive to the HIP peptide isolated from the pancreatic islets of a patient with T1D. The crystal structure of three TRBV5+ TCR-HLA-DQ8-HIP complexes shows that the TRBV5-encoded TCR β-chain forms a common landing pad on the HLA-DQ8 molecule. The N- and C-termini of the HIP is recognised predominantly by the TCR α-chain and TCR β-chain, respectively, in all three TCR ternary complexes. Accordingly, TRBV5 + TCR recognition of HIP peptides might occur via a ‘polarised’ mechanism, whereby each chain within the αβTCR heterodimer recognises distinct origins of the spliced peptide presented by HLA-DQ8.

Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection All Mass Spectrometry data were acquired using Orbitrap Tribrid MS Series Instrument Control Software Version 3.3 (ThermoScientific, San Jose, CA, USA). FACS data: BD FACSDiva-8.0.1. Crystallography data: collection code implemented at the Australian Synchrotron MX1 and MX2 beamlines, Analyst® TF 1.8 (SCIEX).

Data analysis
For Mass Spectrometry: Data were analysed using PEAKS Studio Xplus (Bioinformatics Solutions Inc, Waterloo, Canada) with the following settings: parent mass error tolerance of 10 ppm; fragment mass error tolerance of 0.02 Da; no enzyme cleavage; variable modifications of oxidation (M). Data were searched against the human proteome (Uniprot, November 2018). PEAKS PTM, was subsequently performed in which unassigned spectra were searched against the human proteome (Uniprot, November 2018) database by including 55 common modifications with a FDR cut-off of 1% applied. The top 20 high confidence de novo sequenced candidates without any linear peptide match were further interrogated with the "Hybrid finder" algorithm (Faridi et al. (2018) A prominent subset of HLA-I peptides are not genomically templated: evidence for cis-and trans-spliced peptide ligands, Science Immunology, 3(28). pii: eaar3947) and the identified cis-and trans-spliced candidate sequences added back to the original database. A "Multi-Run Search with Denovo Only Spectra" was performed by using the combined database. Linear and spliced peptides in this search were extracted at 1% FDR to create the final list of identified peptides; For crystallographic procedures: phenix-1.14-3260, ccp4-7.0, coot-8.5, xds-20190315,;All crystallographic figures were generated using PyMol V2.3.2; For SPR: Prism-9 (Graph pad); For FACS analysis: FlowJo-10.6.0 (Tree Star); For single cell TCR TRAV/ TRBV usage analysis: databank search engines IMGT/V-QUEST (http://www.imgt.org/IMGT_vquest/vquest?livret=0&Option=humanTcR). For circular dichroism (CD) spectroscopy: CD Analysis & Plotting Tool (https://capito.uni-jena.de).
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

October 2018
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Data availability. The mass spectrometry proteomics data generated in this study have been deposited to the ProteomeXchange Consortium via the PRIDE (https:// www.ebi.ac.uk/pride/) partner repository with the dataset identifier PXD019466 (http://www.ebi.ac.uk/pride/archive/projects/PXD019466). The structures and structure factors for the complexes of HLA-DQ8-L11C with TCRs A2.13, A1.9 and A3.10 generated in this study have been deposited at the Worldwide Protein Data Bank (

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
No sample size calculations were undertaken. Sample sizes for SPR and T cell stimulation experiments were chosen based on observed reproducibility of experimental outcomes in preliminary experiments: For SPR n ≥ 2 independent experiments with n ≥ 2 technical replicates performed. For T cell stimulation assays n > 3 independent experiments with n = 2 technical replicates performed for each data point.

Replication
Each experiment was performed at least twice independently as indicated in the figure legends.
Randomization Randomisation was not relevant to this study because it was in vitro biochemical based analysis and was not an experimental study that required allocation into groups.

Blinding
Not relevant. This was not a clinical study so no cohort comparison (i.e. different treatment groups) or similar experiment was performed.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. Human CD69 (Clone FN50, cat no. 555533, BD Biosciences); Alexa Fluor 647 mouse anti-human CD4 (clone OKT4 (IgG2b); Walter and Eliza Hall Institute mAb Facility); anti-HLA-DQ (clone SPV-L3).