Upper tract urothelial carcinoma has a luminal-papillary T-cell depleted contexture and activated FGFR3 signaling

Upper tract urothelial carcinoma (UTUC) is characterized by a distinctly aggressive clinical phenotype. To define the biological features driving this phenotype, we performed an integrated analysis of whole-exome and RNA sequencing of UTUC. Here we report several key insights from our molecular dissection of this disease: 1) Most UTUCs are luminal-papillary; 2) UTUC has a T-cell depleted immune contexture; 3) High FGFR3 expression is enriched in UTUC and correlates with its T-cell depleted immune microenvironment; 4) Sporadic UTUC is characterized by a lower total mutational burden than urothelial carcinoma of the bladder. Our findings lay the foundation for a deeper understanding of UTUC biology and provide a rationale for the development of UTUC-specific treatment strategies.

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection Whole exome sequencing of WCM UTUC samples was performed using Illumina HiSeq 2500 (2×100 bp). A total of 21,522 genes were analyzed with an average coverage of 85× using Agilent HaloPlex Exome (Agilent Technologies, Santa Clara, CA). Bioinformatic analysis of BCM-MDA samples data was performed as previously described30. RNA sequencing of WCM UTUC samples was performed on GAII, HiSeq 2000, or HiSeq 2500. All reads were independently aligned with STAR_2.4.0f137 for sequence alignment against the human genome sequence build hg19, downloaded via the UCSC genome browser (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/), and SAMTOOLS v0.1.1938 for sorting and indexing reads. RNA was purified from BCM-MDA UTUC tumors and mRNA expression from was computed for all genes from RNA sequencing data.

Data analysis
All the WCM samples data were processed through the computational analysis pipeline of the Institute for Precision Medicine at Weill Cornell, New York Presbyterian Hospital (IPM-Exome-pipeline). Raw reads quality was assessed with FASTQC. Pipeline output includes segment DNA copy number data, somatic copy-number aberrations (CNAs) and putative somatic single nucleotide variants (SNVs). Bioinformatic analysis of BCM-MDA samples data was performed. For RNA sequencing analysis of WCM UTUC tumors, Cufflinks (2.0.2) was used to estimate the expression values (FPKMS), and GENCODE v2340 GTF file for annotation. Rstudio (1.0.136) with R (v3.3.2) and ggplot2 (2.2.1) were used for the statistical analysis and the generation of figures. mRNA expression from BCM-MDA UTUC tumors was computed for all genes from RNA sequencing data.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

October 2018
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The genomic data that support the findings of this study are available in the cBioPortal for Cancer Genomics with the identifier "https://www.cbioportal.org/study? id=utuc_cornell_baylor_mdacc_2019". The source data underlying Figs 1a-d, 2a-e, 3a,b and 4a-f and Supplementary Figs 1, 2, 3, and 4 are provided as a Source Data file.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative. Data exclusions Patients with low-grade tumors, non-urothelial histology or variant histology in >50% of tumor tissue were excluded from the study.

Replication
Several methods were used for each analysis; for example for detection of UTUC subtypes 3 different classifiers and NMF were used, and results were further functionally studied in vitro.
Randomization Not applicable to our study.
Blinding Not applicable to our study.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.