Background & Summary

T cells are an essential cell type for next-generation vaccines and immunotherapies1. T cells recognize antigens in the form of short peptides presented by MHC (human leucocyte antigen [HLA] in humans) molecules — collectively referred to as the MHC ligandome/peptidome or immunopeptidome24. Robust and comprehensive immunopeptidomic profiling of primary cells and tissues is therefore of great importance for the development of effective T-cell based immunotherapies5.

The cellular immunopeptidome is composed of thousands of MHC-associated peptides—each peptide ranging in abundance between approximately 1 and 10,000 copies per cell6. The immunopeptidome can be divided into two main categories: the MHC class I and the MHC class II immunopeptidome. The latter is composed of peptides of 10-25 amino acids in length that are mainly presented on a subset of professional antigen presenting cells. In contrast, the class I immunopeptidome is composed of peptides presented on the surface of virtually any nucleated cell. Class I peptides are generally of 8-12 amino acids in length7,8. In mammals, the composition of the immunopeptidome is complicated by the high diversity of allelic forms9. Each allelic form can present a different set of peptides that are characterized by the presence of allele-specific anchor residues, known as MHC binding motif10. In humans, more than 17,600 alleles have been documented (IPD-IMGT/HLA Database; December 2017; http://hla.alleles.org/alleles/index.html) and up to six class I and eight class II alleles can be expressed per cell in each individual. In mouse, >200 alleles are expressed among the most commonly used mouse strains (http://www.imgt.org/IMGTrepertoireMHC/Polymorphism/haplotypes/mouse/MHC/Mu_haplotypes.html), and up to two class I and two class II alleles can be expressed per cell in each mouse strain. Even though the composition of the immunopeptidome is highly complex in nature, the deployment of robust technology platforms has facilitated the deciphering of the immunopeptidome at increasing depth and robustness5,11. MS is most widely used due to its capability of identifying and quantifying MHC-associated peptides in an accurate, systematic and unbiased manner12. In fact, many immunopeptidomics studies have demonstrated the ability of MS workflows to identify thousands of MHC-associated peptides from various biological sources in human, mouse and other species1323. Those studies led to a better and systematic understanding of antigen presentation and provided direct physical evidence for the existence of tumor-specific peptides. Nevertheless, only a handful of studies have reported detailed information about the composition of the immunopeptidome in healthy cells and tissues. More specifically, immunopeptidomic analyses of normal thymic cells24,25, peripheral blood mononuclear cells26, and spleen and lymph nodes3,27 have been documented. Thus, basic information about the identity, abundance and distribution of MHC-associated peptides across normal tissues and organs in healthy humans, mice or other species is still largely missing in the literature.

Open and comprehensive reference maps in life sciences, including tissue-based maps, are increasingly beneficial for the scientific community2832. Similarly, the creation of comprehensive maps of the immunopeptidome in human, mouse, and other species would be of great value for both understanding health and diagnosing, monitoring and treating immune diseases5. Given the advances in MS technology over the last decade, the availability of protocols for the isolation MHC-associated peptides from multiple species and tissue types, and the relatively less complex composition of the immunopeptidome in mouse models (in comparison with humans), we reasoned that the time was ripe to initiate a systematic effort to draft the first MS-based atlas of the murine MHC class I immunopeptidome in health using a commonly used mouse strain. To this end, we used data-dependent acquisition (DDA) MS to generate immunopeptidomic data from 19 tissues of healthy C57BL/6 mice. They express both H2Db and H2Kb class I molecules (Fig. 1a). We also mapped the immunopeptidome of four C57BL/6-derived cancer cell lines and used an open and evolving computational pipeline to process the data. Several stringent filters to generate a list of high-confidence H2Db/Kb class I peptides for individual tissues and cell lines were applied. All raw/unfiltered MS data as well as H2Db/Kb peptide spectral libraries — which consist of consensus spectra calculated from repeat measurement of the same peptide sequence — are made publicly available for re-use and re-processing by the community for in-depth interrogation of the dataset (Fig. 1b). In summary, the present study provides a unique resource for basic and translational immunologists to navigate the baseline immunopeptidome in mouse. An open reference map of the murine immunopeptidome in health is valuable for i) basic and translational immunologists to rapidly identify disease-specific MHC peptide antigens — through comparison of peptides found in the reference map versus those identified in disease cells — and ii) computational scientists to access a rich source of data to support technical benchmarking of future studies to develop or test new algorithms for immunopeptidomic analyses. In addition, this reference map, together with its connection with SWATH Atlas, lays down the foundation to perform robust quantitative analysis of the murine immunopeptidome using next-generation SWATH/Data-independent acquisition (DIA)-MS technologies26,33.

Figure 1: Schematic overview of the experimental and computational workflow used to generate and analyze the data.
figure 1

(a) 19 different tissues from C57BL/6 mice were extracted (Table 1) (Annotation Table, Data Citation 1). H2Db and H2Kb-associated peptides were isolated independently by immunoaffinity purification using the monoclonal antibodies B22-249.R1 and Y-3, respectively. Eluted peptides were identified by different LC-MS/MS systems in DDA mode. (b) MS output files were converted, searched, and statistically validated using the indicated software tools. The identified peptides were then clustered (GibbsCluster v.1) and annotated by length and predicted MHC binding affinity (NetMHC v.4). The final list of high-confidence MHC-associated peptides were used to build high-quality H2Db- and H2Kb-specific peptide spectral and assay libraries, which were deposited and shared via SysteMHC Atlas and SWATH Atlas, respectively.

Methods

Mouse tissues and cell lines

Adrenal gland, bladder, bone marrow, brain, colon, heart, kidney, liver, lung, ovary, pancreas, small intestine, skin, spinal cord, spleen, stomach, testis, thymus, and uterus were extracted from C57BL/6 male or female mice (Annotation Table, Data Citation 1). The EL4, LLC1 (LL/2) and B16F10 cell lines were obtained from ATCC. The GL261 cell line was obtained from DSMZ. All cell lines were cultured in DMEM with GlutaMAX-1 supplemented with 100 U/mL penicillin, 100 μg/mL streptomycin and 10% fetal bovine serum. B16F10 was treated with IFNγ (200 U/mL) for 24 h to increase the cell surface expression of H2Db and H2Kb molecules.

Isolation of MHC class I-associated peptides

H2Db- and H2Kb-associated peptides were isolated by a conventional immunoaffinity purification method using the monoclonal antibodies B22-249.R1 and Y-3, respectively22. For generating the tissue-based map of the murine MHC class I immunopeptidome, the tissue/organs from five to six mice were pooled together before isolating MHC-peptide complexes for any given tissue (Annotation Table, Data Citation 1). For each cell line used in this study, ~109 cells were grown before isolating MHC-peptide complexes. The cell surface abundance of MHC proteins was also quantified for each cell line using the QIFIKIT quantification flow cytometric assay, as previously described34.

DDA mass spectrometry

Fragment ion spectra of the respective MHC class I peptide preparations were acquired on an Orbitrap Fusion Lumos and/or a Triple TOF 5600+ (see below) operated in DDA mode. For retention time (RT) normalization and spectral library generation, peptides from the iRT Kit (Biognosys AG, Schlieren, Switzerland) were added to the samples prior to MS injection according to vendor instructions35 (Data Citation 2).

For Lumos data (Annotation Table, Data Citation 1), peptides were separated on an Acclaim PepMap RSLC C18 column (250 mm x 75 um i.d., 2 Å particle size; ThermoFisher Scientific) using a flow rate of 300 nl min-1 and a linear gradient of 4–29.6% aqueous ACN (with 0.1% formic acid) in 120 min. Full mass spectra were acquired with the Orbitrap analyser operated at a resolving power of 120,000 (at m/z 200). MS/MS spectra were acquired in both HCD and CID mode with a normalized collision energy of 27%. Precursors were selected in the "top speed" mode with a cycle time of 3 s. Fragment ions (charge state 2-6+) were accumulated up to an AGC target value of 50,000 with a maximum injection time of 54 ms and the option "Inject ions for all available parallelizable time" enabled, and were detected in the Orbitrap analyzer at a resolution of 30,000 (at m/z 200). Dynamic exclusion was enabled for 30 s after a selection event with a tolerance of±10 p.p.m.

For Triple TOF 5600+ data, (Annotation Table, Data Citation 1). Samples were separated on an Eksigent nanoLC system coupled with an AB SCIEX Triple TOF 5600+ System. The samples were separated in a 75 μm-diameter PicoTip emitter (New Objective, Woburn, MA) packed with 20 cm of Magic 3 μm, 200 Å C18 AQ material (Bischoff Chromatography, Leonberg, Germany). The loaded peptides were eluted from the column at a flow rate of 300 nl/min and a linear gradient of 2–35% aqueous ACN (0.1% formic acid) over 120 min. The mass spectrometer was operated in DDA top20 mode, with 500 and 150 ms acquisition time for the MS1 and MS2 scans respectively, and 20 s dynamic exclusion. MS/MS spectra were acquired in CID mode. Rolling collision energy with a collision energy spread of 15 eV was used for fragmentation.

Database search engines, statistical validation, high-confidence filters and spectral library generation

Raw mass spectrometry files were converted into the mzXML format by msConvert36. The mzXML files were then individually searched using Comet37, MSGF38 and X!Tandem39 against the full non-redundant, canonical mouse genome as annotated by the UniProtKB/Swiss-Prot (2014_02) with 20,270 ORFs and appended iRT peptides and reversed decoy sequences. Oxidation at methionine residues was the only variable modification allowed. We used default search settings for all the engines with the following key parameters: Precursor tolerance was set to ±20 p.p.m., high accuracy fragment ion tolerance was set to ±0.02 Da for Comet and 20 p.p.m. for X!Tandem, and digestion specificity was set to unconstrained. The search identifications from different search engines were then combined and statistically scored using PeptideProphet and iProphet within the TPP (4.8.0), as previously described40. The probabilities estimated by iProphet was cut at 1% FDR. Then, all 8, 9-mers (for H2Kb) and all 9-11-mers (for H2Db) were clustered using GibbsCluster (v1.0)41 to visualize MHC binding motifs enriched in the dataset. To select the final list of high-confidence H2Db- and H2Kb-associated peptides, strict cut-off criteria were applied: FDR 1% (peptide-spectrum match level); 8–9 and 9–11 amino acids in length for H2Kb and H2Db peptides, respectively; and IC50<500 nM (NetMHC v4.0). Spectral libraries were generated by SpectraST using the list of high-confidence H2Db/Kb peptides, with default consensus library building parameters, as previously described26. H2Db- and H2Kb-specific peptide spectral libraries were then combined and generated on the peptide atlas level that contains consensus spectra of peptides from different samples. For a given allele-specific spectral library, the same peptide ions generated under various fragmentation methods (CID Orbitrap, CID TOF and HCD) were specified and kept separated as different library entries. The generated spectral libraries were further converted into TraML format and archived in SWATH Atlas for SWATH/DIA-MS analysis.

Data Records

The accession number for the DDA-MS data (raw and centroided mzXML and identified peptides in pepXML report) used to generate the spectral libraries have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository42 with the dataset identifier PXD008733 (Data Citation 2). Raw and mzXML files are also accessible via the SysteMHC Atlas repository (https://systemhcatlas.org/)40 with the dataset identifier SYSMHC00018. The H2Db- and H2Kb-associated peptide spectral libraries (SpectraST format) and assay libraries (CSV, TraML) are available for different SWATH/DIA-MS data analysis tools at SWATH Atlas (http://www.swathatlas.org/). Spectral libraries are also accessible at SysteMHC Atlas (https://systemhcatlas.org/speclibs). The lists of all peptides (unfiltered and filtered) are available in figshare (https://figshare.com/s/5436fdcdb908000a49d5) (Data Citation 1).

Technical Validation

MS-based identification of high-confidence H2Db- and H2Kb-associated peptides in 19 tissues of healthy mice

The draft map of the murine MHC class I immunopeptidome was generated from 19 different C57BL/6 tissues extracted under steady-state conditions (Fig. 1,Table 1 and Methods). H2Db- and H2Kb-peptide complexes were isolated by immunoaffinity purification using the B22-249.R1 and the Y-3 antibody, respectively. Peptides were acid-eluted and acquired in DDA mode using different MS instruments (Table 1). Following acquisition of data from 280 MS runs, 4.8 million MS/MS spectra were searched using a uniform and well-tested computational pipeline26,40 (Fig. 1b) and yielded assignments of 681,357 and 850,396 peptide ions with iProphet probability P≥0.9 and P>0.0, respectively.

Table 1 Overview of normal tissues and tumor cell lines used to generate the draft map and spectral libraries.

Next, we considered all 7–14 mers identified at FDR 1%, resulting in a total number of 81,058 peptides (Supplementary Figure 1) (List of unfiltered H2Db peptides (7–14 mers), Data Citation 1) (List of unfiltered H2Kb peptides (7–14 mers), Data Citation 1). We then applied very strict confidence filters (see Methods) to remove potential non-MHC binding contaminant peptides. As an example, we observed that after filtering, 72% of all 9-mer H2Db peptides and 81% of all 8-mer H2Kb peptides (FDR 1%) identified from spleen tissue were predicted to have a strong MHC binding affinity with IC50<500 nM (Supplementary Figure 2a). Similarly, 80% of all 9-mer H2Db peptides and 80% of all 8-mer H2Kb peptides (FDR 1%) identified from heart tissue were predicted to have a strong MHC binding affinity with IC50<500 nM (Supplementary Figure 2m). These data and similar data from other tissue types indicate that the antibodies that were used in this study are relatively specific and the proportion of high-confidence H2Db/Kb-associated peptides that were identified from different tissue types was generally high and varied only slightly. Data to calculate the proportion of high-confidence H2Db/Kb-associated peptides per tissue and cell type are available in (List of unfiltered H2Db peptides (7-14 mers), Data Citation 1) (List of unfiltered H2Kb peptides (7-14 mers), Data Citation 1). Longer peptides (i.e.>11 amino acids for H2Db and>9 amino acids for H2Kb) or peptides predicted to bind H2Db/Kb with a lower affinity (IC50>500 nM) were considered in this study as low-confidence H2Db/Kb peptides–although they might still be genuine H2Db/Kb-associated peptides–and were therefore not included for downstream analysis and spectral library generation.

After filtering the whole dataset, the number of high-confidence H2Db/Kb-associated peptides identified per tissue demonstrated a high variability that ranged from 146 (spinal cord tissue) to 3,263 (spleen tissue) with an average number of 1,497 peptides (Fig. 2c). The different amounts of tissues (in grams) used for immunoprecipitation as well as sample handling may have contributed to this large difference in the number of identified peptides. Nevertheless, we observed that the number of high-confidence peptides identified per tissue generally correlated with the abundance of MHC class I proteins previously reported from the same mouse tissues (Supplementary Figure 3)43. Overall 15,645 (2,693 unique) high-confidence H2Db-associated peptides (FDR 1%, 9–11 amino acids, IC50<500 nM) and 12,803 (2,594 unique) H2Kb-associated peptides (FDR 1%, 8–9 amino acids, IC50<500 nM) were identified (Fig. 2). The identified peptides mapped to 4,050 of the mouse UniProtKB/Swiss-Prot proteins. Of note, 36.4 and 27.4% of all the high-confidence H2Db- and H2Kb-associated peptides were not shared across tissues but were rather exclusively detected in one particular tissue (Table 2). In contrast, a relatively small proportion of the measured H2Db immunopeptidome (0.2%) and H2Kb immunopeptidome (1.9%) was shared across all the 19 tissues (Table 2). For instance, the H2Kb-associated peptides INFDFPKL and VNFEFPEF were found in all the 19 tissues whereas the H2Db-associated peptides AAITNGLAM and HSVINQAVM were found exclusively in the brain. It is important to emphasize, however, that the proportion of tissue-shared and tissue-specific peptides mentioned above were not calculated from quantitative and normalized values. In fact, it is very likely that the low coverage overlap described above would have increased significantly if larger amounts of tissues–for those expressing lower levels of MHC molecules–would have been used for immunoprecipitation. In future studies, it will be important to consider the absolute abundance of MHC molecules per tissue type and adjust/normalize the amounts of starting material accordingly to investigate in a more rigorous manner the tissue-specificity of the MHC class I immunopeptidome. Additional factors such as sample handling, yield of the immunoaffinity purification procedure per tissue type, and limits of detection (LOD) and quantification (LOQ) of mass spectrometers used for identifying MHC-associated peptides would also need to be considered. Taken together, these results delineate the first draft map of the murine H2Db/Kb class I immunopeptidome in health and provide initial qualitative data to further explore the tissue-specificity of the immunopeptidome.

Figure 2: Identifications of high-confidence H2Db/Kb-associated peptides from 19 normal mouse tissues.
figure 2

(a) Pie chart indicating the total number of high-confidence H2Db- and H2Kb-associated peptides that were identified across all tissues. (b) Graphs showing the high proportion of high-confidence H2Kb- (upper panel) and H2Db- (lower panel) peptides with a predicted MHC binding affinity (IC50) below 125 nM. The binding motifs for H2Kb- and H2Db-associated peptides were illustrated. (c) Histogram showing the distribution of high-confidence H2Db/Kb-associated peptides identified per mouse tissue.

Table 2 Number of tissue-specific (1 tissue) and tissue-shared (2-19 tissues) H2Kb- and H2Db-associated peptides identified in this study.

A reference map of the murine MHC class I immunopeptidome in health guides identification of potential tumor-associated antigens (TAAs)

The rapid and robust identification of TAAs or tumor-specific antigens is relevant for the development of cancer vaccines, and the generation of a reference map of the MHC class I immunopeptidome in health supports identification of such peptide antigens5,34,44. In this regard, we compared the list of peptides found in the 19 healthy mouse tissues to those found in several in vitro tumor models. More specifically, we profiled the H2Db/Kb immunopeptidome of four different cancer types from four widely used C57BL/6 tumor-derived cell lines: 1) EL4 cells (lymphoma), 2) LLC1 cells (Lewis lung carcinoma), 3) GL261 cells (malignant glioma) and 4) B16F10 cells (melanoma). In summary, 3,282 unique high-confidence H2Db/Kb-associated peptides were identified in the four tumor cell lines, 2,552 peptides were shared between the healthy tissues and the tumor cell lines, and 730 (22%) peptides were exclusively observed in the tumor cell lines (Fig. 3a) (List of high-confidence H2Db peptides, Data Citation 1) (List of high-confidence H2Kb peptides, Data Citation 1). The presence of tumor cell line-specific peptides was also noted. For instance, 28 peptides and 49 peptides were exclusively identified in GL261 and B16F10 cells, respectively (Supplementary Figure 4). Those peptides might be classified as glioma- and melanoma-associated antigens, respectively, if further tested and validated. Thus, a reference map of the murine immunopeptidome in health guides identification of potential TAAs in model cell lines. We envision that a comprehensive reference map of the murine immunopeptidome in health will find application in tumor immunology and beyond, e.g. in immunopathology to identify a wide variety of disease-specific peptide antigens.

Figure 3: Analysis of high-quality H2Db/Kb-specific peptide spectral libraries generated from healthy mouse tissues and tumor cell lines.
figure 3

(a) Venn diagram showing the overlap between high-confidence H2Db/Kb-associated peptides identified from 19 healthy tissues and 4 tumor cell lines used in this study. (b) Cumulative number of MS/MS spectra acquired versus cumulative number of distinct high-confidence H2Db/Kb peptides identified. Each data point represents an added injection/MS experiment, and the experiments are presented in a chronological order of data acquisition (see Order of injection for H2Kb peptides; Data Citation 1 and Order of injection for H2Db peptides; Data Citation 1). (c) Histogram indicating the number of distinct peptide ions that were generated from normal tissues using different fragmentation methods (i.e. Orbitrap CID, HCD and CID-QTOF). High-quality MHC allele- and fragmentation-specific peptide spectral libraries were generated using SpectraST. (d) Venn diagram showing the overlap between high-confidence H2Db/Kb-associated peptides generated from normal tissues using different fragmentation methods. (e) Screenshot of SysteMHC Atlas (https://systemhcatlas.org/). The raw MS output files, the peptide sequences and the spectral libraries are accessible at SysteMHC Atlas with the dataset identifier SYSMHC00018.

H2Db/Kb peptide spectral libraries saturation analysis

Comprehensive and robust quantitative analysis of the immunopeptidome is important to 1) identify new immunotherapeutic targets, 2) better understand the relationship between T cells and MHC-presenting cells, and 3) potentially identify immunopeptidomic biomarker signatures in normal and disease cells from sample cohorts. Building high-quality peptide spectral libraries was demonstrated to be an efficient procedure to support robust quantitative analysis of immunopeptidomes using advanced MS techniques, i.e. SWATH/DIA26,33,40. To estimate the status of our initial mapping effort and to support robust quantitative analysis of the murine MHC class I immunopeptidome, we created H2Db/Kb-specific peptide spectral libraries and we plotted the cumulative number of distinct H2Db/Kb peptides as a function of the number of MS2 spectra acquired on the mass spectrometer (Fig. 3b). Each data point on the curve represents an added injection/experiment, and the experiments are presented in chronological order of data acquisition (Order of injection for H2Kb peptides, Data Citation 1) (Order of injection for H2Db peptides, Data Citation 1). The graphs indicate that new H2Db/Kb peptides were continuously identified as additional MS/MS spectra were collected, suggesting that new peptides will probably be discovered in future experiments, as saturation has not been reached using the presently available technology. Therefore, collecting more data from new experiments (e.g. additional cell lines, additional primary tissues, new experimental conditions, new protocols and MS technologies) will be needed to enable comprehensive and robust quantitative analysis of the murine MHC class I immunopeptidome in the future. In addition, we anticipate that absolute quantitative analysis of immunopeptidomes—i.e. absolute quantification of MHC molecules as well as absolute and systematic quantification of individual MHC-associated peptides per cell and tissue type—will become essential to rigorously assess the completeness of this initial mapping effort.

Sharing H2Db/Kb peptidomic data via SysteMHC Atlas

We anticipate that the dataset generated in this study will be widely used by basic and translational immunologists as well as computational mass spectrometrists. Therefore, an important goal here is to share our immunopeptidomics MS-related data at many different levels of processing. Specifically, we provide raw and converted mzXML files, lists of high-confidence peptides (iProphet results) and H2Db/Kb peptide spectral libraries, all available for download from the SysteMHC Atlas (Fig. 3) (H2Db/Kb peptides used for spectral library generation, Data Citation 1).

The SysteMHC Atlas is a new public data repository that serves as a community resource toward the generation of high-quality comprehensive maps of immunopeptidomes and the support of consistent measurements of immunopeptidomic sample cohorts40. Until now, the SysteMHC Atlas contains 540 sample/context- and 39 MHC allele-specific peptide spectral libraries (37 HLA and 2 H2b), all available for download from the web interface. Moreover, the H2Db- and H2Kb-specific peptide spectral libraries generated in this study were both converted into TraML files for robust quantitative analysis of immunopeptidomes using SWATH/DIA-MS, as described previously26,45. TraML files are available at SWATH Atlas (www.swathatlas.org). Notably, three separate fragmentation-specific libraries were created: 1) CID and 2) HCD using the Orbitrap Fusion Lumos, and 3) CID-QTOF using the Triple TOF 5600+ (Fig. 3c). Different fragmentation methods are complementary and can be used to enhance the identification success rate of MHC-associated peptides and to thus increase immunopeptidome coverage (Fig. 3d). More importantly, the CID-QTOF-, CID- and HCD-specific spectral libraries support the high-throughput targeted analysis of SWATH/DIA immunopeptidomic data generated by these different fragmentation methods.

In the future, we foresee that continuous development of SysteMHC Atlas for effective sharing and re-analysis of immunopeptidomic datasets will be key to comprehensively define the composition and complexity of the murine immunopeptidome. For instance, we envisage that re-analysis of raw MS data using advanced peptide sequencing algorithms might unveil the presence of non-canonical MHC-associated peptides, e.g. proteasome-spliced peptides46,47 (https://www.biorxiv.org/content/biorxiv/early/2018/03/26/288209.full.pdf), which would be of particular relevance for the development of peptide-based vaccines and immunotherapies in precision medicine48.

Usage notes

The lists of peptides provided in figshare (Data Citation 1) may differ from the ones available at SysteMHC Atlas (https://systemhcatlas.org)40. The SysteMHC computational pipeline used to generate the peptide lists is subjected to periodic upgrades and the resulting data may be different from the original publication. To ensure reproducibility of the results, we have introduced a database versioning system and the current build version is 180409 (year/month/date). Current and past builds can also be downloaded at: https://systemhcatlas.org/Builds_for_download/. This information is also available in the ‘ABOUT’ section of the SysteMHC Atlas website.

Additional information

How to cite this article: Schuster, H. et al. A tissue-based draft map of the murine MHC class I immunopeptidome. Sci. Data 5:180157 doi: 10.1038/sdata.2018.157 (2018).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.