Figure 3 : Overview of the Data Processing Pipeline.

From: Sustainable data and metadata management at the BD2K-LINCS Data Coordination and Integration Center

The Data Processing Pipeline consists of four main steps. Data Submission: data and metadata generated by the DSCGs are transferred to the DCIC via one of several technological solutions. Validation: the format and terminology of the data and metadata get validated according to the internal and qualified external references. Standardization and Aggregation: Submitted Reagent Metadata are standardized and further validated and annotated using qualified external references. Small Molecules and Cell Lines are registered into dedicated registration systems (SMDB, Small Molecule DataBase and CLDB, Cell Line DataBase) and are also assigned global IDs (PURLs). The Experimental, Dataset and Assay Metadata are directly deposited into the LDR. Processing pipelines and data files are deposited into the DCIC File Storage. The LDR and the DCIC File Storage are then used for the creation of the LINCS Dataset Packages. After quality control, released Dataset Packages are assigned global PURLs and made accessible via the LDP. Data Packages can be accessed via the LDP UI, through APIs and the LDP R package. Arrows indicate the flow of information between the four main processing steps.