💾Manage your data

Automatically organize and centralize datasets from various sources using Ontologic's dataset automations. This example collates example results from a biology team and a chemistry team into one view.

1. Upload csv data

Use the 3 example datasets below to follow along with this tutorial. These datasets simulate the kinds of results that a bio team and a chem team may create when testing a few candidate molecules.

In this example, the bio team has their internal way of naming experiments, and refer to molecules provided by the chem team with "sample_id". The chem team has a different internal way of naming experiments, and refer to molecules they make with "batch_id".

From the Files page, click the "Upload Files" button and upload all 3 csvs.

Ontologic will automatically ingest any csvs and Excel files uploaded into Files, including the outputs of tool runs. When the platform is done processing the csv, it will link a schema at the bottom of the file preview.

Make sure that your dataset has just headers in the first row, and body in the rest of the document. For example, if your first row is a cell that reads "Bio team week 1" and the next row has column headers, delete the original row so that the data starts in row 2. This allows the platform to parse your content correctly.

2. Confirm dataset organization with schemas

Once the data is ingested, navigate to the Data Explorer page and click on the "Schema" tab.

The platform will define a schema for every unique set of column headers and automatically group datasets that follow the same column header format. This is why the two chemistry datasets are assigned to one schema, and the biology dataset is assigned to another.

When you add new schemas to the platform, check the organization of those datasets by clicking "Edit". From here, you can change the name of the schema to something more descriptive. Change the name of "files_like_bio_data_week_1_csv" to "bio_data" and click Save.

Ontologic also supports scientific data visualizations, like chemical structures. For the chemistry schema, set the type for "canonical_smiles" from STRING to SMILES to let the platform display chemical structures.

You only need to do this step the first time you upload a new type of data to the platform. Ontologic will automatically detect and centralize additional files of the same format to existing schemas, as long as the column headers and data types are the same.

3. Enrich datasets with the query builder

Once you have reviewed the schemas, navigate to the "Queries" tab of the Data Explorer page. Click on the "Create Query" button.

In the Query builder view, you can add datasets that have been organized into schemas. Click on "Add Collection" and select "bio_data".

Then add another collection and select "chem_data". The Correlation Columns for each group will be the identifier that is shared between the two datasets.

For the Correlation Column for bio_data, select "sample_id".

For the Correlation Column for chem_data, select "batch_id". This will automatically generate the SQL code for joining the dataset. Click the "Run Query" button to kick off the query.

Why choose bio_data first?

In this example, we want to enrich the bio dataset with information about the molecules' synthesis history from the chem team, so the table is filled with bio information first.

If you instead want to enrich the chem dataset with information about how the molecules behaved in the bio assays, select chem_data first.

The generated CSV now shows all bio data enriched with chem data, including SMILES structures.

Sort and filter columns using the arrows and funnel symbol at the top of each column. Set a filter on "percent_inhibition" and click OK to update the table. Clear filters with the "clear filter" option on funnels with a dot in the upper right corner.

Once your query is set to your liking, download this view of the data with the "Download" button below. In the future, additional datasets are automatically added to this query when you click on "Run Query" after the data has been uploaded.

Last updated