Skip to main content

Data Ingestion in the Tool

Overview

This section describes how you can easily configure datasets to onboard and manage data. By the end of this guide, you will be fully equipped with the ability to configure data sources, define schema, transform & onboard data into a dataset.

Prerequisites

  • Basic understanding of your data requirements
  • Open mind to create datasets & onboard data

What is a Dataset?

A dataset is a collection of organized information, structured like a table where:

  • Rows represent individual items (customers, products, events)
  • Columns describe item attributes (age, price, date)

Example dataset structure:

Customer NameAgeItem BoughtPriceDate of Purchase
Alice25T-shirt$15Jan 1, 2025
Bob30Jeans$40Jan 3, 2025

Dataset Configuration

Basic Setup

  1. Click '+New Dataset' on the Datasets Listing Screen
  2. Provide a unique dataset name
  3. Select appropriate dataset type

Step 1: Connector Setup

Configure data source connections to fetch data.

Connector Configuration
Connector configuration interface

Connection Details

  1. Select source type
  2. Configure source definition
  3. Test connection
Connection Verification
Successful connection verification

Stream Configuration

  • Choose specific stream
  • Select sync mode:
    • Full Refresh: Overwrites existing records
    • Incremental: Updates and adds records

Sync Settings

Configure automatic data fetch intervals:

  • Minutes: Specify minute interval
  • Hours: Set hour interval and time
  • Daily: Set day and time
  • Weekly: Choose days and time
  • Monthly: Set date and time
  • Yearly: Select months, date, and time
Sync Configuration
Sync frequency configuration options

Step 2: Field Mapping

Configure how source fields map to dataset attributes.

Field Mapping
Field mapping interface

Primary Key Selection

  • Choose unique identifier field
  • Ensure field contains only unique values

Attribute Mapping

  • Map source fields to attributes
  • Configure field types
  • Set data validations

Step 3: Join Sources

Configure relationships between multiple data sources.

Join Configuration
Source joining interface

Join Types

  • Inner Join: Matching values only
  • Left Join: All left + matching right
  • Right Join: All right + matching left
  • Full Outer Join: All records with nulls
  • Cross Join: Cartesian product

Step 4: Schema Definition

Define the final structure of your dataset.

Schema Configuration
Schema configuration interface

Field Configuration

  • Set primary key
  • Configure indexes
  • Enable faceting
  • Set search options

Transformations

Transformations
Transformations interface
  • Add field transformations
  • Configure transformation rules
  • Set application conditions