Skip to main content

Ingestion Pipelines

Move data from sources to your data warehouse with full control over sync modes, schema selection, and execution.

Overview

Ingestion pipelines extract data from external sources (databases, SaaS platforms, APIs) and load it into your data warehouse. Configure how data is synced, loaded, and how schema changes are handled.

Prerequisites: At least one active source and one active destination.

To access: Navigate to Pipelines in the sidebar, then click + Create Pipeline.


Creating an Ingestion Pipeline

To start:

  1. Navigate to Pipelines in the sidebar
  2. Click + Create Pipeline
  3. If you have multiple projects, select which project to create the pipeline in (this determines the destination warehouse)
  4. You'll be redirected to the Workspace IDE, which shows all pipelines in the selected project in an Explorer panel on the left

The pipeline wizard then opens with 4 steps:

Step 1: Choose Source

Select the data source for your pipeline. Only active sources are available.

To proceed: Click Continue.


Step 2: Configure Pipeline

The destination is pre-selected based on the project's warehouse (all pipelines belong to a project). Configure your pipeline settings below.

Pipeline Settings

Availability Note

Not every pipeline setting or option is available for every connector. Supaflow reads source and destination capabilities and only enables supported options. For example, some destinations only allow APPEND load mode or LATENCY optimization.

UI Note

To keep the wizard focused, Supaflow shows only 4 core settings by default in this step. All other supported settings are under Advanced Settings.

Destination Section (Schema/Prefix)

At the top of Step 2, Supaflow shows the selected destination with key connection details (destination name, database, schema/prefix, role).

The schema/prefix column in this section is where destination naming is surfaced:

  • When Namespace Rule = Destination Defined: the card shows the destination Schema from destination config (read-only in this step)
  • Otherwise: the card shows the pipeline namespace value used for naming. The UI label is:
    • Prefix for sources with schema/catalog hierarchy (PostgreSQL, MySQL, etc.)
    • Schema for sources without hierarchy (Salesforce, HubSpot, etc.)

System generates a unique prefix for each pipeline by default (for example, salesforce, salesforce_2) so pipelines do not collide.

How the prefix is used:

  • Sources without schemas (Salesforce, HubSpot, etc.): Prefix becomes the schema name where tables are created
    • Example: Prefix salesforce → Schema salesforce → Tables salesforce.Account, salesforce.Contact
  • Sources with schemas (PostgreSQL, MySQL, etc.): Prefix is prepended to the source schema name
    • Example: Prefix postgres + Source schema orders → Destination schema postgres_orders

When editable, click the schema/prefix chip (or pencil icon) in the destination section to update it.

If the selected schema/prefix is already used by another pipeline, Supaflow shows a warning. Collision outcomes are then governed by Destination Table Collision Handling.

Sync Settings

SettingDescriptionOptions
Ingestion ModeHow data should be synced from source• Historical + Incremental (default) - Full sync first, then ongoing changes
• Historical Only - Full data sync every run
• Incremental Only - Only new and changed data
Error Handling ModeHow to handle errors during sync• Continue with Warnings (default) - Log errors but keep processing
• Abort on Any Error - Stop when errors occur
Full Sync Object Refresh FrequencyHow often objects that do not support incremental ingestion are synced. Disabled when Ingestion Mode is Historical Only.• Every Run - Resync on every pipeline execution
• Daily - Resync every day
• Weekly (default) - Resync every 7 days
• Monthly - Resync every 30 days
• Never - Only run once (initial sync), then skip
Full Resync FrequencyHow often all objects are forced into a complete resync, regardless of incremental support. Disabled when Ingestion Mode is Historical Only.• Never (default) - No scheduled full resync; only incremental syncs after initial load
• Daily - Force full resync every day
• Weekly - Force full resync every 7 days
• Monthly - Force full resync every 30 days

Important: For objects that do not support incremental ingestion, Supaflow runs historical sync only when the configured Full Sync Object Refresh Frequency is due (or when a forced full resync applies). On runs where historical sync is not due, those objects can be skipped.

Load Settings

SettingDescriptionOptions
Load ModeHow data should be loaded into the destination• Merge (default) - Insert new and update existing records
• Append - Add new records only
• Truncate and Load - Delete all data then load
• Overwrite - Drop and recreate tables
Destination Namespace RulesHow destination schemas and tables should be mapped from the source• Mirror Source (default)
• Destination Defined - Use destination's default schema
• Fivetran Naming - Use Fivetran-like naming
Destination Table Collision HandlingHow to handle table name collisions during initial sync• Merge with Existing (default)
• Fail if Table Exists (recommended)
• Drop and Recreate
Perform Hard DeletesPermanently delete records from destination when deleted from source• No (default) - Keep all records
• Yes - Delete records
Load Optimization ModeStrategy for optimizing pipeline execution• Optimize for Cost (default) - Batch operations
• Optimize for Latency - Process immediately
• Adaptive - System decides
Data Validation LevelLevel of data integrity validation during sync• None (default) - No validation
• Row Count - Validate row counts
• Schema and Count
• Column Statistics
• Full Integrity

Schema Evolution Settings

SettingDescriptionOptions
Schema Evolution ModeHow to propagate source schema changes to the destination• Allow All Changes (default) - Automatically propagate all changes
• Block All Changes - Prevent any modifications
• Column Level Changes Only - Allow column changes only
Auto Re-Sync on New TableAutomatically do a full re-sync for new tables• Yes (default)
• No
Auto Re-Sync on New ColumnAutomatically re-sync entire table when new column detected• No (default)
• Yes
Auto Re-Sync on Schema ChangeAutomatically re-sync when any schema change detected• No (default)
• Yes

To proceed: Click Continue.


Step 3: Choose Objects to Sync

Select which tables and fields to replicate from your source.

To select:

  • Use the top-level checkbox to select all objects
  • Or expand catalogs/schemas and select individual tables and fields
  • Use the search bar to filter objects by name

Selections are saved automatically as you work.

To proceed: Click Continue.


Step 4: Review & Save

Review all pipeline settings and selected objects.

Pipeline Name: Auto-generated from source/destination names (e.g., "Salesforce to Snowflake"). You can edit this before creating.

To create: Click Create Pipeline.

You'll be redirected to the pipeline detail page where you can view settings, schema, and run the pipeline.


Pipeline Detail Page

After creating a pipeline, you'll see the pipeline detail page with two tabs:

Settings Tab

  • View all pipeline configuration (read-only)
  • See source, destination, and all sync/load/schema evolution settings
  • Click Sync Now button to run the pipeline

Schema Tab

  • View all selected objects and their fields
  • Filter by: All, Enabled, or Disabled objects
  • Search for specific tables
  • See total object count (e.g., "594/594 objects selected")

To access: Navigate to Pipelines in the sidebar, then click on any pipeline name.

Notification Preferences

The Settings tab also includes Notification Preferences, which controls email notifications for this specific pipeline. You can enable or disable notifications for failures, warnings, and successes independently.

These settings override your workspace-wide notification defaults. To revert to workspace defaults, click Use workspace defaults.

When you create a pipeline, Supaflow automatically subscribes you to failure and warning notifications.


Running a Pipeline

To run manually:

  1. Navigate to Pipelines in the sidebar
  2. Find your pipeline
  3. Click the ... menu → Sync Now

Or from the pipeline detail page, click the Sync Now button.

What happens next:

  • A success message appears: "Sync request was successfully submitted"
  • Click the View activity link to monitor the pipeline sync
  • The activity begins processing immediately

Monitoring Pipeline Runs

After clicking Sync Now, navigate to Activities in the sidebar to monitor pipeline execution.

Each activity shows status (Queued → Running → Completed or Failed), object counts, and row metrics. Click a Pipeline Run to view detailed per-object breakdowns with ingestion mode (Historical/Incremental) and stage-level metrics.

For comprehensive monitoring details: See the Activities documentation.


Managing Pipelines

Access pipeline management options from the three-dot menu (•••) on the pipeline detail page.

Resync Data

Force a full data resync from the source, ignoring incremental state.

When to use:

  • Source data changed outside normal sync process
  • Need to refresh all historical data
  • Recovering from data quality issues

To resync: Click •••Resync Data

Disable Pipeline

Pause the pipeline to prevent it from running.

When to use:

  • Temporarily stop syncing without deleting the pipeline
  • Source credentials expired and need updating
  • Maintenance or troubleshooting

To disable: Click •••Disable Pipeline

Note: Disabled pipelines won't run on schedule. Re-enable by clicking •••Enable Pipeline.

Edit Pipeline

Modify pipeline settings or object selection.

What you can edit:

  • Pipeline name and description
  • Sync settings (ingestion mode, error handling)
  • Load settings (load mode, namespace rules, etc.)
  • Schema evolution settings
  • Object and field selection

To edit: Click •••Edit Pipeline

You'll be redirected to the pipeline wizard to make changes.

Delete Pipeline

Permanently remove the pipeline and its configuration.

To delete: Click •••Delete Pipeline

⚠️ Warning:

  • Deletion is permanent and cannot be undone
  • Activity history is preserved for audit purposes
  • Data already loaded in the destination is not deleted

Troubleshooting

Pipeline Won't Create

Problem: Error about missing sources or destinations.

Solution: Ensure at least one source and one destination have status Active.

Pipeline Run Fails Immediately

Problem: Activity fails within seconds.

Solution: Click the activity to view error details. Common causes include expired credentials, insufficient permissions, or schema mismatches.

No Data in Destination

Problem: Activity shows Completed but destination has no data.

Solution: Verify tables are selected in object selection, check the destination schema name matches your destination prefix, and review row counts in the activity detail page.



Support

Need help? Contact us at support@supa-flow.io