Ingestion Pipelines
Move data from sources to your data warehouse with full control over sync modes, schema selection, and execution.
Overview
Ingestion pipelines extract data from external sources (databases, SaaS platforms, APIs) and load it into your data warehouse. Configure how data is synced, loaded, and how schema changes are handled.
Prerequisites: At least one active source and one active destination.
To access: Navigate to Pipelines in the sidebar, then click + Create Pipeline.
Creating an Ingestion Pipeline
To start:
- Navigate to Pipelines in the sidebar
- Click + Create Pipeline
- If you have multiple projects, select which project to create the pipeline in (this determines the destination warehouse)
- You'll be redirected to the Workspace IDE, which shows all pipelines in the selected project in an Explorer panel on the left
The pipeline wizard then opens with 4 steps:
Step 1: Choose Source
Select the data source for your pipeline. Only active sources are available.
To proceed: Click Continue.
Step 2: Configure Pipeline
The destination is pre-selected based on the project's warehouse (all pipelines belong to a project). Configure your pipeline settings below.
Pipeline Settings
Availability Note
Not every pipeline setting or option is available for every connector. Supaflow reads source and destination capabilities and only enables supported options. For example, some destinations only allow APPEND load mode or LATENCY optimization.
UI Note
To keep the wizard focused, Supaflow shows only 4 core settings by default in this step. All other supported settings are under Advanced Settings.
Destination Section (Schema/Prefix)
At the top of Step 2, Supaflow shows the selected destination with key connection details (destination name, database, schema/prefix, role).
The schema/prefix column in this section is where destination naming is surfaced:
- When Namespace Rule = Destination Defined: the card shows the destination Schema from destination config (read-only in this step)
- Otherwise: the card shows the pipeline namespace value used for naming. The UI label is:
- Prefix for sources with schema/catalog hierarchy (PostgreSQL, MySQL, etc.)
- Schema for sources without hierarchy (Salesforce, HubSpot, etc.)
System generates a unique prefix for each pipeline by default (for example, salesforce, salesforce_2) so pipelines do not collide.
How the prefix is used:
- Sources without schemas (Salesforce, HubSpot, etc.): Prefix becomes the schema name where tables are created
- Example: Prefix
salesforce→ Schemasalesforce→ Tablessalesforce.Account,salesforce.Contact
- Example: Prefix
- Sources with schemas (PostgreSQL, MySQL, etc.): Prefix is prepended to the source schema name
- Example: Prefix
postgres+ Source schemaorders→ Destination schemapostgres_orders
- Example: Prefix
When editable, click the schema/prefix chip (or pencil icon) in the destination section to update it.
If the selected schema/prefix is already used by another pipeline, Supaflow shows a warning. Collision outcomes are then governed by Destination Table Collision Handling.
Sync Settings
| Setting | Description | Options |
|---|---|---|
| Ingestion Mode | How data should be synced from source | • Historical + Incremental (default) - Full sync first, then ongoing changes • Historical Only - Full data sync every run • Incremental Only - Only new and changed data |
| Error Handling Mode | How to handle errors during sync | • Continue with Warnings (default) - Log errors but keep processing • Abort on Any Error - Stop when errors occur |
| Full Sync Object Refresh Frequency | How often objects that do not support incremental ingestion are synced. Disabled when Ingestion Mode is Historical Only. | • Every Run - Resync on every pipeline execution • Daily - Resync every day • Weekly (default) - Resync every 7 days • Monthly - Resync every 30 days • Never - Only run once (initial sync), then skip |
| Full Resync Frequency | How often all objects are forced into a complete resync, regardless of incremental support. Disabled when Ingestion Mode is Historical Only. | • Never (default) - No scheduled full resync; only incremental syncs after initial load • Daily - Force full resync every day • Weekly - Force full resync every 7 days • Monthly - Force full resync every 30 days |
Important: For objects that do not support incremental ingestion, Supaflow runs historical sync only when the configured Full Sync Object Refresh Frequency is due (or when a forced full resync applies). On runs where historical sync is not due, those objects can be skipped.
Load Settings
| Setting | Description | Options |
|---|---|---|
| Load Mode | How data should be loaded into the destination | • Merge (default) - Insert new and update existing records • Append - Add new records only • Truncate and Load - Delete all data then load • Overwrite - Drop and recreate tables |
| Destination Namespace Rules | How destination schemas and tables should be mapped from the source | • Mirror Source (default) • Destination Defined - Use destination's default schema • Fivetran Naming - Use Fivetran-like naming |
| Destination Table Collision Handling | How to handle table name collisions during initial sync | • Merge with Existing (default) • Fail if Table Exists (recommended) • Drop and Recreate |
| Perform Hard Deletes | Permanently delete records from destination when deleted from source | • No (default) - Keep all records • Yes - Delete records |
| Load Optimization Mode | Strategy for optimizing pipeline execution | • Optimize for Cost (default) - Batch operations • Optimize for Latency - Process immediately • Adaptive - System decides |
| Data Validation Level | Level of data integrity validation during sync | • None (default) - No validation • Row Count - Validate row counts • Schema and Count • Column Statistics • Full Integrity |
Schema Evolution Settings
| Setting | Description | Options |
|---|---|---|
| Schema Evolution Mode | How to propagate source schema changes to the destination | • Allow All Changes (default) - Automatically propagate all changes • Block All Changes - Prevent any modifications • Column Level Changes Only - Allow column changes only |
| Auto Re-Sync on New Table | Automatically do a full re-sync for new tables | • Yes (default) • No |
| Auto Re-Sync on New Column | Automatically re-sync entire table when new column detected | • No (default) • Yes |
| Auto Re-Sync on Schema Change | Automatically re-sync when any schema change detected | • No (default) • Yes |
To proceed: Click Continue.
Step 3: Choose Objects to Sync
Select which tables and fields to replicate from your source.
To select:
- Use the top-level checkbox to select all objects
- Or expand catalogs/schemas and select individual tables and fields
- Use the search bar to filter objects by name
Selections are saved automatically as you work.
To proceed: Click Continue.
Step 4: Review & Save
Review all pipeline settings and selected objects.
Pipeline Name: Auto-generated from source/destination names (e.g., "Salesforce to Snowflake"). You can edit this before creating.
To create: Click Create Pipeline.
You'll be redirected to the pipeline detail page where you can view settings, schema, and run the pipeline.
Pipeline Detail Page
After creating a pipeline, you'll see the pipeline detail page with two tabs:
Settings Tab
- View all pipeline configuration (read-only)
- See source, destination, and all sync/load/schema evolution settings
- Click Sync Now button to run the pipeline
Schema Tab
- View all selected objects and their fields
- Filter by: All, Enabled, or Disabled objects
- Search for specific tables
- See total object count (e.g., "594/594 objects selected")
To access: Navigate to Pipelines in the sidebar, then click on any pipeline name.
Notification Preferences
The Settings tab also includes Notification Preferences, which controls email notifications for this specific pipeline. You can enable or disable notifications for failures, warnings, and successes independently.
These settings override your workspace-wide notification defaults. To revert to workspace defaults, click Use workspace defaults.
When you create a pipeline, Supaflow automatically subscribes you to failure and warning notifications.
Running a Pipeline
To run manually:
- Navigate to Pipelines in the sidebar
- Find your pipeline
- Click the ... menu → Sync Now
Or from the pipeline detail page, click the Sync Now button.
What happens next:
- A success message appears: "Sync request was successfully submitted"
- Click the View activity link to monitor the pipeline sync
- The activity begins processing immediately
Monitoring Pipeline Runs
After clicking Sync Now, navigate to Activities in the sidebar to monitor pipeline execution.
Each activity shows status (Queued → Running → Completed or Failed), object counts, and row metrics. Click a Pipeline Run to view detailed per-object breakdowns with ingestion mode (Historical/Incremental) and stage-level metrics.
For comprehensive monitoring details: See the Activities documentation.
Managing Pipelines
Access pipeline management options from the three-dot menu (•••) on the pipeline detail page.
Resync Data
Force a full data resync from the source, ignoring incremental state.
When to use:
- Source data changed outside normal sync process
- Need to refresh all historical data
- Recovering from data quality issues
To resync: Click ••• → Resync Data
Disable Pipeline
Pause the pipeline to prevent it from running.
When to use:
- Temporarily stop syncing without deleting the pipeline
- Source credentials expired and need updating
- Maintenance or troubleshooting
To disable: Click ••• → Disable Pipeline
Note: Disabled pipelines won't run on schedule. Re-enable by clicking ••• → Enable Pipeline.
Edit Pipeline
Modify pipeline settings or object selection.
What you can edit:
- Pipeline name and description
- Sync settings (ingestion mode, error handling)
- Load settings (load mode, namespace rules, etc.)
- Schema evolution settings
- Object and field selection
To edit: Click ••• → Edit Pipeline
You'll be redirected to the pipeline wizard to make changes.
Delete Pipeline
Permanently remove the pipeline and its configuration.
To delete: Click ••• → Delete Pipeline
⚠️ Warning:
- Deletion is permanent and cannot be undone
- Activity history is preserved for audit purposes
- Data already loaded in the destination is not deleted
Troubleshooting
Pipeline Won't Create
Problem: Error about missing sources or destinations.
Solution: Ensure at least one source and one destination have status Active.
Pipeline Run Fails Immediately
Problem: Activity fails within seconds.
Solution: Click the activity to view error details. Common causes include expired credentials, insufficient permissions, or schema mismatches.
No Data in Destination
Problem: Activity shows Completed but destination has no data.
Solution: Verify tables are selected in object selection, check the destination schema name matches your destination prefix, and review row counts in the activity detail page.
Related Pages
- Activities - Monitor pipeline execution
- Schedules - Automate pipeline runs
- Deployments - Promote pipelines across workspaces
Support
Need help? Contact us at support@supa-flow.io