SFTP → SNOWFLAKE
Load CSV, TSV, JSON, JSONL/NDJSON, and XLSX files from any SFTP server into Snowflake — with SSH key or password auth, recursive folder sync, and incremental sync based on file modification time.
SFTP is still how many ERPs, EDI partners, vendors, and compliance feeds deliver files. Supaflow handles the production realities: large CSV fields, gzipped exports, ZIP archives for CSV/TSV, extensionless feed names, and per-file progress so one bad file does not block the rest of the folder. Every connector is included on every Supaflow plan — you pay only for the compute your pipelines consume.
Used by integration teams loading EDI 850 / 856 drops from trading partners, retail and CPG teams pulling nightly vendor price books and item masters, and compliance teams archiving regulator-mandated daily extract feeds into Snowflake — with no per-row fees.
Every row below is an actual capability in the SFTP connector, not a forward-looking promise.
| Feature | How it works | Limit / caveat |
|---|---|---|
| Supported file formats | CSV, TSV, JSON, JSONL / NDJSON (one record per line), and XLSX workbooks. Gzip-compressed variants are supported, and ZIP archives are supported for CSV and TSV feeds. | ZIP support is limited to CSV/TSV files. Other formats should arrive as plain files or gzip-compressed files. |
| Authentication | Use an SSH private key with optional passphrase, or username and password. Credentials are kept in memory for the sync and are not written to disk. | If strict host-key enforcement is required, route SFTP through a controlled network path such as a jumphost or VPN until host-key pinning is configured for your deployment. |
| File discovery | Supaflow scans the configured folder recursively. Empty files are skipped, files without modification timestamps are logged and skipped, and optional filename matching can include extensionless files such as `data_export` or `feed_20240115`. | — |
| Incremental sync | Incremental runs use each file’s modification time. Files are processed oldest to newest, and later syncs re-read only files newer than the last successful run. | Files that cannot be parsed or are rejected for safety remain visible for remediation and can be retried on the next run. |
| Schema inference | Type inference (not just string-as-string) from a sampled subset of files per folder. Column union across files preserves first-seen casing and insertion order. | — |
| Large CSV fields | Multi-MB string fields, base64 payloads, and stringified JSON columns can be parsed without per-file tuning. | — |
| Gzip detection | Compressed files are recognized from names such as `report.csv.gz` or `feed.json.gz`. After decompression, Supaflow parses the file as the underlying CSV, TSV, JSON, JSONL/NDJSON, or XLSX format. | — |
| Connection stability | Transient connection errors and download timeouts are retried automatically. Permission errors fail fast so teams can fix access rather than waiting through repeated retries. | — |
| Per-file safety | Files are handled with bounded temporary storage and cleaned up after parsing. Parse errors on one file skip that file and continue the sync; oversized archives are rejected before extraction. | — |
| Destination | Lands directly in Snowflake with generated tables, Snowflake-ready types, incremental merges, and source file metadata on every row. | For CSV, TSV, JSON, and JSONL/NDJSON, choose one Snowflake table per folder or one table per file. For XLSX, each worksheet lands as its own table for the matching workbook. |
Every Supaflow connector is included on every plan at no extra cost. You pay only for compute consumed, measured in Supaflow Credits (1 credit = 1 compute-hour on a Small node). No per-row fees.
SSH private keys and passwords stay in memory for the duration of the sync and are not written to disk. That matters when SFTP credentials control access to partner feeds, financial exports, or regulated file drops.
EDI feeds with multi-MB CSV fields. Vendor exports compressed without helpful metadata. Folders with extensionless files named after dates. Legacy systems that occasionally drop oversized archives. Supaflow handles these patterns as part of the connector instead of leaving them for custom scripts.
Every one of these is something our connector handles specifically. A generic pipeline built in Airflow or a basic ELT tool will hit at least three of them in production.
Failure mode: Files with stringified JSON columns, base64-encoded blobs, or large free-text fields can fail in lightweight scripts and leave the folder half-loaded.
Evidence: Common in EDI files, product catalogs, and partner exports where one field may contain a large payload.
Fix: Supaflow accepts multi-MB CSV and TSV fields without per-file tuning.
Failure mode: Some vendors compress files without preserving the original filename metadata. Pipelines that depend on that metadata can treat the feed as unknown and skip it.
Evidence: Common in vendor exports where reproducible compression settings strip optional header details.
Fix: Supaflow uses the visible filename, such as `.csv.gz` or `.json.gz`, to choose the parser after decompression.
Failure mode: Many SFTP feeds drop files named `data_export`, `feed_20240115`, or `daily-batch` with no extension. Extension-based filters miss them entirely.
Evidence: Standard pattern for vendor exports and EDI batches where the convention predates extension-based MIME types.
Fix: Configure an optional filename pattern and file format so matching extensionless files are parsed correctly.
Failure mode: Some SFTP servers omit modification-time metadata. Incremental sync cannot safely include those files because there is no timestamp to compare against later runs.
Evidence: Encountered in production against legacy file-transfer gateways.
Fix: Supaflow skips those files with a warning and continues the rest of the sync, so operators can remediate the source feed without losing progress on other files.
Failure mode: A single ZIP file can expand far beyond its compressed size. Naive file pipelines can exhaust local disk or memory before they notice the problem.
Evidence: A common file-processing failure mode, and easy to trigger accidentally with backup exports or malformed partner drops.
Fix: Supaflow rejects oversized archives before extraction and continues processing the rest of the folder.
Provide hostname, port, username, and either an SSH private key (RSA in PEM format, with optional passphrase) or password. The private key is loaded into memory; nothing is written to disk.
SFTP source docs→Choose the directory to sync, optionally add a filename pattern for extensionless feeds, and pick the file format: CSV, TSV, JSON, JSONL/NDJSON, or XLSX. Gzip variants are supported for each format; ZIP is supported for CSV/TSV.
Connect your Snowflake warehouse. Choose one table per folder or one table per file for CSV, TSV, JSON, and JSONL/NDJSON feeds. XLSX files land one table per worksheet for the matching workbook, with source file metadata retained on each row.
Snowflake destination docs→Run on a cron or interval schedule. Incremental syncs use file modification time, so later runs re-read only files newer than the last successful sync.
Schedules docs→Supaflow connects to your SFTP server, scans the configured folder recursively, parses files in the selected format, and lands them in Snowflake. Supported formats include CSV, TSV, JSON, JSONL/NDJSON, and XLSX, with gzip support for each and ZIP support for CSV/TSV. Incremental runs use each file’s modification time so only newer files are re-read.
No. SSH private keys and passwords are kept in memory for the duration of the sync and discarded afterward. They are not written to disk.
Yes. Multi-MB stringified JSON columns, base64 payloads, and large free-text fields can be parsed without per-file tuning.
By the visible filename, such as `.csv.gz` or `.json.gz`. After decompression, Supaflow parses the file as the underlying format.
Yes. Set an optional filename pattern and choose the source file format. Matching extensionless files are parsed as CSV, TSV, JSON, JSONL/NDJSON, or XLSX depending on your configuration.
Fivetran and Hevo often price by rows or volume, which makes SFTP file sync unpredictable — a vendor dropping a large catalog file can compound the bill. Every Supaflow connector is included on every plan at no extra cost. You pay only for compute consumed, measured in Supaflow Credits (1 credit = 1 compute-hour on a Small node). No per-row fees. See the pricing page for the credit packages and free tier.
Yes. The sync agent runs in your own VPC if you want SFTP credentials and file content to stay in your network. The control plane is managed; the data plane can be managed or self-hosted. This is the typical deployment pattern when the SFTP server is also inside your network.
The SFTP source and Snowflake destination, plus other Supaflow connectors you can pair into a Snowflake pipeline.
Source connector overview and capabilities.
Destination connector overview and capabilities.
Other file-based source: CSV, Excel, TSV, and Google Sheets from Drive.
Pair with SFTP when EDI feeds drop into SFTP and master data lives in SQL Server.
Pair with SFTP when transactional data lives in Postgres.
Browse the full catalog of sources and destinations.
Auth setup (SSH key vs password), folder configuration, file pattern matching, and troubleshooting.
Connect your Snowflake warehouse, role requirements, type mapping, and sync semantics.
Every connector is included on every plan. Pay only for compute consumed (Supaflow Credits).
Browse every Supaflow source and destination.
Every connector is included on every Supaflow plan — you pay only for the compute your pipelines consume. CSV, TSV, JSON, JSONL/NDJSON, XLSX; gzip support; ZIP for CSV/TSV; SSH key or password auth; credentials kept in memory; incremental sync based on file modification time.