SFTP Source
Connect an SFTP server as a source to sync structured data from CSV, TSV, JSON, JSONL, and XLSX files. Supports gzip-compressed and ZIP-archived files (for text formats) and Excel workbooks with incremental sync based on file modification time.
Prerequisites
Before you begin, ensure you have:
- An SFTP server with SSH access
- A user account with read access to the folder containing your data files
- One of the following authentication credentials:
- Password for the SSH user
- RSA private key in PEM format for key-based authentication
How Data is Organized
The SFTP connector maps files on your SFTP server to tables in your destination. How files are grouped into tables depends on the Table Mapping Mode setting and the file type.
Table Per Folder (default for CSV, TSV, JSON, JSONL)
Folders at a configurable depth below the root folder define table boundaries. The Table Folder Depth setting (default 1) controls which folder level determines the table name. Files above the configured depth are ignored. Files below the depth are included recursively.
Folder Path = /data, Table Folder Depth = 1
SFTP Server
└── /data/
├── summary.csv --> (ignored -- above depth 1)
├── orders/ --> Table: "orders"
│ ├── orders_jan.csv (rows in "orders" table)
│ ├── orders_feb.csv (rows in "orders" table)
│ ├── archive.csv.gz (decompressed, rows in "orders" table)
│ └── 2024/
│ └── orders_dec.csv (rows in "orders" table -- recursive)
└── customers/ --> Table: "customers"
├── customers_2024.csv (rows in "customers" table)
└── bundle.zip (CSV entries extracted, rows in "customers" table)
With Table Folder Depth = 2, two folder levels define the table name:
Folder Path = /data, Table Folder Depth = 2
SFTP Server
└── /data/
└── us/
├── orders/ --> Table: "us_orders"
│ └── data.csv
└── refunds/ --> Table: "us_refunds"
└── data.csv
Table Per File (for CSV, TSV, JSON, JSONL)
Each file becomes its own table. The table name is derived from the file's relative path (without extension), normalized to lowercase with underscores.
Folder Path = /data, Table Mapping Mode = TABLE_PER_FILE
SFTP Server
└── /data/
├── summary.csv --> Table: "summary"
└── orders/
├── orders_jan.csv --> Table: "orders_orders_jan"
└── orders_feb.csv --> Table: "orders_orders_feb"
When Include Subfolders is disabled, only files directly in the root folder are included.
Table names are derived from file paths by normalizing to lowercase with underscores. In rare cases, two different paths can produce the same table name (e.g., /us_orders.csv and /us/orders.csv both normalize to us_orders). When this happens, the second file gets a numeric suffix (us_orders_2). The assignment is deterministic (sorted by path) so table names are stable across runs.
XLSX File Organization
XLSX files use a different table identity model based on the workbook filename + worksheet name. Each unique combination of workbook basename and worksheet name becomes a table. Files with the same basename and worksheet name in different folders are merged into the same table.
Folder Path = /data, File Type = XLSX
SFTP Server
└── /data/
├── 2026-01/
│ └── report.xlsx
│ ├── Sheet "Orders" --> Table: "report_orders"
│ └── Sheet "Refunds" --> Table: "report_refunds"
└── 2026-02/
└── report.xlsx
├── Sheet "Orders" --> Table: "report_orders" (merged with Jan)
└── Sheet "Refunds" --> Table: "report_refunds" (merged with Jan)
Key points:
- Column names are unioned across files; missing columns become
null - Empty files (0 bytes) and files without modification timestamps are automatically skipped
- Folders with no matching files are not discovered as tables
- XLSX files larger than 50 MB are skipped with a warning
Compression Support
| Format | How it works |
|---|---|
Gzip (.csv.gz, .json.gz, etc.) | Decompressed transparently. Format detected from the extension before .gz. |
ZIP (.zip) | CSV/TSV entries within the archive are extracted and parsed. Non-matching entries (e.g., .txt) are skipped. Only supported for CSV and TSV file types. |
System Fields
Every table includes system fields added automatically. The exact fields depend on the file type:
| Field | Type | File Types | Description |
|---|---|---|---|
_supa_file_name | string | All | Original filename from the SFTP server |
_supa_file_path | string | All | Full path on the SFTP server |
_supa_zip_entry | string | CSV, TSV | Entry name within a ZIP archive (null for non-ZIP files) |
_supa_worksheet_name | string | XLSX | Worksheet name within the Excel workbook |
These fields allow you to trace every row back to its source file in your destination.
Configuration
In Supaflow, create a new SFTP source with these settings:
Connection
Host*SFTP server hostname or IP address.
Example: sftp.example.com
SFTP server port.
Default: 22
SSH username for authentication.
Authentication Method*How to authenticate with the SFTP server.
Options:
- PASSWORD - Authenticate with username and password
- PRIVATE_KEY - Authenticate with an RSA private key
Default: PASSWORD
Password Authentication
Password*SSH password for the user account.
Stored encrypted
Private Key Authentication
RSA Private Key*Upload your RSA private key file in PEM format. The key is loaded in memory and never written to disk.
Stored encrypted
Passphrase to decrypt the private key, if it is encrypted. Leave blank if the private key is not passphrase-protected.
Stored encrypted
The private key must be in PEM format (begins with -----BEGIN RSA PRIVATE KEY-----). ED25519 and ECDSA keys are not currently supported.
Configuration
Folder Path*Root directory on the SFTP server to scan for files. How files are organized into tables depends on the Table Mapping Mode setting (for CSV/TSV/JSON/JSONL) or the workbook + worksheet model (for XLSX).
Example: /data/incoming
Type of files to read from the SFTP server.
Options:
- CSV - Comma-separated values (also matches
.csv.gzand.zipfiles) - TSV - Tab-separated values (also matches
.tsv.gzand.zipfiles) - JSON - JSON files containing an array of objects or a single object (also matches
.json.gz) - JSONL - Newline-delimited JSON, one object per line (also matches
.jsonl.gzand.ndjson) - XLSX - Excel workbooks (
.xlsxfiles). Each worksheet becomes a separate table.
Default: CSV
File PatternOptional regex pattern to filter files by name. Applied to the full file path (not just the filename). When set, replaces the default extension filter so that files without standard extensions (e.g., data_export, feed_20240115) can be matched. The File Type setting still controls how matched files are parsed.
Example: orders_.*\.csv matches all CSV files starting with "orders_"
Example: daily_feed_\d{8} matches extensionless files like daily_feed_20240115
How files are grouped into destination tables. Only applies to CSV, TSV, JSON, and JSONL file types (XLSX uses its own workbook + worksheet grouping).
Options:
- TABLE_PER_FOLDER - Folders at a configurable depth define table boundaries. All files under the same folder are merged into one table.
- TABLE_PER_FILE - Each file becomes its own table.
Default: TABLE_PER_FOLDER
Table Folder DepthNumber of folder levels below the root folder that define table boundaries. Only visible when Table Mapping Mode is TABLE_PER_FOLDER.
- 1 (default): Immediate child folders become tables (e.g.,
orders/,customers/) - 2: Two levels of folders form the table name (e.g.,
us/orders/becomes tableus_orders)
Default: 1
Include SubfoldersWhether to scan subfolders recursively. When disabled, only files directly in the root folder are included. Applies to TABLE_PER_FILE mode and XLSX file type.
Default: true
CSV Settings
These settings appear when File Type is CSV or TSV.
DelimiterColumn delimiter character. For TSV files, the tab character is used automatically even if this is set to comma.
Default: ,
Whether the first row contains column headers. If disabled, columns are named col_1, col_2, etc.
Default: true
Character used to quote fields that contain the delimiter or newlines.
Default: "
Character used to escape the quote character inside quoted fields. When set to the same character as the quote character (the default), uses quote-doubling mode (e.g., "" represents a literal "). When set to a different character (e.g., \), uses backslash-style escaping.
Default: " (quote-doubling mode)
Number of lines to skip before the header row. Use this when files have metadata or comments at the top (e.g., # Generated on 2026-01-15).
Default: 0
Comma-separated list of strings to treat as null values. When a field value exactly matches one of these strings, it is stored as null in the destination.
Default: \N
Example: \N,#N/A,nil treats all three strings as null
Whether to skip blank lines in the file. When enabled, lines that are empty or contain only whitespace are ignored.
Default: true
Whether to trim leading and trailing whitespace from field values.
Default: false
What to do when a data row has more or fewer columns than the header row.
Options:
- SKIP - Log a warning and skip the row
- FAIL - Stop processing and fail the sync
Default: SKIP
Advanced Settings
Request Timeout (seconds)Timeout in seconds for SFTP operations (directory listing, file download). Set to 0 for no timeout. Operations that time out are retried automatically with exponential backoff.
Default: 300
Character encoding of the source files. UTF-8 files with a BOM (byte order mark) are handled automatically.
Options: utf-8, latin-1, iso-8859-1, windows-1252, ascii
Default: utf-8
When enabled, the connector detects column types from sampled data instead of treating all columns as text. For CSV/TSV/JSON/JSONL, types are inferred from string values using pattern matching (integers, decimals, booleans, dates, timestamps). For XLSX, native Excel cell types are used directly (integers, floats, booleans, dates).
Default: true
Interval in minutes for schema metadata refresh.
- 0 - Refresh schema before every pipeline execution
- -1 - Disable automatic schema refresh (use for static schemas)
- Positive value - Refresh interval in minutes (e.g., 60 = hourly, 1440 = daily)
Default: 60 (hourly)
Test & Save
After configuring all required properties, click Test & Save to verify your connection and save the source.
The connection test verifies that:
- SSH credentials are valid (password or private key)
- The SFTP server is reachable on the configured host and port
- The configured folder exists and is readable
Incremental Sync
The SFTP connector supports incremental sync using file modification timestamps (st_mtime). On each sync:
- Initial sync: All matching files in the folder are read
- Subsequent syncs: Only files modified since the last sync are read
This is managed automatically by the connector using a time-window model:
- Files are included when
last_sync_cursor <= file_mtime < current_cutoff_time - The cursor advances to the cutoff time (not the max file mtime), ensuring no gaps
- Files are processed in ascending modification time order for deterministic behavior
- Per-file checkpointing tracks progress; if a sync is interrupted, only unprocessed files are re-read on the next run
Files where the SFTP server does not report a modification time (st_mtime is null) are skipped with a warning. These files cannot participate in incremental sync. If your SFTP server does not provide timestamps, use the Historical ingestion mode (full refresh) instead.
JSON and JSONL File Handling
JSON Files
JSON files are parsed as follows:
- Array of objects: Each object in the array becomes a row
- Single object: The object becomes a single row
- Nested values: Nested objects and arrays are serialized as JSON strings in the destination
JSONL / NDJSON Files
JSONL (newline-delimited JSON) files are parsed line by line:
- Each line must be a valid JSON object
- Empty lines are skipped
- Non-object lines (arrays, strings, numbers) are skipped with a warning
- Files with
.jsonlor.ndjsonextensions are both supported
XLSX File Handling
Excel workbooks (.xlsx files) are supported as a file type. Each worksheet within a workbook becomes a separate destination table.
How XLSX Tables are Organized
Tables are identified by the combination of workbook filename + worksheet name. Files with the same name and worksheet in different folders are merged:
/2026-01/report.xlsxwith sheet "Orders" and/2026-02/report.xlsxwith sheet "Orders" both feed into tablereport_orders/data/summary.xlsxwith sheets "Q1" and "Q2" produces tablessummary_q1andsummary_q2
Worksheet Parsing
- The first non-empty row in each worksheet is treated as the column header
- Leading blank rows before the header are skipped automatically
- Empty worksheets are excluded from schema discovery
- Trailing empty columns are trimmed (columns with no header and no data)
- Formula cells return their cached computed value
- When Infer Column Types is enabled, native Excel types are mapped directly: integers to LONG, floats to DOUBLE, booleans to BOOLEAN, dates to LOCALDATE, and datetimes to INSTANT
- When disabled, all values are treated as strings
- Empty rows within the data range are skipped (gaps do not stop reading)
Limitations
- File size limit: XLSX files larger than 50 MB are skipped with a warning
- No gzip support: XLSX files are already internally compressed;
.xlsx.gzis not supported - No password protection: Encrypted or password-protected workbooks are not supported
- The Table Mapping Mode setting does not apply to XLSX files. XLSX always uses the workbook + worksheet grouping model.
- Name collisions: If two different workbook/worksheet combinations normalize to the same table name (e.g., workbook
My-ReportsheetOrder Itemsand workbookMy_ReportsheetOrder_Items), schema discovery fails with an error. Rename one of the workbooks or worksheets to resolve.
Troubleshooting
Connection test fails
Problem:
- "Failed to connect to SFTP server" error
Solutions:
- Verify host and port: Ensure the hostname is correct and the SFTP service is running on the configured port (default: 22)
- Check credentials:
- Password auth: Verify the username and password are correct
- Key auth: Ensure the private key is in PEM format and corresponds to an authorized key on the server
- Check network access: Ensure your network allows outbound connections to the SFTP server on the configured port
- Check folder path: The configured folder must exist on the server. The connection test verifies this by listing the directory.
No objects found during schema discovery
Problem:
- Schema discovery returns zero tables
Solutions:
- Verify files exist: Ensure the configured folder contains matching files at the expected depth. For TABLE_PER_FOLDER mode, files must be inside subfolders at the configured Table Folder Depth (files directly in the root are ignored when depth >= 1). For TABLE_PER_FILE mode, any matching file becomes a table.
- Check table folder depth: If using TABLE_PER_FOLDER, verify the Table Folder Depth matches your actual folder structure. A depth of 3 when your files are only 1 level deep will result in zero tables.
- Check file type setting: Make sure the File Type matches your actual files (e.g., don't select CSV for JSON files). Without a File Pattern, only files with standard extensions (
.csv,.json,.xlsx, etc.) are matched. - Check file pattern: If you set a File Pattern regex, verify it matches your filenames. Note that File Pattern replaces extension filtering, so the regex must match the full file path. Try removing it to fall back to extension-based filtering.
- Files without extensions: If your files have no extension (e.g.,
data_export), you must set a File Pattern to match them. Without a pattern, only files with standard extensions for the selected File Type are included. - Check file sizes: Empty files (0 bytes) are automatically skipped. XLSX files larger than 50 MB are also skipped.
- Check folder path: Verify the root path is correct and the user has read permission to the root and any child folders you want discovered
- XLSX empty worksheets: For XLSX files, worksheets with no non-blank cells are excluded. Verify your workbooks contain data.
Missing columns in schema
Problem:
- Some columns are missing from the discovered schema
Solutions:
- Check file consistency: Schema inference samples up to 5 of the most recently modified files. If older files have additional columns, they may not be sampled.
- Verify encoding: Ensure the File Encoding setting matches your files. Wrong encoding can produce garbled or missing column names.
- Check for BOM: UTF-8 files with a byte order mark are handled automatically. For other encodings, the BOM may need manual removal.
Incremental sync re-reads all files
Problem:
- Every sync reads all files instead of just modified ones
Solutions:
- Check initial sync completed: The first sync always reads all files. Incremental behavior starts from the second sync.
- Verify files are not being re-saved: Some automated processes touch files without changing content, updating the modification time.
- Check ingestion mode: Ensure the pipeline is configured for Historical + Incremental or Incremental mode (not Historical which always does full refresh).
Permission denied errors in logs
Problem:
- Warning messages about permission denied for specific files
- Some files missing from sync results
Solutions:
- Check file permissions: The SSH user must have read access to all files in the configured folder. Files with insufficient permissions are skipped with a warning.
- Check folder permissions: The user needs read and list permissions on the folder and all subfolders.
Support
Need help? Contact us at support@supa-flow.io