SFTP Source

Connect an SFTP server as a source to sync structured data from CSV, TSV, JSON, JSONL, and XLSX files. Supports gzip-compressed and ZIP-archived files (for text formats) and Excel workbooks with incremental sync based on file modification time.

Prerequisites

Before you begin, ensure you have:

An SFTP server with SSH access
A user account with read access to the folder containing your data files
One of the following authentication credentials:
- Password for the SSH user
- RSA private key in PEM format for key-based authentication

How Data is Organized

The SFTP connector maps files on your SFTP server to tables in your destination. How files are grouped into tables depends on the Table Mapping Mode setting and the file type.

Table Per Folder (default for CSV, TSV, JSON, JSONL)

Folders at a configurable depth below the root folder define table boundaries. The Table Folder Depth setting (default 1) controls which folder level determines the table name. Files above the configured depth are ignored. Files below the depth are included recursively.

Folder Path = /data, Table Folder Depth = 1

SFTP Server
└── /data/
    ├── summary.csv              --> (ignored -- above depth 1)
    ├── orders/                  --> Table: "orders"
    │   ├── orders_jan.csv           (rows in "orders" table)
    │   ├── orders_feb.csv           (rows in "orders" table)
    │   ├── archive.csv.gz           (decompressed, rows in "orders" table)
    │   └── 2024/
    │       └── orders_dec.csv       (rows in "orders" table -- recursive)
    └── customers/               --> Table: "customers"
        ├── customers_2024.csv       (rows in "customers" table)
        └── bundle.zip               (CSV entries extracted, rows in "customers" table)

With Table Folder Depth = 2, two folder levels define the table name:

Folder Path = /data, Table Folder Depth = 2

SFTP Server
└── /data/
    └── us/
        ├── orders/              --> Table: "us_orders"
        │   └── data.csv
        └── refunds/             --> Table: "us_refunds"
            └── data.csv

Table Per File (for CSV, TSV, JSON, JSONL)

Each file becomes its own table. The table name is derived from the file's relative path (without extension), normalized to lowercase with underscores.

Folder Path = /data, Table Mapping Mode = TABLE_PER_FILE

SFTP Server
└── /data/
    ├── summary.csv              --> Table: "summary"
    └── orders/
        ├── orders_jan.csv       --> Table: "orders_orders_jan"
        └── orders_feb.csv       --> Table: "orders_orders_feb"

When Include Subfolders is disabled, only files directly in the root folder are included.

Table Name Collisions

Table names are derived from file paths by normalizing to lowercase with underscores. In rare cases, two different paths can produce the same table name (e.g., /us_orders.csv and /us/orders.csv both normalize to us_orders). When this happens, the second file gets a numeric suffix (us_orders_2). The assignment is deterministic (sorted by path) so table names are stable across runs.

XLSX File Organization

XLSX files use a different table identity model based on the workbook filename + worksheet name. Each unique combination of workbook basename and worksheet name becomes a table. Files with the same basename and worksheet name in different folders are merged into the same table.

Folder Path = /data, File Type = XLSX

SFTP Server
└── /data/
    ├── 2026-01/
    │   └── report.xlsx
    │       ├── Sheet "Orders"   --> Table: "report_orders"
    │       └── Sheet "Refunds"  --> Table: "report_refunds"
    └── 2026-02/
        └── report.xlsx
            ├── Sheet "Orders"   --> Table: "report_orders" (merged with Jan)
            └── Sheet "Refunds"  --> Table: "report_refunds" (merged with Jan)

Key points:

Column names are unioned across files; missing columns become null
Empty files (0 bytes) and files without modification timestamps are automatically skipped
Folders with no matching files are not discovered as tables
XLSX files larger than 50 MB are skipped with a warning

Compression Support

Format	How it works
Gzip (`.csv.gz`, `.json.gz`, etc.)	Decompressed transparently. Format detected from the extension before `.gz`.
ZIP (`.zip`)	CSV/TSV entries within the archive are extracted and parsed. Non-matching entries (e.g., `.txt`) are skipped. Only supported for CSV and TSV file types.

System Fields

Every table includes system fields added automatically. The exact fields depend on the file type:

Field	Type	File Types	Description
`_supa_file_name`	string	All	Original filename from the SFTP server
`_supa_file_path`	string	All	Full path on the SFTP server
`_supa_zip_entry`	string	CSV, TSV	Entry name within a ZIP archive (null for non-ZIP files)
`_supa_worksheet_name`	string	XLSX	Worksheet name within the Excel workbook

These fields allow you to trace every row back to its source file in your destination.

Configuration

In Supaflow, create a new SFTP source with these settings:

Connection

Host*

SFTP server hostname or IP address.
Example: sftp.example.com

Port*

SFTP server port.
Default: 22

Username*

SSH username for authentication.

Authentication Method*

How to authenticate with the SFTP server.
Options:

PASSWORD - Authenticate with username and password
PRIVATE_KEY - Authenticate with an RSA private key

Default: PASSWORD

Password Authentication

Password*

SSH password for the user account.
Stored encrypted

Private Key Authentication

RSA Private Key*

Upload your RSA private key file in PEM format. The key is loaded in memory and never written to disk.
Stored encrypted

Private Key Passphrase

Passphrase to decrypt the private key, if it is encrypted. Leave blank if the private key is not passphrase-protected.
Stored encrypted

Key Format

The private key must be in PEM format (begins with -----BEGIN RSA PRIVATE KEY-----). ED25519 and ECDSA keys are not currently supported.

Configuration

Folder Path*

Root directory on the SFTP server to scan for files. How files are organized into tables depends on the Table Mapping Mode setting (for CSV/TSV/JSON/JSONL) or the workbook + worksheet model (for XLSX).
Example: /data/incoming

File Type*

Type of files to read from the SFTP server.
Options:

CSV - Comma-separated values (also matches .csv.gz and .zip files)
TSV - Tab-separated values (also matches .tsv.gz and .zip files)
JSON - JSON files containing an array of objects or a single object (also matches .json.gz)
JSONL - Newline-delimited JSON, one object per line (also matches .jsonl.gz and .ndjson)
XLSX - Excel workbooks (.xlsx files). Each worksheet becomes a separate table.

Default: CSV

File Pattern

Optional regex pattern to filter files by name. Applied to the full file path (not just the filename). When set, replaces the default extension filter so that files without standard extensions (e.g., data_export, feed_20240115) can be matched. The File Type setting still controls how matched files are parsed.
Example: orders_.*\.csv matches all CSV files starting with "orders_"
Example: daily_feed_\d{8} matches extensionless files like daily_feed_20240115

Table Mapping Mode

How files are grouped into destination tables. Only applies to CSV, TSV, JSON, and JSONL file types (XLSX uses its own workbook + worksheet grouping).
Options:

TABLE_PER_FOLDER - Folders at a configurable depth define table boundaries. All files under the same folder are merged into one table.
TABLE_PER_FILE - Each file becomes its own table.

Default: TABLE_PER_FOLDER

Table Folder Depth

Number of folder levels below the root folder that define table boundaries. Only visible when Table Mapping Mode is TABLE_PER_FOLDER.

1 (default): Immediate child folders become tables (e.g., orders/, customers/)
2: Two levels of folders form the table name (e.g., us/orders/ becomes table us_orders)

Default: 1

Include Subfolders

Whether to scan subfolders recursively. When disabled, only files directly in the root folder are included. Applies to TABLE_PER_FILE mode and XLSX file type.
Default: true

CSV Settings

These settings appear when File Type is CSV or TSV.

Delimiter

Column delimiter character. For TSV files, the tab character is used automatically even if this is set to comma.
Default: ,

Has Header Row

Whether the first row contains column headers. If disabled, columns are named col_1, col_2, etc.
Default: true

Quote Character

Character used to quote fields that contain the delimiter or newlines.
Default: "

Escape Character

Character used to escape the quote character inside quoted fields. When set to the same character as the quote character (the default), uses quote-doubling mode (e.g., "" represents a literal "). When set to a different character (e.g., \), uses backslash-style escaping.
Default: " (quote-doubling mode)

Skip Header Lines

Number of lines to skip before the header row. Use this when files have metadata or comments at the top (e.g., # Generated on 2026-01-15).
Default: 0

Null Values

Comma-separated list of strings to treat as null values. When a field value exactly matches one of these strings, it is stored as null in the destination.
Default: \N
Example: \N,#N/A,nil treats all three strings as null

Skip Blank Lines

Whether to skip blank lines in the file. When enabled, lines that are empty or contain only whitespace are ignored.
Default: true

Trim Whitespace

Whether to trim leading and trailing whitespace from field values.
Default: false

On Column Count Mismatch

What to do when a data row has more or fewer columns than the header row.
Options:

SKIP - Log a warning and skip the row
FAIL - Stop processing and fail the sync

Default: SKIP

Advanced Settings

Request Timeout (seconds)

Timeout in seconds for SFTP operations (directory listing, file download). Set to 0 for no timeout. Operations that time out are retried automatically with exponential backoff.
Default: 300

File Encoding

Character encoding of the source files. UTF-8 files with a BOM (byte order mark) are handled automatically.
Options: utf-8, utf-16, utf-16-le, utf-16-be, latin-1, iso-8859-1, windows-1252, ascii, cp1250, cp1251, shift_jis, euc-jp, gb2312, gbk, euc-kr
Default: utf-8

Encoding Mismatch Handling

If a file contains bytes that are not valid in the configured encoding, those bytes are replaced with the Unicode replacement character (�) instead of failing the entire file. For CSV and TSV files this means all rows are still synced, with only the affected characters replaced. For JSON and JSONL files, replacement characters may produce invalid JSON syntax, causing individual lines or files to be skipped. If you see � characters in your destination data or files are unexpectedly skipped, change the File Encoding setting to match the actual encoding of your source files. Common alternatives to UTF-8 are windows-1252 (Western European), latin-1 (ISO 8859-1), shift_jis (Japanese), and gb2312 (Chinese).

Infer Column Types

When enabled, the connector detects column types from sampled data instead of treating all columns as text. For CSV/TSV/JSON/JSONL, types are inferred from string values using pattern matching (integers, decimals, booleans, dates, timestamps). For XLSX, native Excel cell types are used directly (integers, floats, booleans, dates).
Default: true

Schema Refresh Interval

Interval in minutes for schema metadata refresh.

0 - Refresh schema before every pipeline execution
-1 - Disable automatic schema refresh (use for static schemas)
Positive value - Refresh interval in minutes (e.g., 60 = hourly, 1440 = daily)

Default: 60 (hourly)

Test & Save

After configuring all required properties, click Test & Save to verify your connection and save the source.

The connection test verifies that:

SSH credentials are valid (password or private key)
The SFTP server is reachable on the configured host and port
The configured folder exists and is readable

Incremental Sync

The SFTP connector supports incremental sync using file modification timestamps (st_mtime). On each sync:

Initial sync: All matching files in the folder are read
Subsequent syncs: Only files modified since the last sync are read

This is managed automatically by the connector using a time-window model:

Files are included when last_sync_cursor <= file_mtime < current_cutoff_time
The cursor advances to the cutoff time (not the max file mtime), ensuring no gaps
Files are processed in ascending modification time order for deterministic behavior
Per-file checkpointing tracks progress; if a sync is interrupted, only unprocessed files are re-read on the next run

Files Without Timestamps

Files where the SFTP server does not report a modification time (st_mtime is null) are skipped with a warning. These files cannot participate in incremental sync. If your SFTP server does not provide timestamps, use the Historical ingestion mode (full refresh) instead.

JSON and JSONL File Handling

JSON Files

JSON files are parsed as follows:

Array of objects: Each object in the array becomes a row
Single object: The object becomes a single row
Nested values: Nested objects and arrays are serialized as JSON strings in the destination

JSONL / NDJSON Files

JSONL (newline-delimited JSON) files are parsed line by line:

Each line must be a valid JSON object
Empty lines are skipped
Non-object lines (arrays, strings, numbers) are skipped with a warning
Files with .jsonl or .ndjson extensions are both supported

XLSX File Handling

Excel workbooks (.xlsx files) are supported as a file type. Each worksheet within a workbook becomes a separate destination table.

How XLSX Tables are Organized

Tables are identified by the combination of workbook filename + worksheet name. Files with the same name and worksheet in different folders are merged:

/2026-01/report.xlsx with sheet "Orders" and /2026-02/report.xlsx with sheet "Orders" both feed into table report_orders
/data/summary.xlsx with sheets "Q1" and "Q2" produces tables summary_q1 and summary_q2

Worksheet Parsing

The first non-empty row in each worksheet is treated as the column header
Leading blank rows before the header are skipped automatically
Empty worksheets are excluded from schema discovery
Trailing empty columns are trimmed (columns with no header and no data)
Formula cells return their cached computed value
When Infer Column Types is enabled, native Excel types are mapped directly: integers to LONG, floats to DOUBLE, booleans to BOOLEAN, dates to LOCALDATE, and datetimes to INSTANT
When disabled, all values are treated as strings
Empty rows within the data range are skipped (gaps do not stop reading)

Limitations

File size limit: XLSX files larger than 50 MB are skipped with a warning
No gzip support: XLSX files are already internally compressed; .xlsx.gz is not supported
No password protection: Encrypted or password-protected workbooks are not supported
The Table Mapping Mode setting does not apply to XLSX files. XLSX always uses the workbook + worksheet grouping model.
Name collisions: If two different workbook/worksheet combinations normalize to the same table name (e.g., workbook My-Report sheet Order Items and workbook My_Report sheet Order_Items), schema discovery fails with an error. Rename one of the workbooks or worksheets to resolve.

Troubleshooting

Connection test fails

Problem:

"Failed to connect to SFTP server" error

Solutions:

Verify host and port: Ensure the hostname is correct and the SFTP service is running on the configured port (default: 22)
Check credentials:
- Password auth: Verify the username and password are correct
- Key auth: Ensure the private key is in PEM format and corresponds to an authorized key on the server
Check network access: Ensure your network allows outbound connections to the SFTP server on the configured port
Check folder path: The configured folder must exist on the server. The connection test verifies this by listing the directory.

No objects found during schema discovery

Problem:

Schema discovery returns zero tables

Solutions:

Verify files exist: Ensure the configured folder contains matching files at the expected depth. For TABLE_PER_FOLDER mode, files must be inside subfolders at the configured Table Folder Depth (files directly in the root are ignored when depth >= 1). For TABLE_PER_FILE mode, any matching file becomes a table.
Check table folder depth: If using TABLE_PER_FOLDER, verify the Table Folder Depth matches your actual folder structure. A depth of 3 when your files are only 1 level deep will result in zero tables.
Check file type setting: Make sure the File Type matches your actual files (e.g., don't select CSV for JSON files). Without a File Pattern, only files with standard extensions (.csv, .json, .xlsx, etc.) are matched.
Check file pattern: If you set a File Pattern regex, verify it matches your filenames. Note that File Pattern replaces extension filtering, so the regex must match the full file path. Try removing it to fall back to extension-based filtering.
Files without extensions: If your files have no extension (e.g., data_export), you must set a File Pattern to match them. Without a pattern, only files with standard extensions for the selected File Type are included.
Check file sizes: Empty files (0 bytes) are automatically skipped. XLSX files larger than 50 MB are also skipped.
Check folder path: Verify the root path is correct and the user has read permission to the root and any child folders you want discovered
XLSX empty worksheets: For XLSX files, worksheets with no non-blank cells are excluded. Verify your workbooks contain data.

Missing columns in schema

Problem:

Some columns are missing from the discovered schema

Solutions:

Check file consistency: Schema inference samples up to 5 of the most recently modified files. If older files have additional columns, they may not be sampled.
Verify encoding: Ensure the File Encoding setting matches your files. Wrong encoding can produce garbled or missing column names.
Check for BOM: UTF-8 files with a byte order mark are handled automatically. For other encodings, the BOM may need manual removal.

Incremental sync re-reads all files

Problem:

Every sync reads all files instead of just modified ones

Solutions:

Check initial sync completed: The first sync always reads all files. Incremental behavior starts from the second sync.
Verify files are not being re-saved: Some automated processes touch files without changing content, updating the modification time.
Check ingestion mode: Ensure the pipeline is configured for Historical + Incremental or Incremental mode (not Historical which always does full refresh).

Permission denied errors in logs

Problem:

Warning messages about permission denied for specific files
Some files missing from sync results

Solutions:

Check file permissions: The SSH user must have read access to all files in the configured folder. Files with insufficient permissions are skipped with a warning.
Check folder permissions: The user needs read and list permissions on the folder and all subfolders.

Third-Party Library Notes

Applies to self-hosted (private) agents only

The information in this section applies to deployments where your team runs the Supaflow agent on your own infrastructure (the private agent model). If you use Supaflow's hosted service with the system agent, library management is handled by Supaflow and this section does not apply to you.

paramiko (LGPL-2.1)

The SFTP connector depends on paramiko, a pure-Python SSH/SFTP client library. paramiko is distributed under the GNU Lesser General Public License, version 2.1 (LGPL-2.1). Full license text ships inside every Supaflow agent artifact in THIRD_PARTY_NOTICES.md and THIRD_PARTY_NOTICES.json (alongside the connector artifact at io/supaflow/supaflow-connector-sftp/<version>/).

How paramiko is installed

paramiko is not bundled inside the Supaflow agent Docker image or the connector artifact. It is declared as a runtime dependency in plain-text lockfiles that ship with the SFTP connector:

deps/pyproject.toml — declares the supported paramiko version range
deps/uv.lock — pins the exact paramiko version used at build time

At sync time, the agent materializes a connector-specific Python virtual environment by running uv sync against these files. paramiko is installed into that virtual environment from PyPI, unmodified.

The virtual environment lives under the agent's external data directory:

$AGENT_EXTERNAL_PATH/supaflow-agent/global/connector-envs/<env-hash>/venv/

$AGENT_EXTERNAL_PATH is the volume you mount into the agent container (/data by default). paramiko's files physically reside inside that volume, which is owned by your team — not inside the Supaflow agent image.

Replacing paramiko with a different version

LGPL-2.1 §6 requires that users of a combined work be able to replace the covered library. The following replacement paths are supported.

Option 1: pin a different version in your `supaflow-connector-libs` checkout

The SFTP connector artifact in your supaflow-connector-libs checkout includes the two lockfiles:

io/supaflow/supaflow-connector-sftp/<version>/deps/pyproject.toml
io/supaflow/supaflow-connector-sftp/<version>/deps/uv.lock

Edit deps/pyproject.toml (for example, change the version range or point at a local wheel or git fork) and re-resolve the lock:

cd io/supaflow/supaflow-connector-sftp/<version>/deps/
uv lock

Commit the updated uv.lock in your fork of supaflow-connector-libs. On the next sync, the agent detects that the lockfile hash changed, rebuilds the connector's virtual environment, and installs your pinned paramiko version.

Option 2: steer the uv resolver via a private index

Set an environment variable on the agent container to add a private Python package index to uv's resolution order:

UV_EXTRA_INDEX_URL=https://my-private-pypi.example.com/simple

uv prefers matching wheels from the configured indexes in order, so a private index can serve your paramiko build in place of PyPI's without any change to the lockfiles.

Option 3: replace the installed wheel directly

With the agent stopped:

VENV="$AGENT_EXTERNAL_PATH/supaflow-agent/global/connector-envs/<env-hash>/venv"
rm -rf "$VENV/lib/python3."*"/site-packages/paramiko"
"$VENV/bin/pip" install paramiko==<your-version>

This is a manual override and does not persist across a lockfile change (which would trigger the agent to rebuild the venv), but it is useful for ad-hoc hotfixes.

Complete corresponding source

paramiko's complete source code is available from its upstream project:

PyPI: pypi.org/project/paramiko/
Source repository: github.com/paramiko/paramiko

Supaflow does not modify paramiko; the version installed in your agent virtual environment is the unmodified upstream release pinned in deps/uv.lock.

Support

Need help? Contact us at support@supa-flow.io

Prerequisites​

How Data is Organized​

Table Per Folder (default for CSV, TSV, JSON, JSONL)​

Table Per File (for CSV, TSV, JSON, JSONL)​

XLSX File Organization​

Compression Support​

System Fields​

Configuration​

Connection​

Password Authentication​

Private Key Authentication​

Configuration​

CSV Settings​

Advanced Settings​

Test & Save​

Incremental Sync​

JSON and JSONL File Handling​

JSON Files​

JSONL / NDJSON Files​

XLSX File Handling​

How XLSX Tables are Organized​

Worksheet Parsing​

Limitations​

Troubleshooting​

Connection test fails​

No objects found during schema discovery​

Missing columns in schema​

Incremental sync re-reads all files​

Permission denied errors in logs​

Third-Party Library Notes​

paramiko (LGPL-2.1)​

How paramiko is installed​

Replacing paramiko with a different version​

Option 1: pin a different version in your supaflow-connector-libs checkout​

Option 2: steer the uv resolver via a private index​

Option 3: replace the installed wheel directly​

Complete corresponding source​

Support​

Prerequisites

How Data is Organized

Table Per Folder (default for CSV, TSV, JSON, JSONL)

Table Per File (for CSV, TSV, JSON, JSONL)

XLSX File Organization

Compression Support

System Fields

Configuration

Connection

Password Authentication

Private Key Authentication

Configuration

CSV Settings

Advanced Settings

Test & Save

Incremental Sync

JSON and JSONL File Handling

JSON Files

JSONL / NDJSON Files

XLSX File Handling

How XLSX Tables are Organized

Worksheet Parsing

Limitations

Troubleshooting

Connection test fails

No objects found during schema discovery

Missing columns in schema

Incremental sync re-reads all files

Permission denied errors in logs

Third-Party Library Notes

paramiko (LGPL-2.1)

How paramiko is installed

Replacing paramiko with a different version

Option 1: pin a different version in your `supaflow-connector-libs` checkout

Option 2: steer the uv resolver via a private index

Option 3: replace the installed wheel directly

Complete corresponding source

Support