Skip to main content

SFTP Source

Connect an SFTP server as a source to sync structured data from CSV, TSV, JSON, JSONL, and XLSX files. Supports gzip-compressed and ZIP-archived files (for text formats) and Excel workbooks with incremental sync based on file modification time.

Prerequisites

Before you begin, ensure you have:

  • An SFTP server with SSH access
  • A user account with read access to the folder containing your data files
  • One of the following authentication credentials:
    • Password for the SSH user
    • RSA private key in PEM format for key-based authentication

How Data is Organized

The SFTP connector maps files on your SFTP server to tables in your destination. How files are grouped into tables depends on the Table Mapping Mode setting and the file type.

Table Per Folder (default for CSV, TSV, JSON, JSONL)

Folders at a configurable depth below the root folder define table boundaries. The Table Folder Depth setting (default 1) controls which folder level determines the table name. Files above the configured depth are ignored. Files below the depth are included recursively.

Folder Path = /data, Table Folder Depth = 1

SFTP Server
└── /data/
├── summary.csv --> (ignored -- above depth 1)
├── orders/ --> Table: "orders"
│ ├── orders_jan.csv (rows in "orders" table)
│ ├── orders_feb.csv (rows in "orders" table)
│ ├── archive.csv.gz (decompressed, rows in "orders" table)
│ └── 2024/
│ └── orders_dec.csv (rows in "orders" table -- recursive)
└── customers/ --> Table: "customers"
├── customers_2024.csv (rows in "customers" table)
└── bundle.zip (CSV entries extracted, rows in "customers" table)

With Table Folder Depth = 2, two folder levels define the table name:

Folder Path = /data, Table Folder Depth = 2

SFTP Server
└── /data/
└── us/
├── orders/ --> Table: "us_orders"
│ └── data.csv
└── refunds/ --> Table: "us_refunds"
└── data.csv

Table Per File (for CSV, TSV, JSON, JSONL)

Each file becomes its own table. The table name is derived from the file's relative path (without extension), normalized to lowercase with underscores.

Folder Path = /data, Table Mapping Mode = TABLE_PER_FILE

SFTP Server
└── /data/
├── summary.csv --> Table: "summary"
└── orders/
├── orders_jan.csv --> Table: "orders_orders_jan"
└── orders_feb.csv --> Table: "orders_orders_feb"

When Include Subfolders is disabled, only files directly in the root folder are included.

Table Name Collisions

Table names are derived from file paths by normalizing to lowercase with underscores. In rare cases, two different paths can produce the same table name (e.g., /us_orders.csv and /us/orders.csv both normalize to us_orders). When this happens, the second file gets a numeric suffix (us_orders_2). The assignment is deterministic (sorted by path) so table names are stable across runs.

XLSX File Organization

XLSX files use a different table identity model based on the workbook filename + worksheet name. Each unique combination of workbook basename and worksheet name becomes a table. Files with the same basename and worksheet name in different folders are merged into the same table.

Folder Path = /data, File Type = XLSX

SFTP Server
└── /data/
├── 2026-01/
│ └── report.xlsx
│ ├── Sheet "Orders" --> Table: "report_orders"
│ └── Sheet "Refunds" --> Table: "report_refunds"
└── 2026-02/
└── report.xlsx
├── Sheet "Orders" --> Table: "report_orders" (merged with Jan)
└── Sheet "Refunds" --> Table: "report_refunds" (merged with Jan)

Key points:

  • Column names are unioned across files; missing columns become null
  • Empty files (0 bytes) and files without modification timestamps are automatically skipped
  • Folders with no matching files are not discovered as tables
  • XLSX files larger than 50 MB are skipped with a warning

Compression Support

FormatHow it works
Gzip (.csv.gz, .json.gz, etc.)Decompressed transparently. Format detected from the extension before .gz.
ZIP (.zip)CSV/TSV entries within the archive are extracted and parsed. Non-matching entries (e.g., .txt) are skipped. Only supported for CSV and TSV file types.

System Fields

Every table includes system fields added automatically. The exact fields depend on the file type:

FieldTypeFile TypesDescription
_supa_file_namestringAllOriginal filename from the SFTP server
_supa_file_pathstringAllFull path on the SFTP server
_supa_zip_entrystringCSV, TSVEntry name within a ZIP archive (null for non-ZIP files)
_supa_worksheet_namestringXLSXWorksheet name within the Excel workbook

These fields allow you to trace every row back to its source file in your destination.


Configuration

In Supaflow, create a new SFTP source with these settings:

Connection

Host*

SFTP server hostname or IP address.
Example: sftp.example.com

Port*

SFTP server port.
Default: 22

Username*

SSH username for authentication.

Authentication Method*

How to authenticate with the SFTP server.
Options:

  • PASSWORD - Authenticate with username and password
  • PRIVATE_KEY - Authenticate with an RSA private key

Default: PASSWORD

Password Authentication

Password*

SSH password for the user account.
Stored encrypted

Private Key Authentication

RSA Private Key*

Upload your RSA private key file in PEM format. The key is loaded in memory and never written to disk.
Stored encrypted

Private Key Passphrase

Passphrase to decrypt the private key, if it is encrypted. Leave blank if the private key is not passphrase-protected.
Stored encrypted

Key Format

The private key must be in PEM format (begins with -----BEGIN RSA PRIVATE KEY-----). ED25519 and ECDSA keys are not currently supported.


Configuration

Folder Path*

Root directory on the SFTP server to scan for files. How files are organized into tables depends on the Table Mapping Mode setting (for CSV/TSV/JSON/JSONL) or the workbook + worksheet model (for XLSX).
Example: /data/incoming

File Type*

Type of files to read from the SFTP server.
Options:

  • CSV - Comma-separated values (also matches .csv.gz and .zip files)
  • TSV - Tab-separated values (also matches .tsv.gz and .zip files)
  • JSON - JSON files containing an array of objects or a single object (also matches .json.gz)
  • JSONL - Newline-delimited JSON, one object per line (also matches .jsonl.gz and .ndjson)
  • XLSX - Excel workbooks (.xlsx files). Each worksheet becomes a separate table.

Default: CSV

File Pattern

Optional regex pattern to filter files by name. Applied to the full file path (not just the filename). When set, replaces the default extension filter so that files without standard extensions (e.g., data_export, feed_20240115) can be matched. The File Type setting still controls how matched files are parsed.
Example: orders_.*\.csv matches all CSV files starting with "orders_"
Example: daily_feed_\d{8} matches extensionless files like daily_feed_20240115

Table Mapping Mode

How files are grouped into destination tables. Only applies to CSV, TSV, JSON, and JSONL file types (XLSX uses its own workbook + worksheet grouping).
Options:

  • TABLE_PER_FOLDER - Folders at a configurable depth define table boundaries. All files under the same folder are merged into one table.
  • TABLE_PER_FILE - Each file becomes its own table.

Default: TABLE_PER_FOLDER

Table Folder Depth

Number of folder levels below the root folder that define table boundaries. Only visible when Table Mapping Mode is TABLE_PER_FOLDER.

  • 1 (default): Immediate child folders become tables (e.g., orders/, customers/)
  • 2: Two levels of folders form the table name (e.g., us/orders/ becomes table us_orders)

Default: 1

Include Subfolders

Whether to scan subfolders recursively. When disabled, only files directly in the root folder are included. Applies to TABLE_PER_FILE mode and XLSX file type.
Default: true


CSV Settings

These settings appear when File Type is CSV or TSV.

Delimiter

Column delimiter character. For TSV files, the tab character is used automatically even if this is set to comma.
Default: ,

Has Header Row

Whether the first row contains column headers. If disabled, columns are named col_1, col_2, etc.
Default: true

Quote Character

Character used to quote fields that contain the delimiter or newlines.
Default: "

Escape Character

Character used to escape the quote character inside quoted fields. When set to the same character as the quote character (the default), uses quote-doubling mode (e.g., "" represents a literal "). When set to a different character (e.g., \), uses backslash-style escaping.
Default: " (quote-doubling mode)

Skip Header Lines

Number of lines to skip before the header row. Use this when files have metadata or comments at the top (e.g., # Generated on 2026-01-15).
Default: 0

Null Values

Comma-separated list of strings to treat as null values. When a field value exactly matches one of these strings, it is stored as null in the destination.
Default: \N
Example: \N,#N/A,nil treats all three strings as null

Skip Blank Lines

Whether to skip blank lines in the file. When enabled, lines that are empty or contain only whitespace are ignored.
Default: true

Trim Whitespace

Whether to trim leading and trailing whitespace from field values.
Default: false

On Column Count Mismatch

What to do when a data row has more or fewer columns than the header row.
Options:

  • SKIP - Log a warning and skip the row
  • FAIL - Stop processing and fail the sync

Default: SKIP


Advanced Settings

Request Timeout (seconds)

Timeout in seconds for SFTP operations (directory listing, file download). Set to 0 for no timeout. Operations that time out are retried automatically with exponential backoff.
Default: 300

File Encoding

Character encoding of the source files. UTF-8 files with a BOM (byte order mark) are handled automatically.
Options: utf-8, latin-1, iso-8859-1, windows-1252, ascii
Default: utf-8

Infer Column Types

When enabled, the connector detects column types from sampled data instead of treating all columns as text. For CSV/TSV/JSON/JSONL, types are inferred from string values using pattern matching (integers, decimals, booleans, dates, timestamps). For XLSX, native Excel cell types are used directly (integers, floats, booleans, dates).
Default: true

Schema Refresh Interval

Interval in minutes for schema metadata refresh.

  • 0 - Refresh schema before every pipeline execution
  • -1 - Disable automatic schema refresh (use for static schemas)
  • Positive value - Refresh interval in minutes (e.g., 60 = hourly, 1440 = daily)

Default: 60 (hourly)


Test & Save

After configuring all required properties, click Test & Save to verify your connection and save the source.

The connection test verifies that:

  • SSH credentials are valid (password or private key)
  • The SFTP server is reachable on the configured host and port
  • The configured folder exists and is readable

Incremental Sync

The SFTP connector supports incremental sync using file modification timestamps (st_mtime). On each sync:

  1. Initial sync: All matching files in the folder are read
  2. Subsequent syncs: Only files modified since the last sync are read

This is managed automatically by the connector using a time-window model:

  • Files are included when last_sync_cursor <= file_mtime < current_cutoff_time
  • The cursor advances to the cutoff time (not the max file mtime), ensuring no gaps
  • Files are processed in ascending modification time order for deterministic behavior
  • Per-file checkpointing tracks progress; if a sync is interrupted, only unprocessed files are re-read on the next run
Files Without Timestamps

Files where the SFTP server does not report a modification time (st_mtime is null) are skipped with a warning. These files cannot participate in incremental sync. If your SFTP server does not provide timestamps, use the Historical ingestion mode (full refresh) instead.


JSON and JSONL File Handling

JSON Files

JSON files are parsed as follows:

  • Array of objects: Each object in the array becomes a row
  • Single object: The object becomes a single row
  • Nested values: Nested objects and arrays are serialized as JSON strings in the destination

JSONL / NDJSON Files

JSONL (newline-delimited JSON) files are parsed line by line:

  • Each line must be a valid JSON object
  • Empty lines are skipped
  • Non-object lines (arrays, strings, numbers) are skipped with a warning
  • Files with .jsonl or .ndjson extensions are both supported

XLSX File Handling

Excel workbooks (.xlsx files) are supported as a file type. Each worksheet within a workbook becomes a separate destination table.

How XLSX Tables are Organized

Tables are identified by the combination of workbook filename + worksheet name. Files with the same name and worksheet in different folders are merged:

  • /2026-01/report.xlsx with sheet "Orders" and /2026-02/report.xlsx with sheet "Orders" both feed into table report_orders
  • /data/summary.xlsx with sheets "Q1" and "Q2" produces tables summary_q1 and summary_q2

Worksheet Parsing

  • The first non-empty row in each worksheet is treated as the column header
  • Leading blank rows before the header are skipped automatically
  • Empty worksheets are excluded from schema discovery
  • Trailing empty columns are trimmed (columns with no header and no data)
  • Formula cells return their cached computed value
  • When Infer Column Types is enabled, native Excel types are mapped directly: integers to LONG, floats to DOUBLE, booleans to BOOLEAN, dates to LOCALDATE, and datetimes to INSTANT
  • When disabled, all values are treated as strings
  • Empty rows within the data range are skipped (gaps do not stop reading)

Limitations

  • File size limit: XLSX files larger than 50 MB are skipped with a warning
  • No gzip support: XLSX files are already internally compressed; .xlsx.gz is not supported
  • No password protection: Encrypted or password-protected workbooks are not supported
  • The Table Mapping Mode setting does not apply to XLSX files. XLSX always uses the workbook + worksheet grouping model.
  • Name collisions: If two different workbook/worksheet combinations normalize to the same table name (e.g., workbook My-Report sheet Order Items and workbook My_Report sheet Order_Items), schema discovery fails with an error. Rename one of the workbooks or worksheets to resolve.

Troubleshooting

Connection test fails

Problem:

  • "Failed to connect to SFTP server" error

Solutions:

  1. Verify host and port: Ensure the hostname is correct and the SFTP service is running on the configured port (default: 22)
  2. Check credentials:
    • Password auth: Verify the username and password are correct
    • Key auth: Ensure the private key is in PEM format and corresponds to an authorized key on the server
  3. Check network access: Ensure your network allows outbound connections to the SFTP server on the configured port
  4. Check folder path: The configured folder must exist on the server. The connection test verifies this by listing the directory.

No objects found during schema discovery

Problem:

  • Schema discovery returns zero tables

Solutions:

  1. Verify files exist: Ensure the configured folder contains matching files at the expected depth. For TABLE_PER_FOLDER mode, files must be inside subfolders at the configured Table Folder Depth (files directly in the root are ignored when depth >= 1). For TABLE_PER_FILE mode, any matching file becomes a table.
  2. Check table folder depth: If using TABLE_PER_FOLDER, verify the Table Folder Depth matches your actual folder structure. A depth of 3 when your files are only 1 level deep will result in zero tables.
  3. Check file type setting: Make sure the File Type matches your actual files (e.g., don't select CSV for JSON files). Without a File Pattern, only files with standard extensions (.csv, .json, .xlsx, etc.) are matched.
  4. Check file pattern: If you set a File Pattern regex, verify it matches your filenames. Note that File Pattern replaces extension filtering, so the regex must match the full file path. Try removing it to fall back to extension-based filtering.
  5. Files without extensions: If your files have no extension (e.g., data_export), you must set a File Pattern to match them. Without a pattern, only files with standard extensions for the selected File Type are included.
  6. Check file sizes: Empty files (0 bytes) are automatically skipped. XLSX files larger than 50 MB are also skipped.
  7. Check folder path: Verify the root path is correct and the user has read permission to the root and any child folders you want discovered
  8. XLSX empty worksheets: For XLSX files, worksheets with no non-blank cells are excluded. Verify your workbooks contain data.

Missing columns in schema

Problem:

  • Some columns are missing from the discovered schema

Solutions:

  1. Check file consistency: Schema inference samples up to 5 of the most recently modified files. If older files have additional columns, they may not be sampled.
  2. Verify encoding: Ensure the File Encoding setting matches your files. Wrong encoding can produce garbled or missing column names.
  3. Check for BOM: UTF-8 files with a byte order mark are handled automatically. For other encodings, the BOM may need manual removal.

Incremental sync re-reads all files

Problem:

  • Every sync reads all files instead of just modified ones

Solutions:

  1. Check initial sync completed: The first sync always reads all files. Incremental behavior starts from the second sync.
  2. Verify files are not being re-saved: Some automated processes touch files without changing content, updating the modification time.
  3. Check ingestion mode: Ensure the pipeline is configured for Historical + Incremental or Incremental mode (not Historical which always does full refresh).

Permission denied errors in logs

Problem:

  • Warning messages about permission denied for specific files
  • Some files missing from sync results

Solutions:

  1. Check file permissions: The SSH user must have read access to all files in the configured folder. Files with insufficient permissions are skipped with a warning.
  2. Check folder permissions: The user needs read and list permissions on the folder and all subfolders.

Support

Need help? Contact us at support@supa-flow.io