Amazon S3 Data Lake Destination

Load data into Amazon S3 as Parquet files or Apache Iceberg tables. Integrates with AWS Glue Data Catalog and Snowflake Open Catalog for querying with Athena, Spark, Snowflake, or other analytics tools.

For an overview of capabilities and use cases, see the S3 Data Lake connector page. For Snowflake as a destination, see Snowflake destination or run pipelines natively inside Snowflake.

Table Format

The S3 Data Lake connector supports two table formats:

Format	Best For	Catalog Options
Parquet (default)	Simple data lake files with optional Glue catalog	AWS Glue (optional)
Iceberg	ACID transactions, time travel, multi-engine access	AWS Glue or Snowflake Open Catalog (required)

Parquet writes raw Parquet files to S3 with optional AWS Glue Data Catalog registration. This is a simple, widely compatible format suitable for most use cases.

Iceberg writes Apache Iceberg tables through a catalog. Each write is an atomic transaction, and the catalog tracks table metadata, schema, and snapshot history. This enables time travel queries, concurrent reads during writes, and compatibility with engines like Snowflake, Spark, Trino, and Athena.

Prerequisites

Before you begin, ensure you have:

✅ An AWS account with IAM permissions to create roles and policies
✅ An S3 bucket where Supaflow will write data
✅ AWS Glue access (optional) for Data Catalog table metadata

AWS Setup

Supaflow uses cross-account IAM role assumption to securely access your S3 bucket. You will:

Create IAM policies for S3 (required) and Glue (optional)
Create an IAM role with a trust policy allowing Supaflow to assume it
Attach the policies to the role
Provide the role ARN and external ID in Supaflow

Step 1: Create S3 Permissions Policy

Log in to the AWS Console → IAM → Policies → Create policy
Click the JSON tab. Copy the following policy and paste it into the JSON editor:

S3 Permissions Policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowBucketListForPrefix",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads"
      ],
      "Resource": [
        "arn:aws:s3:::<YOUR_BUCKET_NAME>"
      ],
      "Condition": {
        "StringLike": {
          "s3:prefix": [
            "<OPTIONAL_PREFIX>/*"
          ]
        }
      }
    },
    {
      "Sid": "AllowObjectReadWriteInPrefix",
      "Effect": "Allow",
      "Action": [
        "s3:DeleteObjectTagging",
        "s3:ReplicateObject",
        "s3:PutObject",
        "s3:GetObjectAcl",
        "s3:GetObject",
        "s3:DeleteObjectVersion",
        "s3:PutObjectTagging",
        "s3:DeleteObject",
        "s3:PutObjectAcl",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Resource": [
        "arn:aws:s3:::<YOUR_BUCKET_NAME>/<OPTIONAL_PREFIX>/*"
      ]
    }
  ]
}

IMPORTANT: Modify these values:
- Replace <YOUR_BUCKET_NAME> with your S3 bucket name
- Replace <OPTIONAL_PREFIX> with a path prefix, or remove the Condition block entirely for full bucket access
Click Next → Name the policy (e.g., SupaflowS3Policy) → Create policy

Prefix Restriction

The S3 policy includes an optional prefix condition. This restricts Supaflow to only write within a specific path (e.g., data/supaflow/*). Remove the entire Condition block from the first statement if you want to allow access to the entire bucket.

Step 2: Create Glue Permissions Policy

Create this policy if you plan to use Glue catalog -- either with Parquet format (optional but recommended) or with Iceberg format using AWS Glue as the catalog (required):

In IAM → Policies → Create policy
Click the JSON tab. Copy the following policy and paste it into the JSON editor:

Glue Permissions Policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCatalogRead",
      "Effect": "Allow",
      "Action": [
        "glue:GetCatalog",
        "glue:GetCatalogs",
        "glue:GetCatalogImportStatus"
      ],
      "Resource": [
        "arn:aws:glue:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:catalog"
      ]
    },
    {
      "Sid": "AllowDbAndTableReadWriteForSupaflowPrefix",
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:CreateDatabase",
        "glue:UpdateDatabase",

        "glue:GetTable",
        "glue:GetTables",
        "glue:CreateTable",
        "glue:UpdateTable",

        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:CreatePartition",
        "glue:BatchCreatePartition"
      ],
      "Resource": [
        "arn:aws:glue:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:catalog",
        "arn:aws:glue:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:database/supaflow*",
        "arn:aws:glue:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:table/supaflow*/*"
      ]
    },
    {
      "Sid": "AllowResetOperationsForSupaflowPrefix",
      "Effect": "Allow",
      "Action": [
        "glue:DeleteTable",
        "glue:BatchDeleteTable",
        "glue:DeleteDatabase"
      ],
      "Resource": [
        "arn:aws:glue:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:catalog",
        "arn:aws:glue:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:database/supaflow*",
        "arn:aws:glue:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:table/supaflow*/*"
      ]
    }
  ]
}

IMPORTANT: Modify these values:
- Replace <YOUR_REGION> with your AWS region (e.g., us-east-1)
- Replace <YOUR_ACCOUNT_ID> with your 12-digit AWS account ID
- (Optional) Replace supaflow* with your preferred database prefix
Click Next → Name the policy (e.g., SupaflowGluePolicy) → Create policy

Glue Database Prefix

The policy uses supaflow* as the default database prefix. You can change this to any prefix you prefer (e.g., mycompany_*). In Supaflow configuration, use the literal prefix (for example, mycompany) without the *.

Step 3: Create IAM Role with Trust Policy

Navigate to IAM → Roles → Create role
Select Custom trust policy. Copy the following trust policy and paste it into the JSON editor:

Trust Policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::805595753828:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<YOUR_EXTERNAL_ID>"
        }
        }
      }
  ]
}

IMPORTANT: Modify this value:
- Replace <YOUR_EXTERNAL_ID> with a unique secret string (e.g., my-company-supaflow-2024)
- This acts as a shared secret between your AWS account and Supaflow
- Keep this value secure — you'll enter it in Supaflow configuration
Click Next
Attach the policies you created:
- Search for and select SupaflowS3Policy
- Search for and select SupaflowGluePolicy (if created)
Click Next
Role name: Enter a descriptive name (e.g., SupaflowS3Access)
Click Create role

Step 4: Copy Role ARN

Find the role you just created and click on it
Copy the Role ARN from the summary page
- Example: arn:aws:iam::123456789012:role/SupaflowS3Access
You'll need this ARN and your external ID for the Supaflow configuration

Configuration

In Supaflow, create a new S3 Data Lake destination with these settings:

Connection

S3 Bucket Name*

Name of the S3 bucket
Example: my-data-lake

Bucket Region*

AWS region of the bucket
Example: us-east-1

S3 Prefix Path

Optional path prefix within the bucket. Leave empty to write to bucket root.
Example: data-lake/production

IAM Role ARN*

ARN of the IAM role to assume
Example: arn:aws:iam::123456789012:role/SupaflowS3Role

External ID*

External ID for role assumption (security measure to prevent confused deputy problem). Must match the value in your trust policy.
This value is encrypted and stored securely

S3 Endpoint

Custom S3-compatible endpoint URL for MinIO, DigitalOcean Spaces, etc. Leave empty for standard AWS S3.
Example: https://s3.example.com

Storage Options

Table Format*

Output format for data written to S3
Options: PARQUET, ICEBERG
Default: PARQUET

PARQUET -- Writes raw Parquet files to S3. Simple and widely compatible.
ICEBERG -- Writes Apache Iceberg tables via a catalog. Provides ACID transactions, time travel, and multi-engine compatibility.

Iceberg Settings

These settings appear when Table Format is set to ICEBERG.

Iceberg Catalog*

Catalog backend for managing Iceberg table metadata
Options: GLUE, SNOWFLAKE_OPEN_CATALOG
Default: GLUE

AWS Glue -- Uses your existing AWS Glue Data Catalog. No additional credentials needed beyond the IAM role configured above. Requires the Glue permissions policy (see Step 2).

Snowflake Open Catalog -- Uses Snowflake Open Catalog (Polaris) as the Iceberg REST catalog. Requires OAuth2 credentials and a catalog endpoint.

Region Constraint

When using Snowflake Open Catalog, the catalog and your S3 bucket must be in the same AWS region. Cross-region configurations are not supported.

Catalog Endpoint URL*

Snowflake Open Catalog REST endpoint URL. Only required when Iceberg Catalog is SNOWFLAKE_OPEN_CATALOG.
Example: https://org-account.snowflakecomputing.com/polaris/api/catalog

Catalog Name*

Name of the catalog in Snowflake Open Catalog. Only required when Iceberg Catalog is SNOWFLAKE_OPEN_CATALOG.

Client ID*

OAuth2 client ID for Snowflake Open Catalog authentication. Only required when Iceberg Catalog is SNOWFLAKE_OPEN_CATALOG.
This value is stored securely

Client Secret*

OAuth2 client secret for Snowflake Open Catalog authentication. Only required when Iceberg Catalog is SNOWFLAKE_OPEN_CATALOG.
This value is encrypted and stored securely

Principal Role

Snowflake Open Catalog principal role for authorization scope. Only visible when Iceberg Catalog is SNOWFLAKE_OPEN_CATALOG.
Default: ALL

Parquet Settings

These settings appear when Table Format is set to PARQUET.

Update AWS Glue Catalog

Sync table metadata to AWS Glue Data Catalog for Athena/Redshift Spectrum queries. Requires the Glue permissions policy (see Step 2).
Default: Disabled

Partition Strategy

Time-based partitioning strategy for organizing Parquet files in S3
Options: NONE, DAY, HOUR, SYNC_ID
Default: HOUR

File Name Pattern

Pattern for generated Parquet file names. Supports variables: {seq}, {uuid}, {timestamp}, {supa_job_id}, {date}
Default: part-{supa_job_id}-{seq}

Common Settings

Glue Database Prefix

Prefix for Glue database names. Databases will be created as prefix_schema. Must match the prefix in your Glue IAM policy. Applies to both Parquet (when Glue is enabled) and Iceberg (when using Glue catalog).
Example: supaflow
Default: supaflow

Test & Save

After configuring all required properties, click Test & Save to verify your connection and save the destination.

Data Layout and Write Semantics

Parquet Format

S3 Path Layout

Supaflow writes Parquet files to S3 using the following path layout:

s3://{bucket}/{prefix}/{schema}/{table}/{partition}/{file}.gzip.parquet

Component	Description	Example
bucket	Your S3 bucket name	`my-data-lake`
prefix	Optional S3 path prefix (configurable)	`data/production`
schema	Database/schema name (see Glue Catalog Naming)	`supaflow_postgres_public`
table	Normalized table name	`accounts`
partition	Time-based partition (based on strategy)	`2024/01/15/10`
file	Parquet file with GZIP compression (fixed, not configurable)	`part-abc123-00000.gzip.parquet`

Example paths:

s3://my-bucket/data/supaflow_postgres_public/accounts/2024/01/15/10/part-abc123-00000.gzip.parquet
s3://my-bucket/data/supaflow_salesforce/contact/2024/01/15/part-def456-00000.gzip.parquet

Example folder layout (multiple connectors and tables under one bucket/prefix):

s3://my-bucket/
└── data/
    ├── supaflow_postgres_public/
    │   ├── tenants/
    │   │   └── 2025/
    │   │       └── 12/
    │   │           └── 31/
    │   │               └── 00/
    │   │                   └── part-<supa_job_id>-00000.gzip.parquet
    │   └── accounts/
    │       └── 2025/
    │           └── 12/
    │               └── 31/
    │                   └── 00/
    │                       └── part-<supa_job_id>-00001.gzip.parquet
    ├── supaflow_salesforce/
    │   └── contact/
    │       └── 2025/
    │           └── 12/
    │               └── 31/
    │                   └── 00/
    │                       └── part-<supa_job_id>-00000.gzip.parquet
    └── supaflow_s3/
        └── ext_contacts/
            └── 2025/
                └── 12/
                    └── 31/
                        └── 00/
                            └── part-<supa_job_id>-00000.gzip.parquet

File name pattern

Supports variables: {seq}, {uuid}, {timestamp}, {supa_job_id}, {date}. Default: part-{supa_job_id}-{seq} Compression is fixed to GZIP and cannot be changed.

Write Semantics (Parquet)

Supaflow writes Parquet files using strict append-only semantics:

Supaflow never overwrites or mutates existing S3 objects.
Each pipeline run produces new Parquet files with unique file names.
By default, uniqueness is guaranteed by {supa_job_id} (a UUID). If you customize the file name pattern, keep a unique token (for example {supa_job_id} or {uuid}).

Partitions are time-based and deterministic:

Runs that fall in the same partition window (for example, the same hour when using HOUR partitioning) write multiple files into the same partition directory.
This is expected and safe because uniqueness is enforced at the file level, not the partition level.

First-run safety check (optional):

When Destination Table Handling is set to FAIL, Supaflow verifies the destination table path (bucket + prefix + schema + table) is empty on the initial run.
If any existing files are detected under that table path, the run fails to prevent mixing Supaflow-managed data with pre-existing datasets.

After the initial run succeeds, Supaflow appends new immutable files on subsequent runs, including additional files in existing partitions.

Partition Strategies

Choose a partition strategy based on your query patterns:

Strategy	Path Format	Best For
HOUR (default)	`yyyy/MM/dd/HH`	High-frequency syncs, hourly analysis
DAY	`yyyy/MM/dd`	Daily syncs, daily reporting
SYNC_ID	`{supa_job_id}`	Full refresh pipelines, point-in-time queries
NONE	(no partition)	Small datasets, infrequent updates

Partitions are reused across runs that fall within the same partition window; uniqueness is enforced at the file level, not the partition level.

Iceberg Format

Data Organization

With Iceberg, the catalog manages data organization. You do not configure partition strategies or file name patterns -- the catalog controls where data files and metadata are stored within your S3 bucket.

For AWS Glue catalog, each table is created at a versioned location under your bucket:

s3://{bucket}/{prefix}/{schema}/{table}/v_{timestamp}_{job_id}/

The versioned path (e.g., v_20250315T103045123Z_jd-abc-123/) is generated when the table is first created or when a pipeline reset recreates the table. Normal append runs do not change the location -- they write into the existing table's path as tracked by the catalog.

For Snowflake Open Catalog, the catalog manages the storage location automatically.

Write Semantics (Iceberg)

Iceberg writes use ACID transactions:

Each pipeline run inserts data as an atomic commit. The write either fully succeeds or has no effect.
Concurrent reads are not blocked during writes -- readers see a consistent snapshot.
Each commit creates a new Iceberg snapshot, enabling time travel -- you can query data as it existed at any previous point in time.
Data is append-only. Each run adds new rows; existing rows are not modified or deleted.

First-Run Validation

On the first sync for each table, Supaflow checks whether the table already exists in the catalog. If it does and the pipeline's destination table handling is set to FAIL (the default), the sync fails with a validation error. This prevents accidentally appending into a pre-existing table that was not created by this pipeline.

If the table does not exist, Supaflow creates it at a new versioned location and proceeds with the sync.

Table Reset and Versioned Relocation

When you trigger a pipeline reset (full resync with reset target), Supaflow does not delete the old table data from S3. Instead, it:

Removes the table entry from the catalog (AWS Glue or Snowflake Open Catalog)
Creates a new table at a fresh versioned location within the same S3 prefix
Loads data into the new table

The old data files remain on S3 at the previous versioned path but are no longer referenced by the catalog. This approach is fast (no recursive S3 deletes) and safe (no risk of partial cleanup failures).

Cleaning up old table versions

After a reset, orphaned data at previous versioned paths is no longer queryable through the catalog. To reclaim storage, configure an S3 lifecycle rule to expire objects under old version prefixes, or periodically delete them manually. Supaflow does not automatically delete old data.

Iceberg vs Parquet Comparison

Feature	Parquet	Iceberg
Write atomicity	Per-file	Per-commit (ACID)
Time travel	Not supported	Supported via snapshots
Concurrent reads during writes	May see partial data	Always consistent
Catalog registration	Optional (Glue)	Required (Glue or Snowflake OC)
Partition strategy	Configurable (HOUR, DAY, etc.)	Managed by catalog
File name pattern	Configurable	Managed by catalog
Query engines	Athena, Spark	Athena, Spark, Snowflake, Trino

Glue Data Catalog Database and Table Naming

The schema name (which becomes the Glue database name when Glue is enabled) is constructed from multiple components:

[glue_prefix_] + destination_prefix + source_schema

Component	Source	Default	Example
Glue Prefix	Connector config	`supaflow`	`supaflow`
Destination Prefix	Pipeline config (see Step 2: Configure Pipeline)	Empty unless set (UI often sets to connector type)	`postgres`, `salesforce`
Source Schema	Source database	Varies by source	`public`, `dbo`

Examples:

Source	Destination Prefix	Source Schema	Glue Prefix	Resulting Schema
PostgreSQL	`postgres`	`public`	`supaflow`	`supaflow_postgres_public`
Salesforce	`salesforce`	(none)	`supaflow`	`supaflow_salesforce`
PostgreSQL	(blank)	`public`	`supaflow`	`supaflow_public`
PostgreSQL	(blank)	(blank)	`supaflow`	`supaflow`
PostgreSQL	`postgres`	`public`	(blank)	`postgres_public`

Schema Segment Behavior

If the schema resolves to blank and Glue is disabled, the schema segment is omitted from the S3 path
If Glue is enabled with a prefix, the schema becomes at minimum the Glue prefix

Tracking Columns

Supaflow automatically adds tracking columns to every table for data lineage and identity:

Column	Type	Description
`_supa_synced`	timestamp	When the record was synced (UTC)
`_supa_deleted`	boolean	Soft delete marker from source
`_supa_index`	bigint	Row index within the batch
`_supa_id`	string	Record identity key (see below)
`_supa_job_id`	string	Unique job/sync identifier

Append mode behavior

The S3 Data Lake connector writes in append mode. In this mode:

_supa_index is always 0. Row ordering is not meaningful in append-only S3 tables since each sync appends a new set of files.
_supa_id is computed from the narrowest available key, not from all columns. See below for details.

`_supa_id` computation

The _supa_id column provides a stable record identity for each row. How it is computed depends on the source metadata:

Scenario	Behavior
Source has `_supa_id` as its primary key (data originally loaded by Supaflow)	Passed through unchanged -- preserves identity across pipeline hops
Source has a primary key (e.g., `id`)	SHA-256 hash of the primary key column(s)
Source has a unique key but no primary key	SHA-256 hash of the unique key column(s)
Source has no key at all	SHA-256 hash of all data columns

When computing a hash, Supaflow uses a double-SHA-256 structure: each field value is individually hashed, the per-field hashes are joined with |, and the result is hashed again. This matches the Snowflake destination's _supa_id computation for cross-destination consistency.

Table and Column Normalization

Table and column names are normalized for Parquet/Glue compatibility:

ASCII only: Non-ASCII characters (日本語, Δ, etc.) → underscore
Accents dropped: crème → creme
Valid characters: Letters (a-z, A-Z), digits (0-9), underscores only
Leading digit: Prepend underscore (123abc → _123abc)
Spaces/special chars: Converted to underscore
Multiple underscores: Collapsed to single underscore
Max length: 255 characters (Glue limit)

Collision handling: If normalization creates duplicates, numeric suffixes are added (name, name_1, name_2).

Querying Your Data

Querying Parquet Tables with Athena

If Glue Catalog is enabled for Parquet format, you can query your data directly with Amazon Athena:

By default, Athena queries return all data written for the table (initial + incremental) across all partitions and files. To query a specific sync, filter on _supa_job_id. To get a "current" view, deduplicate by _supa_id ordered by _supa_synced.

-- Query latest sync
SELECT *
FROM supaflow_postgres_public.accounts
WHERE _supa_job_id = 'latest-job-id'
LIMIT 100;

-- Find all synced records for a table
SELECT _supa_job_id, COUNT(*) as record_count, MIN(_supa_synced) as sync_time
FROM supaflow_postgres_public.accounts
GROUP BY _supa_job_id
ORDER BY sync_time DESC;

-- Deduplicate using _supa_id (latest version of each record)
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY _supa_id ORDER BY _supa_synced DESC) as rn
  FROM supaflow_postgres_public.accounts
) WHERE rn = 1 AND _supa_deleted = false;

Querying Iceberg Tables with Athena (AWS Glue Catalog)

Iceberg tables registered in AWS Glue can be queried with Athena, Spark, or any engine that supports the Apache Iceberg table format:

-- Query an Iceberg table (same syntax as regular tables)
SELECT *
FROM supaflow_postgres_public.accounts
LIMIT 100;

-- Time travel -- query data as of a specific snapshot
SELECT *
FROM supaflow_postgres_public.accounts
FOR TIMESTAMP AS OF TIMESTAMP '2025-03-15 10:00:00';

Querying Iceberg Tables from Snowflake (Open Catalog)

Apache Iceberg tables registered in Snowflake Open Catalog can be queried from Snowflake as externally managed tables. This requires a one-time setup in Snowflake to create an external volume and catalog integration.

Prerequisites

An Apache Iceberg table written by Supaflow and registered in Snowflake Open Catalog
A Snowflake account with permissions to create external volumes, catalog integrations, and Iceberg tables
The S3 bucket name, region, and IAM role ARN used by your Supaflow destination

Step 1: Create an external volume

Create an external volume that gives Snowflake access to the S3 bucket where Supaflow writes data:

CREATE OR REPLACE EXTERNAL VOLUME supaflow_iceberg_volume
  STORAGE_LOCATIONS =
    (
      (
        NAME = 'supaflow-s3'
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = 's3://<your-bucket>/'
        STORAGE_AWS_ROLE_ARN = '<arn:aws:iam::123456789012:role/SupaflowS3Access>'
        STORAGE_AWS_EXTERNAL_ID = '<your-external-id>'
      )
    )
    ALLOW_WRITES = FALSE;

Replace <your-bucket>, the role ARN, and external ID with the same values from your Supaflow destination configuration. Set ALLOW_WRITES = FALSE since Snowflake only needs read access for querying.

After creating the volume, retrieve the Snowflake IAM user ARN and external ID that Snowflake assigned:

DESC EXTERNAL VOLUME supaflow_iceberg_volume;

In the output, find STORAGE_AWS_IAM_USER_ARN and STORAGE_AWS_EXTERNAL_ID. Add this Snowflake principal to your IAM role's trust policy so that Snowflake can assume the role and read from your S3 bucket:

{
    "Effect": "Allow",
    "Principal": {
        "AWS": "<STORAGE_AWS_IAM_USER_ARN from above>"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
        "StringEquals": {
            "sts:ExternalId": "<STORAGE_AWS_EXTERNAL_ID from above>"
        }
    }
}

IAM trust policy

This is a separate trust entry from the one used by Snowflake Open Catalog itself. The Open Catalog service uses its own IAM principal to write Iceberg metadata to S3, while the Snowflake compute account uses a different principal to read data via the external volume. Both must be in the trust policy.

Verify the volume has access to your storage:

SELECT SYSTEM$VERIFY_EXTERNAL_VOLUME('supaflow_iceberg_volume');

Step 2: Create a catalog integration

Create a catalog integration that connects Snowflake to your Open Catalog instance using OAuth:

CREATE OR REPLACE CATALOG INTEGRATION supaflow_open_catalog_int
  CATALOG_SOURCE = POLARIS
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = '<your-namespace>'
  REST_CONFIG = (
    CATALOG_URI = 'https://<orgname>-<catalog-account>.snowflakecomputing.com/polaris/api/catalog'
    CATALOG_NAME = '<your-catalog-name>'
  )
  REST_AUTHENTICATION = (
    TYPE = OAUTH
    OAUTH_CLIENT_ID = '<your-client-id>'
    OAUTH_CLIENT_SECRET = '<your-client-secret>'
    OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
  )
  ENABLED = TRUE;

Use the same catalog endpoint, catalog name, and OAuth credentials from your Supaflow destination configuration. The CATALOG_NAMESPACE should match the namespace where Supaflow writes tables (typically the schema name, e.g., supaflow_postgres_public).

Finding your account names

Organization name (<orgname>): See Finding the organization and account name
Catalog account name (<catalog-account>): See Find the account name for a Snowflake Open Catalog account

Step 3: Create an externally managed table

Create an Apache Iceberg table in Snowflake that points to the table in Open Catalog:

CREATE ICEBERG TABLE supaflow_accounts
  CATALOG = 'supaflow_open_catalog_int'
  EXTERNAL_VOLUME = 'supaflow_iceberg_volume'
  CATALOG_TABLE_NAME = '<table-name-in-catalog>';

For CATALOG_TABLE_NAME, use the table name as it appears in Open Catalog (e.g., accounts).

If you did not specify CATALOG_NAMESPACE in the catalog integration, you must specify it here:

CREATE ICEBERG TABLE supaflow_accounts
  CATALOG = 'supaflow_open_catalog_int'
  EXTERNAL_VOLUME = 'supaflow_iceberg_volume'
  CATALOG_TABLE_NAME = '<table-name-in-catalog>'
  CATALOG_NAMESPACE = '<your-namespace>';

Listing available tables

You can discover tables in your catalog using:

SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('supaflow_open_catalog_int');

This returns a JSON array with each table's namespace and name. SYSTEM$LIST_NAMESPACES_FROM_CATALOG may return an empty result for Polaris-based catalogs even when namespaces exist -- use the table listing instead.

Step 4: Query the table

You can now query the Apache Iceberg table from Snowflake and join it with other Snowflake tables:

-- Query synced data
SELECT *
FROM supaflow_accounts
LIMIT 100;

-- Filter by Supaflow tracking columns
SELECT id, name, _supa_synced, _supa_job_id
FROM supaflow_accounts
WHERE _supa_deleted = false
ORDER BY _supa_synced DESC;

-- Join with other Snowflake tables
SELECT a.id, a.name, o.order_total
FROM supaflow_accounts a
JOIN my_database.my_schema.orders o ON a.id = o.account_id;

To keep the table metadata in sync with new Supaflow writes, you can enable automatic refresh:

ALTER ICEBERG TABLE supaflow_accounts SET AUTO_REFRESH = TRUE;

Troubleshooting

Access Denied when assuming role

Problem:

"AccessDenied" error when testing connection
"User is not authorized to perform sts:AssumeRole"

Solutions:

Verify the trust policy:
- Ensure Principal is exactly arn:aws:iam::805595753828:root
- Check External ID matches what you entered in Supaflow
Check role ARN:
- Ensure you copied the complete ARN
- Verify the role exists in your account
Wait a few minutes:
- IAM changes can take up to a minute to propagate

Access Denied on S3 operations

Problem:

"AccessDenied" when writing to S3
"Bucket does not exist" error

Solutions:

Verify bucket name:
- Check for typos in the bucket name
- Ensure bucket exists in the specified region
Check S3 policy:
- Verify <YOUR_BUCKET_NAME> was replaced correctly
- If using prefix restriction, ensure the prefix matches your configuration
Verify bucket permissions:
- Check if bucket has a bucket policy that might deny access
- Ensure bucket is not configured for Requester Pays

Access Denied on Glue operations

Problem:

"AccessDeniedException" when creating Glue tables
Tables not appearing in Glue catalog

Solutions:

Verify Glue policy is attached:
- Check the role has the Glue permissions policy
Check region and account ID:
- Ensure <YOUR_REGION> matches your bucket region
- Verify <YOUR_ACCOUNT_ID> is your 12-digit AWS account ID
Verify database prefix:
- Ensure your Glue policy prefix matches your Supaflow destination configuration
- Default is supaflow* but you can use any prefix

Invalid External ID

Problem:

"The security token included in the request is invalid"
External ID mismatch error

Solutions:

Check for typos:
- External ID is case-sensitive
- Check for leading/trailing spaces
Verify trust policy:
- Open the role in AWS Console
- Click "Trust relationships" tab
- Confirm the External ID matches exactly

Connection timeout

Problem:

Connection test times out
Slow response from AWS

Solutions:

Check AWS service status:
- Visit AWS Service Health Dashboard
Verify region:
- Ensure the region in Supaflow matches your bucket's region
Try again:
- Temporary network issues may cause timeouts

Iceberg: Table already exists on first run

Problem:

"Iceberg table already exists on first run and destinationTableHandling is FAIL"
First sync fails even though you expected it to create the table

Solutions:

Previous failed run left a table behind:
- A prior pipeline run may have created the catalog table but failed during data loading. The table exists in the catalog but has no data (or partial data).
- Run a full resync with reset target to drop the old table and recreate it at a new location.
Table was created by a different pipeline:
- Another pipeline with the same schema prefix may have already created this table in the catalog.
- Check the Glue or Snowflake Open Catalog console for existing tables with the same name.
Need to start fresh:
- The recommended recovery path is always full resync with reset target. This cleanly drops the old catalog entry and recreates the table at a new location.

Iceberg: Glue catalog 403 Forbidden

Problem:

"403 Forbidden" or "AccessDeniedException" when using Iceberg with AWS Glue catalog
Connection test fails with Iceberg + Glue

Solutions:

Verify Glue policy includes Iceberg-specific permissions:
- The Iceberg catalog requires glue:GetCatalog and glue:GetCatalogs in addition to the standard Glue permissions
- If you previously used the Glue policy from the S3 connector, update it to include these additional actions (see Step 2)
Check IAM policy is attached to the role:
- Verify the updated Glue policy is attached to your IAM role

Iceberg: Snowflake Open Catalog authentication failed

Problem:

"401 Unauthorized" or OAuth error when connecting to Snowflake Open Catalog
"Invalid client credentials" error

Solutions:

Verify OAuth2 credentials:
- Check that the Client ID and Client Secret are correct
- Ensure the credentials have not been rotated or revoked in Snowflake Open Catalog
Check catalog endpoint URL:
- Ensure the URL format is correct (e.g., https://org-account.snowflakecomputing.com/polaris/api/catalog)
- Verify the catalog name matches an existing catalog in Snowflake Open Catalog
Verify principal role:
- If you specified a Principal Role other than ALL, ensure the role exists and has access to the catalog

Iceberg: Region mismatch with Snowflake Open Catalog

Problem:

Writes fail with access errors despite correct credentials
"The bucket is in a different region" error

Solutions:

Ensure same region:
- Snowflake Open Catalog and your S3 bucket must be in the same AWS region
- Check your catalog's region in the Snowflake Open Catalog console
- Check your bucket's region in the AWS S3 console
Create bucket in matching region:
- If they differ, either create a new S3 bucket in the catalog's region or use AWS Glue catalog instead

Support

Need help? Contact us at support@supa-flow.io

Table Format​

Prerequisites​

AWS Setup​

Step 1: Create S3 Permissions Policy​

Step 2: Create Glue Permissions Policy​

Step 3: Create IAM Role with Trust Policy​

Step 4: Copy Role ARN​

Configuration​

Connection​

Storage Options​

Iceberg Settings​

Parquet Settings​

Common Settings​

Test & Save​

Data Layout and Write Semantics​

Parquet Format​

S3 Path Layout​

Write Semantics (Parquet)​

Partition Strategies​

Iceberg Format​

Data Organization​

Write Semantics (Iceberg)​

First-Run Validation​

Table Reset and Versioned Relocation​

Iceberg vs Parquet Comparison​

Glue Data Catalog Database and Table Naming​

Tracking Columns​

Append mode behavior​

_supa_id computation​

Table and Column Normalization​

Querying Your Data​

Querying Parquet Tables with Athena​

Querying Iceberg Tables with Athena (AWS Glue Catalog)​

Querying Iceberg Tables from Snowflake (Open Catalog)​

Prerequisites​

Step 1: Create an external volume​

Step 2: Create a catalog integration​

Step 3: Create an externally managed table​

Step 4: Query the table​

Troubleshooting​

Access Denied when assuming role​

Access Denied on S3 operations​

Access Denied on Glue operations​

Invalid External ID​

Connection timeout​

Iceberg: Table already exists on first run​

Iceberg: Glue catalog 403 Forbidden​

Iceberg: Snowflake Open Catalog authentication failed​

Iceberg: Region mismatch with Snowflake Open Catalog​

Support​

Table Format

Prerequisites

AWS Setup

Step 1: Create S3 Permissions Policy

Step 2: Create Glue Permissions Policy

Step 3: Create IAM Role with Trust Policy

Step 4: Copy Role ARN

Configuration

Connection

Storage Options

Iceberg Settings

Parquet Settings

Common Settings

Test & Save

Data Layout and Write Semantics

Parquet Format

S3 Path Layout

Write Semantics (Parquet)

Partition Strategies

Iceberg Format

Data Organization

Write Semantics (Iceberg)

First-Run Validation

Table Reset and Versioned Relocation

Iceberg vs Parquet Comparison

Glue Data Catalog Database and Table Naming

Tracking Columns

Append mode behavior

`_supa_id` computation

Table and Column Normalization

Querying Your Data

Querying Parquet Tables with Athena

Querying Iceberg Tables with Athena (AWS Glue Catalog)

Querying Iceberg Tables from Snowflake (Open Catalog)

Prerequisites

Step 1: Create an external volume

Step 2: Create a catalog integration

Step 3: Create an externally managed table

Step 4: Query the table

Troubleshooting

Access Denied when assuming role

Access Denied on S3 operations

Access Denied on Glue operations

Invalid External ID

Connection timeout

Iceberg: Table already exists on first run

Iceberg: Glue catalog 403 Forbidden

Iceberg: Snowflake Open Catalog authentication failed

Iceberg: Region mismatch with Snowflake Open Catalog

Support