S3 Data Lake Connector

Load data into Amazon S3 as Parquet or Apache Iceberg tables. Keep data in your own bucket and query with Athena, Snowflake, Spark, DuckDB, or Databricks.

DestinationBronze

Start Free View Documentation

Why Supaflow

All connectors included

Every connector is available on every plan. Pricing does not increase with connector count.

Pay for compute, not rows

Credit-based pricing. Usage scales with your pipelines, not with row counts.

One platform

Ingestion, dbt Core transformation, reverse ETL, and orchestration in a single workspace.

Capabilities

Parquet and Apache Iceberg™ Table Formats

Write raw Parquet files for simple, widely compatible data lake storage. Or write Apache Iceberg™ tables for ACID transactions, time travel queries, and concurrent reads during writes.

AWS Glue and Snowflake Open Catalog

Register tables in AWS Glue Data Catalog for Athena and Redshift Spectrum queries. Or use Snowflake Open Catalog (Polaris) as an Iceberg REST catalog for cross-engine access.

Cross-Account IAM Role Assumption

Supaflow assumes an IAM role in your AWS account to write data. Your credentials stay in your account. The external ID mechanism prevents confused deputy attacks.

Time-Based Partitioning

Organize Parquet files by day, hour, or sync ID for efficient query pruning. Iceberg tables handle partitioning through the catalog automatically.

Supported Objects

Table Formats

Parquet

GZIP-compressed Parquet files. Append-only writes, optional Glue catalog registration. Best for simple data lake use cases.

Apache Iceberg™

Iceberg tables with atomic writes, schema evolution, and snapshot history. Requires a catalog (AWS Glue or Snowflake Open Catalog).

Catalog Options

AWS Glue Data Catalog

Register tables as Glue databases and tables. Query with Athena, Redshift Spectrum, Spark, or EMR. Works with both Parquet and Iceberg.

Snowflake Open Catalog

Iceberg REST catalog backed by Snowflake. Query with Snowflake, Spark, Trino, or any Iceberg-compatible engine. Requires OAuth2 credentials.

How It Works

Create IAM policies in your AWS account

Create an S3 permissions policy for bucket read/write access. Optionally create a Glue permissions policy if using Glue catalog. Policies can be scoped to a specific S3 prefix and Glue database prefix.

Create an IAM role with trust policy

Create a role that trusts the Supaflow AWS account (805595753828) with an external ID you choose. Attach the S3 and Glue policies to the role.

Configure the destination in Supaflow

Enter your S3 bucket name, region, IAM role ARN, and external ID. Choose Parquet or Iceberg table format. For Iceberg, select AWS Glue or Snowflake Open Catalog and provide catalog credentials.

Test and save

Click Test & Save to verify IAM role assumption, S3 write access, and catalog connectivity. Supaflow validates all permissions before saving.

Use Cases

Build an open data lake on your own S3

Replicate data from Salesforce, HubSpot, PostgreSQL, and other sources into S3 as Parquet or Iceberg. Your data stays in your bucket under your control -- no vendor lock-in on storage.

Query with any engine

Once data lands in S3 with Glue or Open Catalog registration, query it from Athena, Snowflake, Spark, DuckDB, Databricks, Dremio, Trino, Redshift Spectrum, or BigQuery. One write, many readers.

Iceberg time travel and schema evolution

Use Iceberg table format for ACID transactions, snapshot-based time travel queries, and automatic schema evolution as your source systems change.

Frequently Asked Questions

What query engines can I use to read data from my S3 data lake?

Amazon Athena, Apache Spark, Snowflake, DuckDB, Databricks, Dremio, Trino, Starburst Galaxy, Redshift Spectrum, Azure Synapse Analytics, BigQuery, and Bauplan. Any engine that reads Parquet files or supports the Iceberg table format will work.

When should I use Iceberg vs Parquet?

Use Parquet for simple data lake files that you query with Athena or Spark. Use Iceberg when you need ACID transactions, time travel, concurrent reads during writes, or multi-engine access through a catalog. Iceberg requires a catalog (Glue or Snowflake Open Catalog).

Does my data leave my AWS account?

No. Supaflow assumes an IAM role in your AWS account to write data directly to your S3 bucket. The data never passes through Supaflow storage. You control access, encryption, and lifecycle policies.

What is the difference between S3 Data Lake and the legacy S3 connector?

S3 Data Lake adds Apache Iceberg™ table format support, Snowflake Open Catalog integration, and improved Glue catalog management. New pipelines should use S3 Data Lake. Existing legacy S3 pipelines continue to work.