S3 Data Lake Connector
Load data into Amazon S3 as Parquet files or Apache Iceberg™ tables. Keep data in your own bucket and query it with any engine -- Athena, Snowflake, Spark, DuckDB, Databricks, and more.
Why Supaflow
All connectors included
No per-connector fees. Every connector is available on every plan.
Pay for compute, not rows
Credit-based pricing. No per-row charges, no MAR surprises.
One platform
Ingestion, dbt Core transformation, reverse ETL, and orchestration in a single workspace.
Capabilities
Parquet and Apache Iceberg™ Table Formats
Write raw Parquet files for simple, widely compatible data lake storage. Or write Apache Iceberg™ tables for ACID transactions, time travel queries, and concurrent reads during writes.
AWS Glue and Snowflake Open Catalog
Register tables in AWS Glue Data Catalog for Athena and Redshift Spectrum queries. Or use Snowflake Open Catalog (Polaris) as an Iceberg REST catalog for cross-engine access.
Cross-Account IAM Role Assumption
Supaflow assumes an IAM role in your AWS account to write data. Your credentials stay in your account. The external ID mechanism prevents confused deputy attacks.
Time-Based Partitioning
Organize Parquet files by day, hour, or sync ID for efficient query pruning. Iceberg tables handle partitioning through the catalog automatically.
Supported Objects
Table Formats
Parquet
GZIP-compressed Parquet files. Append-only writes, optional Glue catalog registration. Best for simple data lake use cases.
Apache Iceberg™
Iceberg tables with atomic writes, schema evolution, and snapshot history. Requires a catalog (AWS Glue or Snowflake Open Catalog).
Catalog Options
AWS Glue Data Catalog
Register tables as Glue databases and tables. Query with Athena, Redshift Spectrum, Spark, or EMR. Works with both Parquet and Iceberg.
Snowflake Open Catalog
Iceberg REST catalog backed by Snowflake. Query with Snowflake, Spark, Trino, or any Iceberg-compatible engine. Requires OAuth2 credentials.
How It Works
Create IAM policies in your AWS account
Create an S3 permissions policy for bucket read/write access. Optionally create a Glue permissions policy if using Glue catalog. Policies can be scoped to a specific S3 prefix and Glue database prefix.
Create an IAM role with trust policy
Create a role that trusts the Supaflow AWS account (805595753828) with an external ID you choose. Attach the S3 and Glue policies to the role.
Configure the destination in Supaflow
Enter your S3 bucket name, region, IAM role ARN, and external ID. Choose Parquet or Iceberg table format. For Iceberg, select AWS Glue or Snowflake Open Catalog and provide catalog credentials.
Test and save
Click Test & Save to verify IAM role assumption, S3 write access, and catalog connectivity. Supaflow validates all permissions before saving.
Use Cases
Build an open data lake on your own S3
Replicate data from Salesforce, HubSpot, PostgreSQL, and other sources into S3 as Parquet or Iceberg. Your data stays in your bucket under your control -- no vendor lock-in on storage.
Query with any engine
Once data lands in S3 with Glue or Open Catalog registration, query it from Athena, Snowflake, Spark, DuckDB, Databricks, Dremio, Trino, Redshift Spectrum, or BigQuery. One write, many readers.
Iceberg time travel and schema evolution
Use Iceberg table format for ACID transactions, snapshot-based time travel queries, and automatic schema evolution as your source systems change.
Frequently Asked Questions
What query engines can I use to read data from my S3 data lake?
When should I use Iceberg vs Parquet?
Does my data leave my AWS account?
What is the difference between S3 Data Lake and the legacy S3 connector?
Need a connector we don't support yet?
Build one with AI-powered Connector Dev Skills.
Learn More About the Connector SDK