MongoDB Source

Sync collections from MongoDB Atlas or self-managed MongoDB clusters into your warehouse. A single source can discover and sync multiple databases at once.

For an overview of capabilities and use cases, see the MongoDB connector page.

At a Glance

Document Packing Mode. By default the connector keeps embedded documents and arrays as JSON values on the parent row (PACKED). Switch to UNPACKED to expand embedded documents into parent__leaf columns and promote arrays of objects into related child outputs.
Per-object incremental cursor. During schema discovery, datetime root fields on each collection are flagged as cursor candidates; you choose the cursor in the wizard.
Sample-based schema. MongoDB collections are schemaless, so the connector samples documents per collection to infer field types.

Prerequisites

Before you begin, ensure you have:

A reachable MongoDB cluster -- MongoDB Atlas or self-managed MongoDB
A MongoDB connection string for the cluster
A read-only MongoDB user with read access to the databases you want to sync

Supported Objects

Supaflow discovers collections dynamically at sync time -- you do not list them up front.

Top-level collections become root objects. They land in the destination as database.collection so collections of the same name in different databases stay distinct.
Nested document structures (embedded documents and arrays) follow your Document Packing Mode setting (see Configuration). In PACKED mode they remain as JSON values on the parent row -- one row per Mongo document, no child tables. In UNPACKED mode embedded documents flatten into parent__leaf columns and arrays of objects become related child outputs (e.g., sales.orders__line_items).
System databases -- admin, local, and config -- are skipped by default. System collections (system.*) are skipped during implicit discovery.

The connector default is PACKED. If you want flattened columns (parent__leaf) and arrays of objects surfaced as related child outputs, set Document Packing Mode to UNPACKED in the Advanced Settings of your MongoDB source.

Incremental Sync

MongoDB collections are schemaless, so cursors are not declared up front. After schema discovery, every datetime-typed root field on a collection is offered as a cursor candidate. In the pipeline wizard you select one cursor per collection, and Supaflow only fetches documents whose cursor field has advanced since the last run.

Collections without any datetime root field stay in full-refresh mode. Once documents in those collections gain a datetime root field, Supaflow surfaces it as a cursor candidate on the next schema refresh.

Cursor state is persisted per object, so different collections can use different cursor fields and operate on independent windows.

Authentication

Supaflow connects with a MongoDB connection string. Two patterns are supported:

Credentials in the URI -- e.g., mongodb+srv://user:pass@cluster0.example.mongodb.net/. Useful for the simplest setups and copy-paste from MongoDB Atlas.
Split credentials -- the URI carries the host and TLS options, and the Username, Password, and Authentication Database are entered in dedicated fields. Recommended; passwords are stored encrypted and are not embedded in the saved connection URL or in the connector's displayed configuration.

If credentials appear in both places the connector fails fast rather than silently choosing one.

Permissions

The MongoDB user needs:

read on every database the connector should discover
Either cluster-wide listDatabases (when you want implicit discovery of all readable databases) or per-database listCollections if you supply explicit Database Filters

For long-term stability, use a dedicated read-only service user rather than an individual's credentials.

Configuration

In Supaflow, create a new MongoDB source with these settings.

Authentication

MongoDB Connection String*

The MongoDB URI for your cluster or deployment. Supports replica sets, TLS, read preference, direct-connection, and other standard MongoDB URI options.
Examples:

mongodb+srv://cluster0.example.mongodb.net/
mongodb://host1:27017,host2:27017/?replicaSet=rs0

Stored encrypted

Username

Optional MongoDB username. Use this with Password to keep credentials out of the URI.

Password

Optional MongoDB password for the supplied Username.
Stored encrypted

Authentication Database

Database that stores the supplied user. Most Atlas deployments authenticate against admin.
Default: admin

If your URI already specifies an authSource query option, that value wins.

Sync Settings

Database Filters

Comma- or newline-separated list of database names to discover. Leave blank to discover all readable non-system databases.
Example: sales, crm

Collection Filters

Comma- or newline-separated list of fully qualified database.collection names to sync. Leave blank to sync every collection in the selected database scope.
Example: sales.orders, sales.users, crm.contacts

Bare collection names (without a database) are not accepted. Two databases can contain the same collection name, so qualifying with the database is required to avoid ambiguity.

Advanced Settings

Document Packing Mode

Controls how nested document structures land in your destination.
Options:

PACKED (default) -- Embedded documents and arrays stay as JSON values on the parent row. No child tables, no flattened parent__leaf columns. Use this when your dbt or warehouse model already expects raw documents.
UNPACKED -- Embedded documents flatten into parent__leaf scalar columns; arrays of objects promote into related child outputs (e.g., sales.orders__line_items).

Default: PACKED. Changing this on an existing source restructures the destination on the next sync; coordinate with downstream consumers before switching.

Schema Sample Size

Maximum number of documents sampled per collection during schema discovery. Higher values improve field-type coverage on collections with sparse fields, at the cost of slower discovery.
Default: 1000. Min: 10. Max: 100000.

Lookback Period

Seconds to roll the cursor back on each incremental run. Useful when writes can arrive slightly out of order on the cursor field.
Default: 0 (no lookback)

Schema Refresh Interval

How often Supaflow re-samples schemas before running the pipeline.
Options:

0 -- Refresh before every pipeline execution. Recommended because MongoDB schemas are inferred from sampled documents.
-1 -- Disable refresh. Use only when you know your document shapes are stable.
Positive value -- Refresh interval in minutes (e.g., 60 = hourly, 1440 = daily).

Default: 0

Test & Save

After configuring the connection, click Test & Save. Supaflow validates the URI and credentials with a lightweight ping, then runs schema discovery on the next step.

Schema Evolution

Schemas are re-discovered on the cadence set by Schema Refresh Interval.

New top-level fields appear in subsequent syncs once schema refresh runs.
New repeated nested arrays can appear as new child objects on the next refresh when Document Packing Mode is UNPACKED. Under the default PACKED mode, the new array stays on the parent row as a JSON column.
Removed fields stop populating new rows; the column remains in the destination.
New collections added to a discovered database appear on the next refresh.
New databases added to the cluster are picked up on the next refresh when implicit discovery is in use, or once you list them in Database Filters.

Because MongoDB types are inferred from sampled documents, increasing Schema Sample Size can stabilize types on collections with rare-but-typed fields.

Performance and Source Load

The connector reads with a single MongoDB cursor per collection and filters on the selected cursor field for incremental runs.

Use Database Filters and Collection Filters to narrow scope on large clusters.
Off-peak scheduling is the simplest way to reduce contention on shared MongoDB deployments.
Very large historical loads should select a cursor field as soon as possible so subsequent runs only fetch deltas.

MongoDB applies its own server-side limits on connection counts and operation rate. The MongoDB connector itself does not implement connector-level rate-limit retry; very large or throttled syncs may need a narrower scope (Database / Collection Filters) or off-peak scheduling. For scheduled production syncs we recommend a dedicated read-only user so connection caps and slow queries do not affect your application traffic. See MongoDB's operational guidance for cluster-side tuning.

Troubleshooting

Authentication failed

Problem: MongoDB authentication failed on Test & Save.

Solutions:

Confirm the user exists in the Authentication Database (most Atlas deployments use admin).
If the URI carries credentials, do not also fill in Username and Password -- the connector rejects credentials in two places.
URL-encode special characters (:, /, @, ?) in passwords when embedding them in the URI, or move the password to the Password field.
For Atlas, confirm the IP address that runs Supaflow is allowed by the Atlas network access list.

Connection refused or DNS errors

Problem: MongoDB connection failed mentioning DNS, timeouts, or refused connections.

Solutions:

Verify the URI by connecting from mongosh against the same string.
For SRV URIs (mongodb+srv://...), DNS resolution requires a working DNS path; corporate VPNs sometimes block it.
For self-managed clusters behind a firewall, allow the IP address that runs Supaflow on port 27017 (or your custom port).

A database is missing from discovery

Problem: A database the user can read does not appear in the source.

Solutions:

Confirm the user has listDatabases cluster-wide, or list the database explicitly in Database Filters (the connector then bypasses cluster-wide listing for that database).
System databases (admin, local, config) are skipped by default.

A collection is missing from discovery

Problem: A collection in a discovered database does not appear.

Solutions:

Confirm the user has listCollections on that database.
Check that the collection has at least one document if you expect type-rich field discovery; empty collections still appear, but only with _id until they have content.
If you used Collection Filters, confirm the entry is fully qualified (database.collection) and the spelling matches.

Object names look different from before

Problem: Previously the destination had tables named after bare collections (e.g., orders), and now they are named after database.collection.

Solutions:

This is intentional. Database-qualified names prevent collisions when the same collection name exists in multiple databases.
If you depend on the old destination layout, point downstream models at the new fully qualified table names. Supaflow's destination layer derives the destination identifier from the qualified name.

A nested structure I expected as a child is missing

Problem: A nested array in your documents did not produce a related child output.

Solutions:

Check Document Packing Mode. The default is PACKED, which keeps nested arrays as JSON columns on the parent row instead of producing child outputs. Switch to UNPACKED if you want arrays of objects to materialise as child tables.
In UNPACKED mode, confirm the field is consistently an array of objects in the sampled documents -- single embedded objects are typically flattened, not promoted to a child.
Increase Schema Sample Size so sparse arrays are more likely to appear in the sample.
Re-run schema discovery after adding the structure to documents.

Support

Need help? Contact us at support@supa-flow.io

At a Glance​

Prerequisites​

Supported Objects​

Incremental Sync​

Authentication​

Permissions​

Configuration​

Authentication​

Sync Settings​

Advanced Settings​

Test & Save​

Schema Evolution​

Performance and Source Load​

Troubleshooting​

Authentication failed​

Connection refused or DNS errors​

A database is missing from discovery​

A collection is missing from discovery​

Object names look different from before​

A nested structure I expected as a child is missing​

Support​

At a Glance

Prerequisites

Supported Objects

Incremental Sync

Authentication

Permissions

Configuration

Authentication

Sync Settings

Advanced Settings

Test & Save

Schema Evolution

Performance and Source Load

Troubleshooting

Authentication failed

Connection refused or DNS errors

A database is missing from discovery

A collection is missing from discovery

Object names look different from before

A nested structure I expected as a child is missing

Support