Skip to main content

PostgreSQL Source

Connect PostgreSQL as a source to replicate tables and views into your data warehouse. Supaflow supports full refresh and incremental sync modes with automatic schema discovery and schema evolution.

For an overview of capabilities and use cases, see the PostgreSQL connector page. To run pipelines natively inside Snowflake, see Snowflake Native ETL.

Prerequisites

Before you begin, ensure you have:

  • An active PostgreSQL database instance (version 9.6 or higher)
  • Network connectivity between Supaflow and your PostgreSQL server
  • A database user with CONNECT and SELECT privileges on target schemas
  • Firewall configured to allow connections from Supaflow IP
  • Whitelist Supaflow IP (If Required)

If your database has IP restrictions, add this IP to your pg_hba.conf and postgresql.conf:

18.214.240.61
Create Database User

Create a dedicated read-only user for Supaflow:

CREATE USER supaflow_reader PASSWORD 'your_secure_password';

Grant permissions (repeat for each schema):

GRANT USAGE ON SCHEMA <schema_name> TO supaflow_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO supaflow_reader;
ALTER DEFAULT PRIVILEGES IN SCHEMA <schema_name> GRANT SELECT ON TABLES TO supaflow_reader;

Replace <schema_name> with your actual schema name (e.g., public, sales, analytics).

What Gets Synced

Supaflow discovers all tables and views that the configured database user has SELECT access to. During schema discovery, Supaflow reads table metadata from information_schema to determine column names, types, and primary keys.

Tables and views from any schema the user can access are available for selection. You choose which schemas, tables, and columns to include in your pipeline.

Sync Modes

Full refresh: On each sync, all rows are read from the source table and loaded into the destination. Best for small tables, lookup tables, or tables without a reliable timestamp column.

Incremental sync: Only rows modified since the last sync are fetched. Requires a cursor field -- typically a updated_at or modified_date timestamp column. Supaflow tracks the cursor position automatically between syncs.

To use incremental sync, your tables need a column that reliably indicates when a row was last modified. If no cursor field is available, Supaflow defaults to full refresh.

Schema Evolution

When your PostgreSQL schema changes between syncs, Supaflow handles it automatically:

  • New columns are added to the destination table
  • New tables appear in schema discovery and can be enabled
  • Type widening (e.g., integer to bigint) is applied to the destination
  • Removed columns are preserved in the destination (data is not dropped)

Configuration

Step 1: Connection

Database Host*

Database server IP or hostname
Example: db.example.com or 192.168.1.100

Database Port*

Port on which PostgreSQL is listening
Default: 5432

Database Name*

Name of the database to connect to
Example: production_db

Database Username*

Username for authentication
Example: supaflow_reader

Database Password*

Password for the database user
Stored encrypted


Step 2: Advanced Settings (Optional)

Noop Query

Query to test connectivity without returning data
Default: SELECT 1
Used for connection validation

SSL Mode

SSL connection security level
Options: disable, allow, prefer (default), require, verify-ca, verify-full
Recommended: require or higher for production

JDBC Connection Properties

Additional JDBC connection parameters
Format: key1=value1;key2=value2
Example: connectTimeout=30;socketTimeout=30

Schema Refresh Interval

Interval in minutes for schema metadata refresh
0 = refresh before every pipeline execution
-1 = disable schema refresh
Default: 0 (Range: -1 to 10080)


Step 3: Test & Save

After configuring all required properties, click Test & Save to verify your connection and save the source.

Troubleshooting

Common issues and their solutions:

Connection refused

Problem:

  • Cannot connect to PostgreSQL server
  • "Connection refused" error
  • Network timeout

Solutions:

  1. Verify firewall rules:
    • Add Supaflow IP (18.214.240.61) to allowed IPs
    • Check pg_hba.conf for IP restrictions
    • Example entry: host all all 18.214.240.61/32 md5
  2. Check PostgreSQL is running:
    sudo systemctl status postgresql
  3. Verify port is open:
    sudo netstat -plnt | grep 5432
  4. Check postgresql.conf:
    • Ensure listen_addresses is set correctly
    • Example: listen_addresses = '*' (allow all) or specific IPs
    • Restart PostgreSQL after changes

Authentication failed

Problem:

  • "Password authentication failed"
  • "User does not exist"
  • "FATAL: no pg_hba.conf entry"

Solutions:

  1. Verify username and password:
    • Test login manually:
    psql -h your_host -p 5432 -U supaflow_reader -d your_database
  2. Check pg_hba.conf:
    • Add entry for Supaflow IP:
    host all supaflow_reader 18.214.240.61/32 md5
    • Reload configuration:
    sudo systemctl reload postgresql
  3. Verify user exists:
    SELECT usename FROM pg_user WHERE usename = 'supaflow_reader';
  4. Reset password if needed:
    ALTER USER supaflow_reader WITH PASSWORD 'new_secure_password';

Permission denied on tables

Problem:

  • "Permission denied for table"
  • Can connect but cannot read data
  • Some tables visible, others not

Solutions:

  1. Grant SELECT on all tables:
    GRANT USAGE ON SCHEMA public TO supaflow_reader;
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO supaflow_reader;
  2. Grant privileges on future tables:
    ALTER DEFAULT PRIVILEGES IN SCHEMA public
    GRANT SELECT ON TABLES TO supaflow_reader;
  3. Check schema permissions:
    -- View current grants
    SELECT grantee, privilege_type
    FROM information_schema.role_table_grants
    WHERE table_schema = 'public' AND grantee = 'supaflow_reader';
  4. Repeat for each schema:
    • Replace 'public' with your schema names
    • Common schemas: public, analytics, sales, etc.

SSL connection errors

Problem:

  • "SSL connection failed"
  • "SSL is not enabled on the server"
  • Certificate verification errors

Solutions:

  1. Check SSL configuration:
    • In postgresql.conf: ssl = on
    • Restart PostgreSQL after enabling
  2. Adjust SSL mode in Supaflow:
    • Try prefer instead of require
    • Use disable for testing (not recommended for production)
  3. For self-signed certificates:
    • Use SSL mode require (skips certificate verification)
    • Avoid verify-ca or verify-full for self-signed certs
  4. Check certificate paths:
    • Ensure server.crt and server.key are configured
    • Verify file permissions: 600 for key, 644 for cert

Tables not appearing in schema

Problem:

  • Connected successfully but tables missing
  • Some tables visible, others not
  • Empty schema list

Solutions:

  1. Verify USAGE grant on schema:
    GRANT USAGE ON SCHEMA your_schema TO supaflow_reader;
  2. Check table ownership:
    -- View tables user can access
    SELECT tablename FROM pg_tables
    WHERE schemaname = 'public'
    AND has_table_privilege('supaflow_reader', schemaname||'.'||tablename, 'SELECT');
  3. Refresh schema in Supaflow:
    • Set Schema Refresh Interval to 0
    • Save and test connection again
  4. Verify tables exist:
    SELECT tablename FROM pg_tables WHERE schemaname = 'public';

Connection timeout

Problem:

  • Connection times out during setup
  • Slow query response
  • Test connection takes too long

Solutions:

  1. Increase timeout in JDBC properties:
    • connectTimeout=60 (60 seconds)
    • socketTimeout=60
    • Format: connectTimeout=60;socketTimeout=60
  2. Check network latency:
    ping your_database_host
  3. Verify PostgreSQL performance:
    -- Check for long-running queries
    SELECT pid, now() - query_start as duration, query
    FROM pg_stat_activity
    WHERE state = 'active'
    ORDER BY duration DESC;
  4. Reduce connection pool size:
    • Large pools can cause timeout on slow servers
    • Start with smaller pool size

Too many connections

Problem:

  • "FATAL: too many connections for role"
  • "FATAL: sorry, too many clients already"
  • Connection limit reached

Solutions:

  1. Check connection limit:
    SHOW max_connections;
    SELECT count(*) FROM pg_stat_activity;
  2. Increase max_connections:
    • In postgresql.conf: max_connections = 200 (adjust as needed)
    • Restart PostgreSQL
  3. Set connection limit for user:
    ALTER USER supaflow_reader CONNECTION LIMIT 10;
  4. Close idle connections:
    SELECT pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE usename = 'supaflow_reader' AND state = 'idle';

Support

Need help? Contact us at support@supa-flow.io