Google Drive Source

Connect Google Drive as a source to sync structured data from CSV, TSV, Excel (.xlsx), and Google Sheets files stored in Drive folders.

Prerequisites

Before you begin, ensure you have:

A Google Drive folder containing the files you want to sync

For OAuth authentication (recommended):

A Google account with access to the Drive folder

For Service Account authentication:

A Google Cloud Platform (GCP) project with the Google Drive API enabled
The Google Sheets API enabled if you plan to sync Google Sheets files
A GCP Service Account with a JSON key file
The Drive folder shared with the service account email as Viewer (or higher)

How Data is Organized

The Google Drive connector maps files and folders in Drive to tables and rows in your destination. The mapping depends on the file type you select.

CSV and TSV Files: Folders as Tables

For CSV and TSV files, each folder becomes a table. All files within a folder are treated as containing rows for the same table. This means:

CSV/TSV files in a folder can have slightly different columns; Supaflow unions discovered columns and fills missing values with null
Rows from all files in the folder are combined into one table
The table name is derived from the folder name (normalized to lowercase with underscores)

Google Drive Folder (configured)
├── customers/                     --> Table: "customers"
│   ├── customers_jan.csv              (rows from this file)
│   ├── customers_feb.csv              (rows from this file)
│   └── customers_mar.csv              (rows from this file)
├── orders/                        --> Table: "orders"
│   ├── orders_2024.csv                (rows from this file)
│   └── orders_2025.csv                (rows from this file)
└── products.csv                   --> Table: "<root_folder_name>"
                                        (files in root become a table
                                         named after the root folder)

Key points:

Supaflow scans the configured folder and its immediate child folders (one level deep)
Files directly in the root folder are grouped into a single table named after the root folder
Each child folder with matching files becomes a separate table
During schema discovery, Supaflow samples up to Schema Sample Size files to infer column names and data types. Columns are unioned across sampled files, so files with slightly different columns are supported (missing columns become null).

Excel and Google Sheets: Worksheets as Tables

For Excel (.xlsx) and Google Sheets files, each worksheet (tab) becomes a table. The table name is derived from both the file name and the worksheet name:

Table name = <file_name>_<worksheet_name>

Both components are normalized (lowercase, spaces replaced with underscores, special characters removed).

Google Drive Folder (configured)
├── Q1 Report.xlsx
│   ├── Revenue        --> Table: "q1_report_revenue"
│   └── Expenses       --> Table: "q1_report_expenses"
├── subfolder/
│   └── Inventory.xlsx
│       └── Sheet1     --> Table: "inventory_sheet1"
└── Sales Pipeline     (Google Sheets file)
    ├── Deals          --> Table: "sales_pipeline_deals"
    └── Leads          --> Table: "sales_pipeline_leads"

Key points:

Supaflow searches the configured folder and all subfolders recursively for Excel and Google Sheets files
Each non-empty worksheet with at least a header row and one data row becomes a table
The first row of each worksheet is treated as the header row
Empty worksheets are automatically skipped
Excel files larger than 50 MB are skipped with a warning
Google Sheets files are exported to xlsx format for processing; files exceeding Google's export limit (~10 MB) are skipped with a warning

Table Name Normalization

All table names are normalized for compatibility with destination systems:

Converted to lowercase
Spaces and special characters replaced with underscores
Multiple consecutive underscores collapsed to one
Leading digits prefixed with underscore

Examples:

Source Name	Normalized Table Name
`Customer Data` (folder)	`customer_data`
`Q1 Report.xlsx` / `Revenue` (worksheet)	`q1_report_revenue`
`2024 Sales` (folder)	`_2024_sales`

System Fields

Every table includes two system fields added automatically by the connector:

Field	Type	Description
`_supa_file_name`	string	The original file name from Google Drive
`_supa_file_id`	string	The unique Google Drive file ID

These fields allow you to trace every row back to its source file in your destination.

Google Cloud Setup

OAuth is simpler

If you use OAuth authentication, you can skip this entire section. Just click Authorize in the source configuration and sign in with your Google account. The steps below are only needed for Service Account authentication.

Step 1: Create a Service Account

Go to the Google Cloud Console
Select or create a project
Navigate to IAM & Admin > Service Accounts
Click Create Service Account
Give it a name (e.g., "supaflow-drive-reader") and click Create
Skip the optional role assignment steps and click Done

Step 2: Enable Required APIs

Navigate to APIs & Services > Library
Search for and enable:
- Google Drive API
- Google Sheets API (required only if syncing Google Sheets files)

Step 3: Create a JSON Key

Go to IAM & Admin > Service Accounts
Click on the service account you created
Go to the Keys tab
Click Add Key > Create new key > JSON
Save the downloaded JSON file securely

Key Security

The JSON key file contains credentials for your service account. Store it securely and never commit it to version control. In Supaflow, the key is encrypted at rest.

Open Google Drive and navigate to the folder containing your data files
Right-click the folder and select Share
Enter the service account email address (e.g., supaflow-drive-reader@your-project.iam.gserviceaccount.com)
Set permission to Viewer
Click Send

The service account now has read access to the folder and all files within it.

Configuration

In Supaflow, create a new Google Drive source with these settings:

Authentication

Authentication Method*

Select your authentication method
Options:

oauth - Sign in with your Google account and authorize Supaflow to read your Drive files. No GCP project setup required.
service_account - Authenticate using a GCP Service Account JSON key (see Google Cloud Setup below)

Default: oauth

OAuth

Authorize

Click Authorize to open the Google sign-in flow. Sign in with the Google account that has access to the Drive folder you want to sync and grant Supaflow read-only access. The authorization token is stored encrypted.

Service Account

Service Account Key*

Upload your Google Service Account JSON key file. The service account must have Google Drive API access (and Google Sheets API access if syncing Google Sheets).
Stored encrypted

Configuration

Folder URL*

Google Drive folder URL or folder ID. You can paste the full URL from your browser or just the folder ID.
Example: https://drive.google.com/drive/folders/1jPbXXTs_ZYmb...

File Type*

Type of files to read from the folder
Options:

CSV - Comma-separated values files
TSV - Tab-separated values files
EXCEL - Excel .xlsx files (each worksheet becomes a table)
GOOGLE_SHEETS - Google Sheets files (each worksheet becomes a table)

Default: CSV

File Pattern

Optional glob pattern to filter files by name (case-insensitive). Only files matching the pattern are included in schema discovery and sync.
Example: *.csv, sales_*.xlsx, report_??.csv

CSV Settings

These settings apply when File Type is CSV.
Delimiter applies only to CSV.
Quote Character, Encoding, Has Header Row, Skip Header Lines, Skip Footer Lines, and Null Sequence apply to both CSV and TSV.

Delimiter

Column delimiter character. For TSV files, the tab character is used automatically.
Default: ,

Quote Character

Character used to quote fields that contain the delimiter or newlines
Default: " (double-quote)

Encoding

File character encoding
Default: UTF-8

Has Header Row

Whether the first data row contains column headers. If disabled, columns are named col_1, col_2, etc.
Default: true

Skip Header Lines

Number of raw text lines to skip before the header/data rows. Useful for files with comment lines or metadata at the top.
Default: 0
Range: 0 to 100

Skip Footer Lines

Number of lines to skip at the end of the file. Useful for files with summary rows or footers.
Default: 0
Range: 0 to 100

Null Sequence

String value to treat as null (e.g., NULL, \N, N/A). Leave empty to disable null detection.

Advanced Settings

Schema Sample Size

Number of files to sample during schema inference. Higher values produce more accurate schemas but take longer.
Default: 5
Range: 1 to 50

Lookback Time Seconds

Seconds to subtract from the incremental sync lower bound to catch late-modified files. Use this if files in your Drive folder may be modified slightly after the sync window closes.
Default: 0
Range: 0 to 86400

Error Handling

How to handle file-level errors during sync
Options:

FAIL - Stop on the first file error (default)
SKIP - Skip failed files and continue with remaining files

Default: FAIL

Test & Save

After configuring all required properties, click Test & Save to verify your connection and save the source.

The connection test verifies that:

Your credentials are valid (OAuth token or service account key)
The Google Drive API is accessible
The configured folder exists and is readable

Incremental Sync

The Google Drive connector supports incremental sync using file modification timestamps. On each sync:

Initial sync: All matching files in scope for the selected file type are read
Subsequent syncs: Only files modified since the last sync are read

This is managed automatically by the connector. There is no cursor column in the schema -- the connector tracks sync state internally using Google Drive's modifiedTime metadata on each file.

How it works for each file type:

File Type	Incremental Unit	Behavior
CSV / TSV	Individual file	Only re-reads files modified since last sync. Unmodified files are skipped entirely.
Excel	Entire file	If any worksheet is modified (the file's `modifiedTime` changes), all worksheets in that file are re-read.
Google Sheets	Entire file	Same as Excel -- if the file is modified, all worksheets are re-read.

Full Table Re-Read for Spreadsheets

Because Google Drive tracks modification time at the file level (not worksheet level), any edit to an Excel or Google Sheets file causes all worksheets from that file to be re-synced. For large spreadsheets, keep in mind that a small edit triggers a full re-read of the file.

Troubleshooting

Connection test fails with authentication error

Problem:

"Failed to initialize Google Drive connector" error
Authentication-related error messages

If using OAuth:

Re-authorize: Click Authorize again and complete the Google sign-in flow. OAuth tokens can expire or be revoked.
Check folder access: Ensure the Google account you signed in with has at least Viewer access to the configured folder.
Google Sheets API: If syncing Google Sheets files, the Google Sheets API must be enabled in the GCP project backing the OAuth client.

If using Service Account:

Verify the JSON key file:
- Ensure you uploaded the correct service account JSON key
- Check that the key has not been revoked in GCP Console
- Verify the key file is valid JSON (not corrupted)
Check API access:
- Ensure the Google Drive API is enabled in your GCP project
- If syncing Google Sheets, ensure the Google Sheets API is also enabled
Verify service account permissions:
- The service account needs no special IAM roles -- Drive folder sharing provides access

No objects found during schema discovery

Problem:

Schema discovery returns zero tables
"No objects discovered" message

Solutions:

Verify folder access:
- OAuth: Ensure the Google account you authorized has access to the folder
- Service Account: Open the Drive folder and check that the service account email appears in the sharing settings with at least Viewer access
Check file type matches:
- Ensure the File Type setting matches the actual files in the folder
- CSV files must have MIME type text/csv; TSV files must have text/tab-separated-values or text/plain
- Excel files must be .xlsx format (.xls is not supported)
Check folder structure:
- For CSV/TSV: files must be directly in the root folder or in immediate child folders (not deeper)
- For Excel/Google Sheets: files are found recursively in all subfolders
Check file pattern:
- If you set a File Pattern, verify it matches your file names
- Try removing the pattern to see all files

Missing columns or incorrect data types

Problem:

Some columns are missing from discovered schema
Data types are wrong (e.g., numbers detected as strings)

Solutions:

Increase schema sample size:
- Set Schema Sample Size to a higher value (e.g., 20 or 50)
- More samples produce more accurate column unions and type inference
Check header consistency:
- For CSV/TSV: files can have different headers, but more consistent headers produce cleaner schemas
- Column names are normalized (lowercased, special characters to underscores)
- Duplicate column names get numeric suffixes (name, name_1, name_2)
Verify encoding:
- Ensure the Encoding setting matches your files (default: UTF-8)
- Non-UTF-8 files may produce garbled column names

Incremental sync re-reads all files

Problem:

Every sync reads all files instead of just modified ones

Solutions:

Check initial sync completed:
- The first sync always reads all files
- Incremental behavior starts from the second sync onward
Verify files are not being re-saved:
- Google Drive updates modifiedTime whenever a file is opened and saved, even without changes
- Automated processes that touch files may trigger re-reads
Check lookback time:
- A high Lookback Time Seconds value expands the sync window, potentially including more files

Google Sheets export fails

Problem:

"Cannot export Google Sheets file (possibly exceeds 10MB limit)" warning
Google Sheets files not appearing in schema

Solutions:

Check file size:
- Google limits Sheets exports to approximately 10 MB
- Large spreadsheets with many rows or complex formulas may exceed this limit
- Consider splitting large sheets into smaller files
Check permissions:
- The Google Sheets API must be enabled in the GCP project (applies to both OAuth and service account authentication)
- Service Account: Verify the service account has access to the file via folder sharing

Excel file skipped

Problem:

Excel files not appearing in schema
"Exceeds 50MB limit" warning in logs

Solutions:

Check file size:
- Excel files larger than 50 MB are skipped
- Consider splitting large workbooks into smaller files
Check file format:
- Only .xlsx (Office Open XML) format is supported
- Legacy .xls files are not supported -- save as .xlsx

Support

Need help? Contact us at support@supa-flow.io

Prerequisites​

How Data is Organized​

CSV and TSV Files: Folders as Tables​

Excel and Google Sheets: Worksheets as Tables​

Table Name Normalization​

System Fields​

Google Cloud Setup​

Step 1: Create a Service Account​

Step 2: Enable Required APIs​

Step 3: Create a JSON Key​

Step 4: Share Your Drive Folder​

Configuration​

Authentication​

OAuth​

Service Account​

Configuration​

CSV Settings​

Advanced Settings​

Test & Save​

Incremental Sync​

Troubleshooting​

Connection test fails with authentication error​

No objects found during schema discovery​

Missing columns or incorrect data types​

Incremental sync re-reads all files​

Google Sheets export fails​

Excel file skipped​

Support​

Prerequisites

How Data is Organized

CSV and TSV Files: Folders as Tables

Excel and Google Sheets: Worksheets as Tables

Table Name Normalization

System Fields

Google Cloud Setup

Step 1: Create a Service Account

Step 2: Enable Required APIs

Step 3: Create a JSON Key

Step 4: Share Your Drive Folder

Configuration

Authentication

OAuth

Service Account

Configuration

CSV Settings

Advanced Settings

Test & Save

Incremental Sync

Troubleshooting

Connection test fails with authentication error

No objects found during schema discovery

Missing columns or incorrect data types

Incremental sync re-reads all files

Google Sheets export fails

Excel file skipped

Support