Google Drive Source
Connect Google Drive as a source to sync structured data from CSV, TSV, Excel (.xlsx), and Google Sheets files stored in Drive folders.
Prerequisites
Before you begin, ensure you have:
- A Google Drive folder containing the files you want to sync
For OAuth authentication (recommended):
- A Google account with access to the Drive folder
For Service Account authentication:
- A Google Cloud Platform (GCP) project with the Google Drive API enabled
- The Google Sheets API enabled if you plan to sync Google Sheets files
- A GCP Service Account with a JSON key file
- The Drive folder shared with the service account email as Viewer (or higher)
How Data is Organized
The Google Drive connector maps files and folders in Drive to tables and rows in your destination. The mapping depends on the file type you select.
CSV and TSV Files: Folders as Tables
For CSV and TSV files, each folder becomes a table. All files within a folder are treated as containing rows for the same table. This means:
- CSV/TSV files in a folder can have slightly different columns; Supaflow unions discovered columns and fills missing values with
null - Rows from all files in the folder are combined into one table
- The table name is derived from the folder name (normalized to lowercase with underscores)
Google Drive Folder (configured)
├── customers/ --> Table: "customers"
│ ├── customers_jan.csv (rows from this file)
│ ├── customers_feb.csv (rows from this file)
│ └── customers_mar.csv (rows from this file)
├── orders/ --> Table: "orders"
│ ├── orders_2024.csv (rows from this file)
│ └── orders_2025.csv (rows from this file)
└── products.csv --> Table: "<root_folder_name>"
(files in root become a table
named after the root folder)
Key points:
- Supaflow scans the configured folder and its immediate child folders (one level deep)
- Files directly in the root folder are grouped into a single table named after the root folder
- Each child folder with matching files becomes a separate table
- During schema discovery, Supaflow samples up to Schema Sample Size files to infer column names and data types. Columns are unioned across sampled files, so files with slightly different columns are supported (missing columns become
null).
Excel and Google Sheets: Worksheets as Tables
For Excel (.xlsx) and Google Sheets files, each worksheet (tab) becomes a table. The table name is derived from both the file name and the worksheet name:
Table name = <file_name>_<worksheet_name>
Both components are normalized (lowercase, spaces replaced with underscores, special characters removed).
Google Drive Folder (configured)
├── Q1 Report.xlsx
│ ├── Revenue --> Table: "q1_report_revenue"
│ └── Expenses --> Table: "q1_report_expenses"
├── subfolder/
│ └── Inventory.xlsx
│ └── Sheet1 --> Table: "inventory_sheet1"
└── Sales Pipeline (Google Sheets file)
├── Deals --> Table: "sales_pipeline_deals"
└── Leads --> Table: "sales_pipeline_leads"
Key points:
- Supaflow searches the configured folder and all subfolders recursively for Excel and Google Sheets files
- Each non-empty worksheet with at least a header row and one data row becomes a table
- The first row of each worksheet is treated as the header row
- Empty worksheets are automatically skipped
- Excel files larger than 50 MB are skipped with a warning
- Google Sheets files are exported to xlsx format for processing; files exceeding Google's export limit (~10 MB) are skipped with a warning
Table Name Normalization
All table names are normalized for compatibility with destination systems:
- Converted to lowercase
- Spaces and special characters replaced with underscores
- Multiple consecutive underscores collapsed to one
- Leading digits prefixed with underscore
Examples:
| Source Name | Normalized Table Name |
|---|---|
Customer Data (folder) | customer_data |
Q1 Report.xlsx / Revenue (worksheet) | q1_report_revenue |
2024 Sales (folder) | _2024_sales |
System Fields
Every table includes two system fields added automatically by the connector:
| Field | Type | Description |
|---|---|---|
_supa_file_name | string | The original file name from Google Drive |
_supa_file_id | string | The unique Google Drive file ID |
These fields allow you to trace every row back to its source file in your destination.
Google Cloud Setup
If you use OAuth authentication, you can skip this entire section. Just click Authorize in the source configuration and sign in with your Google account. The steps below are only needed for Service Account authentication.
Step 1: Create a Service Account
- Go to the Google Cloud Console
- Select or create a project
- Navigate to IAM & Admin > Service Accounts
- Click Create Service Account
- Give it a name (e.g., "supaflow-drive-reader") and click Create
- Skip the optional role assignment steps and click Done
Step 2: Enable Required APIs
- Navigate to APIs & Services > Library
- Search for and enable:
- Google Drive API
- Google Sheets API (required only if syncing Google Sheets files)
Step 3: Create a JSON Key
- Go to IAM & Admin > Service Accounts
- Click on the service account you created
- Go to the Keys tab
- Click Add Key > Create new key > JSON
- Save the downloaded JSON file securely
The JSON key file contains credentials for your service account. Store it securely and never commit it to version control. In Supaflow, the key is encrypted at rest.
Step 4: Share Your Drive Folder
- Open Google Drive and navigate to the folder containing your data files
- Right-click the folder and select Share
- Enter the service account email address (e.g.,
supaflow-drive-reader@your-project.iam.gserviceaccount.com) - Set permission to Viewer
- Click Send
The service account now has read access to the folder and all files within it.
Configuration
In Supaflow, create a new Google Drive source with these settings:
Authentication
Authentication Method*Select your authentication method
Options:
- oauth - Sign in with your Google account and authorize Supaflow to read your Drive files. No GCP project setup required.
- service_account - Authenticate using a GCP Service Account JSON key (see Google Cloud Setup below)
Default: oauth
OAuth
AuthorizeClick Authorize to open the Google sign-in flow. Sign in with the Google account that has access to the Drive folder you want to sync and grant Supaflow read-only access. The authorization token is stored encrypted.
Service Account
Service Account Key*Upload your Google Service Account JSON key file. The service account must have Google Drive API access (and Google Sheets API access if syncing Google Sheets).
Stored encrypted
Configuration
Folder URL*Google Drive folder URL or folder ID. You can paste the full URL from your browser or just the folder ID.
Example: https://drive.google.com/drive/folders/1jPbXXTs_ZYmb...
Type of files to read from the folder
Options:
- CSV - Comma-separated values files
- TSV - Tab-separated values files
- EXCEL - Excel .xlsx files (each worksheet becomes a table)
- GOOGLE_SHEETS - Google Sheets files (each worksheet becomes a table)
Default: CSV
File PatternOptional glob pattern to filter files by name (case-insensitive). Only files matching the pattern are included in schema discovery and sync.
Example: *.csv, sales_*.xlsx, report_??.csv
CSV Settings
These settings apply when File Type is CSV.
Delimiter applies only to CSV.
Quote Character, Encoding, Has Header Row, Skip Header Lines, Skip Footer Lines, and Null Sequence apply to both CSV and TSV.
Column delimiter character. For TSV files, the tab character is used automatically.
Default: ,
Character used to quote fields that contain the delimiter or newlines
Default: " (double-quote)
File character encoding
Default: UTF-8
Whether the first data row contains column headers. If disabled, columns are named col_1, col_2, etc.
Default: true
Number of raw text lines to skip before the header/data rows. Useful for files with comment lines or metadata at the top.
Default: 0
Range: 0 to 100
Number of lines to skip at the end of the file. Useful for files with summary rows or footers.
Default: 0
Range: 0 to 100
String value to treat as null (e.g., NULL, \N, N/A). Leave empty to disable null detection.
Advanced Settings
Schema Sample SizeNumber of files to sample during schema inference. Higher values produce more accurate schemas but take longer.
Default: 5
Range: 1 to 50
Seconds to subtract from the incremental sync lower bound to catch late-modified files. Use this if files in your Drive folder may be modified slightly after the sync window closes.
Default: 0
Range: 0 to 86400
How to handle file-level errors during sync
Options:
- FAIL - Stop on the first file error (default)
- SKIP - Skip failed files and continue with remaining files
Default: FAIL
Test & Save
After configuring all required properties, click Test & Save to verify your connection and save the source.
The connection test verifies that:
- Your credentials are valid (OAuth token or service account key)
- The Google Drive API is accessible
- The configured folder exists and is readable
Incremental Sync
The Google Drive connector supports incremental sync using file modification timestamps. On each sync:
- Initial sync: All matching files in scope for the selected file type are read
- Subsequent syncs: Only files modified since the last sync are read
This is managed automatically by the connector. There is no cursor column in the schema -- the connector tracks sync state internally using Google Drive's modifiedTime metadata on each file.
How it works for each file type:
| File Type | Incremental Unit | Behavior |
|---|---|---|
| CSV / TSV | Individual file | Only re-reads files modified since last sync. Unmodified files are skipped entirely. |
| Excel | Entire file | If any worksheet is modified (the file's modifiedTime changes), all worksheets in that file are re-read. |
| Google Sheets | Entire file | Same as Excel -- if the file is modified, all worksheets are re-read. |
Because Google Drive tracks modification time at the file level (not worksheet level), any edit to an Excel or Google Sheets file causes all worksheets from that file to be re-synced. For large spreadsheets, keep in mind that a small edit triggers a full re-read of the file.
Troubleshooting
Connection test fails with authentication error
Problem:
- "Failed to initialize Google Drive connector" error
- Authentication-related error messages
If using OAuth:
- Re-authorize: Click Authorize again and complete the Google sign-in flow. OAuth tokens can expire or be revoked.
- Check folder access: Ensure the Google account you signed in with has at least Viewer access to the configured folder.
- Google Sheets API: If syncing Google Sheets files, the Google Sheets API must be enabled in the GCP project backing the OAuth client.
If using Service Account:
- Verify the JSON key file:
- Ensure you uploaded the correct service account JSON key
- Check that the key has not been revoked in GCP Console
- Verify the key file is valid JSON (not corrupted)
- Check API access:
- Ensure the Google Drive API is enabled in your GCP project
- If syncing Google Sheets, ensure the Google Sheets API is also enabled
- Verify service account permissions:
- The service account needs no special IAM roles -- Drive folder sharing provides access
No objects found during schema discovery
Problem:
- Schema discovery returns zero tables
- "No objects discovered" message
Solutions:
- Verify folder access:
- OAuth: Ensure the Google account you authorized has access to the folder
- Service Account: Open the Drive folder and check that the service account email appears in the sharing settings with at least Viewer access
- Check file type matches:
- Ensure the File Type setting matches the actual files in the folder
- CSV files must have MIME type
text/csv; TSV files must havetext/tab-separated-valuesortext/plain - Excel files must be
.xlsxformat (.xlsis not supported)
- Check folder structure:
- For CSV/TSV: files must be directly in the root folder or in immediate child folders (not deeper)
- For Excel/Google Sheets: files are found recursively in all subfolders
- Check file pattern:
- If you set a File Pattern, verify it matches your file names
- Try removing the pattern to see all files
Missing columns or incorrect data types
Problem:
- Some columns are missing from discovered schema
- Data types are wrong (e.g., numbers detected as strings)
Solutions:
- Increase schema sample size:
- Set Schema Sample Size to a higher value (e.g., 20 or 50)
- More samples produce more accurate column unions and type inference
- Check header consistency:
- For CSV/TSV: files can have different headers, but more consistent headers produce cleaner schemas
- Column names are normalized (lowercased, special characters to underscores)
- Duplicate column names get numeric suffixes (
name,name_1,name_2)
- Verify encoding:
- Ensure the Encoding setting matches your files (default: UTF-8)
- Non-UTF-8 files may produce garbled column names
Incremental sync re-reads all files
Problem:
- Every sync reads all files instead of just modified ones
Solutions:
- Check initial sync completed:
- The first sync always reads all files
- Incremental behavior starts from the second sync onward
- Verify files are not being re-saved:
- Google Drive updates
modifiedTimewhenever a file is opened and saved, even without changes - Automated processes that touch files may trigger re-reads
- Google Drive updates
- Check lookback time:
- A high Lookback Time Seconds value expands the sync window, potentially including more files
Google Sheets export fails
Problem:
- "Cannot export Google Sheets file (possibly exceeds 10MB limit)" warning
- Google Sheets files not appearing in schema
Solutions:
- Check file size:
- Google limits Sheets exports to approximately 10 MB
- Large spreadsheets with many rows or complex formulas may exceed this limit
- Consider splitting large sheets into smaller files
- Check permissions:
- The Google Sheets API must be enabled in the GCP project (applies to both OAuth and service account authentication)
- Service Account: Verify the service account has access to the file via folder sharing
Excel file skipped
Problem:
- Excel files not appearing in schema
- "Exceeds 50MB limit" warning in logs
Solutions:
- Check file size:
- Excel files larger than 50 MB are skipped
- Consider splitting large workbooks into smaller files
- Check file format:
- Only
.xlsx(Office Open XML) format is supported - Legacy
.xlsfiles are not supported -- save as.xlsx
- Only
Support
Need help? Contact us at support@supa-flow.io