Skip to main content

Google Drive Source

Connect Google Drive as a source to sync structured data from CSV, TSV, Excel (.xlsx), and Google Sheets files stored in Drive folders.

Prerequisites

Before you begin, ensure you have:

  • A Google Drive folder containing the files you want to sync

For OAuth authentication (recommended):

  • A Google account with access to the Drive folder

For Service Account authentication:

  • A Google Cloud Platform (GCP) project with the Google Drive API enabled
  • The Google Sheets API enabled if you plan to sync Google Sheets files
  • A GCP Service Account with a JSON key file
  • The Drive folder shared with the service account email as Viewer (or higher)

How Data is Organized

The Google Drive connector maps files and folders in Drive to tables and rows in your destination. The mapping depends on the file type you select.

CSV and TSV Files: Folders as Tables

For CSV and TSV files, each folder becomes a table. All files within a folder are treated as containing rows for the same table. This means:

  • CSV/TSV files in a folder can have slightly different columns; Supaflow unions discovered columns and fills missing values with null
  • Rows from all files in the folder are combined into one table
  • The table name is derived from the folder name (normalized to lowercase with underscores)
Google Drive Folder (configured)
├── customers/ --> Table: "customers"
│ ├── customers_jan.csv (rows from this file)
│ ├── customers_feb.csv (rows from this file)
│ └── customers_mar.csv (rows from this file)
├── orders/ --> Table: "orders"
│ ├── orders_2024.csv (rows from this file)
│ └── orders_2025.csv (rows from this file)
└── products.csv --> Table: "<root_folder_name>"
(files in root become a table
named after the root folder)

Key points:

  • Supaflow scans the configured folder and its immediate child folders (one level deep)
  • Files directly in the root folder are grouped into a single table named after the root folder
  • Each child folder with matching files becomes a separate table
  • During schema discovery, Supaflow samples up to Schema Sample Size files to infer column names and data types. Columns are unioned across sampled files, so files with slightly different columns are supported (missing columns become null).

Excel and Google Sheets: Worksheets as Tables

For Excel (.xlsx) and Google Sheets files, each worksheet (tab) becomes a table. The table name is derived from both the file name and the worksheet name:

Table name = <file_name>_<worksheet_name>

Both components are normalized (lowercase, spaces replaced with underscores, special characters removed).

Google Drive Folder (configured)
├── Q1 Report.xlsx
│ ├── Revenue --> Table: "q1_report_revenue"
│ └── Expenses --> Table: "q1_report_expenses"
├── subfolder/
│ └── Inventory.xlsx
│ └── Sheet1 --> Table: "inventory_sheet1"
└── Sales Pipeline (Google Sheets file)
├── Deals --> Table: "sales_pipeline_deals"
└── Leads --> Table: "sales_pipeline_leads"

Key points:

  • Supaflow searches the configured folder and all subfolders recursively for Excel and Google Sheets files
  • Each non-empty worksheet with at least a header row and one data row becomes a table
  • The first row of each worksheet is treated as the header row
  • Empty worksheets are automatically skipped
  • Excel files larger than 50 MB are skipped with a warning
  • Google Sheets files are exported to xlsx format for processing; files exceeding Google's export limit (~10 MB) are skipped with a warning

Table Name Normalization

All table names are normalized for compatibility with destination systems:

  • Converted to lowercase
  • Spaces and special characters replaced with underscores
  • Multiple consecutive underscores collapsed to one
  • Leading digits prefixed with underscore

Examples:

Source NameNormalized Table Name
Customer Data (folder)customer_data
Q1 Report.xlsx / Revenue (worksheet)q1_report_revenue
2024 Sales (folder)_2024_sales

System Fields

Every table includes two system fields added automatically by the connector:

FieldTypeDescription
_supa_file_namestringThe original file name from Google Drive
_supa_file_idstringThe unique Google Drive file ID

These fields allow you to trace every row back to its source file in your destination.


Google Cloud Setup

OAuth is simpler

If you use OAuth authentication, you can skip this entire section. Just click Authorize in the source configuration and sign in with your Google account. The steps below are only needed for Service Account authentication.

Step 1: Create a Service Account

  1. Go to the Google Cloud Console
  2. Select or create a project
  3. Navigate to IAM & Admin > Service Accounts
  4. Click Create Service Account
  5. Give it a name (e.g., "supaflow-drive-reader") and click Create
  6. Skip the optional role assignment steps and click Done

Step 2: Enable Required APIs

  1. Navigate to APIs & Services > Library
  2. Search for and enable:
    • Google Drive API
    • Google Sheets API (required only if syncing Google Sheets files)

Step 3: Create a JSON Key

  1. Go to IAM & Admin > Service Accounts
  2. Click on the service account you created
  3. Go to the Keys tab
  4. Click Add Key > Create new key > JSON
  5. Save the downloaded JSON file securely
Key Security

The JSON key file contains credentials for your service account. Store it securely and never commit it to version control. In Supaflow, the key is encrypted at rest.

Step 4: Share Your Drive Folder

  1. Open Google Drive and navigate to the folder containing your data files
  2. Right-click the folder and select Share
  3. Enter the service account email address (e.g., supaflow-drive-reader@your-project.iam.gserviceaccount.com)
  4. Set permission to Viewer
  5. Click Send

The service account now has read access to the folder and all files within it.


Configuration

In Supaflow, create a new Google Drive source with these settings:

Authentication

Authentication Method*

Select your authentication method
Options:

  • oauth - Sign in with your Google account and authorize Supaflow to read your Drive files. No GCP project setup required.
  • service_account - Authenticate using a GCP Service Account JSON key (see Google Cloud Setup below)

Default: oauth

OAuth

Authorize

Click Authorize to open the Google sign-in flow. Sign in with the Google account that has access to the Drive folder you want to sync and grant Supaflow read-only access. The authorization token is stored encrypted.

Service Account

Service Account Key*

Upload your Google Service Account JSON key file. The service account must have Google Drive API access (and Google Sheets API access if syncing Google Sheets).
Stored encrypted


Configuration

Folder URL*

Google Drive folder URL or folder ID. You can paste the full URL from your browser or just the folder ID.
Example: https://drive.google.com/drive/folders/1jPbXXTs_ZYmb...

File Type*

Type of files to read from the folder
Options:

  • CSV - Comma-separated values files
  • TSV - Tab-separated values files
  • EXCEL - Excel .xlsx files (each worksheet becomes a table)
  • GOOGLE_SHEETS - Google Sheets files (each worksheet becomes a table)

Default: CSV

File Pattern

Optional glob pattern to filter files by name (case-insensitive). Only files matching the pattern are included in schema discovery and sync.
Example: *.csv, sales_*.xlsx, report_??.csv


CSV Settings

These settings apply when File Type is CSV.
Delimiter applies only to CSV.
Quote Character, Encoding, Has Header Row, Skip Header Lines, Skip Footer Lines, and Null Sequence apply to both CSV and TSV.

Delimiter

Column delimiter character. For TSV files, the tab character is used automatically.
Default: ,

Quote Character

Character used to quote fields that contain the delimiter or newlines
Default: " (double-quote)

Encoding

File character encoding
Default: UTF-8

Has Header Row

Whether the first data row contains column headers. If disabled, columns are named col_1, col_2, etc.
Default: true

Skip Header Lines

Number of raw text lines to skip before the header/data rows. Useful for files with comment lines or metadata at the top.
Default: 0
Range: 0 to 100

Skip Footer Lines

Number of lines to skip at the end of the file. Useful for files with summary rows or footers.
Default: 0
Range: 0 to 100

Null Sequence

String value to treat as null (e.g., NULL, \N, N/A). Leave empty to disable null detection.


Advanced Settings

Schema Sample Size

Number of files to sample during schema inference. Higher values produce more accurate schemas but take longer.
Default: 5
Range: 1 to 50

Lookback Time Seconds

Seconds to subtract from the incremental sync lower bound to catch late-modified files. Use this if files in your Drive folder may be modified slightly after the sync window closes.
Default: 0
Range: 0 to 86400

Error Handling

How to handle file-level errors during sync
Options:

  • FAIL - Stop on the first file error (default)
  • SKIP - Skip failed files and continue with remaining files

Default: FAIL


Test & Save

After configuring all required properties, click Test & Save to verify your connection and save the source.

The connection test verifies that:

  • Your credentials are valid (OAuth token or service account key)
  • The Google Drive API is accessible
  • The configured folder exists and is readable

Incremental Sync

The Google Drive connector supports incremental sync using file modification timestamps. On each sync:

  1. Initial sync: All matching files in scope for the selected file type are read
  2. Subsequent syncs: Only files modified since the last sync are read

This is managed automatically by the connector. There is no cursor column in the schema -- the connector tracks sync state internally using Google Drive's modifiedTime metadata on each file.

How it works for each file type:

File TypeIncremental UnitBehavior
CSV / TSVIndividual fileOnly re-reads files modified since last sync. Unmodified files are skipped entirely.
ExcelEntire fileIf any worksheet is modified (the file's modifiedTime changes), all worksheets in that file are re-read.
Google SheetsEntire fileSame as Excel -- if the file is modified, all worksheets are re-read.
Full Table Re-Read for Spreadsheets

Because Google Drive tracks modification time at the file level (not worksheet level), any edit to an Excel or Google Sheets file causes all worksheets from that file to be re-synced. For large spreadsheets, keep in mind that a small edit triggers a full re-read of the file.


Troubleshooting

Connection test fails with authentication error

Problem:

  • "Failed to initialize Google Drive connector" error
  • Authentication-related error messages

If using OAuth:

  1. Re-authorize: Click Authorize again and complete the Google sign-in flow. OAuth tokens can expire or be revoked.
  2. Check folder access: Ensure the Google account you signed in with has at least Viewer access to the configured folder.
  3. Google Sheets API: If syncing Google Sheets files, the Google Sheets API must be enabled in the GCP project backing the OAuth client.

If using Service Account:

  1. Verify the JSON key file:
    • Ensure you uploaded the correct service account JSON key
    • Check that the key has not been revoked in GCP Console
    • Verify the key file is valid JSON (not corrupted)
  2. Check API access:
    • Ensure the Google Drive API is enabled in your GCP project
    • If syncing Google Sheets, ensure the Google Sheets API is also enabled
  3. Verify service account permissions:
    • The service account needs no special IAM roles -- Drive folder sharing provides access

No objects found during schema discovery

Problem:

  • Schema discovery returns zero tables
  • "No objects discovered" message

Solutions:

  1. Verify folder access:
    • OAuth: Ensure the Google account you authorized has access to the folder
    • Service Account: Open the Drive folder and check that the service account email appears in the sharing settings with at least Viewer access
  2. Check file type matches:
    • Ensure the File Type setting matches the actual files in the folder
    • CSV files must have MIME type text/csv; TSV files must have text/tab-separated-values or text/plain
    • Excel files must be .xlsx format (.xls is not supported)
  3. Check folder structure:
    • For CSV/TSV: files must be directly in the root folder or in immediate child folders (not deeper)
    • For Excel/Google Sheets: files are found recursively in all subfolders
  4. Check file pattern:
    • If you set a File Pattern, verify it matches your file names
    • Try removing the pattern to see all files

Missing columns or incorrect data types

Problem:

  • Some columns are missing from discovered schema
  • Data types are wrong (e.g., numbers detected as strings)

Solutions:

  1. Increase schema sample size:
    • Set Schema Sample Size to a higher value (e.g., 20 or 50)
    • More samples produce more accurate column unions and type inference
  2. Check header consistency:
    • For CSV/TSV: files can have different headers, but more consistent headers produce cleaner schemas
    • Column names are normalized (lowercased, special characters to underscores)
    • Duplicate column names get numeric suffixes (name, name_1, name_2)
  3. Verify encoding:
    • Ensure the Encoding setting matches your files (default: UTF-8)
    • Non-UTF-8 files may produce garbled column names

Incremental sync re-reads all files

Problem:

  • Every sync reads all files instead of just modified ones

Solutions:

  1. Check initial sync completed:
    • The first sync always reads all files
    • Incremental behavior starts from the second sync onward
  2. Verify files are not being re-saved:
    • Google Drive updates modifiedTime whenever a file is opened and saved, even without changes
    • Automated processes that touch files may trigger re-reads
  3. Check lookback time:
    • A high Lookback Time Seconds value expands the sync window, potentially including more files

Google Sheets export fails

Problem:

  • "Cannot export Google Sheets file (possibly exceeds 10MB limit)" warning
  • Google Sheets files not appearing in schema

Solutions:

  1. Check file size:
    • Google limits Sheets exports to approximately 10 MB
    • Large spreadsheets with many rows or complex formulas may exceed this limit
    • Consider splitting large sheets into smaller files
  2. Check permissions:
    • The Google Sheets API must be enabled in the GCP project (applies to both OAuth and service account authentication)
    • Service Account: Verify the service account has access to the file via folder sharing

Excel file skipped

Problem:

  • Excel files not appearing in schema
  • "Exceeds 50MB limit" warning in logs

Solutions:

  1. Check file size:
    • Excel files larger than 50 MB are skipped
    • Consider splitting large workbooks into smaller files
  2. Check file format:
    • Only .xlsx (Office Open XML) format is supported
    • Legacy .xls files are not supported -- save as .xlsx

Support

Need help? Contact us at support@supa-flow.io