Skip to main content

How to Sync Google Drive Files to Snowflake with Supaflow

· 8 min read
Puneet Gupta
Founder, Supaflow

Got files sitting in Google Drive that you need in Snowflake? Whether they are CSVs, Excel workbooks, TSVs, or Google Sheets, Supaflow's Google Drive connector can sync them into Snowflake tables automatically -- with schema discovery, incremental sync, and schema evolution built in.

This guide walks through the full setup end to end using CSV files, but the same workflow applies to all supported file types.

What the Google Drive Connector Does

The Google Drive connector reads structured data from a Drive folder and loads it into your destination. It supports four file types:

File TypeHow Tables are CreatedScan Depth
CSVEach folder becomes a table; all CSVs in a folder are combined as rowsRoot + one level of subfolders
TSVSame as CSV (tab-delimited)Root + one level of subfolders
Excel (.xlsx)Each worksheet becomes a table (named <file>_<sheet>)Recursive through all subfolders
Google SheetsEach worksheet becomes a table (named <file>_<sheet>)Recursive through all subfolders

For CSV and TSV, files within the same folder can even have slightly different columns -- Supaflow unions them and fills missing values with null. For Excel and Google Sheets, each non-empty worksheet with a header row becomes its own table.

Every row includes two system fields (_supa_file_name and _supa_file_id) so you can always trace data back to its source file.

Incremental sync is supported out of the box. After the initial full sync, subsequent runs only pick up files that have been modified since the last run -- no re-processing the entire folder each time.

Prerequisites

  • A Supaflow account (sign up here)
  • A Google Drive folder with data files (CSV, TSV, Excel, or Google Sheets) -- see the Google Drive source docs for full setup details
  • A Snowflake account with a warehouse, database, and schema ready to receive data -- see the Snowflake destination docs for connection options

Step 1: Prepare Your Data in Google Drive

Create a folder structure in Google Drive for Supaflow to discover. For CSV and TSV files, each folder becomes a table. For Excel and Google Sheets, each worksheet becomes a table.

For this walkthrough, we will use CSV files. Create a folder called accounts and add a file called account1.csv with content like this:

Name,NumEmp,Industry
Company A,25,Heavy Industry
Company B,36,Mining
Company C,47,Aerospace
Company D,58,Fiber Optics

You can add as many folders and files as you need. Supaflow discovers them all automatically. If you have Excel or Google Sheets files instead, the same workflow applies -- just select the matching file type when creating the source.

Step 2: Create a Snowflake Destination

Before creating a pipeline, you need to configure where the data will land. Go to the Destinations page in Supaflow.

Destinations page

Click Create Destination and select Snowflake. Fill in your Snowflake connection details:

  1. Authentication Type -- select basic from the dropdown first
  2. Username and Password -- your Snowflake credentials
  3. Account Identifier -- your Snowflake account URL (e.g., XXXXXXX-YYYYYYY.snowflakecomputing.com)
  4. Warehouse, Database, and Schema -- where the data will be loaded

Click Test & Save to verify the connection.

Create destination form

Step 3: Create a Google Drive Source

Next, set up the Google Drive source so Supaflow can read your files. Go to the Sources page and click Create Source. Select Google Drive from the list of available source types.

Select source type

Fill in the source configuration:

  1. Source Name -- give it a descriptive name (e.g., "Google Drive")
  2. Authentication Method -- select oauth and click Authorize to sign in with your Google account and grant Supaflow read access to your Drive
  3. Folder URL -- paste the full URL of the Google Drive folder you want to sync (e.g., https://drive.google.com/drive/folders/...)
  4. File Type -- select the format of your files: CSV, TSV, EXCEL, or GOOGLE_SHEETS
  5. CSV/TSV Settings (shown when CSV or TSV is selected) -- configure delimiter, quote character, encoding, and whether the file has a header row. The defaults work for most standard files.
  6. File Pattern (optional) -- a glob pattern to filter files by name (e.g., sales_*.csv). Leave empty to include all matching files.

Click Test & Save to verify the connection. Supaflow will confirm it can access the folder and read files.

Create source form

For the full list of supported file types and configuration options, see the Google Drive source docs.

Step 4: Open Your Project

Navigate to your project that has the Google Drive source configured. Click Open to enter the project.

Project card

Step 5: Create a Pipeline

Inside the project, click Create Pipeline to launch the pipeline wizard.

No pipelines yet

Choose Source

Select Google Drive from the list of available sources and click Continue.

Choose source

Configure Pipeline Settings

Set the sync behavior for your pipeline:

  • Ingestion Mode: Historical + Incremental -- does a full sync on the first run, then only picks up changes on subsequent runs
  • Load Mode: Merge -- inserts new records and updates existing ones based on primary key
  • Schema Evolution Mode: Allow All Changes -- automatically propagates column additions and type changes to Snowflake

Configure pipeline

Choose Objects to Sync

Select which tables (folders) and fields (columns) to sync. Supaflow auto-discovers your Google Drive folder structure and shows all available objects.

Tip: If you recently added a folder to Google Drive and it does not appear in the list, click Refresh Schema in the top-right corner. Supaflow will re-scan your Google Drive and pick up changes. Note that for CSV/TSV, Supaflow scans the root folder and its immediate child folders (one level deep). For Excel and Google Sheets, it scans all subfolders recursively.

Choose objects to sync

Review and Save

Review the pipeline summary. Supaflow auto-generates a pipeline name and destination schema. Adjust any settings if needed, then click Create Pipeline.

Review and save

Step 6: Run the Pipeline

The pipeline is now active. Click Sync Now to trigger the first run.

Pipeline settings

Step 7: Monitor Progress

Switch to the Jobs tab to watch the pipeline run in real time.

Job in progress

Click into the job to see per-object progress -- ingested rows, loaded rows, duration, and status for each table.

Job detail - running

Once all objects complete, the job shows a Completed status with a full summary.

Job completed

Step 8: Verify in Snowflake

Log into Snowflake, navigate to Catalog > Database Explorer, and browse to the database and schema you configured.

Snowflake catalog

Select any table to preview the data. You should see the rows from your CSV files loaded and ready to query.

Snowflake data preview

What Happens Next

After the initial sync, Supaflow handles ongoing changes automatically:

  • New or modified files are picked up on the next incremental run (only changed files are re-read, not the entire folder)
  • New folders appear as new tables after a schema refresh
  • Schema changes (new columns, type changes) are propagated to Snowflake based on your schema evolution settings
  • CSV/TSV files with slightly different columns across the same folder are unioned automatically -- missing values become null
  • Excel and Google Sheets edits trigger a full re-read of all worksheets in that file, since Drive tracks modification at the file level

Every synced row includes _supa_file_name and _supa_file_id fields, so you can always trace data back to its source file in Google Drive.

You can schedule runs on a cadence or trigger them manually with Sync Now whenever you need fresh data. See the ingestion pipelines docs for more on pipeline configuration, scheduling, and sync modes.

Get Started

Sign up at app.supa-flow.io and connect your Google Drive in minutes. For full details, check out the Google Drive source docs, the Snowflake destination docs, and the pipeline configuration guide. If you have questions, reach out at support@supa-flow.io.