Skip to main content

GitHub Source

Connect GitHub as a source to sync activity from one or many repositories into your warehouse. One source can cover a specific repository, every repository under an owner, or any repository whose name matches a prefix.

For an overview of capabilities and use cases, see the GitHub connector page. To run pipelines natively inside Snowflake, see Snowflake Native ETL.

Prerequisites

Before you begin, ensure you have:

  • A GitHub personal access token from github.com/settings/tokens
  • Read access to every repository you want to sync -- public repositories work with a basic token; private repositories need the repo scope
  • For rules that list an organization's repositories (for example my-org/*), the token also needs read:org

Supported Objects

ObjectSync ModeDescription
Repository EventsIncrementalPush, Issues, Pull Request, Create, Watch, and other event types from each repository's event feed.
IssuesFull RefreshEvery issue on the repository with up to 100 reactions and 100 comments per issue. Open and closed are both returned.
Pull RequestsFull RefreshEvery pull request with up to 100 reactions and 100 comments per PR.
StargazersFull RefreshUsers who starred the repository, with the timestamp they added the star.

Not Yet Supported

The connector does not sync commits as a separate entity, releases, tags, workflows and Actions runs, check runs, discussions, milestones, teams, organizations as an entity, users as an entity, branches, gists, packages, security advisories, or deployment statuses.

GitHub Enterprise Server (self-hosted) is not supported -- the connector targets github.com only.

Contact us with the objects you need -- expansion is prioritized by customer demand.

Incremental Sync

Repository Events sync incrementally: only new events since the last successful run are fetched. Issues, Pull Requests, and Stargazers run full refresh (see Supported Objects).

Each repository is tracked independently. If you add a repository to your rules later, it starts from scratch rather than inheriting progress from other repositories, so older events are not silently skipped.

Two GitHub-side limits are worth knowing. GitHub returns only the 300 most recent events per repository, and the events feed covers only the past 30 days. Very active repositories need to sync often enough that their 300-event window does not roll past the cursor between runs; dormant repositories return an empty event stream.

Authentication

GitHub uses a personal access token to authenticate. Classic tokens and fine-grained tokens both work, and no OAuth setup is involved.

Create a Personal Access Token

  1. Sign in to GitHub with the account you want the connector to use
  2. Go to github.com/settings/tokens
  3. Click Generate new token and choose classic or fine-grained
  4. Copy the token once -- GitHub will not show it again

Permissions

The connector is read-only. Grant only what the token needs:

  • Public repositories: no extra scopes beyond a basic token. A fine-grained token with Public repositories (read-only) also works.
  • Private repositories: the repo scope on a classic token, or equivalent repository-level read permissions on a fine-grained token scoped to the repositories you want to sync.
  • Rules that list an organization's repositories (my-org/*, my-org/prefix*): also grant read:org so the connector can enumerate the organization's repositories.

Fine-grained tokens can only see the repositories on their access list. A wildcard like my-org/* resolved against a fine-grained token only expands over the repositories the token has access to.

For long-term stability, use a machine account rather than an individual user's token. If the individual leaves and their account is deactivated, the pipeline breaks.

Configuration

In Supaflow, create a new GitHub source with these settings:

Repositories*

Comma-separated list of rules that define the repositories to sync. Three forms are supported:

  • owner/repo -- a specific repository
  • owner/* -- every repository owned by that user or organization
  • owner/prefix* -- repositories whose name begins with prefix

You can mix forms in a single value. Entries that start with # are treated as comments and skipped.
Example: octocat/hello-world, my-org/*, my-org/backend-*

Personal Access Token*

The token you created at github.com/settings/tokens.
Stored encrypted

Schema Refresh Interval

How often to re-discover the GitHub schema before running.
Options:

  • 0 -- refresh before every pipeline execution
  • -1 -- disable automatic schema refresh
  • Positive value -- refresh interval in minutes (e.g., 60 = hourly, 1440 = daily)

Default: 60 (hourly)

Test & Save

After configuring your access token, click Test & Save to verify your connection and save the source.

Rate Limiting

GitHub applies API rate limits to its REST and GraphQL APIs. Supaflow handles transient rate limiting automatically with retry and backoff, but very large syncs may take longer or need off-peak scheduling. See GitHub's rate limiting guide for current limits.

Schema Evolution

The connector's schema ships with each Supaflow connector release.

  • New fields added by GitHub appear after the connector is updated to a newer release
  • New event types on the Repository Events feed pass through automatically -- the event's type is preserved and the event body is stored alongside it
  • New top-level resources (releases, workflows, and so on) require a connector update and are not discovered automatically

Troubleshooting

Authentication failed

Problem:

  • "GitHub rejected the access token" on Test & Save

Solutions:

  1. Confirm the token has not been revoked or expired at github.com/settings/tokens.
  2. Regenerate the token and paste the new value -- GitHub does not show tokens again after creation.
  3. For private repositories, confirm the classic token has the repo scope, or the fine-grained token has read access on every repository your rules resolve to.
  4. For wildcard or prefix rules, confirm the token has read:org so the organization's repositories can be listed.

Repository not found or inaccessible

Problem:

  • Test & Save reports that a rule resolved to a repository that does not exist or is not visible

Solutions:

  1. Check the exact spelling of the owner/repo value. Lookup is case-insensitive, but typos are not forgiven.
  2. For private repositories, confirm the token belongs to an account with read access.
  3. If the owner is a GitHub organization that requires SSO, authorize the token for that organization from github.com/settings/tokens via Configure SSO.

Wildcard rule resolves to zero repositories

Problem:

  • A rule like my-org/* or my-org/backend-* matches no repositories even though you expected results

Solutions:

  1. For fine-grained tokens, confirm the token's repository access list includes the repositories you want. Fine-grained tokens cannot see anything outside their configured list.
  2. Classic tokens need read:org to list an organization's repositories. Without it, only personal repositories owned by the token's user are visible.
  3. Check that the owner exists on GitHub.
  4. Prefix matching is case-insensitive but matches from the start of the name. my-org/backend-* matches my-org/backend-api but not my-org/api-backend.

Repository Events return fewer rows than expected

Problem:

  • A busy repository's event feed has gaps, or older events are missing from the destination

Solutions:

  1. GitHub exposes only the 300 most recent events per repository. If a repository produces more than 300 events between syncs, the older portion rolls off before the connector can read it. Increase the sync frequency.
  2. GitHub's events feed covers only the past 30 days. Events older than that are not returned by the API regardless of what appears in the GitHub web UI.

Rate limit exhausted

Problem:

  • The sync fails with a rate-limit message, or a long historical sync stalls

Solutions:

  1. Wait for GitHub's hourly budget to refill and re-run the sync.
  2. Narrow your rules: replace owner/* with specific repositories or a prefix pattern to reduce how many repositories are synced.
  3. Split large pipelines into multiple smaller ones scheduled at different times.

GitHub Enterprise Server site

Problem:

  • Your GitHub is self-hosted at github.yourcompany.com, not on github.com

Solutions:

  1. The connector targets github.com only. GitHub Enterprise Server and GitHub Enterprise Cloud with a custom hostname are not supported.
  2. Contact support if Enterprise Server support is important for your evaluation.

Support

Need help? Contact us at support@supa-flow.io