GitHub Source
Connect GitHub as a source to sync activity from one or many repositories into your warehouse. One source can cover a specific repository, every repository under an owner, or any repository whose name matches a prefix.
For an overview of capabilities and use cases, see the GitHub connector page. To run pipelines natively inside Snowflake, see Snowflake Native ETL.
Prerequisites
Before you begin, ensure you have:
- A GitHub personal access token from github.com/settings/tokens
- Read access to every repository you want to sync -- public repositories work with a basic token; private repositories need the
reposcope - For rules that list an organization's repositories (for example
my-org/*), the token also needsread:org
Supported Objects
| Object | Sync Mode | Description |
|---|---|---|
| Repository Events | Incremental | Push, Issues, Pull Request, Create, Watch, and other event types from each repository's event feed. |
| Issues | Full Refresh | Every issue on the repository with up to 100 reactions and 100 comments per issue. Open and closed are both returned. |
| Pull Requests | Full Refresh | Every pull request with up to 100 reactions and 100 comments per PR. |
| Stargazers | Full Refresh | Users who starred the repository, with the timestamp they added the star. |
Not Yet Supported
The connector does not sync commits as a separate entity, releases, tags, workflows and Actions runs, check runs, discussions, milestones, teams, organizations as an entity, users as an entity, branches, gists, packages, security advisories, or deployment statuses.
GitHub Enterprise Server (self-hosted) is not supported -- the connector targets github.com only.
Contact us with the objects you need -- expansion is prioritized by customer demand.
Incremental Sync
Repository Events sync incrementally: only new events since the last successful run are fetched. Issues, Pull Requests, and Stargazers run full refresh (see Supported Objects).
Each repository is tracked independently. If you add a repository to your rules later, it starts from scratch rather than inheriting progress from other repositories, so older events are not silently skipped.
Two GitHub-side limits are worth knowing. GitHub returns only the 300 most recent events per repository, and the events feed covers only the past 30 days. Very active repositories need to sync often enough that their 300-event window does not roll past the cursor between runs; dormant repositories return an empty event stream.
Authentication
GitHub uses a personal access token to authenticate. Classic tokens and fine-grained tokens both work, and no OAuth setup is involved.
Create a Personal Access Token
- Sign in to GitHub with the account you want the connector to use
- Go to github.com/settings/tokens
- Click Generate new token and choose classic or fine-grained
- Copy the token once -- GitHub will not show it again
Permissions
The connector is read-only. Grant only what the token needs:
- Public repositories: no extra scopes beyond a basic token. A fine-grained token with
Public repositories (read-only)also works. - Private repositories: the
reposcope on a classic token, or equivalent repository-level read permissions on a fine-grained token scoped to the repositories you want to sync. - Rules that list an organization's repositories (
my-org/*,my-org/prefix*): also grantread:orgso the connector can enumerate the organization's repositories.
Fine-grained tokens can only see the repositories on their access list. A wildcard like my-org/* resolved against a fine-grained token only expands over the repositories the token has access to.
For long-term stability, use a machine account rather than an individual user's token. If the individual leaves and their account is deactivated, the pipeline breaks.
Configuration
In Supaflow, create a new GitHub source with these settings:
Repositories*Comma-separated list of rules that define the repositories to sync. Three forms are supported:
owner/repo-- a specific repositoryowner/*-- every repository owned by that user or organizationowner/prefix*-- repositories whose name begins withprefix
You can mix forms in a single value. Entries that start with # are treated as comments and skipped.
Example: octocat/hello-world, my-org/*, my-org/backend-*
The token you created at github.com/settings/tokens.
Stored encrypted
How often to re-discover the GitHub schema before running.
Options:
- 0 -- refresh before every pipeline execution
- -1 -- disable automatic schema refresh
- Positive value -- refresh interval in minutes (e.g., 60 = hourly, 1440 = daily)
Default: 60 (hourly)
Test & Save
After configuring your access token, click Test & Save to verify your connection and save the source.
Rate Limiting
GitHub applies API rate limits to its REST and GraphQL APIs. Supaflow handles transient rate limiting automatically with retry and backoff, but very large syncs may take longer or need off-peak scheduling. See GitHub's rate limiting guide for current limits.
Schema Evolution
The connector's schema ships with each Supaflow connector release.
- New fields added by GitHub appear after the connector is updated to a newer release
- New event types on the Repository Events feed pass through automatically -- the event's type is preserved and the event body is stored alongside it
- New top-level resources (releases, workflows, and so on) require a connector update and are not discovered automatically
Troubleshooting
Authentication failed
Problem:
- "GitHub rejected the access token" on Test & Save
Solutions:
- Confirm the token has not been revoked or expired at github.com/settings/tokens.
- Regenerate the token and paste the new value -- GitHub does not show tokens again after creation.
- For private repositories, confirm the classic token has the
reposcope, or the fine-grained token has read access on every repository your rules resolve to. - For wildcard or prefix rules, confirm the token has
read:orgso the organization's repositories can be listed.
Repository not found or inaccessible
Problem:
- Test & Save reports that a rule resolved to a repository that does not exist or is not visible
Solutions:
- Check the exact spelling of the
owner/repovalue. Lookup is case-insensitive, but typos are not forgiven. - For private repositories, confirm the token belongs to an account with read access.
- If the owner is a GitHub organization that requires SSO, authorize the token for that organization from github.com/settings/tokens via Configure SSO.
Wildcard rule resolves to zero repositories
Problem:
- A rule like
my-org/*ormy-org/backend-*matches no repositories even though you expected results
Solutions:
- For fine-grained tokens, confirm the token's repository access list includes the repositories you want. Fine-grained tokens cannot see anything outside their configured list.
- Classic tokens need
read:orgto list an organization's repositories. Without it, only personal repositories owned by the token's user are visible. - Check that the owner exists on GitHub.
- Prefix matching is case-insensitive but matches from the start of the name.
my-org/backend-*matchesmy-org/backend-apibut notmy-org/api-backend.
Repository Events return fewer rows than expected
Problem:
- A busy repository's event feed has gaps, or older events are missing from the destination
Solutions:
- GitHub exposes only the 300 most recent events per repository. If a repository produces more than 300 events between syncs, the older portion rolls off before the connector can read it. Increase the sync frequency.
- GitHub's events feed covers only the past 30 days. Events older than that are not returned by the API regardless of what appears in the GitHub web UI.
Rate limit exhausted
Problem:
- The sync fails with a rate-limit message, or a long historical sync stalls
Solutions:
- Wait for GitHub's hourly budget to refill and re-run the sync.
- Narrow your rules: replace
owner/*with specific repositories or a prefix pattern to reduce how many repositories are synced. - Split large pipelines into multiple smaller ones scheduled at different times.
GitHub Enterprise Server site
Problem:
- Your GitHub is self-hosted at github.yourcompany.com, not on github.com
Solutions:
- The connector targets github.com only. GitHub Enterprise Server and GitHub Enterprise Cloud with a custom hostname are not supported.
- Contact support if Enterprise Server support is important for your evaluation.
Support
Need help? Contact us at support@supa-flow.io