<@U04SE163WC8> How to Pull content of files from G...
# ask-ai
a
@kapa.ai How to Pull content of files from Github Repo?
u
To pull content of files from a GitHub repository using Airbyte, you can use the GitHub source connector. Here's how you can set it up: 1. First, you'll need to configure the GitHub source in Airbyte. This involves providing your GitHub credentials and specifying the repository you want to pull data from. Here's an example of how to set up the GitHub source using PyAirbyte:
Copy code
python import airbyte as ab  source = ab.get_source(     "source-github",     config={         "credentials": {             "personal_access_token": "your_personal_access_token"         },         "repositories": ["your_github_username/your_repository_ID"]     } )
[Source Setup : Github with PyAirbyte](https://airbyte.com/tutorials/end-to-end-rag-using-github-pyairbyte-and-chroma-vector-db) 2. After configuring the source, you can select the specific streams you want to sync. The GitHub connector supports various streams, including commits, issues, pull requests, and more. For example, to select the commits stream:
Copy code
python source.select_streams(["commits"])
3. Then, you can read the data from the selected streams:
Copy code
python cache = ab.get_default_cache() result = source.read(cache=cache)  commits_details = [doc for doc in result["commits"].to_documents()]
[Source Setup : Github with PyAirbyte](https://airbyte.com/tutorials/end-to-end-rag-using-github-pyairbyte-and-chroma-vector-db) Alternatively, you can use the
AirbyteLoader
from the
langchain_airbyte
package:
Copy code
python from langchain_airbyte import AirbyteLoader from langchain.schema import Document  loader = AirbyteLoader(     source="source-github",     stream="commits",     config={         "credentials": {             "personal_access_token": "your_personal_access_token"         },         "repositories": ["your_username/repository_name"]     } )  docs = loader.load() docs = [Document(page_content=record.page_content, metadata=record.metadata) for record in docs]
[Load Data using AirbyteLoader](https://airbyte.com/tutorials/illustrating-the-usage-of-langchain-airbyte-package) Remember to replace
"your_personal_access_token"
,
"your_github_username/your_repository_ID"
, and
"your_username/repository_name"
with your actual GitHub credentials and repository information. Note that the GitHub connector allows you to pull various types of data from a repository, including commits, issues, pull requests, and more. The exact content you can pull depends on the streams you select and the permissions associated with your GitHub token.