Aman Kumar
07/22/2025, 10:19 AMkapa.ai
07/22/2025, 10:19 AMpython pip install airbyte
[Source: PyAirbyte Getting Started](https://docs.airbyte.com/platform/using-airbyte/pyairbyte/getting-started) --- ### 2. Discover and Use Available Connectors You can list available connectors and select the one you need: python import airbyte as ab # List all available connectors ab.get_available_connectors()
[Source: PyAirbyte Demo](https://airbyte.com/tutorials/leverage-pyairbyte-with-this-demo) --- ### 3. Configure and Connect to a Data Source For example, to connect to GitHub: python source = ab.get_source( "source-github", install_if_missing=True, config={ "repositories": ["airbytehq/quickstarts"], "credentials": { "personal_access_token": ab.get_secret("GITHUB_PERSONAL_ACCESS_TOKEN"), }, }, ) source.check() # Verify the connection
[Source: Python ETL in Docker](https://airbyte.com/tutorials/python-etl-in-docker) You can similarly configure other sources by specifying their connector name and required config. --- ### 4. Select and Read Data Streams python # List available streams source.get_available_streams() # Select streams you want to sync source.set_streams(["pull_requests", "issues", "reviews", "stargazers"]) # Read data into a cache cache = ab.get_default_cache() result = source.read(cache=cache) # Convert to pandas DataFrame issues_df = cache["issues"].to_pandas()
[Source: Python ETL in Docker](https://airbyte.com/tutorials/python-etl-in-docker) --- ### 5. Connect to Multiple Data Sources Repeat the above steps for each data source you want to connect to, using the appropriate connector name and configuration for each. --- Summary: With PyAirbyte, you can connect to a wide variety of data sources by installing the relevant connector, configuring it, and reading data—all from Python. This approach works for any supported connector, and you can process or combine data from multiple sources as needed in your Python workflow. For more details and examples, see the [official PyAirbyte documentation](https://docs.airbyte.com/platform/using-airbyte/pyairbyte/getting-started).Aman Kumar
07/22/2025, 10:34 AMkapa.ai
07/22/2025, 10:34 AMkapa.ai
07/22/2025, 10:34 AMlist_connectors()
method in the airbyte
(PyAirbyte) library. According to the official documentation and tutorials, the correct way to list available connectors is to use the get_available_connectors()
method: python import airbyte as ab # List all available connectors ab.get_available_connectors()
This will return a list of all connectors you can use with PyAirbyte. Please update your code to use get_available_connectors()
instead of list_connectors()
. For more details, see the [PyAirbyte demo tutorial](https://airbyte.com/tutorials/leverage-pyairbyte-with-this-demo).Aman Kumar
07/22/2025, 10:35 AMkapa.ai
07/22/2025, 10:35 AMkapa.ai
07/22/2025, 10:35 AMpython from airbyte.caches import PostgresCacheConfig, PostgresCache pg_cache = PostgresCache( PostgresCacheConfig( host="localhost", port=5432, username="postgres", password="postgres", database="pyairbyte_demo" ) )
You can then use this cache to read data from a source and store it in PostgreSQL: python source.select_all_streams() read_result = source.read(cache=pg_cache)
[See full example here](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte) --- ### 2. Using Airbyte UI If you are using the Airbyte UI, you can add PostgreSQL as a destination by providing the following configuration: - Host: The hostname of your PostgreSQL server - Port: The port (default is 5432) - Username: Your PostgreSQL username - Password: Your PostgreSQL password - Database: The database name - Schema: The schema to use (e.g., public
) You can run a PostgreSQL instance locally with Docker: bash docker run --rm --name airbyte-zoom-db -e POSTGRES_PASSWORD=password \ -v airbyte_zoom_data:/var/lib/postgresql/data -p 2000:5432 -d postgres
Then, use these credentials in the Airbyte UI to set up the connection. After setup, Airbyte will test the connection and let you know if it’s successful. [See step-by-step guide here](https://airbyte.com/tutorials/visualize-the-time-spent-by-your-team-in-zoom-calls#step-1) --- ### 3. Permissions Make sure the PostgreSQL user you use has the necessary permissions: sql CREATE USER airbyte_user WITH PASSWORD '<password>'; GRANT CREATE, TEMPORARY ON DATABASE <database> TO airbyte_user;
[More details on required permissions](https://docs.airbyte.com/integrations/destinations/postgres#postgres) --- If you need to connect to PostgreSQL as a source (to extract data), you’ll need to provide similar connection details and ensure the user has read access to the relevant schemas and tables. For advanced use cases like Change Data Capture (CDC), additional setup is required (see [CDC tutorial](https://airbyte.com/tutorials/postgresql-change-data-capture-elt#step-2)). If you need a code example for a specific use case (e.g., using PyAirbyte as a source or destination), let me know!Aman Kumar
07/22/2025, 10:40 AMkapa.ai
07/22/2025, 10:40 AMkapa.ai
07/22/2025, 10:41 AMImportError: cannot import name 'PostgresCacheConfig' from 'airbyte.caches'
suggests that either the class name or its location in the PyAirbyte package may have changed, or it may not be available in your installed version. Based on the official [PyAirbyte demo](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte), the correct import statement is: python from airbyte.caches import PostgresCacheConfig, PostgresCache
If you are seeing this import error, possible reasons include: - PyAirbyte version mismatch: The PostgresCacheConfig
and PostgresCache
classes are available in the pre-release demo and may require the latest version of the airbyte
Python package. Make sure you have installed or upgraded to the latest version: bash pip install --upgrade airbyte
- Colab/Environment-specific features: The demo specifically mentions it is designed for Google Colab and may require certain system-level packages or a specific environment setup. If you have the latest version and still encounter the error, it is possible that the feature is not yet available in the stable release of PyAirbyte, or the API has changed since the demo was published. In that case, please refer to the [official PyAirbyte documentation](https://docs.airbyte.com/platform/using-airbyte/pyairbyte/getting-started) for the most up-to-date usage and available features. Summary: The knowledge sources indicate that PostgresCacheConfig
should be importable from airbyte.caches
, but if you encounter an import error, it is likely due to a version mismatch or an environment issue. Double-check your package version and environment as per the [official demo instructions](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte).Aman Kumar
07/22/2025, 10:53 AMkapa.ai
07/22/2025, 10:53 AMkapa.ai
07/22/2025, 10:54 AMPostgresCache
is by passing a PostgresCacheConfig
object, not by passing the connection parameters directly. Example: python from airbyte.caches import PostgresCache, PostgresCacheConfig pg_cache = PostgresCache( PostgresCacheConfig( host="localhost", port=5432, username="postgres", password="postgres", database="chartsai" ) )
[See the official demo for reference](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte). 2. How to configure the source: The config
parameter for ab.get_source("source-postgres", config=...)
should be a dictionary with the required Postgres source configuration, not the cache object. The cache is used as a destination for the read operation, not as the source config. Example config for a Postgres source: python source = ab.get_source( "source-postgres", config={ "host": "localhost", "port": 5432, "database": "chartsai", "username": "postgres", "password": "postgres", "ssl_mode": "disable" } )
3. How to use the cache: The cache (e.g., pg_cache
) is passed to the read()
method, not to the source config. --- ### Corrected Example python import airbyte as ab from airbyte.caches import PostgresCache, PostgresCacheConfig def list_connectors(): connectors = ab.get_available_connectors() print(connectors) def connect_to_postgres_cache(host, port, user, password, database): return PostgresCache( PostgresCacheConfig( host=host, port=port, username=user, password=password, database=database ) ) def main(): # list_connectors() pg_cache = connect_to_postgres_cache( "localhost", 5432, "postgres", "postgres", "chartsai" ) source = ab.get_source( "source-postgres", config={ "host": "localhost", "port": 5432, "database": "chartsai", "username": "postgres", "password": "postgres", "ssl_mode": "disable" } ) source.select_all_streams() read_result = source.read(cache=pg_cache) print(read_result) if __name__ == "__main__": main()
This approach matches the usage shown in the [PyAirbyte Postgres cache demo](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte). --- Summary: - Use PostgresCacheConfig
to configure your cache. - Pass a config dictionary to ab.get_source
for the source. - Pass the cache to the read()
method, not to the source config. If you follow this structure, your code should work as intended with PyAirbyte and PostgreSQL.Aman Kumar
07/22/2025, 11:01 AMsource-postgres
logs to file: /tmp/airbyte/logs/source-postgres/source-postgres-log-K0RWP74G6.log
ERROR: Something went wrong in the connector. See the logs for more details.
Traceback (most recent call last):
File "/Users/aman/personal/connectors/main.py", line 35, in <module>
main()
File "/Users/aman/personal/connectors/main.py", line 30, in main
source.select_all_streams()
File "/Users/aman/personal/connectors/.venv/lib/python3.12/site-packages/airbyte/sources/base.py", line 224, in select_all_streams
self._selected_stream_names = self.get_available_streams()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aman/personal/connectors/.venv/lib/python3.12/site-packages/airbyte/sources/base.py", line 306, in get_available_streams
return [s.name for s in self.discovered_catalog.streams]
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aman/personal/connectors/.venv/lib/python3.12/site-packages/airbyte/sources/base.py", line 412, in discovered_catalog
self._discovered_catalog = self._discover()
^^^^^^^^^^^^^^^^
File "/Users/aman/personal/connectors/.venv/lib/python3.12/site-packages/airbyte/sources/base.py", line 296, in _discover
for msg in self._execute(["discover", "--config", config_file]):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/aman/personal/connectors/.venv/lib/python3.12/site-packages/airbyte/_connector_base.py", line 467, in _execute
raise exc.AirbyteConnectorFailedError(
airbyte.exceptions.AirbyteConnectorFailedError: Connector failed. (AirbyteConnectorFailedError)
------------------------------------------------------------
AirbyteConnectorFailedError: Connector failed.
Please review the log file for more information.
Connector Name: 'source-postgres'
Exit Code: 1
Log file: /tmp/airbyte/logs/source-postgres/source-postgres-log-K0RWP74G6.logkapa.ai
07/22/2025, 11:01 AMkapa.ai
07/22/2025, 11:01 AM"internal_message":"java.lang.Exception: Verification error(s) occurred for DISCOVER. Errors: [$.ssl_mode: string found, object expected, $.ssl_mode: must be valid to one and only one schema, but 2 are valid with indexes '0, 1']"
indicates that the ssl_mode
parameter in your Postgres source configuration is being passed as a string, but the connector expects it as an object. What does this mean? - In recent versions of Airbyte and PyAirbyte, the ssl_mode
field for Postgres connectors should be an object, not a plain string. - For example, instead of: python "ssl_mode": "disable"
you should use: python "ssl_mode": {"mode": "disable"}
This is confirmed in the official documentation and tutorials, which show the correct format for the ssl_mode
field as an object with a mode
key, e.g.: python config={ "host": "localhost", "port": 5432, "database": "chartsai", "username": "postgres", "password": "postgres", "ssl_mode": {"mode": "disable"} }
[See example in the PyAirbyte Postgres tutorial](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte) How to fix: Update your source configuration to use the object format for ssl_mode
. For example: python source = ab.get_source( "source-postgres", config={ "host": "localhost", "port": 5432, "database": "chartsai", "username": "postgres", "password": "postgres", "ssl_mode": {"mode": "disable"} } )
Summary: - The error is due to passing ssl_mode
as a string instead of an object. - Change "ssl_mode": "disable"
to "ssl_mode": {"mode": "disable"}
in your configuration. - This should resolve the connector validation error and allow schema discovery to proceed. If you continue to have issues after making this change, please check the connector logs for more details or ensure you are using a compatible version of the Airbyte Python package and connector.Aman Kumar
07/22/2025, 11:03 AMkapa.ai
07/22/2025, 11:03 AMkapa.ai
07/22/2025, 11:03 AMorg.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
means that Airbyte (via the Postgres connector) is unable to connect to your PostgreSQL instance on localhost:5432
. This is a common issue and can be caused by several factors, as seen in multiple knowledge sources: Possible causes and troubleshooting steps: 1. PostgreSQL is not running or not accessible - Ensure that your PostgreSQL server is running and listening on the correct port (5432
). - You can check this by running: bash netstat -an | grep 5432
- Or by trying to connect manually using psql
or another database client. 2. Incorrect host or port - Double-check that you are using the correct hostname and port in your Airbyte configuration. - If you are running Airbyte in Docker or another containerized environment, localhost
may refer to the container itself, not your host machine. In such cases, you may need to use the host's IP address or set up proper networking between containers. This issue is discussed in [this forum post](https://discuss.airbyte.io/t/4251) and [this GitHub issue](https://github.com/airbytehq/airbyte/issues/24751). 3. PostgreSQL is not accepting TCP/IP connections - Make sure your PostgreSQL server is configured to accept TCP/IP connections. - Check the postgresql.conf
file for the listen_addresses
setting (should be '*'
or your host IP) and the pg_hba.conf
file for appropriate host-based authentication rules. 4. Permissions or firewall issues - Ensure there are no firewall rules blocking access to port 5432. 5. Database initialization or permissions - If you are using Airbyte's built-in Postgres (e.g., via Helm or Docker), make sure the database is initialized and the user has the correct permissions. - There are cases where the database is not ready yet or has permission issues, as seen in [this issue](https://github.com/airbytehq/airbyte/issues/54199). Summary of what to check: - Is PostgreSQL running and accessible on the expected host/port? - Are you using the correct host/port in your Airbyte config, especially if using Docker? - Is PostgreSQL configured to accept TCP/IP connections? - Are there any permission or firewall issues? - Is the database initialized and does the user have the right permissions? If you have checked all of the above and still encounter issues, please review the full connector logs for more details, as suggested in the error message and knowledge sources. If you are running Airbyte in a container, try using your machine's IP address instead of localhost
for the Postgres host, as this has resolved similar issues for other users [as described here](https://discuss.airbyte.io/t/4251) and [here](https://github.com/airbytehq/airbyte/issues/24751).Aman Kumar
07/22/2025, 11:10 AMkapa.ai
07/22/2025, 11:10 AMkapa.ai
07/22/2025, 11:10 AMconnection_string
parameter in the configuration. Instead, you must specify individual fields such as host
, port
, database
, username
, password
, and ssl_mode
in the config dictionary. The documentation and tutorials consistently show the configuration in this format: python config = { "host": "localhost", "port": 5432, "database": "chartsai", "username": "postgres", "password": "root", "ssl_mode": {"mode": "disable"} }
[See example in the PyAirbyte Postgres tutorial](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte) [See also the Airbyte Postgres source config reference](https://docs.airbyte.com/integrations/sources/postgres/cloud-sql-postgres#reference) For advanced JDBC options, you can use the jdbc_url_params
field to append additional parameters, but not the entire connection string. The documentation explicitly states: > To customize the JDBC connection beyond common options, specify additional supported JDBC URL parameters as key-value pairs separated by the symbol & in the JDBC URL Parameters (Advanced) field. > Note: Do not use the following keys in JDBC URL Params field as they will be overwritten by Airbyte: currentSchema
, user
, password
, ssl
, and sslmode
[Source](https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#advanced-custom-jdbc-connection-strings) Summary: You cannot use a connection_string
parameter in the config. You must provide the connection details as individual fields. If you need to add extra JDBC parameters, use the jdbc_url_params
field, but not for the entire connection string.Aman Kumar
07/22/2025, 11:10 AMkapa.ai
07/22/2025, 11:11 AMkapa.ai
07/22/2025, 11:11 AMpsql
or DBeaver) with the same credentials to confirm it is accessible. 2. Incorrect Host/Port in Configuration If you are running Airbyte in Docker, localhost
inside the container refers to the container itself, not your host machine. In this case, use host.docker.internal
(on Mac/Windows) or your host's IP address as the database host. This is a common issue and is discussed in several sources, including [this forum post](https://discuss.airbyte.io/t/3242) and [this GitHub issue](https://github.com/airbytehq/airbyte/issues/24751). 3. PostgreSQL Not Accepting TCP/IP Connections Check your PostgreSQL configuration (postgresql.conf
and pg_hba.conf
) to ensure it is set to accept TCP/IP connections and that the user has the necessary permissions. The error message "Connection refused" often means the server is not listening on the expected interface or port, or is blocked by a firewall. 4. SSL Mode Configuration Make sure you are passing the correct format for ssl_mode
in your config. It should be an object, not a string: python "ssl_mode": {"mode": "disable"}
Passing "ssl_mode": "disable"
will cause a validation error, as seen in the logs and [PyAirbyte tutorial](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte). 5. Database Initialization and Permissions If you are using Airbyte's built-in Postgres or a new database, ensure it is initialized and the user has the correct permissions. Permission or initialization errors can also prevent connections, as seen in [this issue](https://github.com/airbytehq/airbyte/issues/54199). 6. No Support for Connection String Airbyte does not support a single connection_string
parameter in the config. You must provide individual fields: host
, port
, database
, username
, password
, and ssl_mode
[see details](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte). Summary: - Double-check your host, port, and credentials. - Use the correct host value if running Airbyte in Docker. - Ensure your database is running and accepting connections. - Use the correct config format for ssl_mode
. - Do not use a connection_string
parameter. If you continue to have issues, please check the full connector logs for more details, as they often provide specific error messages about what is wrong with the connection attempt.Aman Kumar
07/22/2025, 11:16 AMkapa.ai
07/22/2025, 11:16 AMkapa.ai
07/22/2025, 11:16 AMtunnel_method
field in the Airbyte connector configuration specifies whether and how Airbyte should establish an SSH tunnel to connect to your database. This is useful when your database is not directly accessible (for example, it does not have a public IP address), and you need to connect through an intermediate server (a bastion or jump server). Supported values for tunnel_method
include: - No Tunnel: Direct connection to the database (no SSH tunneling). - SSH Key Authentication: Use an RSA private key to authenticate to the bastion server. - Password Authentication: Use a password to authenticate to the bastion server. The configuration for tunnel_method
is an object that may include additional fields such as the jump server host, port, username, and either a private key or password, depending on the authentication method you choose. Example: python "tunnel_method": { "tunnel_method": "SSH_KEY_AUTH", "tunnel_host": "<http://bastion.example.com|bastion.example.com>", "tunnel_port": 22, "tunnel_user": "myuser", "ssh_key": "-----BEGIN RSA PRIVATE KEY-----\n..." }
or for no tunnel: python "tunnel_method": { "tunnel_method": "NO_TUNNEL" }
This field is required if you need to connect through a bastion host, and optional if you can connect directly. For more details, see the [Airbyte documentation on SSH tunneling](https://docs.airbyte.com/integrations/destinations/postgres#naming-conventions) and [source configuration reference](https://docs.airbyte.com/integrations/sources/postgres/cloud-sql-postgres#reference).Aman Kumar
07/22/2025, 11:17 AMkapa.ai
07/22/2025, 11:17 AMkapa.ai
07/22/2025, 11:18 AMsql \dt;
This command will display all tables in the current schema. For example, after a sync, you might see output like: List of relations Schema | Name | Type | Owner --------+------------------------+-------+---------- public | _airbyte_raw_table_one | table | postgres public | table_one | table | postgres
You can open a Postgres shell to your destination database and run this command to see the tables created by Airbyte. This approach is shown in several Airbyte tutorials and documentation sources, such as the [incremental data synchronization tutorial](https://airbyte.com/tutorials/incremental-data-synchronization#step-7) and the [full refresh data synchronization tutorial](https://airbyte.com/tutorials/full-data-synchronization#step-7). If you are using PyAirbyte and have cached data in a Postgres database, you can also use SQL queries (e.g., via JupySQL or a database client) to list tables in your cache database, as shown in the [PyAirbyte Postgres cache demo](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte).Aman Kumar
07/22/2025, 11:18 AMkapa.ai
07/22/2025, 11:18 AMkapa.ai
07/22/2025, 11:18 AMpython streams = source.get_available_streams() print(streams)
This will output a list of available data streams (tables) for your configured source. You can use this method for any supported source, including PostgreSQL. [See example in the PyAirbyte GitHub chatbot tutorial](https://airbyte.com/tutorials/github-documentation-chatbot-using-airbyte-pgvector) --- ### 2. List Tables in a Postgres Cache If you are using a Postgres cache, you can use SQLAlchemy to inspect the tables: python engine = read_result.cache.get_sql_engine() # Use SQLAlchemy's inspect to list tables from sqlalchemy import inspect inspector = inspect(engine) print(inspector.get_table_names())
Or, if you are using JupySQL in a notebook: python %sql engine %%sql SELECT table_name FROM information_schema.tables WHERE table_schema = 'public';
[See example in the PyAirbyte Postgres cache demo](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte) --- Summary: - Use source.get_available_streams()
to list all streams (tables) available from the source. - If using a Postgres cache, use SQLAlchemy or SQL queries to list tables in the cache database. If you need to see the actual table names in the destination Postgres database after a sync, you can also connect with any SQL client and run \dt;
or a similar SQL query.Aman Kumar
07/22/2025, 11:22 AMkapa.ai
07/22/2025, 11:22 AMkapa.ai
07/22/2025, 11:22 AMairbyte
Python package (PyAirbyte). PyAirbyte is designed to let you use Airbyte connectors directly from Python, making it a good choice for Jupyter Notebooks, prototyping, or when you don't want to run a server or manage Docker containers. You can install it with pip install airbyte
and use it natively in your Python environment without Docker. However, if you want to run the full Airbyte platform (the UI, server, and orchestration features), Docker is required for the self-managed version. But for PyAirbyte specifically, Docker is not required—just Python and the package itself are enough [see official docs](https://docs.airbyte.com/platform/using-airbyte/pyairbyte/getting-started).Aman Kumar
07/22/2025, 11:23 AMkapa.ai
07/22/2025, 11:23 AMkapa.ai
07/22/2025, 11:23 AM"host": "host.docker.internal"
if you are running Airbyte (or its connectors) in a Docker container and your PostgreSQL database is running on your host machine. In this setup, localhost
inside the Docker container refers to the container itself, not your host machine. The special hostname host.docker.internal
allows containers to access services running on the host. This is a common solution for local development on Mac and Windows, as described in several Airbyte tutorials and forum posts. For example: > Alternatively, use a local Postgres database on your computer: use host.docker.internal
(if you are on a [Mac](https://docs.docker.com/desktop/mac/networking/)) as the host instead of localhost
when setting up the source and destination. [Source: Delta Lake on Databricks Lakehouse tutorial](https://airbyte.com/tutorials/load-data-into-delta-lake-on-databricks-lakehouse#step-1) If you are not running Airbyte or its connectors in Docker, and everything is running natively on your machine, you should be able to use "localhost"
as the host. But if you are using Docker, "host.docker.internal"
is needed for the container to reach your host's services.Aman Kumar
07/22/2025, 11:25 AMkapa.ai
07/22/2025, 11:25 AMkapa.ai
07/22/2025, 11:26 AMpython # Install JupySQL if needed %pip install --quiet jupysql # Load the JupySQL extension %load_ext sql # Get the SQLAlchemy engine from the cache engine = read_result.cache.get_sql_engine() # Pass the engine to JupySQL %sql engine # Now you can run SQL queries using cell magics %%sql SELECT * FROM products LIMIT 10;
[See example in the PyAirbyte demo](https://airbyte.com/tutorials/leverage-pyairbyte-with-this-demo) --- For Postgres cache: You can also use SQLAlchemy’s inspector or JupySQL to run queries: python engine = read_result.cache.get_sql_engine() # Using SQLAlchemy directly from sqlalchemy import text with engine.connect() as conn: result = conn.execute(text("SELECT * FROM users LIMIT 10")) for row in result: print(row)
Or, in a Jupyter notebook: python %sql engine %%sql SELECT * FROM users LIMIT 10;
[See example in the PyAirbyte Postgres cache demo](https://airbyte.com/tutorials/extract-data-from-postgres-using-pyairbyte) --- Summary: - Use the cache’s SQL engine to run queries. - In notebooks, JupySQL makes it easy to run SQL directly. - You can also use SQLAlchemy in standard Python scripts. If you need to run queries on the destination database (e.g., a Postgres destination after a sync), connect to it using your preferred SQL client or library and run your queries as usual.Aman Kumar
07/22/2025, 11:26 AMkapa.ai
07/22/2025, 11:26 AMkapa.ai
07/22/2025, 11:26 AMreadAnyDatabase
privilege for cluster-wide access. - [Detailed steps here](https://docs.airbyte.com/integrations/sources/mongodb-v2#quick-start). - For Self-hosted MongoDB: - Use the MongoDB shell to create a user with the read
role on the target database. --- ### 2. Get the MongoDB Connection String - For MongoDB Atlas: - Log in to the Atlas dashboard, click "Connect" on your cluster, select "Shell", and copy the connection string (without quotes) [see instructions](https://docs.airbyte.com/integrations/sources/mongodb-v2#step-2-discover-the-mongodb-cluster-connection-string). - For Self-hosted: - Refer to the [MongoDB connection string documentation](https://www.mongodb.com/docs/manual/reference/connection-string/#find-your-self-hosted-deployment-s-connection-string). --- ### 3. Configure the Airbyte MongoDB Source You need to provide the following configuration parameters: - Cluster Type: Atlas or self-hosted replica set - Connection String: The MongoDB URI you obtained above - Database Names: List of databases to sync - Username: The MongoDB user - Password: The user’s password - Authentication Source (Atlas only): Usually admin
(default) - Other Advanced Options: Such as schema enforcement, queue size, etc. Example configuration in Python (for PyAirbyte or Airbyte UI): python config = { "cluster_type": "atlas", # or "replica" "connection_string": "<your-mongodb-connection-string>", "database_names": ["your_database"], "username": "your_username", "password": "your_password", "auth_source": "admin" # Only for Atlas, optional for self-hosted # ...other optional parameters }
When using Airbyte, you would typically set this configuration in the UI or pass it to the connector in your workflow. The connector will test the connection upon creation to ensure it can access your MongoDB instance [see details](https://docs.airbyte.com/integrations/sources/mongodb-v2#step-3-configure-the-airbyte-mongodb-source). --- Note: - If your MongoDB instance is not directly accessible, you can use an SSH tunnel by configuring the SSH tunnel parameters in the connector setup [see SSH tunnel details](https://docs.airbyte.com/integrations/destinations/mongodb#getting-started-airbyte-open-source). - The knowledge sources do not provide a direct PyAirbyte code example for MongoDB, but the configuration principles are the same as described above. --- For more details on each configuration parameter, refer to the [MongoDB source connector documentation](https://docs.airbyte.com/integrations/sources/mongodb-v2#configuration-parameters).Aman Kumar
07/22/2025, 11:41 AMkapa.ai
07/22/2025, 11:41 AMkapa.ai
07/22/2025, 11:41 AMVerification error(s) occurred for DISCOVER. Errors: [$: required property 'database_config' not found]
indicates that your MongoDB source connector configuration is missing the required property database_config
. According to the knowledge sources, the MongoDB V2 connector expects certain fields in the configuration, and recent versions may require a database_config
object instead of the previous flat structure. What to do: - Make sure your configuration includes a database_config
property with the necessary connection details. - The knowledge sources do not provide the exact structure for database_config
, but based on the error and recent changes in Airbyte connectors, you should check the [official MongoDB V2 documentation](https://docs.airbyte.com/integrations/sources/mongodb-v2#configuration-parameters) for the latest required config format. Summary: Your error is due to a missing database_config
property in your connector configuration. Please update your config to include this property as required by the current version of the MongoDB V2 connector. If you are unsure of the exact structure, refer to the connector's documentation or the spec file for the connector version you are using. The knowledge sources do not provide the full required structure for database_config
, so you may need to consult the latest Airbyte documentation for MongoDB V2 for details.Aman Kumar
07/22/2025, 11:42 AMkapa.ai
07/22/2025, 11:42 AMkapa.ai
07/22/2025, 11:42 AMairbyte
Python package). The sources do describe the required configuration fields and mention that the MongoDB V2 connector expects a database_config
property, but they do not include a full Python code sample for this use case. Here’s what is clear from the knowledge sources: - The MongoDB V2 connector requires a database_config
property in the configuration. - The exact structure of database_config
is not detailed in the provided sources. - The error you received (required property 'database_config' not found
) confirms this requirement. - For the most accurate and up-to-date configuration, you should refer to the [official MongoDB V2 documentation](https://docs.airbyte.com/integrations/sources/mongodb-v2#configuration-parameters). Summary: I'm sorry, but there is not enough information in the knowledge sources to provide a full, working Python code example for connecting to MongoDB using PyAirbyte. Please consult the official Airbyte documentation for the exact configuration structure required.Aman Kumar
07/22/2025, 5:51 PMkapa.ai
07/22/2025, 5:51 PMkapa.ai
07/22/2025, 5:51 PMpython import airbyte as ab source = ab.get_source( "source-s3", config={ "streams": [ { "name": "your_stream_name", "format": {"filetype": "csv"}, "globs": ["**"] } ], "bucket": "your-bucket-name", "aws_access_key_id": "YOUR_ACCESS_KEY", "aws_secret_access_key": "YOUR_SECRET_KEY", "region_name": "us-east-1" } ) source.check()
[See: End-to-end RAG using S3, PyAirbyte, Pinecone, and Langchain](https://airbyte.com/tutorials/end-to-end-rag-using-s3-pyairbyte-pinecone-and-langchain) --- ### 2. S3 as a Destination Required fields: - `s3_bucket_name`: Name of your S3 bucket. - `s3_bucket_path`: Subdirectory in the bucket. - `s3_bucket_region`: AWS region. - Authentication: Either `access_key_id`/`secret_access_key` or role_arn
. Example config fields: json { "s3_bucket_name": "your-bucket", "s3_bucket_path": "your/path", "s3_bucket_region": "us-east-1", "access_key_id": "YOUR_ACCESS_KEY", "secret_access_key": "YOUR_SECRET_KEY" }
[See: S3 Destination Config Reference](https://docs.airbyte.com/integrations/destinations/s3#reference) --- Note: - For private buckets, ensure your IAM user or role has the necessary permissions (e.g., s3:GetObject
, s3:ListBucket
for sources; s3:PutObject
, s3:DeleteObject
for destinations) as shown in the [setup guides](https://docs.airbyte.com/integrations/sources/s3#step-1-set-up-s3) and [destination setup](https://docs.airbyte.com/integrations/destinations/s3#step-2-set-up-the-s3-destination-connector-in-airbyte). - If using Airbyte in Docker, ensure network access to S3. --- If you need more specific code or UI steps, let me know your use case (source or destination, Airbyte UI or Python)!Aman Kumar
07/22/2025, 6:02 PMkapa.ai
07/22/2025, 6:02 PMkapa.ai
07/22/2025, 6:02 PMAman Kumar
07/22/2025, 6:02 PMkapa.ai
07/22/2025, 6:03 PMkapa.ai
07/22/2025, 6:03 PM