https://linen.dev logo
Join Slack
Powered by
# pyairbyte
  • m

    Mauricio Pérez

    05/16/2025, 4:22 PM
    Hi team, I'm currently trying to connect to the
    source-airtable
    connector using
    pyairbyte
    , but I’m running into an issue. Here are the details: • Airbyte version:
    0.24.2
    • Python version:
    3.10.17
    • Error Message:
    Copy code
    ERROR: Error starting the sync. This could be due to an invalid configuration or catalog.
    Please contact Support for assistance.
    Error: Validation against json schema defined in declarative_component_schema.yaml schema failed
    
    AirbyteConnectorMissingSpecError: Connector did not return a spec.
      Please review the log file for more information.
      Connector Name: 'source-airtable'
    This is the snippet it's throwing the issue:
    Copy code
    import airbyte as ab            
    
    airtable = ab.get_source(
        "source-airtable",
    )
    
    credentials = {
        "credentials": {
            "auth_method": "api_key",      
            "api_key":     "pat"
        }
    }
    
    airtable.set_config(config=credentials)
    a
    • 2
    • 19
  • a

    AJ Steers (Airbyte)

    05/16/2025, 7:41 PM
    🎉 Now Available: PyAirbyte 0.25.0 This release adds the capability to set cursor and primary key overrides - which is helpful for DB-type sources. Specifically,
    Source
    objects now support the following methods: •
    set_cursor_key()
    - Overrides the cursor key for one stream. •
    set_cursor_keys()
    - Overrides the cursor key for any number of streams. •
    set_primary_key()
    - Overrides the primary key for one stream. •
    set_primary_keys()
    - Overrides the primary key for any number of stream. See the updated API docs for more information. Thanks to @Krishna for his help testing the new feature, and thanks also to @Mateusz Czarkowski for "upvoting" the issue here in the channel for our prioritization. 🙏
  • r

    Rad Extrem

    05/19/2025, 2:40 AM
    Hello Team, While experimenting with VENV-based installation for Python connectors, I was wondering- is there a way to pre-bake a connector to avoid runtime installation, possibly via a Dockerfile, aside from using
    get_source
    or
    get_destination
    ? Also, would using a
    local_executable
    be a better approach for this use case? If so, are there any established steps or best practices for building such executables for connectors?
    a
    a
    • 3
    • 8
  • b

    Ben Wilen

    05/19/2025, 5:14 PM
    👋 Hey all, I'm using the PyAirbyte
    SnowflakeCache
    and currently am running into this auth error with a 5 hour sync:
    Copy code
    sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 390114 (08001): None: Authentication token has expired.  The user must authenticate again.
    (Background on this error at: <https://sqlalche.me/e/20/f405>)
    Assuming it's because snowflake has a default timeout of 4 hours, does anyone have a fix for this? I don't see a way via PyAirbyte to specify
    "client_session_keep_alive": True
    a
    • 2
    • 14
  • b

    Ben Wilen

    05/27/2025, 7:57 PM
    👋 Hey team, question about a possible discrepancy with SqlStateWriter - I see it uses
    table_name=table_prefix + stream_name
    as the table name here. But SqlProcessorBase uses a normalizer in
    get_sql_table_name()
    . As a result, if the stream name is not normalized in the state message (which I believe it isn't), the table_name we are actually inputting into the state table is not always the same as the actual table name. Is that intended (or am I misunderstanding the code)?
    a
    • 2
    • 7
  • j

    Jay Stevens

    06/05/2025, 6:39 PM
    I am trying to use PyAirbyte to sync data from multiple Stripe Accounts (via their stripe connect) feature to a
    MotherDuckCache
    - but I don't think that the way incremental sync is setup will work unless I use a different cache for each account. Does that sound right?
    a
    • 2
    • 14
  • n

    Nick Clarke

    06/12/2025, 11:52 PM
    Hi. I am uncertain of how to do development against a connector using pyairbyte. I'm trying to make some changes to an existing connector
    source-appsflyer
    . I am trying to test my changes as part of the service we have that uses pyairbyte in a docker container to read from this source and write back to bigquery as a destination. I've checked out https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-appsflyer and made my relevant changes there. I am now trying to test the changes in my service that uses pyairbyte with the following test code:
    Copy code
    import airbyte as ab
    import json
    
    CONFIG_PATH = "configs/appsflyer_android.json"
    with open(CONFIG_PATH, "r") as f:
        source_config = json.load(f)
    
    ## /app/source_appsflyer/ is my local clone of airbytehq/airbyte/ mounting only the relevant appsflyer connector folder in my docker container.
    
    source = ab.get_source("source-appsflyer", config=source_config, local_executable="/app/source_appsflyer/source_appsflyer/run.py")
    source.select_streams(["retargeting_geo_report"])
    all_streams = source.get_selected_streams()
    read_result = source.read()
    This fails, because if I
    pip install -r requirements.txt
    from
    /app/source_appsflyer
    I get package collisions for
    import airbyte
    between airebyte and pyairbyte. I then tried
    poetry install --with dev
    which places an executable in
    /root/.cache/pypoetry/virtualenvs/source-appsflyer-OcVLBknA-py3.10/bin/source-appsflyer
    which I then point to with
    source = ab.get_source("source-appsflyer", config=source_config, local_executable="/root/.cache/pypoetry/virtualenvs/source-appsflyer-OcVLBknA-py3.10/bin/source-appsflyer")
    But this appears to install the version from PyPI. When I make local changes to the package and do a fresh install, those changes do not appear in the
    lib
    code under ``/root/.cache/pypoetry/virtualenvs/source-appsflyer-OcVLBknA-py3.10`` I also tried
    poetry build
    from
    /app/source_appsflyer/
    and then pointing
    source = ab.get_source("source-appsflyer", config=source_config, local_executable="/app/source_appsflyer/dist/name_of_the_whl_here")
    This fails. What is the proper way to do this? How can I point to an local executable or path that pyairbyte understands?
    a
    • 2
    • 18
  • a

    Andrew Lytle

    06/13/2025, 2:48 PM
    Hey all, a quick question about the S3/Source connector. I'm trying to install it via pip in a Python 3.12 environment, and it is failing with a number of dependency related issues. Everything seems to work with 3.11, and I notice the docker image appears to be running 3.11 as well. Does this suggest the S3 Connector does not currently support 3.12? Thanks for your help!
    n
    a
    • 3
    • 6
  • s

    Slackbot

    06/20/2025, 7:14 PM
    An admin, @AJ Steers, removed LangChain from this channel.
  • s

    Slackbot

    06/20/2025, 7:15 PM
    An admin, @AJ Steers, removed Airbyte Team from this channel.
  • s

    Slackbot

    06/20/2025, 7:17 PM
    aj from Airbyte Team was added to this channel by aj238. You can review their permissions in Channel Details. Happy collaborating!
  • a

    AJ Steers

    06/20/2025, 7:20 PM
    has renamed the channel from "pyairbyte-public-beta" to "pyairbyte"
  • n

    Nick Clarke

    06/23/2025, 9:56 PM
    I'm running pyairbyte within a container, and I want to use a source that is java based. Is there any best practices for docker-within-docker? I seem to have some fuzzy memory of docker now allowing a privileged container to break out and stand up a container alongside it instead of running another container within the running container, but its been a long time.
    a
    • 2
    • 24
  • i

    Idan Moradov

    06/25/2025, 9:49 AM
    Suddenly pyairbye stop working for use when release the new version 0.25.2 the read function show no cursor for stream attempting to do incremental Someone face the same issue?
    y
    a
    • 3
    • 3
  • a

    Alioune Amoussou

    06/25/2025, 3:38 PM
    Hi there 👋🏿, I was about to make a PR to add
    key-pair authentication to Snowflake
    when I came across this one PR. It seems to handle authentication when the key is in a file but does not allow to pass the key directly as a string to
    SnowflakeConfig
    in an attribute. I was wondering what you think of this feature ? And whether, to implement my functionality, I should start from the branch of the existing PR or whether I should start from master. There are several approaches I can take: - Add a private_key attribute in
    SnowflakeConfig
    here - Add a private_key attribute and validation function in
    SnowflakeConfig
    (ex: password can't be filled if private_key is...) here - Abstract this logic into a
    Credential
    class, which would contain all authentication attributes, handle validation and generate part of the configuration passed here.
    a
    • 2
    • 7
  • y

    Yohann Jardin

    06/27/2025, 1:01 PM
    octavia wave Hi! We're facing a race condition related to PyAirbyte about twice a week, where we query
    _airbyte_state
    and the state of a stream is missing from the table. I shared the details on github. It has a minimal impact for us, and we will soon not face it anymore. I'm sharing it there in case other people face this in the future. We're not planning try to tackle it. The fix looks easy, but testing against the different caches and their support of transaction or upsert doesn't seem trivial 😕
    a
    • 2
    • 1
  • b

    Ben Wilen

    06/27/2025, 5:18 PM
    👋 Hey all, I'm working on adding telemetry to PyAirbyte, and believe Airbyte is already integrated with OpenTelemetry - does PyAirbyte have any support for that yet?
    a
    a
    • 3
    • 9
  • a

    AJ Steers (Airbyte)

    06/28/2025, 7:36 PM
    Creating new 🧵 for this question from @aditya kumar.
    a
    • 2
    • 5
  • n

    Nick Clarke

    06/30/2025, 10:11 PM
    I'm running into a
    PyAirbyteNameNormalizationError
    when I attempt to run a very simple example with the mixpanel connector, which seems like it may be an issue with the source? Please see https://gist.github.com/nickolasclarke/dd858ea3b4464e472f5a02ffbd4ce586 for more details.
    👀 1
    a
    y
    • 3
    • 80
  • n

    Nick Clarke

    07/03/2025, 10:12 PM
    I'm a tad confused on how the BigQuery cache differs from the bigquery destination. I'm attempting to sync to a BQ dataset using the BigQueryCache, but it appears to only be out cache files to .cache/ and I do not see them getting flushed out. I am doing a full sync, so will it not flush and write to BQ until it has reached the end of all pages?
    a
    • 2
    • 22
  • y

    Yohann Jardin

    07/11/2025, 8:09 PM
    octavia wave Hi all and @AJ Steers (Airbyte) Currently logging is only done in files, it is not logged to stdout. We previously discussed that on Slack and created an issue with two proposals. Without much input whether we prefer proposal one or two, I went with the first one that has no breaking change or anything. PR: https://github.com/airbytehq/PyAirbyte/pull/716 For reviewing, I suggest going commit per commit. They are supposed to be easier to navigate.
    👍 1
    a
    • 2
    • 7
  • y

    Yohann Jardin

    07/11/2025, 8:28 PM
    Btw, there are CI failures that I was able to confirm are unrelated to my change. (cf a PR containing a single empty commit.) Here is a separate PR focused on solving this issue.
    👀 1
    a
    • 2
    • 3
  • m

    Mauricio Pérez

    07/18/2025, 6:45 PM
    Hi everyone, I'm using
    source-pipedrive
    with Python 3.11 and running into this error:
    Failure Reason: Encountered an error while discovering streams. Error: mutable default <class 'airbyte_cdk.sources.declarative.decoders.json_decoder.JsonDecoder'> for field decoder is not allowed: use default_factory
    Here’s the snippet triggering the error:
    Copy code
    import airbyte as ab
    
    pipedrive_config = {
        "api_token": "api_token",
        "replication_start_date": "2017-01-25 00:00:00Z"
    }
    
    pipedrive = ab.get_source("source-pipedrive", pip_url="airbyte-source-pipedrive==2.3.7")
    pipedrive.set_config(pipedrive_config)
    pipedrive.check()
    I suspect this might be related to a CDK version incompatibility with Python 3.11. Has anyone found a workaround or a compatible version that resolves this?
    y
    • 2
    • 3
  • a

    AJ Steers (Airbyte)

    08/05/2025, 12:33 AM
    📣 Tomorrow (Tuesday) I'll be presenting at the Airbyte MCP webinar, demoing some exciting new MCP capabilities in PyAirbyte. We'll also be talking about MCP tools for AI agents, the differences between remote and local MCP servers, and showing off some other Airbyte MCP Server goodness across the Airbyte ecosystem. _*Please join us if you are interested in learning how to integrate your AI and data workloads.*_ octavia muscle
    octavia muscle 2
    dancing pikachu 1
    • 1
    • 1
  • a

    AJ Steers (Airbyte)

    08/05/2025, 12:53 AM
    I'm thrilled to announce the latest PyAirbyte release: PyAirbyte v0.29. This is a huge update, so please read the below and let us know if you have feedback, or if you run into any issues when using the latest version. From the Release Notes: > PyAirbyte
    v0.29
    introduces powerful new features, including MCP tools targeted to LLM use cases, the ability to "preview" data from multiple streams simultaneously, faster connector installs by leveraging the powerful
    uv
    tool, and the ability to mix-and-match connectors' Python versions within the same environment.
    > > ✨ New Features (PyAirbyte Core) > • feat: add stream previews for sources via
    Source.print_stream_previews()
    and
    Source.get_stream_previews()
    (#725) > • feat: replace
    pip
    with
    uv
    for connector installations, resulting in dramatically faster connector installation (can be disabled with the
    AIRBYTE_NO_UV
    environment variable) (#730) > • feat: ability to override which Python versions will be used for installing new connectors with the
    use_python
    arg for
    get_source()
    and
    get_destination()
    (#730) > • feat: support uv-managed Python versions for installing new connectors with the
    use_python
    arg, even for Python versions not yet installed on the system (#730) > • feat: add new Cache.run_sql_query() method to run SQL queries directly against cache objects (#734) > > 🤖 New Built-in PyAirbyte MCP Server (🧪 Experimental) > • feat: add new MCP tools which allow LLMs to call PyAirbyte directly (#734, #738, #736): > ◦ Connector Management > ▪︎
    list_connectors
    - List available Airbyte connectors with optional filtering by type (source/destination), install types (python, yaml, java, docker), or keywords > ▪︎
    get_connector_info
    - Get documentation URL and information for a specific connector > ▪︎
    list_connector_config_secrets
    - List available config secret names for a given connector > ▪︎
    validate_connector_config
    - Validate a connector configuration > ◦ Source Operations > ▪︎
    list_source_streams
    - List all streams available in a source connector > ▪︎
    get_source_stream_json_schema
    - Get the JSON schema for a specific stream in a source connector > ▪︎
    get_stream_previews
    - Get sample records (previews) from streams in a source connector > ▪︎
    read_source_stream_records
    - Read records from a specific source stream > ◦ Cache Operations > ▪︎
    describe_default_cache
    - Describe the currently configured default cache (typically DuckDB) > ▪︎
    list_cached_streams
    - List all streams available in the default cache > ▪︎
    sync_source_to_cache
    - Run a sync from a source connector to the default DuckDB cache > ▪︎
    run_sql_query
    - Run SQL queries against the default cache Please let us know what you think, here or in the GitHub Discussion - and join the MCP Webinar for more information on the latest with Airbyte AI and MCP.
  • a

    AJ Steers (Airbyte)

    08/05/2025, 1:10 AM
    📰 Some Big News for PyAirbyte 🎉 Please pardon this very rare <!channel> notification as I share share some very important PyAirbyte announcements: 1. You are invited to our MCP Webinar tomorrow (Tuesday) which will be live on YouTube. ◦ Rewatch link will also be shared here afterwards. 2. PyAirbyte now has built-in MCP capabilities. ◦ Plug it into your favorite IDE or LLM Chat interface and let us know what you think! 3. PyAirbyte can now mix-and-match Python versions for connectors with the power of uv. ◦ _Take advantage of faster uv-managed installs, or opt out with AIRBYTE_NO_UV environment variable if you want to continue using
    pip
    ._ ⬆️ Scroll up for more detail on each of these big updates.
    dancing pikachu 3
    🥧 2
    airbyte rocket 2
    ✅ 1
    👀 1
    🙏 1
  • a

    AJ Steers (Airbyte)

    08/05/2025, 4:28 AM
    Lastly, we are currently looking for design partners for future AI and MCP features. Drop a "🙋" or ping @AJ Steers (Airbyte) if you are interested in new features at the intersection of data and AI. 🤖+📈=octavia muscle
    🙋 2
    🙋🏻 1
  • a

    aditya kumar

    08/06/2025, 3:01 PM
    Hi AJ, I missed your webinar, can you share it if recorded.
    a
    • 2
    • 1
  • n

    Nick Clarke

    08/08/2025, 1:01 AM
    @AJ Steers is the ability to simultaneously "drain" a cache queue while its still loading data on the roadmap at all?
  • c

    Colin

    08/24/2025, 7:40 PM
    I’m currently using pyairbyte and looking to use the Mailchimp module. This used to work (within reason) but now I’m getting an error… but I can’t work out how to fix this?
    Copy code
    File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
      File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
      File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
      File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
      File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
    ModuleNotFoundError: No module named 'source_declarative_manifest'
    y
    a
    • 3
    • 16