https://linen.dev logo
Join Slack
Powered by
# pyairbyte
  • t

    Tendrex Reporting

    04/06/2025, 10:55 PM
    Hi team, I have two questions about loading Google Sheets into PostgreSQL tables using PyAirbyte. 1. I’m encountering some issues with a basic POC. You can find the details in this thread, along with the code. Essentially, I’m trying to load data from a spreadsheet into a PostgreSQL table, but I’m running into the following error:
    AirbyteConnectorFailedError: Connector failed.
    Please review the log file for more information.
    Connector Name: 'source-google-sheets'
    Exit Code: 1
    Log file: /tmp/airbyte/logs/source-google-sheets/source-google-sheets-log-JR6MC6ZE6.log
    1. In the future, I’d like to implement a similar process, but only load data starting from row 7, since my customer uses the first few rows for non-tabular content like logos and details. Is that currently possible? From what I can tell in the documentation, it seems we can only provide the spreadsheet link without any additional parameters.
    u
    • 2
    • 10
  • t

    Travis Niemczyk

    04/10/2025, 5:45 PM
    does the SnowflakeCache() support using RSA/key-pair instead of a password? I know snowflake is transitioning from allowing user/pass as authenticating.
    u
    • 2
    • 4
  • k

    Krishna

    04/16/2025, 5:32 PM
    Airbyte Gods - Tyring to test Snowflake Cortex destination connector . The connector is not getting downloaded into virtual environment ..And I dont see this package available in pypi .. Its is python indeed .. Is there an alternate method to install it .
    >> dest = ab.get_destination('destination-snowflake-cortex')
    Writing PyAirbyte logs to file: /tmp/airbyte/logs/2025-04-16/airbyte-log-JRZSR7KR1.log Writing
    destination-snowflake-cortex
    logs to file: /tmp/airbyte/logs/destination-snowflake-cortex/destination-snowflake-cortex-log-JRZSR7KRM.log
    >> dest.check()
    Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ssm-user/airbyte/lib/python3.10/site-packages/airbyte/_connector_base.py", line 311, in check with as_temp_files([self._config]) as [config_file]: File "/home/ssm-user/airbyte/lib/python3.10/site-packages/airbyte/_connector_base.py", line 136, in _config raise exc.AirbyteConnectorConfigurationMissingError( airbyte.exceptions.AirbyteConnectorConfigurationMissingError: Connector is missing configuration. (AirbyteConnectorConfigurationMissingError) ------------------------------------------------------------ AirbyteConnectorConfigurationMissingError: Connector is missing configuration. Provide via get_destination() or set_config() Connector Name: 'destination-snowflake-cortex'
    >>
    a
    • 2
    • 4
  • t

    Travis Niemczyk

    04/23/2025, 8:52 PM
    has anyone successfully been able to setup the netsuite connector? I keep getting this
    401 Client Error
    but the credentials Im using are the same as a working pipeline thats currently running with a different platform. In the docs it calls for crazy full permissions on everything, and im hoping that its not mandatory since theres pages and pages of permissions that would need to be given just to test this. Any thoughts? Im also using
    "object_types": ["transactions"],
    to limit to just 1 object
    Copy code
    Failure Reason: "HTTPError('401 Client Error: Unauthorized for url: https://<realm>.suitetalk.api.netsuite.com/services/rest/record/v1/transactions?limit=1')"
    p
    y
    • 3
    • 8
  • p

    pat parillo

    04/25/2025, 5:54 PM
    Hello, Curious if anyones had success using non yaml connectors like netsuite, zoho etc.. in a databricks notebook environment? I've seen bits and pieces of info out there but nothing concrete.
  • b

    Ben Wilen

    04/25/2025, 6:06 PM
    👋 Hey team, I'm running into an issue migrating from Airbyte to PyAirbyte because it seems like stream sync modes are hardcoded to incremental in this code block, and our custom connector relies on sync_mode in
    read_records()
    . Is this expected behavior? cc @AJ Steers
    a
    • 2
    • 8
  • t

    Travis Niemczyk

    05/05/2025, 7:37 PM
    Has anyone seen this error before when trying to setup a new s3 source?
    Copy code
    TypeError: Can't instantiate abstract class SourceS3StreamReader with abstract method upload
    An error occurred during check: Connector check failed. (AirbyteConnectorCheckFailedError)
    ------------------------------------------------------------
    AirbyteConnectorCheckFailedError: Connector check failed.
        Please review the log file for more information.
        Connector Name: 'source-s3'
    ------------------------------------------------------------
    Caused by: Connector failed. (AirbyteConnectorFailedError)
    ------------------------------------------------------------
    AirbyteConnectorFailedError: Connector failed.
        Please review the log file for more information.
        Connector Name: 'source-s3'
        Exit Code: 1
    The log file it mentions is always blank
    airbyte==0.24.2
    airbyte-api==0.52.2
    airbyte-cdk==6.48.6
    airbyte_protocol_models_dataclasses==0.15.0
    airbyte_protocol_models_pdv2==0.13.1
    ab.get_source('source-s3')
    is installing
    v4.13.5
    y
    a
    a
    • 4
    • 24
  • m

    Mauricio Pérez

    05/16/2025, 4:22 PM
    Hi team, I'm currently trying to connect to the
    source-airtable
    connector using
    pyairbyte
    , but I’m running into an issue. Here are the details: • Airbyte version:
    0.24.2
    • Python version:
    3.10.17
    • Error Message:
    Copy code
    ERROR: Error starting the sync. This could be due to an invalid configuration or catalog.
    Please contact Support for assistance.
    Error: Validation against json schema defined in declarative_component_schema.yaml schema failed
    
    AirbyteConnectorMissingSpecError: Connector did not return a spec.
      Please review the log file for more information.
      Connector Name: 'source-airtable'
    This is the snippet it's throwing the issue:
    Copy code
    import airbyte as ab            
    
    airtable = ab.get_source(
        "source-airtable",
    )
    
    credentials = {
        "credentials": {
            "auth_method": "api_key",      
            "api_key":     "pat"
        }
    }
    
    airtable.set_config(config=credentials)
    a
    • 2
    • 19
  • a

    AJ Steers (Airbyte)

    05/16/2025, 7:41 PM
    🎉 Now Available: PyAirbyte 0.25.0 This release adds the capability to set cursor and primary key overrides - which is helpful for DB-type sources. Specifically,
    Source
    objects now support the following methods: •
    set_cursor_key()
    - Overrides the cursor key for one stream. •
    set_cursor_keys()
    - Overrides the cursor key for any number of streams. •
    set_primary_key()
    - Overrides the primary key for one stream. •
    set_primary_keys()
    - Overrides the primary key for any number of stream. See the updated API docs for more information. Thanks to @Krishna for his help testing the new feature, and thanks also to @Mateusz Czarkowski for "upvoting" the issue here in the channel for our prioritization. 🙏
  • r

    Rad Extrem

    05/19/2025, 2:40 AM
    Hello Team, While experimenting with VENV-based installation for Python connectors, I was wondering- is there a way to pre-bake a connector to avoid runtime installation, possibly via a Dockerfile, aside from using
    get_source
    or
    get_destination
    ? Also, would using a
    local_executable
    be a better approach for this use case? If so, are there any established steps or best practices for building such executables for connectors?
    a
    a
    • 3
    • 8
  • b

    Ben Wilen

    05/19/2025, 5:14 PM
    👋 Hey all, I'm using the PyAirbyte
    SnowflakeCache
    and currently am running into this auth error with a 5 hour sync:
    Copy code
    sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 390114 (08001): None: Authentication token has expired.  The user must authenticate again.
    (Background on this error at: <https://sqlalche.me/e/20/f405>)
    Assuming it's because snowflake has a default timeout of 4 hours, does anyone have a fix for this? I don't see a way via PyAirbyte to specify
    "client_session_keep_alive": True
    a
    • 2
    • 14
  • b

    Ben Wilen

    05/27/2025, 7:57 PM
    👋 Hey team, question about a possible discrepancy with SqlStateWriter - I see it uses
    table_name=table_prefix + stream_name
    as the table name here. But SqlProcessorBase uses a normalizer in
    get_sql_table_name()
    . As a result, if the stream name is not normalized in the state message (which I believe it isn't), the table_name we are actually inputting into the state table is not always the same as the actual table name. Is that intended (or am I misunderstanding the code)?
    a
    • 2
    • 7
  • j

    Jay Stevens

    06/05/2025, 6:39 PM
    I am trying to use PyAirbyte to sync data from multiple Stripe Accounts (via their stripe connect) feature to a
    MotherDuckCache
    - but I don't think that the way incremental sync is setup will work unless I use a different cache for each account. Does that sound right?
    a
    • 2
    • 14
  • n

    Nick Clarke

    06/12/2025, 11:52 PM
    Hi. I am uncertain of how to do development against a connector using pyairbyte. I'm trying to make some changes to an existing connector
    source-appsflyer
    . I am trying to test my changes as part of the service we have that uses pyairbyte in a docker container to read from this source and write back to bigquery as a destination. I've checked out https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-appsflyer and made my relevant changes there. I am now trying to test the changes in my service that uses pyairbyte with the following test code:
    Copy code
    import airbyte as ab
    import json
    
    CONFIG_PATH = "configs/appsflyer_android.json"
    with open(CONFIG_PATH, "r") as f:
        source_config = json.load(f)
    
    ## /app/source_appsflyer/ is my local clone of airbytehq/airbyte/ mounting only the relevant appsflyer connector folder in my docker container.
    
    source = ab.get_source("source-appsflyer", config=source_config, local_executable="/app/source_appsflyer/source_appsflyer/run.py")
    source.select_streams(["retargeting_geo_report"])
    all_streams = source.get_selected_streams()
    read_result = source.read()
    This fails, because if I
    pip install -r requirements.txt
    from
    /app/source_appsflyer
    I get package collisions for
    import airbyte
    between airebyte and pyairbyte. I then tried
    poetry install --with dev
    which places an executable in
    /root/.cache/pypoetry/virtualenvs/source-appsflyer-OcVLBknA-py3.10/bin/source-appsflyer
    which I then point to with
    source = ab.get_source("source-appsflyer", config=source_config, local_executable="/root/.cache/pypoetry/virtualenvs/source-appsflyer-OcVLBknA-py3.10/bin/source-appsflyer")
    But this appears to install the version from PyPI. When I make local changes to the package and do a fresh install, those changes do not appear in the
    lib
    code under ``/root/.cache/pypoetry/virtualenvs/source-appsflyer-OcVLBknA-py3.10`` I also tried
    poetry build
    from
    /app/source_appsflyer/
    and then pointing
    source = ab.get_source("source-appsflyer", config=source_config, local_executable="/app/source_appsflyer/dist/name_of_the_whl_here")
    This fails. What is the proper way to do this? How can I point to an local executable or path that pyairbyte understands?
    a
    • 2
    • 18
  • a

    Andrew Lytle

    06/13/2025, 2:48 PM
    Hey all, a quick question about the S3/Source connector. I'm trying to install it via pip in a Python 3.12 environment, and it is failing with a number of dependency related issues. Everything seems to work with 3.11, and I notice the docker image appears to be running 3.11 as well. Does this suggest the S3 Connector does not currently support 3.12? Thanks for your help!
    n
    a
    • 3
    • 6
  • s

    Slackbot

    06/20/2025, 7:14 PM
    An admin, @AJ Steers, removed LangChain from this channel.
  • s

    Slackbot

    06/20/2025, 7:15 PM
    An admin, @AJ Steers, removed Airbyte Team from this channel.
  • s

    Slackbot

    06/20/2025, 7:17 PM
    aj from Airbyte Team was added to this channel by aj238. You can review their permissions in Channel Details. Happy collaborating!
  • a

    AJ Steers

    06/20/2025, 7:20 PM
    has renamed the channel from "pyairbyte-public-beta" to "pyairbyte"
  • n

    Nick Clarke

    06/23/2025, 9:56 PM
    I'm running pyairbyte within a container, and I want to use a source that is java based. Is there any best practices for docker-within-docker? I seem to have some fuzzy memory of docker now allowing a privileged container to break out and stand up a container alongside it instead of running another container within the running container, but its been a long time.
    a
    • 2
    • 24
  • i

    Idan Moradov

    06/25/2025, 9:49 AM
    Suddenly pyairbye stop working for use when release the new version 0.25.2 the read function show no cursor for stream attempting to do incremental Someone face the same issue?
    y
    a
    • 3
    • 3
  • a

    Alioune Amoussou

    06/25/2025, 3:38 PM
    Hi there 👋🏿, I was about to make a PR to add
    key-pair authentication to Snowflake
    when I came across this one PR. It seems to handle authentication when the key is in a file but does not allow to pass the key directly as a string to
    SnowflakeConfig
    in an attribute. I was wondering what you think of this feature ? And whether, to implement my functionality, I should start from the branch of the existing PR or whether I should start from master. There are several approaches I can take: - Add a private_key attribute in
    SnowflakeConfig
    here - Add a private_key attribute and validation function in
    SnowflakeConfig
    (ex: password can't be filled if private_key is...) here - Abstract this logic into a
    Credential
    class, which would contain all authentication attributes, handle validation and generate part of the configuration passed here.
    a
    • 2
    • 7
  • y

    Yohann Jardin

    06/27/2025, 1:01 PM
    octavia wave Hi! We're facing a race condition related to PyAirbyte about twice a week, where we query
    _airbyte_state
    and the state of a stream is missing from the table. I shared the details on github. It has a minimal impact for us, and we will soon not face it anymore. I'm sharing it there in case other people face this in the future. We're not planning try to tackle it. The fix looks easy, but testing against the different caches and their support of transaction or upsert doesn't seem trivial 😕
    a
    • 2
    • 1
  • b

    Ben Wilen

    06/27/2025, 5:18 PM
    👋 Hey all, I'm working on adding telemetry to PyAirbyte, and believe Airbyte is already integrated with OpenTelemetry - does PyAirbyte have any support for that yet?
    a
    a
    • 3
    • 9
  • a

    AJ Steers (Airbyte)

    06/28/2025, 7:36 PM
    Creating new 🧵 for this question from @aditya kumar.
    a
    • 2
    • 5
  • n

    Nick Clarke

    06/30/2025, 10:11 PM
    I'm running into a
    PyAirbyteNameNormalizationError
    when I attempt to run a very simple example with the mixpanel connector, which seems like it may be an issue with the source? Please see https://gist.github.com/nickolasclarke/dd858ea3b4464e472f5a02ffbd4ce586 for more details.
    👀 1
    a
    y
    • 3
    • 80
  • n

    Nick Clarke

    07/03/2025, 10:12 PM
    I'm a tad confused on how the BigQuery cache differs from the bigquery destination. I'm attempting to sync to a BQ dataset using the BigQueryCache, but it appears to only be out cache files to .cache/ and I do not see them getting flushed out. I am doing a full sync, so will it not flush and write to BQ until it has reached the end of all pages?
    a
    • 2
    • 22
  • y

    Yohann Jardin

    07/11/2025, 8:09 PM
    octavia wave Hi all and @AJ Steers (Airbyte) Currently logging is only done in files, it is not logged to stdout. We previously discussed that on Slack and created an issue with two proposals. Without much input whether we prefer proposal one or two, I went with the first one that has no breaking change or anything. PR: https://github.com/airbytehq/PyAirbyte/pull/716 For reviewing, I suggest going commit per commit. They are supposed to be easier to navigate.
    👍 1
    a
    • 2
    • 7
  • y

    Yohann Jardin

    07/11/2025, 8:28 PM
    Btw, there are CI failures that I was able to confirm are unrelated to my change. (cf a PR containing a single empty commit.) Here is a separate PR focused on solving this issue.
    👀 1
    a
    • 2
    • 3
  • m

    Mauricio Pérez

    07/18/2025, 6:45 PM
    Hi everyone, I'm using
    source-pipedrive
    with Python 3.11 and running into this error:
    Failure Reason: Encountered an error while discovering streams. Error: mutable default <class 'airbyte_cdk.sources.declarative.decoders.json_decoder.JsonDecoder'> for field decoder is not allowed: use default_factory
    Here’s the snippet triggering the error:
    Copy code
    import airbyte as ab
    
    pipedrive_config = {
        "api_token": "api_token",
        "replication_start_date": "2017-01-25 00:00:00Z"
    }
    
    pipedrive = ab.get_source("source-pipedrive", pip_url="airbyte-source-pipedrive==2.3.7")
    pipedrive.set_config(pipedrive_config)
    pipedrive.check()
    I suspect this might be related to a CDK version incompatibility with Python 3.11. Has anyone found a workaround or a compatible version that resolves this?
    y
    • 2
    • 3