https://linen.dev logo
Join Slack
Powered by
# ask-ai
  • b

    Bruno Gottschefsky

    06/30/2025, 12:02 AM
    Airbyte Cloud didn't trigger dbt cloud job after a successful sync. why this happened and how to audit it?
    k
    • 2
    • 4
  • k

    Kofi Manful

    06/30/2025, 5:46 AM
    @kapa.ai is there a channel here for connecting with freelancers who have airbyte expertise?
    k
    • 2
    • 1
  • t

    Travis Liao

    06/30/2025, 6:20 AM
    What is the latest tool currently provided to build connectors @kapa.ai
    k
    • 2
    • 1
  • i

    Ishan Anilbhai Koradiya

    06/30/2025, 6:34 AM
    Hi @kapa.ai, does airbyte helm setup pull images every time a sync runs ?
    k
    • 2
    • 1
  • a

    ali chadordouzan

    06/30/2025, 6:45 AM
    @kapa.ai I have an Airbyte that syncs Magento 2 MariaDB with BigQuery using the MySQL connector. It syncs every 12 hours. The current MariaDB configuration for CDC is as follows: • Binlogs expire after 3 days. • The maximum binlog size is 100 MB. • It creates 3 binlog files daily. My MariaDB pod hasn’t restarted in the last 120 days. My syncs fail randomly. For example: 1. I had a successful sync on June 26, 2025, at 14:33 with the following details: 1.96 MB | 829 records loaded | 3m 26s. 2. The next sync failed on 27 Jun 2025, 02:53, after 5 attempts with the following error logs:
    Copy code
    2025-06-27 02:53:06 source ERROR Jun 26, 2025 11:23:06 PM com.github.shyiko.mysql.binlog.BinaryLogClient resolveDatabaseVersion
    2025-06-27 02:53:06 source ERROR : Database version: 5.5.5-10.4.28-MariaDB-1:10.4.28+maria~ubu2004-log (major=5, minor=5, mariadb=true)
    2025-06-27 02:53:06 source ERROR Jun 26, 2025 11:23:06 PM com.github.shyiko.mysql.binlog.BinaryLogClient connect
    2025-06-27 02:53:06 source ERROR : Connected to <http://222.175.242.35.bc.googleusercontent.com:3306|222.175.242.35.bc.googleusercontent.com:3306> at mysql-bin.002665/24950335 (sid:5549, cid:70186522)
    2025-06-27 02:53:06 source ERROR <http://blc-222.175.242.35.bc.googleusercontent.com:3306|blc-222.175.242.35.bc.googleusercontent.com:3306> i.d.c.b.BinlogStreamingChangeEventSource(logStreamingSourceState):1145 Error during binlog processing. Last offset stored = {file=mysql-bin.002665, pos=0, server_id=1, event=1}, binlog reader near position = mysql-bin.002665/24950335
    2025-06-27 02:53:06 source ERROR <http://blc-222.175.242.35.bc.googleusercontent.com:3306|blc-222.175.242.35.bc.googleusercontent.com:3306> i.d.p.ErrorHandler(setProducerThrowable):52 Producer failure io.debezium.DebeziumException: Failed to deserialize data of EventHeaderV4{timestamp=1750935665000, eventType=WRITE_ROWS, serverId=1, headerLength=19, dataLength=23, nextPosition=24950377, flags=0}
    	at io.debezium.connector.binlog.BinlogStreamingChangeEventSource.wrap(BinlogStreamingChangeEventSource.java:1190) ~[debezium-connector-binlog-3.0.0.Final.jar:3.0.0.Final]
    	at io.debezium.connector.binlog.BinlogStreamingChangeEventSource$ReaderThreadLifecycleListener.onEventDeserializationFailure(BinlogStreamingChangeEventSource.java:1243) ~[debezium-connector-binlog-3.0.0.Final.jar:3.0.0.Final]
    	at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1090) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:655) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:954) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
    Caused by: com.github.shyiko.mysql.binlog.event.deserialization.EventDataDeserializationException: Failed to deserialize data of EventHeaderV4{timestamp=1750935665000, eventType=WRITE_ROWS, serverId=1, headerLength=19, dataLength=23, nextPosition=24950377, flags=0}
    	at com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.deserializeEventData(EventDeserializer.java:343) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.nextEvent(EventDeserializer.java:246) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at io.debezium.connector.binlog.BinlogStreamingChangeEventSource$1.nextEvent(BinlogStreamingChangeEventSource.java:415) ~[debezium-connector-binlog-3.0.0.Final.jar:3.0.0.Final]
    	at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1077) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	... 3 more
    Caused by: com.github.shyiko.mysql.binlog.event.deserialization.MissingTableMapEventException: No TableMapEventData has been found for table id:155. Usually that means that you have started reading binary log 'within the logical event group' (e.g. from WRITE_ROWS and not proceeding TABLE_MAP
    	at com.github.shyiko.mysql.binlog.event.deserialization.AbstractRowsEventDataDeserializer.deserializeRow(AbstractRowsEventDataDeserializer.java:109) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at com.github.shyiko.mysql.binlog.event.deserialization.WriteRowsEventDataDeserializer.deserializeRows(WriteRowsEventDataDeserializer.java:64) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at com.github.shyiko.mysql.binlog.event.deserialization.WriteRowsEventDataDeserializer.deserialize(WriteRowsEventDataDeserializer.java:56) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at com.github.shyiko.mysql.binlog.event.deserialization.WriteRowsEventDataDeserializer.deserialize(WriteRowsEventDataDeserializer.java:32) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.deserializeEventData(EventDeserializer.java:337) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at com.github.shyiko.mysql.binlog.event.deserialization.EventDeserializer.nextEvent(EventDeserializer.java:246) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	at io.debezium.connector.binlog.BinlogStreamingChangeEventSource$1.nextEvent(BinlogStreamingChangeEventSource.java:415) ~[debezium-connector-binlog-3.0.0.Final.jar:3.0.0.Final]
    	at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1077) ~[mysql-binlog-connector-java-0.31.0.jar:0.31.0]
    	... 3 more
    This pattern continues until the binlog files expire and are purged from MariaDB, then Airbyte runs a reset. This pattern repeats as follows: • A couple of successful syncs, then failures for 3 days, followed by a couple of successful syncs, then failures for 3 days. Could this problem be due to the max binlog size being too small? Could it be from the resource exhaustion of the Airbyte server? (installed with abctl local) Could it be from network disruptions?
    k
    • 2
    • 4
  • e

    Elena Mascarenas Garcia

    06/30/2025, 7:59 AM
    how much more or less would cost to transfer 1 million rows monthly from table hubspot.email_events
    k
    • 2
    • 7
  • t

    Travis Liao

    06/30/2025, 8:00 AM
    How can I use poetry to build python connector? @kapa.ai
    k
    • 2
    • 4
  • e

    Elena Mascarenas Garcia

    06/30/2025, 8:14 AM
    if I ran an initial sync for a given stream, then disabled the stream, what will happen if I enable the stream again? will I be charged for an initial sync again
    k
    • 2
    • 1
  • l

    Luke Alexander

    06/30/2025, 8:35 AM
    @kapa.ai I have been running sync between mysql 8.4 (CDC) and google BQ, every 12 hours, been working fine, we added 3 new tables, then sync broke, I thought upgrading airbyte from 1.5.1 to 1.7.1 might fix the issue, tried but failed, then tried updating the source/destination plugins but still errors, I see this:
    Copy code
    Input was fully read, but some streams did not receive a terminal stream status message. If the destination did not encounter other errors, this likely indicates an error in the source or platform. Streams without a status message
    the list of Streams without a status message is every mysql table
    k
    • 2
    • 37
  • k

    kapa.ai

    06/30/2025, 9:34 AM
    Airbyte provides a source connector for Taboola, allowing you to ingest data from the Taboola API. To use this connector, you need to have a Taboola account and obtain the following credentials from Taboola support: Client ID, Client Secret, and Account ID. Once you have these, you can configure the connector in Airbyte by entering these values. The Taboola source connector supports several streams, including
    account
    ,
    campaigns
    ,
    campaign_items
    ,
    audience_rules
    ,
    conversion_rules
    ,
    motion_ads
    , and
    audiences
    . Note that these streams do not support incremental syncs or pagination; only full refresh syncs are available. For more details on the API, you can refer to the [Taboola API documentation](https://developers.taboola.com/backstage-api/reference). The required configuration fields are: - `account_id`: Your Taboola Account ID - `client_id`: Your Taboola Client ID - `client_secret`: Your Taboola Client Secret You can find more information and setup instructions in the official Airbyte documentation for the [Taboola source connector](https://docs.airbyte.com/integrations/sources/taboola#taboola).
  • v

    Vivien Morlet

    06/30/2025, 9:43 AM
    @kapa.ai My PR has been approved but some conflicts about new version of my conenctor did arise before merge. What is the process? Should I fix the conflicts by updating the version of the connector once again in the documentation? Or des the airbyte team handle it?
    k
    • 2
    • 1
  • i

    Ilan Gresserman

    06/30/2025, 11:29 AM
    @kapa.ai How do I need to configure fluentbit to send airbyte data to a log manager?
    k
    • 2
    • 1
  • d

    Durim Gashi

    06/30/2025, 11:43 AM
    @kapa.ai I am getting this error when trying to sync Postgres data to Redshift: Some streams either received an INCOMPLETE stream status,
    k
    • 2
    • 1
  • g

    Guy

    06/30/2025, 1:47 PM
    in the airbyte ui builder, how do i correctly work with a child stream that is a child of an incremental parent stream? at the moment the parents stream works correctly, and only fetches new records, but the child stream gets the same number of records which were fetched in the previous run of the parent stream
    k
    • 2
    • 1
  • m

    Max

    06/30/2025, 2:10 PM
    @kapa.ai with regards to scaling airbyte, which settings should I be primarily concerned with? I'm running on AWS m7g.xlarge ec2 instance, 4vCPU 16GiB of memory. The issue I have is some long sync jobs are blocking other connection syncs and I would like to resolve this. My kindest docker image also shows 1/1 airbyte-abctl-worker which I'm assuming means only a single worker? This sounds wrong to me
    k
    • 2
    • 5
  • m

    Max

    06/30/2025, 2:26 PM
    @kapa.ai if I'm locally hosting airbyte, can I update my CDK version by getting into a shell on the airbyte-abctl-connector-builder-server and pip upgrading airbyte-cdk?
    k
    • 2
    • 4
  • a

    Annika Maybin

    06/30/2025, 2:31 PM
    @kapa.ai I am on Airbyte 1.7 and on the newest MySQL and Redshift connectors. At some point in the past I noticed some scd tables not being removed after syncing so wondering if I can just delete them manually.
    k
    • 2
    • 16
  • x

    Xavier LAI

    06/30/2025, 2:39 PM
    Hi @kapa.ai, Since upgrading to airbyte 1.7 when i deploy my same custom connector there is a malformed url reconstruction when paginate using Cursor pagination inject into Path The second page is not injected properly. The base url is shorten from
    <https://api.businesscentral.dynamics.com/v2.0/d87c1c6c-3fcd-4523-841b-82d0e122c000/VALTEX-PROD/api/cabaia/datawarehouse/v1.0/companies(c52533a0-3cbb-eb11-9b52-000d3ae74924)/salesLines>
    to
    <https://api.businesscentral.dynamics.com/salesLines>
    but it has the good query parameters
    k
    • 2
    • 8
  • l

    Lance Nehring

    06/30/2025, 2:43 PM
    @kapa.ai So a replication pod is running out of memory. What helm chart options would I need to set to increase pod memory for replication pods?
    k
    • 2
    • 4
  • a

    Antoine Anicotte

    06/30/2025, 4:07 PM
    @kapa.ai is there a way in the Builder using OAuth2 to pass access token params in the body and not as headers/query params? I want to query the Quora Ads API and the params must be in the body:
    Copy code
    curl -X POST \
        -d "client_id=28f50ac6edeb48b5a3025cc5c902d893" \
        -d "client_secret=ooouA7R0Ccwn9zzzTo8t--cHzcChbC9hH8tXCv5nmXU=" \
        -d "code=RHCuU8DmhuYCIIMLXHGpPIwn0DIfcK" \
        -d "grant_type=authorization_code" \
        --data-urlencode "redirect_uri=<https://example.app/oauth/callback>" \
        <https://www.quora.com/_/oauth/token>
    k
    t
    • 3
    • 7
  • j

    Júlia Lemes

    06/30/2025, 5:45 PM
    @kapa.ai I have a PostgreSQL source that connects to my Redshift, there were two times where there was a discrepancy between the source data and the data in Redshift, but there were no errors in the logs. In one of those times, a lot of records in the destination table were null
    k
    • 2
    • 13
  • a

    Aliaksei Ivushkin

    06/30/2025, 6:54 PM
    #C01AHCD885S how to clean up the logs on airbyte running in abtcl
    k
    • 2
    • 1
  • k

    kapa.ai

    06/30/2025, 6:54 PM
    To clean up logs on Airbyte running with abctl, you have a few options: 1. Automatic Log Cleanup (Retention Policy): - Airbyte schedules periodic jobs to delete logs from syncs. By default, logs are deleted every month or when their size exceeds 5GB. - You can adjust the retention policy using environment variables: - `MINIMUM_WORKSPACE_RETENTION_DAYS`: Minimum age (in days) before files are eligible for sweeping (default: 1 day). - `MAXIMUM_WORKSPACE_RETENTION_DAYS`: Maximum age (in days) before files are definitely swept (default: 60 days). - `MAXIMUM_WORKSPACE_SIZE_MB`: Maximum workspace size before sweeping occurs (default: 5GB). - Set these variables in your environment or configuration to control log retention. Note that some older variables may no longer be effective in recent Airbyte versions, and the current solution may involve using
    TEMPORAL_HISTORY_RETENTION_IN_DAYS
    for certain cleanup tasks. Please check your Airbyte version for compatibility with these settings [Airbyte Rotation Logs](https://discuss.airbyte.io/t/332), [Airbyte log sweeping](https://discuss.airbyte.io/t/2714). 2. Manual Cleanup (Kubernetes/MinIO): - If you are running Airbyte with abctl (which uses Kubernetes and MinIO by default), you can manually delete log files by accessing the MinIO pod:
    Copy code
    sh      kubectl exec -it -n airbyte-abctl airbyte-minio-0 -- /bin/bash
    - Then, delete files inside the log directory, typically:
    Copy code
    /storage/airbyte-dev-logs/job-logging/workspace/
    - This will free up space used by old logs [How to clean up airbyte-minio](https://discuss.airbyte.io/t/3101). Note: Always ensure you are not deleting logs that are still needed for troubleshooting or compliance. If you want to completely uninstall Airbyte and remove all data (including logs), you can use:
    Copy code
    sh abctl local uninstall --persisted rm -rf ~/.airbyte/abctl
    This will stop all containers and delete all persisted data, including logs [OSS Quickstart Uninstall](https://docs.airbyte.com/platform/1.6/using-airbyte/getting-started/oss-quickstart#uninstall-airbyte). If you need more granular or automated log cleanup, consider scripting the deletion process or adjusting the retention environment variables as described above.
  • a

    Aliaksei Ivushkin

    06/30/2025, 6:57 PM
    #C01AHCD885S i see a lot of logs from the replication job
    k
    • 2
    • 1
  • d

    Disha

    06/30/2025, 7:25 PM
    For airbyte github PAT what scopes should pat have?
    k
    • 2
    • 7
  • a

    Annika Maybin

    06/30/2025, 7:58 PM
    @kapa.ai are there plans for the microsoft sharepoint connector to be able to read in excel data?
    k
    • 2
    • 1
  • a

    Aliaksei Ivushkin

    06/30/2025, 8:10 PM
    @kapa.ai application button disappear after restart
    k
    • 2
    • 2
  • d

    Diego Ramirez

    06/30/2025, 8:29 PM
    @kapa.ai Using Hubspot source version 5.0.0 returns 504 error with log c"annot create /pipes/stderr: Permission denied"
    k
    • 2
    • 1
  • m

    Max

    06/30/2025, 9:30 PM
    @kapa.ai I'm trying to install airbyte on an ec2 instance using docker and abctl, I have an existing pgdata file in the .airbyte directory that I want to leverage for the database but I get the following error:
    Copy code
    WARNING  Encountered an issue deploying Airbyte:
                Pod: airbyte-db-0.184deee0dc9d8b38
                Reason: BackOff
                Message: Back-off restarting failed container airbyte-db-container in pod airbyte-db-0_airbyte-abctl(ae9d9299-ac76-4a01-8c32-8a0065d43337)
                Count: 15
                Logs: chown: /var/lib/postgresql/data/pgdata: Operation not permitted
              chmod: /var/lib/postgresql/data/pgdata: Operation not permitted
              The files belonging to this database system will be owned by user "postgres".
              This user must also own the server process.
    
              The database cluster will be initialized with locale "en_US.utf8".
              The default database encoding has accordingly been set to "UTF8".
              The default text search configuration will be set to "english".
    
              Data page checksums are disabled.
    
              initdb: error: directory "/var/lib/postgresql/data/pgdata" exists but is not empty
              If you want to create a new database system, either remove or empty
              the directory "/var/lib/postgresql/data/pgdata" or run initdb
              with an argument other than "/var/lib/postgresql/data/pgdata".
    ▀  Installing 'airbyte/airbyte' (version: 1.4.0) Helm Chart (this may take several minutes) (9m8s)
    k
    • 2
    • 16
  • j

    Jerry Bao

    06/30/2025, 11:43 PM
    @kapa.ai if a source connector is running OOM when it tries to sync a specific stream, is it possible that the parent stream data is being entirely loaded in memory before the child stream starts iterating through the parent data to retrieve data?
    k
    • 2
    • 7
1...4445464748Latest