https://linen.dev logo
Join Slack
Powered by
# troubleshooting
  • m

    Mahesh

    03/01/2022, 10:22 AM
    Help appreciated
  • m

    Mahesh

    03/01/2022, 11:37 AM
    Is this your first time deploying Airbyte: Yes OS Version / Instance:MacOs Memory / Disk: 8Gb Deployment: Docker Airbyte Version: 0.35.39-alpha Source name/version: Destination name/version: Step: Setting new connection Description: I can’t make my firts connection, I’ve got a Mac M1 and I’m aware of the troubles involving it, but I’ve followed the instructions in github and I still get the same error.
  • b

    BERKIN

    03/02/2022, 11:48 AM
    AirByte Synch stop abruptly after few days of successfull run . Connection is enabled for fequency 24 hours, but synch not happening !!!. How to check the log, synch is not even triggered
    a
    h
    • 3
    • 11
  • k

    Kemp Po

    03/08/2022, 4:15 PM
    Is this your first time deploying Airbyte: No / Yes OS Version / Instance: GKE n1-standard-4 Memory / Disk: 4gb Deployment: Kubernetes Airbyte Version: 0.35.46-alpha Source name/version: Zendesk Support 0.2.0 Destination name/version: Google Cloud Storage (GCS) 0.1.24 Step: On initial sync Description: having issues with the GCS destination parquet files? I can write to csvs fine but not parquets, I have a hunch that it might be the url its using is
    s3a://
    instead of
    gs://
    ? All default settings except compression codec = SNAPPY
    a
    m
    h
    • 4
    • 8
  • k

    konrad schlatte

    03/10/2022, 12:08 PM
    Is this your first time deploying Airbyte: No OS Version / Instance: EC2 Docker-compose Memory / Disk: 16GB RAM / 4GB CPU t3-xlarge Deployment: Docker Airbyte Version: 0_.30.15-alpha_ I am running a custom source connector Salesforce Marketing cloud with destination Snowflake and getting the following timeout error:
    Copy code
    2022-03-10 08:35:13 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Done retrieving results from 'sent' endpoint
    2022-03-10 08:35:13 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Updating state.
    2022-03-10 08:35:13 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Fetching sent from 2022-03-09T12:00:00Z to 2022-03-09T12:30:00Z
    2022-03-10 08:35:13 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Making RETRIEVE call to 'sent' endpoint with filters '{'Property': 'EventDate', 'SimpleOperator': 'between', 'Value': ['2022-03-09T12:00:00Z', '2022-03-09T12:30:00Z']}'.
    2022-03-10 08:35:13 ERROR () DefaultAirbyteStreamFactory(internalLog):88 - Request failed with 'Error: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.'
    2022-03-10 08:35:13 ERROR () DefaultAirbyteStreamFactory(internalLog):88 - Traceback (most recent call last):
    
    2022-03-10 08:35:13 ERROR () DefaultAirbyteStreamFactory(internalLog):88 -   File "/usr/local/lib/python3.7/site-packages/tap_exacttarget/__init__.py", line 135, in do_sync
    I can resolve this by reducing the "pagination window" from 30 minutes to 5 minutes for example. i.e. it appears that at this time interval there is too much data that needs to be processed - hence the timeout. I am wondering whether there is another way to handle this error. There is an outstanding Pr for this connector as well https://github.com/airbytehq/airbyte/pull/10026.
    a
    • 2
    • 3
  • o

    Oluwapelumi Adeosun

    03/11/2022, 6:52 AM
    Some tables are missing when I do a
    refresh source schema
    . Is this a bug or how can I ensure all the tables from the
    source
    are loaded into the
    destination
    I specified? The source is a PostgreSQL DB running on Amazon RDS.
    a
    m
    • 3
    • 6
  • g

    Gary K

    03/11/2022, 7:29 AM
    Hi everyone 👋 Apologies if I'm making a few assumptions here, (it's friday afternoon and i've only done minimal searching), but I'm wondering if/how I can change the
    number -> double precision
    conversion that appears to be happening with the postgres connector (0.3.15 in airbyte 0.35.42-alpha)? I've got a mysql source bigint column stored with full precision in the _airbyte_data json, but the normalization is converting it to a double and I'm losing precision 😱 (Note, I'd rather not have to do a custom normalisation (from raw) of all the connection streams manually; ie no heavy lifting on my part if possible 🏋️)
    a
    c
    • 3
    • 5
  • c

    Connor Francis

    03/11/2022, 10:51 PM
    I'm not 100% sure this is the right channel. Currently my source is a Postgres database with many schemas. Using WAL replication I'm listening to all these schemas and dumping them to the same destination. I'd like to include source schema name as an additional column in the destination table. For example I might have schemas:
    moe, larry and curly.
    All three of these source schemas have the same table called
    stooges
    . My destination would only have a single schema called
    public
    and I would like all three sources to dump into the same
    stooges
    table in this destination schema; however, I would like to add an additional text column in the destination table called source_schema which would take on the value of
    moe, larry and curly
    .
    m
    • 2
    • 4
  • a

    Adam Schmidt

    03/14/2022, 6:53 AM
    Hey team, I'm really close to having the Gitlab connector running. I'm currently able to sync repos and other items to my warehouse for the top-level group, no problems. Problem: I'm wondering if the connector supports sub-groups, as this is how my teams keep themselves organised. Does the connector recurse through groups, sub-groups, sub-groups of sub-groups, and so on? Has anyone done this before? Edit: Seems as though the connector needs either or both of a group id and project id. If the groupID is left empty, the Gitlab api will return all of the groups that the API key has access to. This would be preferable to having to set a long list of space-delimited
    my-org%2fsome-subgroup
    as the group ID (which works!)
    a
    • 2
    • 3
  • n

    Nahid Oulmi

    03/14/2022, 10:32 AM
    Is this your first time deploying Airbyte: No OS Version / Instance: Debian GNU/Linux, 10 (buster), amd64 built on, GCP ,e2-standard-4 Memory / Disk: 4 CPU, 8Gb RAM Deployment: Docker Airbyte Version: 0.35.7-alpha Step: Setting up resources Description: My Airbyte jobs take too much of my VM’s RAM ; at some point the VM is down, with no access to the Airbyte’s UI nor via SSH; the only fix available is to shutdown & restart the instance. In order to avoid that, I would like to set a global RAM consumption threshold for all Airbyte, but I am not sure which solution is better ; • The Docker
    -memory
    parameter seems a good option but I am not sure if it works well within Airbyte deployments : https://docs.docker.com/config/containers/resource_constraints/#limit-a-containers-access-to-memory • The Airbyte specific parameter
    JOB_MAIN_CONTAINER_MEMORY_LIMIT
    works at a job level if I am not mistaken ; As I don’t know how many Airbyte jobs can be triggered at the same time, if I have 10 jobs consuming only 1GB of RAM at the same time it will cause the same issue, which is why I would prefer to set a global RAM threshold. What do you think would be the best option ?
  • a

    Arash Layeghi

    03/14/2022, 2:00 PM
    Would you please help with this?
    a
    k
    • 3
    • 5
  • n

    Nitin Jain

    03/14/2022, 3:17 PM
    @here We are syncing events from kafka to redshift. We have deployed the airbyte on kubernetes . Sometimes our pipeline is getting stuck. Source and destination pods are in runniing state for large amount of time. I have seen the instances where our destination redshift pod is running for 3-4 hours.
    h
    • 2
    • 4
  • r

    Robert Andrews

    03/14/2022, 4:32 PM
    Hi guys, looking just to confirm an error and my understanding align, am I right with the attached log being an issue solely on the sandbox of GCP rather than AirByte? Just experimenting with the deduped sync mode. @Talia Moyal
  • f

    Filipe Araújo

    03/14/2022, 5:43 PM
    Hi everyone! Airbyte Version: 0.35.53-alpha Source name/version: Hubspot 0.1.45 Destination name/version: Postgres 0.3.15 Step: Running Full Refresh Hubspot Description: By accessing the Hubspot dashboard I can see that I have around 362k engagement (activities), I run my connection on Saturday afternoon and till now its still importing with 32 Million (and counting) rows for the engagements. Can you help me try to understand what is happening? Since version 0.1.43 I can’t seem to get this running probably, with that version I got the 362k engagement. Thanks!
    m
    • 2
    • 22
  • m

    Madhup Sukoon

    03/14/2022, 6:04 PM
    Hi! I'm getting the following error when trying to deploy Airbyte through Helm:
    Copy code
    error validating data: unknown object type "nil" in Secret.data.postgresql-password
    I'm trying to get it to run with an external AWS RDS PGSQL DB. I Have defined the following params:
    Copy code
    postgresql.enabled
    externalDatabase.host
    externalDatabase.user
    externalDatabase.existingSecret
    externalDatabase.existingSecretPasswordKey
    externalDatabase.database
    I have not defined
    externalDatabase.password
    (because I want it to take the password from the secret) and the port number (The default should be correct.) Any ideas where I might be going wrong?
    m
    • 2
    • 1
  • w

    William Graham

    03/14/2022, 6:44 PM
    After updating to the latest release, our incremental syncs with the marketo source no longer work. The error we get in the logs is “KeyError: actionResult”--nothing overly productive to help us diagnose the issue. Any ideas?
  • o

    Owen Kephart

    03/14/2022, 8:24 PM
    Hi! Working on interacting w/ the Airbyte API programmatically and noticed a slight weirdness with the
    streamName
    field in the jobs/get response. Intuitively, I expected this name to be the same as the
    name
    field for the matching source in the
    syncCatalog
    of connections/get, but it seems that
    streamName
    actually includes the prefix, while
    name
    does not. So for example, if I had a connector with a prefix of
    foo
    , if
    streamName
    would be
    foo_actions
    , while
    name
    would be just
    actions
    .
    • 1
    • 1
  • a

    Aditya Rane

    03/15/2022, 1:17 AM
    Is this your first time deploying Airbyte: Yes OS Version / Instance: Amazon Linux Memory / Disk: 8 GB Deployment: Docker Airbyte Version: 0.35.46-alpha Source name/version: MSSQL 0.3.17 Destination name/version: Snowflake 0.4.20 Description: 2022-03-15 011306 ERROR i.a.w.DefaultReplicationWorker(run):168 - Sync worker failed. java.util.concurrent.ExecutionException: io.airbyte.workers.DefaultReplicationWorker$DestinationException: Destination process exited with non-zero exit code 1 • Can some one please help me with the internal staging connection with snowflake is there any script which I am suppose to run and missing out. I have also given ownership and usage for all future stages to the role AIRBYTE_ROLE but it stills fails.
    o
    h
    m
    • 4
    • 9
  • o

    Octavia Squidington III

    03/15/2022, 7:03 AM
    loading...
    a
    • 2
    • 1
  • o

    Octavia Squidington III

    03/15/2022, 8:07 AM
    loading...
    a
    • 2
    • 1
  • k

    Kevin Soenandar

    03/15/2022, 9:52 AM
    Hi team, I'm encountering odd behaviour with the basic normalization. I'm using a modified version of the latest Hubspot connector and for the associations field, I'm building it such that the
    companies
    table's ticket associations would have the following value:
    [ {"company_id": <some_value>, "ticket_id": <some_value>}, {"company_id": <some_value>, "ticket_id": <some_value>} ]
    My expectation is it should create a separate table once ingested into my Snowflake warehouse with
    company_id
    and
    ticket_id
    as the fields, per this documentation. However, this is not the case. Any idea what I'm missing here?
    h
    • 2
    • 1
  • k

    Keshav Agarwal

    03/15/2022, 10:30 AM
    3 sheets -> postgres 1 hubspot -> postgres 11 postgres -> postgres all increment, except 2 or 3 tables none are big ones we did not have a problem earlier, we used to run 20 more connectors
    h
    • 2
    • 1
  • o

    Octavia Squidington III

    03/15/2022, 11:12 AM
    loading...
    a
    i
    • 3
    • 2
  • b

    Brian Soares

    03/15/2022, 12:18 PM
    Hi @channel, This is my first time using Airbyte. I've come across this particular use case where I have to use Airbyte for Batch load from snowflake to Google cloud storage. So I'm able to establish the source as Snowflake and Destination as BigQuery but not Destination as GCS. The file size on an Average is approximately over 1GB. Are there any limitations as to file size which Airbyte can support while loading it to GCS from Snowflake?
    a
    m
    • 3
    • 5
  • n

    Nitin Jain

    03/15/2022, 12:22 PM
    @here We are syncing json events from kafka to redshift with basic normalisation. We tried using
    INSERT
    replica strategies, data is being synced but the pipeline is very slow. Looking at the docs we changed the replica strategy to
    COPY
    via giving the s3 credentials in the redshift destination. In
    COPY
    replica strategy, csv files are being written on s3, but only some partial data is being inserted into our redshift db. In the exmaple below, you can see pipeline read 39,100 records, I verified 4 different csvs were written on s3 one having
    16252
    records, another one having somewhere around 22k records, another one with 2k records.But the number of records written to redshift db is around
    16301
    . I have seen this if the multiple files are written to s3, only one of the file (randomly chosen ) is being synced with db. I m using full refresh | append mode for the pipeline. Attaching Image for better understanding
    a
    a
    • 3
    • 6
  • j

    Jayesh Patil

    03/15/2022, 1:07 PM
    Just wanted to +1 on the issue. I am seeing the same issue while pulling linkedin data into bigquery.
    a
    s
    • 3
    • 2
  • m

    Maxime Sabran

    03/15/2022, 1:31 PM
    Hi All, I am trying to set up the Facebook Marketing connection (connector up to date) but I get the error "FacebookAPIException('Error: 2635, (#2635) You are calling a deprecated version of the Ads API. Please update to the latest version: v13.0.')" Would you know if the connector is compatible with this version or am I doing something wrong ?
    a
    k
    m
    • 4
    • 4
  • m

    Michael Horvath

    03/15/2022, 1:39 PM
    The CONNECT_TIME and IDLE_TIME settings are unlimited for this account. The EXPIRE_TIME setting is 10 minutes. For parallel connections, there is no limit for SESSIONS_PER_USER. Overall utilization limits are: '"RESOURCE_NAME" "CURRENT_UTILIZATION" "MAX_UTILIZATION" "LIMIT_VALUE" "processes" "81" "127" " 600" "sessions" "101" "155" " 928"
    m
    • 2
    • 2
  • d

    Drew Fustin

    03/15/2022, 1:57 PM
    Hi, all. I’m in the process of setting up our data infrastructure. Hoping to use Airbyte for replication of our backend Postgres database into our Redshift data lake/warehouse. Spun up an EC2-hosted instance of Airbyte, and I seem to be not getting all the records that are in the source into my destination. I created a bug issue here: https://github.com/airbytehq/airbyte/issues/11158
    o
    a
    • 3
    • 3
  • s

    Saman Arefi

    03/15/2022, 1:58 PM
    Hi everyone, hope you're all having a sensational day. Could I get some pointers regarding Airbyte's scalability? The docs recommend a
    t2.large
    instance and describe, in details, how Airbyte is mainly memory and disk bound. I've been testing stuff out now on an
    t3.xlarge
    and noticed the following: Loading one large-ish Oracle table (~9GB, 7M rows) takes me about 30min, which I think is pretty good. Now, loading two at the same time via the same connector (9GB, 7M rows, 13 GB, 7M rows) takes an hour in total, with both taking up roughly an hour each. What gives? Looking at htop, I seem to be running more into a CPU limit as well, so I'm not sure what's causing this. These are my two largest table, but in production I'd use Airbyte for another 30 or so tables, each between 10k and 1M rows as well, so this doesn't seem to scale well. Or am I doing something wrong?
    a
    • 2
    • 5
12345...14Latest