https://linen.dev logo
Join Slack
Powered by
# ask-community-for-troubleshooting
  • f

    Felix Gondwe

    02/07/2023, 3:51 PM
    Hi, using airbyte to stream data from places like shopify into Google Cloud Big-Query. Using the recommended Google cloud Storage and default normalized data option for transformations. The streams are slow and wondering if anyone has recommendation to optimize for faster streaming?
    n
    • 2
    • 1
  • s

    Svatopluk Chalupa

    02/07/2023, 3:52 PM
    Hi, we've got a problem with basic normalization after an incremental load on Postgres2Postgres sync (append incremental). We're loading quite large tables, around 150GB, first sync lasts almost 24 hours and normalization takes a few hours of it. That's expected. But now, every incremental load contains just hundreds of rows, EL is done within a minute, and normalization then freezes, not fininshing within a day! Can anyone tell me, what is a problem? Does it operates with whole normalized 150GB table or what? Should I change some settings or what can I do? Thanks.
    plus1 2
    u
    • 2
    • 3
  • k

    Katja Wiesmüller

    02/07/2023, 4:10 PM
    Hi everyone, I have tried to connect the Amazon Ads API to BigQuery. There is a large amount of data and the first attempts to sync failed after 24h. The first approach was to limit the sync based on profiles. The sync of 100 profiles also stopped after several hours. Transferring 10 profiles took about 5h (1,200,000 records; 2,2 GB). We currently deployed Airbyte on one VM, so the next step was to scale up the VM to different configurations. Unfortunately, this did not improve the performance. The goal is to sync about 3000 profiles daily. Has anyone had a similar problem before, or a suggestion on how to sync such a large amount of data from the Amazon Ads API most efficiently?
    plus1 7
    u
    c
    u
    • 4
    • 5
  • c

    Chris

    02/07/2023, 4:26 PM
    Hi, I am trying to sync data from Bing Ads to BigQuery. I created VM on GCE and sync failed. I increased the disk size and it seems to work, and it failed the next day. I realized that the disk space is constantly on rise. It seems that Airbyte sync uses up some disk space in the process, and the disk space not deleted after finished(?). The question is: does the disk space used in the previous sync get erased before new one runs the next time? Or do you have to somehow manually delete it, or set it to be deleted in the config file somewhere?
    s
    n
    • 3
    • 8
  • s

    Siddhant Singh

    02/07/2023, 4:58 PM
    Hi Airbyte team Need help with this PR https://github.com/airbytehq/airbyte/pull/20749
    u
    • 2
    • 1
  • w

    Walker Philips

    02/07/2023, 7:04 PM
    Do Source/Destination Docker Containers start /build from scratch each Sync? For example, if I were to write to a text a file within the Source/Destination file directory will that text file be retained, with the new edits, between syncs? I have a list of files that I would like to remember if they have been processed yet, I figured adding them to a "State" object may be a bit of a hack and could grow beyond the intended size limits on it if it cannot handle what could turn into a large string object as each filename gets added to it.
    n
    • 2
    • 1
  • e

    EJ Oruche

    02/07/2023, 7:37 PM
    hello all, I am trying to pull and sync our user’s data from tools like slack so that we can provide them with relevant results from those tools in our app. I was trying to sync data from slack to both weaviate and typesense and it didnt work for me. I think I may have been rate limited but it was not clear. In contrast, I was able to do move data from slack to google sheets easily. I am not sure if its b/c google sheets was done via airbyte cloud and the others were in my local instance
    m
    • 2
    • 1
  • h

    Herry

    02/07/2023, 7:58 PM
    Hi All. I'm using Airbyte opensource
    0.40.32
    and stripe
    1.0.1
    . I tried to create a connection from Stripe to S3 bucket but got
    Unknown error occurred
    on the UI. This is really strange for me because all connection tests on stripe and s3 are passed without problem, and other connections that are using the s3 bucket are good. Is there any known bugs in Stripe source connection? I attached the worker logs in the thread.
    l
    u
    • 3
    • 5
  • l

    Lior Chen

    02/07/2023, 8:53 PM
    hi, can we use google sheets as a destination in a self hosted airbyte? I see it as an option in my deployment. but not sure because the docs said its only supported in airbyte cloud (airbyte version 0.40.32)
    n
    r
    • 3
    • 4
  • j

    Jake Johnson

    02/07/2023, 9:02 PM
    What's the recommended solution for adding user authentication to our self-hosted airbyte instance? How can allow logins from only certain emails to access our airbyte?
    m
    • 2
    • 1
  • j

    Jake Johnson

    02/07/2023, 9:07 PM
    We deployed Airbyte to our Kubernetes cluster using helm and would prefer to not use plural or restack
    l
    • 2
    • 1
  • z

    Zaza Javakhishvili

    02/07/2023, 10:07 PM
    Guys, This fragment from connection creation JSON and .env file indicates the same? Logical .env indicates default values. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- P.S. can you help with a measurement units? -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Connection config JSON:
    Copy code
    resourceRequirements: {
    optional resource requirements to run workers (blank for unbounded allocations)
    
    cpu_request: string
    cpu_limit: string
    memory_request: string
    memory_limit: string
    }
    .env file:
    Copy code
    ### JOBS ###
    # Relevant to scaling.
    SYNC_JOB_MAX_ATTEMPTS=3
    SYNC_JOB_MAX_TIMEOUT_DAYS=3
    JOB_MAIN_CONTAINER_CPU_REQUEST=
    JOB_MAIN_CONTAINER_CPU_LIMIT=
    JOB_MAIN_CONTAINER_MEMORY_REQUEST=
    JOB_MAIN_CONTAINER_MEMORY_LIMIT=
    n
    • 2
    • 2
  • j

    Jordan Fox

    02/07/2023, 10:34 PM
    An issue just popped up where after loading a bunch of blob files into other directories not managed by Airbyte but in the same container, the sync's have significantly slowed down using Azure blob storage destination: First image shows each container is taking 30+ seconds to register, second image shows it was taking ~1 second previously.
    u
    • 2
    • 1
  • p

    Paulo José Ianes Bernardo Filho

    02/07/2023, 11:04 PM
    Hello guys! How are you? I am trying to test the Airbyte and I got this error: Stack Trace: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist But my other friends don't get this error. I looked in my console and I have a 401 error. Maybe I don't have permission to request the create API and because of this my env variables are not exported. Another thing, in my first run the Airbyte works well and after this I started to get this error I tried: Cleaned all images and volumes but not work also Thank you very much! Regards
    n
    • 2
    • 1
  • c

    clatko

    02/07/2023, 11:46 PM
    has there really been no progress on GCS as a source referenced here? - https://github.com/airbytehq/airbyte/issues/11135 anybody have info on this?
    m
    • 2
    • 1
  • m

    Mathieu Stark

    02/08/2023, 12:06 AM
    Hi everyone! Good afternoon! I'm getting this page when I try to see the status of a connector. Thoughts?
    e
    k
    u
    • 4
    • 11
  • g

    Ganpat Agarwal

    02/08/2023, 4:36 AM
    I am sure Airbyte has not stopped taking contribution from outside users, but the PR review process has become terribly slow. An example PR : https://github.com/airbytehq/airbyte/pull/18677 which is open for 3 months now. I know team is busy in handling high priority work, but any guidance along these lines will help.
    j
    m
    • 3
    • 5
  • h

    Han Hu

    02/08/2023, 6:12 AM
    Hi Team, I tried the Airbyte cloud. It looks like importing custom connectors is not available at this moment ? If no, when would this feature be released out cuz it is helpful ?
    ✅ 1
    m
    • 2
    • 3
  • a

    Ayushman Singh

    02/08/2023, 6:52 AM
    Hello folks, I am using Airbyte open source. What I’m trying to do: 1. Ingest data from some sources ( full refresh, I want to check for updates as well) 2. Do some minor transformations on the data 3. UPSERT data into destination What I think would work: • I use DBT module to write queries that would UPSERT the data at destination instead of overwriting What I need help with: 1. I am confused when and where does the SQL generated by DBT execute a. Does it run on the data from the source b. Does it run on the raw data copied to the destination c. Does it run after the data is copied to destination (after normalization) Please help. thank you
    a
    u
    u
    • 4
    • 7
  • k

    Kacper Adler

    02/08/2023, 8:07 AM
    https://airbytehq.slack.com/archives/C021JANJ6TY/p1675768239117059 Hey guys, please I'd love to get some help with this
    u
    • 2
    • 1
  • s

    Sabbiu Shah

    02/08/2023, 8:37 AM
    Hi all, Objective: I have data from multiple source accounts and need to sync them to a single destination (with extra column for account_id) using airbyte? (Source: square—multiple accounts; Destination: postgres) Approaches: 1. Edit the
    source-connector
    to add account_id in response body and schema. - If I go with this approach I'll have to create source myself and cannot utilize existing one. 2. Load data to different postgres schema, per account. And, combine data from all postgres schema into a global one. - How should I approach the combination of data? - Will I be able to run normalization on the combined data (using airbyte's normalization module)? Question: What is the recommended approach to achieve this? Is there some easy to use tools developed already? Is there a better approach to handle this scenario? What I have found... • https://docs.airbyte.com/understanding-airbyte/namespaces#--custom-format • https://airbytehq.slack.com/archives/C019WEENQRM/p1633512549383200?thread_ts=1633498797.374500&cid=C019WEENQRM
    using prefixes (or custom namespace) to dispatch your source data per account in the destination. and then, using a custom SQL operations at the end of the sync to union all dispatched tables back into a single table but with an added column denoting which account it is coming from
    2️⃣ 1
    a
    n
    • 3
    • 5
  • s

    Soshi Nakachi仲地早司

    02/08/2023, 9:48 AM
    Hello, Teams. How to get dbt logs in kubernetes deployments? https://discuss.airbyte.io/t/how-to-get-dbt-logs-in-kubernetes-deployments/3844
    u
    • 2
    • 1
  • y

    Yannick Ouali

    02/08/2023, 10:20 AM
    Hello everyone! I am using airbyte to extract data from a MySQL databse to S3 (Minio) using CDC and .parquet as output. It works great and the api is so easy to use, thank you for that! I was wondering if anyone know a tool for my need. When using CDC, I have a new file at each synchronisation when changes are detected. I'd like to know if there is a way to version data based on cdc extraction from s3? For example, if I have those files containing CDC extractions: data-0.parquet data-1.parquet data-2.parquet I want to rebuild a file based on the version 0,1 or 2. I know this question is not related to Airbyte functionalities, but in case, if anyone has heard about a tool managing CDC extractions, it'll be so great! Thank you in advance, and sorry if this question is out of the box.
    n
    • 2
    • 2
  • s

    Sushant

    02/08/2023, 10:49 AM
    Hi ,how to incorporate schema changes for big volume of tables using 'incremental append ' mode. Full refresh might be costlier operation because of large volume of the data.
    u
    • 2
    • 1
  • j

    jonty

    02/08/2023, 12:31 PM
    Hey all, I'm trying to use Airbyte to pull data from Hubspot. At the moment, I'm only trying to sync the
    contacts
    table using both Incremental|Deduped and Full Sync strategy. However, in both cases, the logs show the following error (again and again):
    Copy code
    2023-02-08 12:28:24 source > Could not cast `undefined` to `<class 'int'>`
    Traceback (most recent call last):
      File "/airbyte/integration_code/source_hubspot/streams.py", line 480, in _cast_value
        casted_value = target_type(field_value)
    ValueError: invalid literal for int() with base 10: 'undefined'
    Airbyte v0.40.32, Hubspot Connector v0.3.1
    n
    • 2
    • 1
  • g

    Gleber Baptistella

    02/08/2023, 12:49 PM
    Hi guys! How are you? I've tried to find any similar thread about the subject but I couldn't on this channel. We have a lot of huge tables in PostgreSQL and I'd like to know if there is any way to speed up the ingestion or make a dump outside of Airbyte, load the data to data lake (S3) and after Airbyte just load data incrementally? I mean: first load would be done outside of Airbyte but incremental loads after that would be done by Airbyte.
    🙏 1
    d
    u
    u
    • 4
    • 5
  • u

    강민기

    02/08/2023, 1:45 PM
    hey guys! I've tried to load data to to Snowflake from MySQL I got same error 3 times 😞 Can i load over 100GB via CDC mode(incremental | deduped + history) ? Data be loaded, but can not use CDC in next time connection: mysql -> snowflake I'm not good at English. so plz excuse me.
    m
    u
    • 3
    • 3
  • d

    Dave Tomkinson

    02/08/2023, 1:49 PM
    Hi, I'm using the airbyte UI and trying to get it to load a tsv (tab separated values) from S3 rather than csv but can't figure out how to get a
    TAB
    in the delimiter box. \t gives
    ValueError('delimiter should contain 1 character only')
    and tabbing tabs to the next field as expected.
    m
    n
    • 3
    • 7
  • s

    Sharath Chandra

    02/08/2023, 2:26 PM
    Mixpanel Source: On mixpanel I see data. When I try to sync, I see airbyte saying read 0 records. Attaching all the screenshots. Could some one please help?
    m
    • 2
    • 1
  • l

    Lior Chen

    02/08/2023, 2:49 PM
    hi, can we scale airbyte cluster for extra resiliency ? we’re running k8s and soon we’ll deploy hundreds and potentially thousands of connectors. has anyone reached that scale? does adding more airbyte-worker and airbyte-server to the replicaset does the job?
    u
    • 2
    • 2
1...138139140...245Latest