https://linen.dev logo
Join SlackCommunities
Powered by
# ask-community-for-troubleshooting
  • r

    Robert Put

    10/12/2022, 10:29 PM
    If i run airbyte on EC2 And configure a remote database 1. Can i treat the instance as ephemeral or is there data that needs to be stored there? Its only for remote sources/connections. 2. If i deleted it and re-created a few days later would it start from where it last knew it was and continue without any issues once the next sync was ran for a connector? Mainly thinking of incremental sync's
    ✍️ 1
    s
    • 2
    • 5
  • w

    wp

    10/12/2022, 10:37 PM
    Hi, anyone have issues with the Google Ads connector 0.2.1 -> BQ denormalized 1.2.4 with GCS staging ?
    Copy code
    BigQueryError{reason=invalidQuery, location=query, message=Field ad_group_criterion_final_urls has incompatible types. Configured schema: string. Avro file: record}
    Is this a bug on the source connector?
    ✍️ 1
    m
    • 2
    • 16
  • m

    Mohit Reddy

    10/13/2022, 3:48 AM
    We are deploying airbyte as part of our kubernetes cluster in a namespace which by default injects a sidecar to every pod which gets created in it. This causes the following logs -
    Copy code
    2022-10-13 03:37:17 INFO i.a.w.p.KubePodProcess(<init>):572 - Pod IP: 10.105.236.201
    2022-10-13 03:37:17 INFO i.a.w.p.KubePodProcess(<init>):579 - Using null stdin output stream...
    2022-10-13 03:37:17 ERROR i.a.w.g.DefaultCheckConnectionWorker(run):98 - Unexpected error while checking connection: 
    java.lang.NullPointerException: null
    	at java.io.Reader.<init>(Reader.java:168) ~[?:?]
    	at java.io.InputStreamReader.<init>(InputStreamReader.java:112) ~[?:?]
    	at io.airbyte.commons.io.IOs.newBufferedReader(IOs.java:120) ~[io.airbyte-airbyte-commons-0.40.14.jar:?]
    	at io.airbyte.commons.io.LineGobbler.<init>(LineGobbler.java:99) ~[io.airbyte-airbyte-commons-0.40.14.jar:?]
    	at io.airbyte.commons.io.LineGobbler.gobble(LineGobbler.java:67) ~[io.airbyte-airbyte-commons-0.40.14.jar:?]
    	at io.airbyte.commons.io.LineGobbler.gobble(LineGobbler.java:28) ~[io.airbyte-airbyte-commons-0.40.14.jar:?]
    	at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:65) ~[io.airbyte-airbyte-workers-0.40.14.jar:?]
    	at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:36) ~[io.airbyte-airbyte-workers-0.40.14.jar:?]
    	at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:161) ~[io.airbyte-airbyte-workers-0.40.14.jar
    To get around this, we disable injecting the sidecar by adding an annotation to the jobs - https://docs.airbyte.com/operator-guides/configuring-airbyte/#jobs-specific (specifically CHECK_JOB_KUBE_ANNOTATIONS). We recently upgraded airbyte from 0.39.1 to 0.40.14 and this has started to fail again i.e. the annotation is not being applied. Any help here?
    ✍️ 1
    n
    • 2
    • 7
  • s

    Shashank Tiwari

    10/13/2022, 10:22 AM
    Copy code
    0/2 nodes are available: 1 Too many pods, 1 node(s) didn't find available persistent volumes to bind.
    This is the warning coming in the pod for airbyte-db when deployed on EKS. Can anyone help me in this
    ✅ 1
    k
    m
    +2
    • 5
    • 48
  • s

    Sefath Chowdhury

    10/13/2022, 12:42 PM
    Hi everyone! Has anyone faced the issue of real-time-replication (using CDC) breaking do to schema changes on the source? how have you gotten around this issue?
    ✍️ 1
    m
    • 2
    • 2
  • s

    sonti srihari

    10/13/2022, 12:48 PM
    Hi Team is airbyte interface online solution ?
    ✍️ 1
    h
    • 2
    • 2
  • n

    Nir Chamo

    10/13/2022, 1:32 PM
    Hey, very new to airbyte. I'm trying to use the File source type with a csv places locally on my machine. The problem is it seems airbyte can't find the file, as follows - "No such file or directory" Tried to look it up but didn't seem like it's common, maybe there is a specific folder I should place my file in? it's a csv format file
    ✍️ 1
    a
    s
    • 3
    • 5
  • s

    Stuart Horgan

    10/13/2022, 3:46 PM
    hi guys, just a quick question - i'm trying to set up an incremental connection with HTTP API, I"ve nearly got there but I'm puzzled over the initial setting of the state/cursor value when you run it for the first time. According to the tutorial, we should have something like this in read_records:
    Copy code
    def read_records(self, *args, **kwargs) -> Iterable[Mapping[str, Any]]:
            for record in super().read_records(*args, **kwargs):
                if self._cursor_value:
                    latest_record_date = record[self.cursor_field]
                    self._cursor_value = max(self._cursor_value, latest_record_date)
                yield record
    and this is how we get the new state value for future runs to use. But the first time you run it, the self._cursor_value is set to None, so we never enter the if statement and update to the latest value. So what is supposed to happen here? How do we get the correct state returned at the end of the first run instead of the start date being returned unchanged?
    ✍️ 1
    • 1
    • 4
  • a

    Alberto Colon

    10/13/2022, 2:31 PM
    Hi, I'm trying to set up a postgres source and postgres destination. We are using airbyte (0.40.14) on arm64 in a k3s cluster. When I try to create the postgres source, a pod is created like this:
    rce-postgres-check-9688c843-03b2-40af-9b99-98b9d45e18f9-0-ixmtc   0/4     Init:Error   0               6m58s   10.42.7.8     b020114-47b9054   <none>           <none>
    The logs of this pod:
    Timeout while attempting to copy to init container, exiting with code 1...
    and in the airbyte UI we get this error (attachment): Do I need to setup something else in the env file for kubernetes? I really don't have any clue about this bug/error. Thanks in advance...
    👀 1
    ✍️ 1
    h
    m
    • 3
    • 28
  • j

    Jhon Edison Bambague Calderon

    10/13/2022, 3:33 PM
    Hello everyone!!! I have a question, are the next Jira tables supported by Airbyte connector Jira?
    ✍️ 1
    s
    • 2
    • 2
  • j

    Jordan Young

    10/13/2022, 4:39 PM
    Hi all, I wanted to get the community's thoughts on this: My team and I have gotten to the point where we've developed several internally useful custom connectors and would like to check in some commits. We don't want to open PR's on the main airbyte repo because our connectors won't be broadly useful (really specific use cases at this point) and we're primarily a gitlab shop these days. Our current thinking is to run airbyte from the main repo and keep our custom connectors in a separate repo, then programmatically add them with octavia/ the api. Thoughts on using this pattern?
    ✍️ 1
    s
    • 2
    • 3
  • s

    sar

    10/13/2022, 5:58 PM
    Having an issue with the
    destination-aws-datalake
    connector. We have a table that’s more than 15+ million rows and whenever we try to run a sync using that connector, the docker container that gets spun out to run the sync ends up chewing all the host memory and eventually crashing the EC2 instance. Tried setting some global docker limits (as i can’t do it on the container level since it gets spun up when the sync job starts) to no success as
    docker stats
    kept showing the available memory on the host. Tried resizing the instance and even with 64GB of RAM on the host, still ran into the same issue. https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/destination-aws-datalake Anyone run into something similar using that connector and how did you end up resolving it? Here’s a bit more info on the issue itself - https://github.com/mitodl/ol-data-platform/issues/371
    ✍️ 1
    s
    • 2
    • 2
  • d

    Dusty Shapiro

    10/13/2022, 6:34 PM
    K8s/Helm question: I’m attempting to use an external DB instead of the default container db, but it doesn’t look like the changes I’ve had any affect. I changed in the helm values:
    Copy code
    postgresql:
      enabled: false
    externalDatabase:
      host: ${db_host}
      user: ${db_user}
      password: ${db_password}
      database: ${db_name}
      port: ${db_port}
    ✍️ 1
  • p

    Pulkit Srivastava

    10/13/2022, 7:15 PM
    Hello team, We have been using airbyte for long time and have been running around 6000 syncs daily we just now moved from 0.33.5-alpha to 0.39.4-alpha what we observed was airbyte api
    sync
    endpoint is taking more time than when we were using 0.33.5-alpha can you please let me know if this is expected behaviour or not previously it was taking less than 1 sec in new version its taking 6 to 7 seconds. If any more info is needed I am happy to provide. Thanks
    ✍️ 1
    h
    • 2
    • 6
  • d

    David hatch

    10/13/2022, 8:04 PM
    Hi, I am working on a custom connector for airbyte. I’m running into this error during the sync normalization process:
    is of type timestamp without time zone but expression is of type text
    . In my schema i have configured this column as follows:
    Copy code
    "block_day": {
                "type": "string",
                "format": "date-time",
                "airbyte_type": "timestamp_without_timezone"
            },
    Any ideas on how to resolve this would be appreciated. I’ve tried a couple variations of the configuration and have searched around on the airbyte github issues but haven’t found a solution.
    ✍️ 1
    u
    m
    • 3
    • 11
  • m

    Matt Webster

    10/13/2022, 8:07 PM
    Does anyone have any troubleshooting tips or workarounds for the Redshift destination connector problem where any incoming VARCHAR field over 65535 blows up the data sync? I’ve tried fumbling around and truncating long data from VARCHAR columns but I haven’t been able to solve it that way. I see messages like these in the worker logs but they don’t mean much to me (I can’t tell where it’s getting stuck):
    Copy code
    13 of 33 ERROR creating incremental model loading.log_scd................................................... [ERROR in 16.07s]
    Database Error in model log_scd (models/generated/airbyte_incremental/scd/loading/log_scd.sql)
      Invalid input
      DETAIL:  
        -----------------------------------------------
        error:  Invalid input
        code:      8001
        context:   CONCAT() result too long for type varchar(65535)
        query:     417552
        location:  string_ops.cpp:108
        process:   query0_113_417552 [pid=690]
        -----------------------------------------------
      compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_incremental/scd/loading/log_scd.sql
    Any ideas would be greatly appreciated! Here is the root issue in GitHub: https://github.com/airbytehq/airbyte/issues/14441
    ✍️ 1
    s
    e
    • 3
    • 8
  • e

    Eduardo Aviles

    10/13/2022, 6:52 PM
    Hi, i'm having troubles with monday connector:
    Copy code
    2022-10-12 09:22:30 [44msource[0m > Encountered an exception while reading stream items
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 971, in json
        return complexjson.loads(self.text, **kwargs)
      File "/usr/local/lib/python3.9/json/__init__.py", line 346, in loads
        return _default_decoder.decode(s)
      File "/usr/local/lib/python3.9/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/usr/local/lib/python3.9/json/decoder.py", line 355, in raw_decode
        raise JSONDecodeError("Expecting value", s, err.value) from None
    json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 115, in read
        yield from self._read_stream(
      File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 165, in _read_stream
        for record in record_iterator:
      File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 254, in _read_full_refresh
        for record in records:
      File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 417, in read_records
        response = self._send_request(request, request_kwargs)
      File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 336, in _send_request
        return backoff_handler(user_backoff_handler)(request, request_kwargs)
      File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
        ret = target(*args, **kwargs)
      File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
        ret = target(*args, **kwargs)
      File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 291, in _send
        if self.should_retry(response):
      File "/airbyte/integration_code/source_monday/source.py", line 55, in should_retry
        is_complex_query = response.json().get("errors")
      File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 975, in json
        raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
    requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    2022-10-12 09:22:30 [44msource[0m > Finished syncing items
    1650afd1_5854_4435_87ce_4c1b3063d3db_logs_37_txt.txt
    ✍️ 1
    s
    • 2
    • 4
  • s

    Simon Thelin

    10/13/2022, 8:43 PM
    Hello. I am currently using incremental updates with the postgres source to s3 destination. I have noticed that even though I am using an
    update
    column on a
    pg
    table, it seems to read through all data of the table, but only writes the
    delta
    changes. On some table syncs however, I can see it only reads very small chunks. Is there a particular reason for this behaviour?
    ✍️ 1
    s
    y
    +2
    • 5
    • 9
  • l

    Luca Moity

    10/13/2022, 1:42 PM
    When I try to connect my Zendesk instance as source via API token, im getting the following error: "'Organization access is not enabled. Please check admin permission of the current account'" My credentials are fine...
    ✍️ 1
    • 1
    • 4
  • t

    Tammy Shipps

    10/13/2022, 10:04 PM
    Hello all! I was SO CLOSE to not needing to ask for help…so very close. I actually did get the synced data over but I have a major issue with it not ending up in the same schema - I’m going into Postgres, so it’s using the schema setup shown here -> https://docs.airbyte.com/integrations/destinations/postgres#schema-map, which of course, doesn’t work on the other end if you need the original schema. I’m really hoping there’s an easy fix :)
    ✍️ 1
    m
    • 2
    • 10
  • e

    Edgar Valdez

    10/14/2022, 2:41 AM
    Hi there! Is there a way that when extracting data from PG to S3 the target path doesn’t append a
    public
    folder? ie: Target bucket:
    abc
    and after running the job, tables are stored at:
    abc/public/table_1
    I’d like:
    abc/table_1
    Cheers! PS: Target file is a parquet file
    ✍️ 1
    m
    • 2
    • 3
  • w

    Wayne

    10/14/2022, 3:24 AM
    Hi, all, I am user the docker version of Airbyte, I have synced a few databases that is huge. Now i want to switch the airbyte to a different machine, so I would like to clone everything there without the need to re-sync each database. How ccan I do it? I have copied the airbyte folder and also all the docker images over, but I still see a brand new airbyte when I launch it. What am I missing?
    ✍️ 1
    m
    s
    • 3
    • 7
  • m

    Manuveeran Sandhu

    10/14/2022, 6:12 AM
    Hey, I am Data Engineer working at a startup: UNA Brands. We have been using Airbyte for last 8 months on EC2, and now we are doing the load testing on aws EKS. This is my first post on airbyte slack. One of issue I am facing while testing out the loads on kubernetes aws EKS is that when running 25/30+ connections in parallel, the loads are not getting successful exit status after finishing the load(but I do see kubernetes pods getting completed for both source & destination) while webapp keep on saying it's running the loads. You can see in below screenshot and logs that connection ran for about 4hrs approx, but UI can't fetch the exit code sooner and says 9hrs are taken to complete start_time: 2022-10-13 192934 UTC end_time: 2022-10-13 234107 UTC Please let me know if you want to know more about the configuration settings we are using, happy to share across
    logs-7.txt
    ✍️ 1
    • 1
    • 2
  • b

    Bogdan

    10/13/2022, 1:45 PM
    Hi everyone! I am trying to set up a new Google Ads source with my newly created credentials, but I get this error. What could be the problem?
    ✍️ 1
    m
    • 2
    • 9
  • e

    Emilja Dankevičiūtė

    10/14/2022, 9:43 AM
    hello, I'm testing out airbyte with an orchestrator tool (airflow, kestra, etc.) and I can see that there is a username and a password for airbyte webapp(e.g. https://airflow.apache.org/docs/apache-airflow-providers-airbyte/1.0.0/connections.html). In our airbyte web UI I can't find anything regarding any user management or auth and it looks like the auth is provided by a proxy. We're running open source airbyte on kubernetes. What are those values in airflow's airbyte connection spec? Are they for airbyte cloud?
    ✍️ 1
    h
    • 2
    • 7
  • l

    laila ribke

    10/14/2022, 11:52 AM
    Hi, I need to change the destination for all my connectios from postgresql to redshift. Is it possible to do without loosing the data I already have and without setting new connections with the new destination?
    ✍️ 1
    s
    • 2
    • 4
  • t

    Travis James

    10/14/2022, 12:43 PM
    How can I get AirByte to also create primary keys, foreign keys, and indexes when moving a SQL Server database to a Postgres database (or any other cross database conversion)? I have a few projects where I was attempting to use AirByte and the destination database is unusable without having all the relationships and keys copied along with the data. Also how can I handle differences in types like between SQL Server relative timestamps and Postgres timestamps, GUID and UUID, etc.?
    ✅ 1
    ✍️ 1
    h
    • 2
    • 2
  • k

    Kevin Millan

    10/10/2022, 9:31 PM
    Hello everyone, I'm trying to create an
    Incremental | Deduped + history
    table by migrating from redshift to postgres but, I encountered a couple of errors along the way. I'm using a
    datetime
    column as the
    cursor field
    : 1- I was first getting an error from airbyte saying that the table
    *_stg
    needed a
    REPLICA IDENTITY
    . I then added the field manually with
    ALTER TABLE analytics.revenue REPLICA IDENTITY FULL;
    2- Now I'm getting the same error (log file attached) but with the
    *_scd
    table. But the issue seems to be that it is having problems created the
    _scd
    table in the first place.
    logs-15681.txt
    🥲 1
    ✍️ 1
    h
    • 2
    • 11
  • r

    Robert Put

    10/14/2022, 6:39 PM
    When using the CLI, how would you refresh the schema on a connection? Is the option to update in the ui and then import?
    ✍️ 1
    m
    • 2
    • 6
  • l

    Lucas Gonthier

    10/14/2022, 6:50 PM
    Hello, my team would like to use Airbyte in production to fetch our client data on Shopify. However from what I can see on the Airbyte documentation, the connector is in alpha version. I would like to know if it is safe to use it in prod ? We would like from our customer to download our shopify app and then run an ELT process with Airbyte after getting an access token.
    m
    m
    • 3
    • 3
1...757677...245Latest