Airbyte #ask-community-for-troubleshooting

Robert Put

10/12/2022, 10:29 PM

If i run airbyte on EC2 And configure a remote database 1. Can i treat the instance as ephemeral or is there data that needs to be stored there? Its only for remote sources/connections. 2. If i deleted it and re-created a few days later would it start from where it last knew it was and continue without any issues once the next sync was ran for a connector? Mainly thinking of incremental sync's

✍️ 1

10/12/2022, 10:37 PM

Hi, anyone have issues with the Google Ads connector 0.2.1 -> BQ denormalized 1.2.4 with GCS staging ?

Copy code

BigQueryError{reason=invalidQuery, location=query, message=Field ad_group_criterion_final_urls has incompatible types. Configured schema: string. Avro file: record}

Is this a bug on the source connector?

✍️ 1

Mohit Reddy

10/13/2022, 3:48 AM

We are deploying airbyte as part of our kubernetes cluster in a namespace which by default injects a sidecar to every pod which gets created in it. This causes the following logs -

Copy code

2022-10-13 03:37:17 INFO i.a.w.p.KubePodProcess(<init>):572 - Pod IP: 10.105.236.201
2022-10-13 03:37:17 INFO i.a.w.p.KubePodProcess(<init>):579 - Using null stdin output stream...
2022-10-13 03:37:17 ERROR i.a.w.g.DefaultCheckConnectionWorker(run):98 - Unexpected error while checking connection: 
java.lang.NullPointerException: null
	at java.io.Reader.<init>(Reader.java:168) ~[?:?]
	at java.io.InputStreamReader.<init>(InputStreamReader.java:112) ~[?:?]
	at io.airbyte.commons.io.IOs.newBufferedReader(IOs.java:120) ~[io.airbyte-airbyte-commons-0.40.14.jar:?]
	at io.airbyte.commons.io.LineGobbler.<init>(LineGobbler.java:99) ~[io.airbyte-airbyte-commons-0.40.14.jar:?]
	at io.airbyte.commons.io.LineGobbler.gobble(LineGobbler.java:67) ~[io.airbyte-airbyte-commons-0.40.14.jar:?]
	at io.airbyte.commons.io.LineGobbler.gobble(LineGobbler.java:28) ~[io.airbyte-airbyte-commons-0.40.14.jar:?]
	at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:65) ~[io.airbyte-airbyte-workers-0.40.14.jar:?]
	at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:36) ~[io.airbyte-airbyte-workers-0.40.14.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:161) ~[io.airbyte-airbyte-workers-0.40.14.jar

To get around this, we disable injecting the sidecar by adding an annotation to the jobs - https://docs.airbyte.com/operator-guides/configuring-airbyte/#jobs-specific (specifically CHECK_JOB_KUBE_ANNOTATIONS). We recently upgraded airbyte from 0.39.1 to 0.40.14 and this has started to fail again i.e. the annotation is not being applied. Any help here?

✍️ 1

Shashank Tiwari

10/13/2022, 10:22 AM

Copy code

0/2 nodes are available: 1 Too many pods, 1 node(s) didn't find available persistent volumes to bind.

This is the warning coming in the pod for airbyte-db when deployed on EKS. Can anyone help me in this

✅ 1

Sefath Chowdhury

10/13/2022, 12:42 PM

Hi everyone! Has anyone faced the issue of real-time-replication (using CDC) breaking do to schema changes on the source? how have you gotten around this issue?

✍️ 1

sonti srihari

10/13/2022, 12:48 PM

Hi Team is airbyte interface online solution ?

✍️ 1

Nir Chamo

10/13/2022, 1:32 PM

Hey, very new to airbyte. I'm trying to use the File source type with a csv places locally on my machine. The problem is it seems airbyte can't find the file, as follows - "No such file or directory" Tried to look it up but didn't seem like it's common, maybe there is a specific folder I should place my file in? it's a csv format file

✍️ 1

Stuart Horgan

10/13/2022, 3:46 PM

hi guys, just a quick question - i'm trying to set up an incremental connection with HTTP API, I"ve nearly got there but I'm puzzled over the initial setting of the state/cursor value when you run it for the first time. According to the tutorial, we should have something like this in read_records:

Copy code

def read_records(self, *args, **kwargs) -> Iterable[Mapping[str, Any]]:
        for record in super().read_records(*args, **kwargs):
            if self._cursor_value:
                latest_record_date = record[self.cursor_field]
                self._cursor_value = max(self._cursor_value, latest_record_date)
            yield record

and this is how we get the new state value for future runs to use. But the first time you run it, the self._cursor_value is set to None, so we never enter the if statement and update to the latest value. So what is supposed to happen here? How do we get the correct state returned at the end of the first run instead of the start date being returned unchanged?

✍️ 1

Alberto Colon

10/13/2022, 2:31 PM

Hi, I'm trying to set up a postgres source and postgres destination. We are using airbyte (0.40.14) on arm64 in a k3s cluster. When I try to create the postgres source, a pod is created like this:

rce-postgres-check-9688c843-03b2-40af-9b99-98b9d45e18f9-0-ixmtc   0/4     Init:Error   0               6m58s   10.42.7.8     b020114-47b9054   <none>           <none>

The logs of this pod:

Timeout while attempting to copy to init container, exiting with code 1...

and in the airbyte UI we get this error (attachment): Do I need to setup something else in the env file for kubernetes? I really don't have any clue about this bug/error. Thanks in advance...

👀 1

✍️ 1

Jhon Edison Bambague Calderon

10/13/2022, 3:33 PM

Hello everyone!!! I have a question, are the next Jira tables supported by Airbyte connector Jira?

✍️ 1

Jordan Young

10/13/2022, 4:39 PM

Hi all, I wanted to get the community's thoughts on this: My team and I have gotten to the point where we've developed several internally useful custom connectors and would like to check in some commits. We don't want to open PR's on the main airbyte repo because our connectors won't be broadly useful (really specific use cases at this point) and we're primarily a gitlab shop these days. Our current thinking is to run airbyte from the main repo and keep our custom connectors in a separate repo, then programmatically add them with octavia/ the api. Thoughts on using this pattern?

✍️ 1

sar

10/13/2022, 5:58 PM

Having an issue with the

destination-aws-datalake

connector. We have a table that’s more than 15+ million rows and whenever we try to run a sync using that connector, the docker container that gets spun out to run the sync ends up chewing all the host memory and eventually crashing the EC2 instance. Tried setting some global docker limits (as i can’t do it on the container level since it gets spun up when the sync job starts) to no success as

docker stats

kept showing the available memory on the host. Tried resizing the instance and even with 64GB of RAM on the host, still ran into the same issue. https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/destination-aws-datalake Anyone run into something similar using that connector and how did you end up resolving it? Here’s a bit more info on the issue itself - https://github.com/mitodl/ol-data-platform/issues/371

✍️ 1

Dusty Shapiro

10/13/2022, 6:34 PM

K8s/Helm question: I’m attempting to use an external DB instead of the default container db, but it doesn’t look like the changes I’ve had any affect. I changed in the helm values:

Copy code

postgresql:
  enabled: false
externalDatabase:
  host: ${db_host}
  user: ${db_user}
  password: ${db_password}
  database: ${db_name}
  port: ${db_port}

✍️ 1

Pulkit Srivastava

10/13/2022, 7:15 PM

Hello team, We have been using airbyte for long time and have been running around 6000 syncs daily we just now moved from 0.33.5-alpha to 0.39.4-alpha what we observed was airbyte api

sync

endpoint is taking more time than when we were using 0.33.5-alpha can you please let me know if this is expected behaviour or not previously it was taking less than 1 sec in new version its taking 6 to 7 seconds. If any more info is needed I am happy to provide. Thanks

✍️ 1

David hatch

10/13/2022, 8:04 PM

Hi, I am working on a custom connector for airbyte. I’m running into this error during the sync normalization process:

is of type timestamp without time zone but expression is of type text

. In my schema i have configured this column as follows:

Copy code

"block_day": {
            "type": "string",
            "format": "date-time",
            "airbyte_type": "timestamp_without_timezone"
        },

Any ideas on how to resolve this would be appreciated. I’ve tried a couple variations of the configuration and have searched around on the airbyte github issues but haven’t found a solution.

✍️ 1

Matt Webster

10/13/2022, 8:07 PM

Does anyone have any troubleshooting tips or workarounds for the Redshift destination connector problem where any incoming VARCHAR field over 65535 blows up the data sync? I’ve tried fumbling around and truncating long data from VARCHAR columns but I haven’t been able to solve it that way. I see messages like these in the worker logs but they don’t mean much to me (I can’t tell where it’s getting stuck):

Copy code

13 of 33 ERROR creating incremental model loading.log_scd................................................... [ERROR in 16.07s]
Database Error in model log_scd (models/generated/airbyte_incremental/scd/loading/log_scd.sql)
  Invalid input
  DETAIL:  
    -----------------------------------------------
    error:  Invalid input
    code:      8001
    context:   CONCAT() result too long for type varchar(65535)
    query:     417552
    location:  string_ops.cpp:108
    process:   query0_113_417552 [pid=690]
    -----------------------------------------------
  compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_incremental/scd/loading/log_scd.sql

Any ideas would be greatly appreciated! Here is the root issue in GitHub: https://github.com/airbytehq/airbyte/issues/14441

✍️ 1

Eduardo Aviles

10/13/2022, 6:52 PM

Hi, i'm having troubles with monday connector:

Copy code

2022-10-12 09:22:30 [44msource[0m > Encountered an exception while reading stream items
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 115, in read
    yield from self._read_stream(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 165, in _read_stream
    for record in record_iterator:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 254, in _read_full_refresh
    for record in records:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 417, in read_records
    response = self._send_request(request, request_kwargs)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 336, in _send_request
    return backoff_handler(user_backoff_handler)(request, request_kwargs)
  File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py", line 291, in _send
    if self.should_retry(response):
  File "/airbyte/integration_code/source_monday/source.py", line 55, in should_retry
    is_complex_query = response.json().get("errors")
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2022-10-12 09:22:30 [44msource[0m > Finished syncing items

1650afd1_5854_4435_87ce_4c1b3063d3db_logs_37_txt.txt

✍️ 1

Simon Thelin

10/13/2022, 8:43 PM

Hello. I am currently using incremental updates with the postgres source to s3 destination. I have noticed that even though I am using an

update

column on a

pg

table, it seems to read through all data of the table, but only writes the

delta

changes. On some table syncs however, I can see it only reads very small chunks. Is there a particular reason for this behaviour?

✍️ 1

Luca Moity

10/13/2022, 1:42 PM

When I try to connect my Zendesk instance as source via API token, im getting the following error: "'Organization access is not enabled. Please check admin permission of the current account'" My credentials are fine...

✍️ 1

Tammy Shipps

10/13/2022, 10:04 PM

Hello all! I was SO CLOSE to not needing to ask for help…so very close. I actually did get the synced data over but I have a major issue with it not ending up in the same schema - I’m going into Postgres, so it’s using the schema setup shown here -> https://docs.airbyte.com/integrations/destinations/postgres#schema-map, which of course, doesn’t work on the other end if you need the original schema. I’m really hoping there’s an easy fix :)

✍️ 1

Edgar Valdez

10/14/2022, 2:41 AM

Hi there! Is there a way that when extracting data from PG to S3 the target path doesn’t append a

public

folder? ie: Target bucket:

abc

and after running the job, tables are stored at:

abc/public/table_1

I’d like:

abc/table_1

Cheers! PS: Target file is a parquet file

✍️ 1

Wayne

10/14/2022, 3:24 AM

Hi, all, I am user the docker version of Airbyte, I have synced a few databases that is huge. Now i want to switch the airbyte to a different machine, so I would like to clone everything there without the need to re-sync each database. How ccan I do it? I have copied the airbyte folder and also all the docker images over, but I still see a brand new airbyte when I launch it. What am I missing?

✍️ 1

Manuveeran Sandhu

10/14/2022, 6:12 AM

Hey, I am Data Engineer working at a startup: UNA Brands. We have been using Airbyte for last 8 months on EC2, and now we are doing the load testing on aws EKS. This is my first post on airbyte slack. One of issue I am facing while testing out the loads on kubernetes aws EKS is that when running 25/30+ connections in parallel, the loads are not getting successful exit status after finishing the load(but I do see kubernetes pods getting completed for both source & destination) while webapp keep on saying it's running the loads. You can see in below screenshot and logs that connection ran for about 4hrs approx, but UI can't fetch the exit code sooner and says 9hrs are taken to complete start_time: 2022-10-13 192934 UTC end_time: 2022-10-13 234107 UTC Please let me know if you want to know more about the configuration settings we are using, happy to share across

logs-7.txt

✍️ 1

Bogdan

10/13/2022, 1:45 PM

Hi everyone! I am trying to set up a new Google Ads source with my newly created credentials, but I get this error. What could be the problem?

✍️ 1

Emilja Dankevičiūtė

10/14/2022, 9:43 AM

hello, I'm testing out airbyte with an orchestrator tool (airflow, kestra, etc.) and I can see that there is a username and a password for airbyte webapp(e.g. https://airflow.apache.org/docs/apache-airflow-providers-airbyte/1.0.0/connections.html). In our airbyte web UI I can't find anything regarding any user management or auth and it looks like the auth is provided by a proxy. We're running open source airbyte on kubernetes. What are those values in airflow's airbyte connection spec? Are they for airbyte cloud?

✍️ 1

laila ribke

10/14/2022, 11:52 AM

Hi, I need to change the destination for all my connectios from postgresql to redshift. Is it possible to do without loosing the data I already have and without setting new connections with the new destination?

✍️ 1

Travis James

10/14/2022, 12:43 PM

How can I get AirByte to also create primary keys, foreign keys, and indexes when moving a SQL Server database to a Postgres database (or any other cross database conversion)? I have a few projects where I was attempting to use AirByte and the destination database is unusable without having all the relationships and keys copied along with the data. Also how can I handle differences in types like between SQL Server relative timestamps and Postgres timestamps, GUID and UUID, etc.?

✅ 1

✍️ 1

Kevin Millan

10/10/2022, 9:31 PM

Hello everyone, I'm trying to create an

Incremental | Deduped + history

table by migrating from redshift to postgres but, I encountered a couple of errors along the way. I'm using a

datetime

column as the

cursor field

: 1- I was first getting an error from airbyte saying that the table

*_stg

needed a

REPLICA IDENTITY

. I then added the field manually with

ALTER TABLE analytics.revenue REPLICA IDENTITY FULL;

2- Now I'm getting the same error (log file attached) but with the

*_scd

table. But the issue seems to be that it is having problems created the

_scd

table in the first place.

logs-15681.txt

🥲 1

✍️ 1

Robert Put

10/14/2022, 6:39 PM

When using the CLI, how would you refresh the schema on a connection? Is the option to update in the ui and then import?

✍️ 1

Lucas Gonthier

10/14/2022, 6:50 PM

Hello, my team would like to use Airbyte in production to fetch our client data on Shopify. However from what I can see on the Airbyte documentation, the connector is in alpha version. I would like to know if it is safe to use it in prod ? We would like from our customer to download our shopify app and then run an ELT process with Airbyte after getting an access token.