Airbyte #ask-community-for-troubleshooting

Human

12/23/2022, 12:45 AM

Issue: Check_connection (and Sync) fails for File source

Copy code

requests.exceptions.SSLError: HTTPSConnectionPool(host='<http://storage.googleapis.com|storage.googleapis.com>', port=443): Max retries exceeded with url: /covid19-open-data/v2/latest/epidemiology.csv (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1129)')))

Cause: Self signed cert in chain for File HTTPS source Ask: How do I add the CA cert on the worker that runs the source?

Hai Huynh

12/23/2022, 12:59 AM

Hi everyone.i am new bie postgres. Any one help me please I have a question: i have 1 connection between postgres source and postgres destination. My sync mode is increnental dedup. Can i wipe historical data in table with prefix _stg and _scd. Because the record in that 2 table grow up every job but i dont need historical data in the 2 table above.

Michael

12/23/2022, 5:39 AM

Hi Team, I've submitted a PR to resolve a bug in Source Okta Stream [https://github.com/airbytehq/airbyte/pull/20833]. Can someone help me what to do next? (First time submitting a PR)

Özgür Sallancı

12/23/2022, 6:13 AM

Hi guys. Thanks for the great app. I followed to tutorial to create a custom connection, but i can't. I joined the office hours but still couldn't make it work.

Vu Le Hoang

12/23/2022, 10:05 AM

Hi all, my Airbyte node’s disk is getting full. So I want to clean the

airbyte_workspace

volume. Is it safe to clear all its content? If it not, which kind of files should I delete? Thank you airbyte rocket

✅ 1

Nahid Oulmi

12/23/2022, 10:28 AM

• Is this your first time deploying Airbyte?: No • OS Version / Instance: Debian • Memory / Disk: 32GB memory / 50GB disk • Deployment: docker-compose • Airbyte Version: 0.40.26 • Source name/version: elastic-search custom connector • Destination name/version: bigquery • Step: schema discovery in webapp • Description: I developped a custom source connector for Elastic Search using the Python CDK for version 0.39.42 because the standard one was not fit for my use case (needed to do incremental updates). I created a static schema file located in the file

catalog/configured_catalog.json

that looks like this :

Copy code

{
  "streams": [
    {
      "stream": {
        "name": "my_stream",
        "json_schema": {},
        "supported_sync_modes": [
          "full_refresh",
          "incremental"
        ],
        "source_defined_cursor": "True",
        "default_cursor_field": [
          "date"
        ]
      },
      "sync_mode": "incremental",
      "destination_sync_mode": "overwrite"
    }
]

I deployed it on Airbyte version 0.39.42 and it was working fine. Now, we updated our Airbyte version to 0.40.26 and the connector is not working fine anymore. The problem is located at the schema discovery step. When I go to “replication” to see my schema : image▾
1340×512 42.9 KB▾
I get this error message : image▾
2448×1096 268 KB▾
There is no error log on server side. The only error log I get is on the browser side which says (as on the screenshot) :

Copy code

TypeError: Cannot convert undefined or null to object
    at Function.keys (<anonymous>)
    at or (CatalogSection.tsx:101:41)
    at sa (react-dom.production.min.js:157:137)
    at qa (react-dom.production.min.js:180:154)
    at Ba (react-dom.production.min.js:178:169)
    at ja (react-dom.production.min.js:177:178)
    at Gs (react-dom.production.min.js:274:126)
    at Au (react-dom.production.min.js:250:347)
    at Ou (react-dom.production.min.js:250:278)
    at Cu (react-dom.production.min.js:250:138)
ls @ react-dom.production.min.js:216
n.payload @ react-dom.production.min.js:217
ho @ react-dom.production.min.js:130
Wa @ react-dom.production.min.js:184
Gs @ react-dom.production.min.js:269
Au @ react-dom.production.min.js:250
Ou @ react-dom.production.min.js:250
Cu @ react-dom.production.min.js:250
_u @ react-dom.production.min.js:243
(anonymous) @ react-dom.production.min.js:123
t.unstable_runWithPriority @ scheduler.production.min.js:18
Vi @ react-dom.production.min.js:122
Ki @ react-dom.production.min.js:123
M @ scheduler.production.min.js:16
b.port1.onmessage @ scheduler.production.min.js:12

The local tests (

check

spec

discover

read

) work fine. Is there a thing I need to modify/update in the connector Python code to get it to work with version 0.40.26 ? I am sure it is an issue with the version since it is working fine on 0.39.42. Thanks,

Bruno Agresta González

12/23/2022, 1:34 PM

Hello everyone, I have a connection from Postgres (AWS) to BigQuery. In this connection I am experiencing problems with some tables that have “Incremental Deduped + History” as sync configuration. Cursor Field “Updated_at” and Primary key “id”. The problem is that in the destination table the rows that are updates appear duplicated. This is not the behavior I expect for the deduped configuration. I’m using the

0.40.24

version of Airbyte, BigQuery connector

1.2.9

and Postgres connector

0.3.26

. Is anyone experiencing the same?

Nivedita Baliga

12/23/2022, 4:36 PM

Hello everyone. I am an Airbyte newbie (started using the open-source version just 3 days ago!!). As a test, I was trying to get data from a 145M row table in BigQuery into Snowflake and I got a "responseTooLarge" error. As per my search on the internet, this is an open issue and can only be resolved by creating a view on the source DB to chunk the data into the destination. Is it true?

Nivedita Baliga

12/23/2022, 4:44 PM

Another question - can't I pick and choose what columns from source I want in destination?

Dany Chepenko

12/23/2022, 5:43 PM

Any hints on passing the verification process for Facebook ads? I feel so confused describing the app functionality as it's hardly considered an app in my pov. That's for

business_management feedback

Copy code

We were unable to approve your request for this permission because the explanation of your app's use case was unclear.
To resolve this issue, please provide a valid use case with a revised screencast or notes that explain the following items:
1. Which app function requires the requested permission.
2. How the requested permission will enhance your app's functionality and integration.
3. How the requested permission will enhance the end user's experience.
You should also make sure that the screencast submitted is the correct video for the app before you re-submit for review.
For more information, you can also view our App Review introduction video and App Review Rejection Guide.

and that's for

ads_read feedback

Copy code

We found that your app's test credentials did not allow us to fully review the content of the app or there were no test credentials provided for us to use during our review.
To resolve this issue:
- If your test credentials do allow access, check that the account is set up properly to provide us with full access and to allow us to reproduce your use case steps.
- Otherwise, please consider including any applicable test credentials and passwords for our team to use. If a non-facebook user account is required to log into your app, please include those credentials when you re-submit.
For more information, please visit our App Review Rejection Guides.
Notes from your reviewer:
Unfortunately we have not been able to verify the permissions due to unclear use cases and being unable to link ad account.

For the ads_read permission, please for the next submission, include that the app is internal in the use case, and show ads metrics in the screencast.

The use cases for business_management and leads_retrieval are unclear. Please, for the next submission, include the word 'leads' along with a relevant use cases.Please rectify these matters for the next submission.

Tamas Foldi

12/23/2022, 7:06 PM

when I am trying to use

octavia apply

I got the following error message:

Copy code

airbyte_api_client.exceptions.ApiTypeError: Invalid type for variable 'non_breaking_changes_preference'. Required value type is NonBreakingChangesPreference and passed type was str at ['non_breaking_changes_preference']

CLI and server versions are the same, trying to apply back an import to another airbyte server. any clue what could be wrong?

Rishabh Jain

12/23/2022, 11:16 PM

I am trying to setup replication slot using wal2json plugin. I have created the slot and publication in Postgres. And provided the values to Airbyte. But when I try to test the connection I get an error saying that the “_Expected exactly one replication slot but found 0_”. But I do have replication slot created in Postgres. Not sure why Airbyte is unable to find it. pgoutput plugin works perfectly fine with Airbyte. Below screenshot for wal2json.

Shay Rubach

12/25/2022, 7:45 AM

Hello all and happy xmas. A question. Can I run a query on a returned catalog from a source (say mysql)? Or directly query the source and get a "queried" (filtered) catalog? [edit] I've ran into this post and this post and realized it is not supported. What could be my alternatives? Is there a way to write a generic Transformation that would get a query and return a queried catalog? Thanks.

Mickaël Andrieu

12/26/2022, 4:32 AM

Hi, I have the "so famous" java.sql.SQLException: YEAR error (using MySQL connector). I know you wont or cant fix it, but I'm wondering how I can reproduce with more information : I want to know which line(s) and which column(s) are responsible for this error : any idea ? (My skills in Java are ... close to none)

Kevin Noguera

12/26/2022, 8:30 AM

Anyone here has setup the Zendesk Support connector and faced issues with the incremental streams actually doing a full refresh? Current hypothesis is that for some streams (the ones working) they do increments as expected due to their cursor_field being the correct data type (timestamp) but others failing do not (integer, string).

Pablo Morales

12/26/2022, 4:07 PM

Hi everyone! We started a thread about Shopify on github about a month ago. It is as follows: https://github.com/airbytehq/airbyte/issues/19348 We have problems since in the Orders stream, each struct in the line_items array has an attribute called discount_allocations, which is always empty. We are connecting with BigQuery (denormalized). Anyone with the same problem or who can offer us a solution? Thanks!

Temidayo Azeez

12/26/2022, 4:30 PM

That's the log file. Thank you!

Timam

12/26/2022, 6:18 PM

Hi Everyone, I am just getting started with airbyte. Installed airbyte on my eks cluster following https://docs.airbyte.com/deploying-airbyte/on-kubernetes-via-helm/ where can i find default values.yaml for helm chart ?

Ignacio Alasia

12/26/2022, 6:52 PM

Hi team! We deployed Airbyte (v0.40.17) on kubernetes using Helm, and make some test with S3 --> Snowflake and PG --> Snowflake and its worked. When we try to transfer a big table From PG to SF using CDC, PG (v1.0.34), the first batch of 324.43 GB, and 54,933,278 rows flowed well. But when the connector run again, the workers failed:

Copy code

ERROR i.a.w.g.DefaultReplicationWorker(run):196 - Sync worker failed.
java.util.concurrent.ExecutionException:io.airbyte.workers.general.DefaultReplicationWorker$DestinationException: Destination process message delivery failed.

And this other log:

Copy code

2022-12-24 01:05:55 ERROR i.a.w.g.DefaultReplicationWorker(run):196 - Sync worker failed.
java.util.concurrent.ExecutionException: io.airbyte.workers.general.DefaultReplicationWorker$SourceException: Source cannot be stopped

We running this over a m6i.xlarge. So, first, someone have any idea of whats wrong? second, I would like to know how Airbyte works behind when make the CDC or the behavior of Airbyte in this process. How use the workers and compare the data what already are in the destination and the new data. Best, Ignacio.

👍 1

Igor Safonov

12/26/2022, 7:22 PM

Hi, I am new to Airbyte and created a toy example of copying data between

google ads

and

databricks lakehouse

(deployed on minikube with helm, v0.40.25) Unfortunately, it causes an error during an execution of an SQL statement in the absence of column types (I edited it a bit for readability):

Copy code

CREATE TABLE <table> (_airbyte_ab_id string, _airbyte_emitted_at string, `campaign.id` , `metrics.clicks` , `segments.date`) USING csv LOCATION '<location>' options ("header" = "true", "multiLine" = "true")

~~Is there something wrong with my configuration? Could you please advise what could I look into?~~ UPD: I've made some research. Here is the json schema from my logs:

Copy code

Json schema for stream usr_igsaf.google_ads_test: {"type":"object","$schema":"<http://json-schema.org/draft-07/schema#>","properties":{"campaign.id":{"type":["integer","null"]},"campaign.name":{"type":["string","null"]},"segments.date":{"type":["string","null"],"format":"date"},"metrics.clicks":{"type":["integer","null"]},"metrics.conversions":{"type":["number","null"]},"metrics.cost_micros":{"type":["integer","null"]},"metrics.impressions":{"type":["integer","null"]},"user_location_view.country_criterion_id":{"type":["integer","null"]}},"additionalProperties":true}

Looks like the code in the databricks integration is not ready to see arrays in the

type

field:

Copy code

final String type = node.get("type").asText();
      schemaString.append(", `").append(header).append("` ").append(type.equals("number") ? "double" : type);

If I contribute a fix, would it take long until it gets released?

Timam

12/26/2022, 7:34 PM

Hi everyone, Hope you are doing great. I am new to Airbyte and just installed Airbyte on EKS with helm. How do we manage users and authentication on Airbyte*?*

Sujith Kumar.S

12/27/2022, 6:00 AM

Any plan in pipeline for Kafka connector in GA ? If so any expected time frame we have ?

Georges Stephan

12/27/2022, 7:20 AM

Hey everyone, I am trying to connect to a GitLab repo as a data source. Unfortunately, the repo I am connecting to uses HTTP, not HTTPS. I edited the

streams.py

file and changed the return string from the function

def url_base(self) -> str:

to return a URL that starts with

http

instead of

https.

However, by examining the logs, I see that Airbyte still uses HTTPS to connect, although the URL starts with

http

. Is there anything else I need to change? I am using Airbyte version 0.40.26 running under Docker compose. Thank you!

Akilesh V

12/27/2022, 7:41 AM

Hi All, I am having issue to upgrade Airbyte version from v0.40.3 to v0.40.23 after upgrading sync is not working and workspace doesn't show all connector belong to the workspace.

Nils de Bruin

12/27/2022, 8:16 AM

Hey everyone! I have a Postgres source with incremental syncing (no CDC), which failed after updating the source connector to a version larger than 1.0.30. I am seeing this message in the log:

Copy code

Stack Trace: org.postgresql.util.PSQLException: ERROR: syntax error at or near "FROM"
  Position: 30

and

Copy code

"failureOrigin" : "source",
  "failureType" : "system_error",
  "internalMessage" : "org.postgresql.util.PSQLException: ERROR: syntax error at or near \"FROM\"\n  Position: 30",
  "externalMessage" : "Something went wrong in the connector. See the logs for more details.",
  "metadata" : {
    "attemptNumber" : 2,
    "jobId" : 392,
    "from_trace_message" : true,
    "connector_command" : "read"
  },

I can revert to version 1.0.30, and then the error disappears. Does anyone have the same issue or know what this could be? Thanks!

laila ribke

12/27/2022, 8:27 AM

Hi all, I´m still with the nordigen API.. This is the example of the data I will receive as a response from the transactions endpoint.

Copy code

"transactions": {
    "booked": [
      {
        "transactionId": "string",
        "debtorName": "string",
        "debtorAccount": {
          "iban": "string"
        },
        "transactionAmount": {
          "currency": "string",
          "amount": "328.18"
        },
        "bankTransactionCode": "string",
        "bookingDate": "date",
        "valueDate": "date",
        "remittanceInformationUnstructured": "string"
      },
      {
        "transactionId": "string",
        "transactionAmount": {
          "currency": "string",
          "amount": "947.26"
        },
        "bankTransactionCode": "string",
        "bookingDate": "date",
        "valueDate": "date",
        "remittanceInformationUnstructured": "string"
      }
    ],
    "pending": [
      {
        "transactionAmount": {
          "currency": "string",
          "amount": "float"
        },
        "valueDate": "date",
        "remittanceInformationUnstructured": "string"
      }
    ]
  }
}

It´s line two objects : "booked" and "pending", which each one of them contains an array of object, that each object is a transaction.. I think I´ll start only with the booked ones, but I´m curious how the schema should look like How

✅ 1

Dimitriy Ni

12/27/2022, 12:15 PM

Hi Everyone, hope you have a great Christmas time 🙂 I have a question regarding Facebook Marketing Connector in Airbyte Cloud. The Consumption seems a bit too high. I set up an incremental load, yet its extracting every time 14k rows wit 37mb. Seems like its looking back way back than just the recent days? Any one experience with that and how I could change that? Thanks in advance!

Nandhakumar M

12/27/2022, 1:40 PM

Hi team,

Nandhakumar M

12/27/2022, 1:42 PM

Hi Team, I am looking to create a pipeline from mysql to S3/Blob(with Hive catalog). Does airbyte support dedup with the incremental sync? or CDC is the way to go for such use cases? Thanks in advance!

Noah Selman

12/27/2022, 2:18 PM

Reposting because I think this may have gotten buried over the holiday… Hello! A couple weeks ago we had a very strange occurrence with our sync from mysql 5.6 to bigquery. After a particular “full refresh - overwrite” connection failed to sync 3 consecutive times, subsequent syncs of the same connection began taking double the time to complete. Looking at the logs, the extra time was due to an unexplained gap during normalization. After all the sql scripts were written, there would be an hour of nothing before any of them would start executing. However, eventually dbt would successful complete. Even more strangely, a later sync of the same connection failed 3 consecutive times - afterward subsequent syncs once again took the original amount of time. The sync in question is relatively large (5.73 GB) and copies >200 tables. I’ve included a copy of the logs for one run during the period that there was an hour gap in normalization. We’re on airbyte version 0.40.17 running on GKE. We’re happy the this problem fixed itself but we would like help identifying what happened here and how we can prevent it from happening again. Thanks!! https://files.slack.com/files-pri/T01AB4DDR2N-F04FX3KT9E3/download/0df738fa_6de8_432d_9aa9_a77e48a47a5f_logs_111_txt.txt?origin_team=T01AB4DDR2N

0df738fa_6de8_432d_9aa9_a77e48a47a5f_logs_111_txt.txt