Airbyte #ask-community-for-troubleshooting

Dario Forti

10/14/2022, 7:10 PM

Hello, we are trying to develop a connector for an API which all the endpoints work with POST requests: We found that it's is compatible with Airbyte CDK. However it would be grate some help or guidance, Anyone who has experience can give us a hint ?? Thanks

Dusty Shapiro

10/14/2022, 8:00 PM

I’m curious if anyone else who deploys Airbyte via Helm chart to K8s is able to use the Octavia CLI alongside it? Edit: Perhaps by adding it to the

extraContainers

array in the chart values?

✍️ 1

Dipesh Kumar

10/14/2022, 5:51 PM

Hello, I am currently testing out Airbyte, specifically CDC replication on MSSQL. Could someone point me to the right direction on best practices to handle Schema changes (i.e ADD Columns, DROP Columns, ALTER Columns types)

✍️ 1

Lee Danilek

10/14/2022, 10:46 PM

Hi! I'm trying to build a new source connector, and I have access to the transaction log, so it seems like a CDC connector is possible. Would that be recommended? There don't seem to be a ton of docs on building them. It would be nice to fully support deletes, but if new CDC connectors are difficult then I can fall back to soft deletes.

✍️ 1

Sourav Gupta

10/15/2022, 4:12 AM

Can we get user level data (not just aggregated level data) from Google analytics 4 ?

✍️ 1

Zaza Javakhishvili

10/15/2022, 4:58 AM

Hi, after stream error, Airbyte is not removing tmp table. Is this destination connector issue or Airbyte engine itself?

✅ 1

Zaza Javakhishvili

10/15/2022, 5:17 AM

I have a question. Why does Airbyte raw data have no unique identifier by stream slices?

Emile Fyon

10/15/2022, 9:02 PM

Hello guys. I am new to Airbyte and modern datawarehouse. I am setting up a airbyte-gcp-dbt architecture. I managed to set-up all three services and they are running properly. I am wondering what are the best practices to structure the data that is loaded in the my _raw tables and make the models as efficiently as possible? Basically, what I would like : • Generate staging models automatically (or semi-automatically) in dbt from airbyte connectors (if possible) • Have the tables with columns names and not all JSON objects in my GCP schemas Here are some screenshots of what I am getting at the moment. Thanks a lot and looking forward to have my first models in production 😍

✍️ 1

✅ 1

Robert Put

10/15/2022, 10:16 PM

when reading records from a postgres source, does it do table by table or can or parallelize the number of tables being read?

✍️ 1

Wendell Nascimento

10/16/2022, 1:24 AM

Hi, what's the best way to add rds-combined-ca-bundle.pem in docker? Does anyone have an example of how to do this? I want to connect my data source to AWS DocumentDB, but I don't know if I should build a new docker image, if so which one? Thanks

✍️ 1

Geoffrey Garcia

10/16/2022, 1:37 PM

Hi guys. I started using Airbyte few days ago and managed to make my first connections running. But (and even I've looked at the documentation/forum) I don't understand how I could extract/backup as flat files the sources, destinations and connections I've defined. At the end I'm also interested by a mean to provision/load such files into an Airbyte instance (already running or during the deploiement stage). Actually I've to handle several environments (production and non-production) and I'm looking for a convinient way to deploy similar sources/destinations/connections for every single on them using the CLI: where should I start ? Sorry if it's a newbie question and thank you for your answers.

✍️ 1

Rahul Borse

10/16/2022, 4:52 PM

Hi Team, Is there any way to not create airbyte_emmited_at column in destination data?

✍️ 1

Mohamed Alsharif

10/16/2022, 6:57 PM

Hi Team, I'm working with Ownco on helping companies share ownership with it's users depending on the users performance and value to the network, the performance will be measured via analysing different data sources. I have a couple of questions regarding using the hosted Airbyte instance and what it means to the data that will pass via airbyte who can I ask directly my inquiries?

✍️ 1

Gene Chu

10/17/2022, 2:43 AM

Hi Team, I am the new for airbyte, i would like to know if there any grafana dashboard could apply after i enable

airbyte-metrics

in my kubernetes. i need to check the latest overall job status, and send alert when the job failed.

✍️ 1

胡得潮

10/17/2022, 2:59 AM

https://github.com/airbytehq/airbyte/pull/17884 how can i add the setup of docker container to automate tests, I know docker, but I'm not very familiar with this process, how to automate it.

✍️ 1

Gergely Imreh

10/17/2022, 7:28 AM

Hey! I have a question about fields that are synced as Incremental | Deduped + History and are normalized on some of the fields: • the top level, say

table

is indeed deduped, and that’s nice • there is a

table_scd

for the history, and that works fine as well, every entry has a unique

_airbyte_ab_id

as expected • The normalized table (say normalizing a

properties

field), ends up as

table_properties

as expected for this example, but the content is different from what I’ve expected to some respect: The differences and questions are: • the contents of
table_properties
are normalized from
table_scd
(as confirmed by looking at the generated dbt code) rather than as expected from

table

, so end up with a loads more entries than expected. I would have expected

table_properties

to be deduped after the normalized just as

table

was before. Am I expecting something wrong? • in

table_properties

there’s actually a lot more entries even than

table_scd

because a single entry seem to potentially (but not always) end up normalized multiple times, only the

_airbyte_normalized_at

field being different, while all the content fields are the same (as are the hashid fields to confirm). Looking at duplicates, I see some some entries normalized multiple times for us up to 8x even (most of the duplicates are 2-3x) on our 1/h schedule. What can cause multiple normalization like this, if the underlying table does have only a single entry?

✍️ 1

Ramon Vermeulen

10/17/2022, 8:18 AM

Anyone else having problems with the MySQL and MSSQL connectors after upgrading? I followed the migration guide (schema changes) and upgraded both to the latest version. But since the upgrade the connectors slowed down a lot. Currently the connector is already running for more than 2 days, before the upgrade they ran +- 8 hours for ingesting all the data.

✍️ 1

Steven Herweijer

10/17/2022, 8:45 AM

Hi all. I’m running a postgres-to-postgres connection. I have a lot of tables, but one in particular is rather large (12mil+ records). For this reason I have set it up for incremental sync using a cursor field. What I’ve noticed however, is that the second sync was much slower than the first. This time is spent in normalization. It seems that although reading from source is only the new records, the normalization is done for the whole 12mil anyways, which causes the target instance that runs in AWS to run out of IOPS credits and slow to a crawl. Is this normal behavior? There is also no JSON to unpack or anything in this table.

Copy code

2022-10-15 00:08:27 normalization > 128 of 132 OK created incremental model <table name removed> ....................................... [INSERT 0 12446501 in 17788.85s]

✍️ 1

胡得潮

10/17/2022, 8:57 AM

for a new destination , Can the columns written to the table only be fixed _airbyte_ab_id, _airbyte_emitted_at, _airbyte_data, and why?

✍️ 1

Andrew Exlet

10/17/2022, 11:14 AM

Hi. Just started playing with Airbyte and a couple of things which I seem to miss finding some documentation on and hoping someone can help: 1. When creating a connection to a SQL source, can I limit this in anyway (i.e. the SQL source has history going back 10 years but I only need the last 2 years, how do I stop it bringing in all 10 years?) 2. After creating a connector I can’t see anywhere to edit it? If I’m using something like a MYSQL connector with SSH tunnel and I need to change the certificate details, how would I do this?

✍️ 1

Robert Put

10/17/2022, 1:44 PM

If i have a connection to a postgres db that has a max 15min timeout that writes to snowflake, it can't finish the full sync in that amount of, is there a way for the sync to resume from where it stopped? atleast start from any tables that did not finish vs restarting all of them each time? The goal would be to run it a few times with failures until all the data is sync, then the incremental sync would be much shorter going forward once the initial is done

✍️ 1

Siddhant Singh

10/17/2022, 11:18 AM

hi. How can I add the docs for the new custom connector I've added ? I have added the md file in the docs other than what do I have to do?

✍️ 1

Eric Lamphere

10/17/2022, 2:51 PM

Hi All 👋, I'm trying to sync data from Shopify (just the orders table for now) to S3 but I'm getting the following 404 error:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: <https://templestclair.myshopify.com/admin/api/2021-07/shopify_payments/balance/transactions.json?limit=250&order=id+asc&since_id=0>

It seems like there's been some discussion around this on GH. Does anyone know if this issue has been fixed? Note: I'm connecting with the OAuth method

✍️ 1

Oyindamola Olatunji

10/17/2022, 4:13 PM

Hello everyone, Running the DigitalOcean's deployment guide, I can't find a command that exposes Airbyte container on a port that can be accessed on the browser For example, on the AWS guide we have this

Copy code

# In your workstation terminal
ssh -i $SSH_KEY -L 8000:localhost:8000 -N -f ec2-user@$INSTANCE_IP

Just visit <http://localhost:8000> in your browser and start moving some data!

✍️ 1

Lucas Gonthier

10/17/2022, 4:20 PM

Hi all ! I'd like to know when octavia-cli will no longer be in alpha version ? My team wants to use airbyte but the UI is not an option for us.

✍️ 1

Sid Lais

10/17/2022, 6:32 PM

Hi all, I am new to Airbyte and came across it when looking for hactoberfest issues. The good first issues are all taken by others but there are some issues still available, so was thinking to take it up but I don't know if I would able to even start it correctly. What are the requirements for making a connection? Are the docs and video tutorial enough for a first timer?

👋 1

octavia loves 1

✍️ 1

Jing Xu

10/17/2022, 6:40 PM

Hi All. running into an issue on Airbyte OSS. have been syncing 2 tables from MS SQL to snowflake on full|overwrite. Assume these are customer & event tables in MS SQL to be synced. After syncing: Event => event & _airbyte_raw_event tables are generated on snowflake Customer => customer & _airbyte_raw_cutomer tables are generated, however, additional account_ab2 & account_ab3 views are generated in snowflake. Have tested other sources such as postgres, no views are generated on the destination. Does any know why views are generated on the destination? Based on logs, seems instead of ctes, normalization step generated views. Logs looks like below:

Generating airbyte_ctes/AIRBYTE/CUSTOMER_AB1.sql from CUSTOMER

Generating airbyte_views/AIRBYTE/CUSTOMER_AB2.sql from CUSTOMER

Generating airbyte_views/AIRBYTE/CUSTOMER_AB3.sql from CUSTOMER

Generating airbyte_tables/AIRBYTE/ACCOUNT.sql from CUSTOMER

Generating airbyte_ctes/AIRBYTE/EVENT_AB1.sql from Event

Generating airbyte_ctes/AIRBYTE/EVENT_AB2.sql from Event

Generating airbyte_ctes/AIRBYTE/EVENT_AB3.sql from Event

Generating airbyte_tables/AIRBYTE/EVENT.sql from Event

✍️ 1

Kevin Chan

10/17/2022, 8:01 PM

I'm using syncing Harvest to MySQL database and can't get it to work... Attached is the error log. Is there anyone using these 2 connectors? Does the Error make sense to anyone

Failure Origin: persistence, Message: Something went wrong during state persistence

Activity failure. ActivityId=4f668877-8765-38e1-a288-02dac0332e72, activityType=Persist, attempt=1

java.lang.IllegalStateException: Job ran during migration from Legacy State to Per Stream State. One of the streams that did not have state is: io.airbyte.protocol.models.StreamDescriptor@6ffb8e41[name=clients,namespace=<null>,additionalProperties={}]. Job must be retried in order to properly store state.

7779489c_41ed_4628_aec3_899e7a5fef06_logs_19_txt.txt

✍️ 1

Liyin Qiu

10/17/2022, 11:15 PM

Hi team, we have mysql to s3 sync and found a datetime in mysql is written as array in s3. this issue is open in . The airbyte_type is timestamp_without_timezone. Could anyone share if it will be updated to long or string or remain to be array in the fix? We want to create hive schema based on this parquet file, but we have to create struct in hive schema to make it work. Thanks

✍️ 1

Robert Put

10/18/2022, 4:15 AM

can you limit a range for an incremental sync? Like say the cursor values can only be within the last year?

✍️ 1