Hi All In my MySql gt Redshift S3 staging connection I see t Airbyte #ask-community-for-troubleshooting

Hi All, In my MySql ->Redshift (S3 staging) con...

laila ribke

11/09/2022, 11:31 AM

Hi All, In my MySql ->Redshift (S3 staging) connection I see that source emitted 10 million rows of clickout table. But in destination, I have only 400K. I´m using incremental refresh. This source was used in the past with other destinations. May that be the problem why I receive only 400K rows? For me, it´s a new connection, so it should fully refresh, and only next time increment. But it seems it thinks it has only to increment.

user

11/09/2022, 1:13 PM

Hi laila, could you check that all records are present in the raw table? If they are, this could be a normalization issue and you'd need to see what is the commonality between all these records - a pattern in the schema.

laila ribke

11/10/2022, 11:22 AM

Hi, I even created a new connection. The source emits 10M rows, but I receive in destination only 3M rows. It´s a MySql->S3->Redshift connection. I even did a full refresh Overwrite. Attaching the logs. What can it be?

9d9aea6f_8f2e_4ef1_9c3e_0bbf57935780_logs_306_txt.txt

Nataly Merezhuk (Airbyte)

11/10/2022, 5:52 PM

If you look in the logs you'll see that all records emitted were committed. This means the issue is not in the connection/sync running. Could you try setting up a new MySQL source and try a sync on a small subset of the data to see if the error persists?

laila ribke

11/10/2022, 6:37 PM

In the same sync, I synced 5 tables. On 4 of them I received all records. Only in this one I´m missing records. Might it be because of it´s size (10 million records)?

laila ribke

11/10/2022, 7:01 PM

Hi.. I don´t understand. The record number emitted a committed is correct, but I don´t see them in my destination.. Source emitted the 10M rows.. May it be something with the S3 strategy?

Nataly Merezhuk (Airbyte)

11/14/2022, 12:39 PM

Hello! Were you able to do a test sync with a subset of the data?

laila ribke

11/14/2022, 1:16 PM

I don't have the possibility. But I checked that the source has no filters.. I'm attaching the S3 file and the sync logs. The rows count of the raw table and the normalized one is equal

0e819573-2119-4834-a854-183a45ff2f61.txt

laila ribke

11/15/2022, 9:02 AM

Hi, just to let you know we solved the problem. In the destination, we set the S3 Filename pattern as {sync_id}. As it´s a large table, airbyte sends to S3 batches. And they were overwritten. We changed the Filename pattern to {sync_id}_{timestamp:millis} and S3 stored several files for that table.

Nataly Merezhuk (Airbyte)

11/15/2022, 12:57 PM

Thank you so much for following up and I'm so glad to hear you've resolved it!

2 Views

Open in Slack

Previous Next