Shubham Pinjwani

03/11/2022, 9:22 AM
Hello, I want to perform join queries in the destination side. I am using full refresh append so there will be many copies of the same data at different intervals. I want to join them based on the interval or sync number at which they were synced. So, Is there a way to add sync number or sync ID kind of thing or something like this which will help me in this case?
Augustin Lafanechere (Airbyte)

03/11/2022, 1:24 PM
Hi @Shubham Pinjwani, maybe you could set up a second connection in
full refresh overwrite
and join the table outputted by this connection?
Are you using normalization on this connection?
If you use normalization I think you can identify the latest replicated rows with
. @Chris Duong [Airbyte] maybe you have a more clever solution to suggest 😄

Chris Duong [Airbyte]

03/11/2022, 1:39 PM
we don’t have a sync_id/number for the moment but that’s something useful to have yes! For the moment, you could base off _airbyte_emitted_at / _airbyte_normalized_at columns as described by augustin In the mean time, do you have a primary key / id in your data otherwise?

Shubham Pinjwani

03/11/2022, 2:46 PM
Not all the tables have primary keys. Also I thought of using _airbyte_emitted_at using bigquery denormalized destination but there was a issue that I mentioned in my next message. Also I opened a issue in the repo for that.