Hello. I used to use Fivetran for ingesting data from Shopify and now trying Airbyte with a client. I noticed I have duplicates in my raw data even though I am using the replication method “Incremental | Dedup + history”.
• Is it a known issue?
• What would you recommend to fix it? Delete duplicates with manual queries or redo a full resync?
• Applying a select distinct clause to raw data can be tricky in some cases for 2 reasons: 1) distinct doesn’t apply to arrays and 2) The columns _airbyte_emitted_at and _airbyte_normalized_at are different even for duplicates, so to use distinct I would need to list manually all the columns in my staging tables and exclude _airbyte_emitted and _airbyte_normalized_at , which is not ideal
Thanks