can anyone point me to a `Incremental Sync - Dedup...
# ask-community-for-troubleshooting
j
can anyone point me to a
Incremental Sync - Deduped History
example?
j
there be dragons there
šŸ‰ 1
its an incremental sync, but then actually a full refresh to dedupe
IMO its a little bloated to accomplish that based on my use cases, so I handle with my own dbt normalization
ā˜ļø 1
j
i was under the impression that you either incremental or full, but not both at the same time šŸ¤” ?
a
Hey @Jarrod Parkes, have you checked out the doc for it? If so, let me know what was insufficient about it, would love to help clear it up šŸ™‚
j
i have, i got it open right now
i have the pieces, im pretty sure — a primary key per record and a cursor field per record. im just trying to figure out how to connect the dots
i have my current custom source working with a full refresh (tested append/overwrite), but im trying to transition it to this increment sync dedup’ed as ive learned more about what airbyte can do and what my use case requires
a
Ah, do you need an example on writing the Incremental + dedupe implementation?
j
that is what im looking for help with yes. the doc you linked doesn’t really show/discuss how to define a catalog/stream for this use case — that is where im lost
ive tried to find one by searching the repo, and stumbled across the BambooHR source, but the way its stream is defined seems pretty custom
a
https://github.com/airbytehq/airbyte/blob/master/docs/connector-development/tutorials/adding-incremental-sync.md I'm not sure that this covers deduplication, but this covers the basics of getting incremental sync running
j
will check out
j
@abhi it looks like shopify example is capturing state on a per-stream basis. i think what i wanted to do was to capture this on a per-record basis?
because each individual record that is returned by this API has its own ā€œcursor fieldā€
or it the case that you store one updated_at value representing the last time a sync was run and then you can filter across individual records by using that value
j
@Jarrod Parkes I believe the cursor is set per stream, is that what you are asking?
IE, its not a blanket value for all streams, but based on whatever the last value was for each particular stream
j
yeh, per stream is what im seeing too
i was more trying to understand how i use that to effect what gets sent for the next incremental sync
j
it will pull that state and get everything >= cursor on the next exec