So we have multiple tables where whenever new data comes in, that defines the entire state of that entity. Unfortunately as the incoming events is parsed into a 1:Many relationship, we have no reliable way of deduping data, as if we specify a primary key down to the granularity of that relationship, we end up retaining data which was not provided in that new event.
I'm not entirely sure how to deal with this issue, but one potential solution seems to be to use a json object type, allowing us to store the many relationship in that. Unfortunately, there are limitations with that json relationship which makes it infeasible to do so.
My question is whether it is possible to achieve this at all using an upsert table, or if I would need a custom batch job to do this..
I had a thought that perhaps it would be possible to store the event with the composite key in kafka, but extract the JSON payload during realtime ingestion. I'm assuming this doesn't work, any thoughts?