Hey, we are using this ingestion script <https://g...
# ingestion
c
Hey, we are using this ingestion script https://github.com/linkedin/datahub/tree/v0.6.0/metadata-ingestion/mce-cli (yeah, we will upgrade to latest very soon 😅🙏) to ingest data in our Datahub deployment. While this is working great to insert the data, I wanted to know can this be used to upsert the dataset as well? I see a
proposedDelta
field in the bootsrap_mce.dat file but not sure how to give input in it, since there is no example provided for upsertion.
a
From my experience, upsert for Python ingestion is already present on entity level - you just post same entity URL with new/changed aspects only. However there is no upsert on aspect level - aspects are overwritten
p.s. you can check PARTIAL_UPDATE method on Rest.LI docs portal - it is exactly what used
c
Yeah, exactly aspects does not get upserted. Checking the
PARTIAL_UPDATE
method, thanks. But just curious if there is any use of
proposedDelta
to achieve the same
a
When I've tried to use it (0.8.3 maybe?) it was not implemented and there was no doc, other but few code comments and one article on portal
Maybe we may want to ask @big-carpet-38439 (tagging intuitively, sorry)
🙏 1
m
@chilly-barista-6524 @ambitious-airline-8020: generic patch support for aspects is WIP. For now you will have to do a read-modify-write. The previous
proposedDelta
approach requires writing code on the server side to handle delta-s per "delta-group". We're not supporting it going forward. If you can let us know what the specific use-case for the upsert is, maybe we can provide some more targeted recommendations.
a
@mammoth-bear-12532 I do not have any, just shared my experience with author. In my case I was OK with the scheme: retrieve Dataset -> extract SchemaMetadata aspect -> modify -> make partial update on Dataset
c
@mammoth-bear-12532 we have our datasets getting updated from different places like column descriptions, table descriptions, owners and business glossary get updated from one flow and properties updated from a separate dag. So, while updating properties I was trying to update that only by specifying the dataset urn but everything else is getting removed as well. Was going to go for
read-modify-write
but saw
proposedDelta
so got curious
@mammoth-bear-12532 do we have any other recommendation btw? We can go with read-write-modify for now but since we are also upgrading datahub now, so would like to know if anything new is available