few-grass-66826
08/27/2022, 12:42 PMlemon-engine-23512
08/27/2022, 4:12 PMjolly-yacht-10587
08/28/2022, 9:51 AM{
"aspect": {
"__type": "SchemaMetadata",
"schemaName": "mongodb",
"platform": "urn:li:dataPlatform:mongodb",
"platformSchema": {
"__type": "MySqlDDL",
"tableSchema": "schema"
},
"version": "3",
"hash": "",
"fields": [
{
"fieldPath": "hello",
"jsonPath": "null",
"nullable": true,
"description": "test hello 18",
"type": {
"type": {
"__type": "RecordType"
}
},
"nativeDataType": "Record()",
"recursive": false
}
]
},
"entityType": "dataset",
"entityUrn": "urn:li:dataset:(urn:li:dataPlatform:mongodb,hello.hello3,PROD)"
}
2. Is it possible to delete an aspect using the post request instead of delete request by passing the body similar as above pic?
3. If I want to delete an aspect but still want it to be shown on UI but just mark it as “deleted” or sth so users can view version history of this dataset, is this possible?few-grass-66826
08/28/2022, 11:47 AMbetter-actor-97450
08/29/2022, 2:56 AMstraight-agent-79732
08/29/2022, 5:31 AMaloof-oil-31167
08/29/2022, 7:57 AMFROM linkedin/datahub-ingestion:85a55ff
the following one is not pulling anything -
FROM linkedin/datahub-ingestion:0.8.43.3
few-grass-66826
08/29/2022, 8:48 AMflat-painter-78331
08/29/2022, 9:52 AMsquare-hair-99480
08/29/2022, 10:15 AMplatform_instance
so it appeared to me with the name datahub
in the UI.
After a few days ingesting data I had to change it and add platform_instance
to this ingestion since I would be ingesting data from two distinct Snowflake accounts. Later I was asked to change platform_instance
value another time.
So now when I go in the UI Datasets -> Prod -> Snwoflake
I see 3 names (datahub
, name_01
, name_02
) for the same ingestion Job.
How can I delete the older data so I only see and access the data related to the last value for the ingestion platform_instance
?
I have tried things like
datahub delete --urn "urn:li:dataPlatformInstance:(urn:li:dataPlatform:snowflake,DATAHUB.datahub,PROD)" --soft
but it did not work.alert-fall-82501
08/29/2022, 11:44 AMaloof-oil-31167
08/29/2022, 1:22 PMCaused by: java.lang.ClassNotFoundException: datahub.spark.DatahubSparkListener
i added the following configs to the spark session -
"spark.jars.packages" = "io.acryl:datahub-spark-lineage:0.8.23",
"spark.extraListeners" = "datahub.spark.DatahubSparkListener",
"spark.datahub.rest.server" = ${?DATAHUB_URL},
"spark.datahub.rest.token" = ${?DATAHUB_TOKEN}
"spark.datahub.metadata.dataset.env" = "STG"
does anyone have an idea?stocky-minister-77341
08/29/2022, 1:42 PMbrave-businessperson-3969
08/29/2022, 2:09 PMalert-coat-46957
08/29/2022, 3:11 PMsteep-finland-24780
08/29/2022, 6:35 PMminiature-plastic-43224
08/29/2022, 8:40 PMcareful-insurance-60247
08/29/2022, 9:49 PMcool-translator-98249
08/29/2022, 10:57 PM[2022-08-29 22:53:33,805] ERROR {datahub.entrypoints:195} - Command failed:
Tree is empty.
alert-fall-82501
08/30/2022, 5:33 AMnote: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> sasl
alert-fall-82501
08/30/2022, 5:35 AMfew-carpenter-93837
08/30/2022, 6:18 AMmicroscopic-mechanic-13766
08/30/2022, 7:34 AMval spark = SparkSession.builder().appName("test-application").config("spark.jars.packages","io.acryl:datahub-spark-lineage:0.8.43").config("spark.extraListeners","datahub.spark.DatahubSparkListener").config("spark.datahub.rest.server", "<http://datahub-gms:8080>").enableHiveSupport().getOrCreate()
After that, the initial datasets (which are not ingested in Datahub as they are .csv files) are modified.
My "problem" is that after executing all of the notebook, nothing appears on Datahub.
Is it needed to install anything in Jupyter itself, or does it look for the jars in some repository like Maven??
I would really appreciate some guidance on how this connection works!
Thanks in advance 🙂brave-tomato-16287
08/30/2022, 7:36 AM{\'message\': \'Showing partial results. The '
'request exceeded the "\n'
' "100000 node limit. Use pagination, additional filtering, or both in the query to adjust results.\', '
'\'extensions\': "\n'
Can anybody suggest something?alert-fall-82501
08/30/2022, 7:45 AMsqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:databricks.pyhive
alert-fall-82501
08/30/2022, 7:46 AMbumpy-journalist-41369
08/30/2022, 9:11 AMbumpy-journalist-41369
08/30/2022, 9:11 AMcolossal-hairdresser-6799
08/30/2022, 9:27 AMUPSERT
Python Emitter
Add or update aspect (tags, terms, owners)
Hi,
When looking at the documentation for adding tags, terms and owners to dataset all the examples includes
1. Get the current owners
current_tags: Optional[GlobalTagsClass] = graph.get_aspect_v2(
entity_urn=dataset_urn,
aspect="globalTags",
aspect_type=GlobalTagsClass,
)
2. Check if tag not already exist
if current_tags:
if tag_to_add not in [x.tag for x in current_tags.tags]:
3. If it doesn’t add to list of tags
# tags exist, but this tag is not present in the current tags
current_tags.tags.append(TagAssociationClass(tag_to_add)) <- new tag
4. Then add the current_tags with an UPSERT.
event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityType="dataset",
changeType=ChangeTypeClass.UPSERT,
entityUrn=dataset_urn,
aspectName="globalTags",
aspect=current_tags,
)
My understanding of an UPSERT is “if the aspect exist update that aspect and if not add it”.
So what I don’t understand is why we would need to go through 1-3 if we’re using UPSERT in the end anyways?colossal-hairdresser-6799
08/30/2022, 9:54 AMgraph.emit
Information regarding successful update or skipped write due to aspect already exists
Hi,
When using graph.emit to update an aspect is there any way to see if it was updated or just skipped since it already existed?
Right now I can only see a log saying
INFOmetadata ingestionOwner urnlicorpGroup:test already exists, omitting write