Hello everyone, I wanted to ask quick question - ...
# ingestion
b
Hello everyone, I wanted to ask quick question - is there some way to update the "stats" tab for a given urn using the curl command? Profiling doesn't entirely work for me, since the datahub has hardcoded "s3a" as a path and my s3 is on completely different host (url differs), you can find an error with reference below in the pasted image. Regardless of that - would it be possible to do profiling of data with manual update via curl? Perhaps there is some field inside the com.linkedin.metadata.snapshot.DatasetSnapshot that could be altered to get the same results as with profiling set to "True". Thank you deeply for all the help, you guys do tremendous work.
c
1. You can raise a feature request to support custom s3 urls 2. To solve your problem, you can populate the "datasetProfile" aspect of dataset and send over datahub using rest or kafka. Both Clients are available in java and python. You can use anyone based on your comfort. Examples are here
b
Great, thank you for your answer @careful-pilot-86309, do you have some example with already filled datasetProfile?
These contain mces but you can safely refer to aspect contents needed for mcp
b
Thanks @careful-pilot-86309 ! That's what I needed : )
@careful-pilot-86309 one more small thing, to be sure that I am on the same page, and understood you correctly. So how should this datasetProfile be inserted to the following curl?
curl '<http://localhost:8080/entities?action=ingest>' -X POST --data '{
"entity": {
"value": {
"com.linkedin.metadata.snapshot.DatasetSnapshot": {
"urn": "<my urn>",
"aspects": [
{
"com.linkedin.common.InstitutionalMemory": {
"elements": [
{
"url": "<http://www.google.com|www.google.com>",
"description": "Some description",
"createStamp": {
"time": 0,
"actor": "urn:li:corpuser:Datahub"
}
}
]
}
},
{
"com.linkedin.dataset.DatasetProperties": {
"description": "This is just a test!",
"tags": [],
}
},
{
"com.linkedin.common.GlobalTags": {
"tags": [
{
"tag": "urn:li:tag:tag1"
},
{
"tag": "urn:li:tag:tag2"
},
]
}
}
]
}
}
}
}'
Should be within the datasetProperties or after the datasetSnapshot? If it would be possible for you, could you put the datasetProfile in correct place in my curl command with some example value? Thanks in advance for help! : )