Hi all, I am trying to ingest data coming from a H...
# ingestion
l
Hi all, I am trying to ingest data coming from a Hive Metastore (hosted at Databricks) and it works almost perfectly but we are not seeing the descriptions of the columns (and maybe the tables). I could find that it has been fixed in acryl-datahub 0.3.0 as mentioned here. In fact when I use the console sink, I can see the description, but when I use the HTTP sink, I’ll not see them in the UI. As anyone seen this? Is there any way to fix or something? Thanks!
m
Hi @limited-cricket-18852 can you use the file sink and see what it says? Also which version of the datahub cli are you on? 0.3.0 is really old 🙂
latest is 0.8.42
l
Hi @mammoth-bear-12532! I am manually running my recipes using the acryl-datahub python library at version 0.8.40.2 in a scheduled python script. I ran using the File sink, and it’s working fine: I get all the descriptions. Also, we are running Datahub version 0.8.41 in Kubernetes.
m
can you paste a snippet of the file that show the descriptions?
l
sure, here is a sample:
also, here is something interesting. I could proxy and “spy” the HTTP calls leaving my computer and see that the call to http://datahub-gms.mycompany.com/entities?action=ingest to have descriptions in the
.<http://entity.value.com|entity.value.com>.linkedin.metadata.snapshot.DatasetSnapshot.aspects[*].com.linkedin.schema.SchemaMetadata
(json path) field
tested on self hosted Datahub on v8.4.1 and python lib acryl-datahub at v0.8.42
m
so if we check on this one dataset
"urn:li:dataset:(urn:li:dataPlatform:hive,<http://bronze.my|bronze.my>_table,PROD)"
btw the sample json had a typo
can you check if you have edited the descriptions in the UI ... that could be one reason why these descriptions are not showing up
datahub get --urn
"urn:li:dataset:(urn:li:dataPlatform:hive,<http://bronze.my|bronze.my>_table,PROD)"
--aspect editableSchemaMetadata
l
Hi! sorry about the json, I removed “sensitive” columns (as well as updating the name of the dataset). Here is the result of the command:
Copy code
{}
I have not edited anything in the UI
but something very strange. I now have the comments on the dataset I was testing! Then I tried another dataset (I keep using the same recipe but filter on a specific table with
source.config.table_pattern.allow
) but still no comment in the UI (but it’s there are in the FILE sink)
I do not know if updating the python dep from
0.8.40.2
to
0.8.42
or changing the sinks to come back to
datahub-rest
or something else, made the first dataset to now have a description on the columns. But with this other table, still nothing. I am carrying on with the tests