Hey everyone, I noticed a hdfs dataset in the demo...
# ingestion
c
Hey everyone, I noticed a hdfs dataset in the demo, even though hadoop is not supported by sources of datahub. Does anyone have any idea how it was ingested or can this be done for other unsupported sources like minio? I would be grateful for your help
w
@calm-river-44367 there are a couple of ways to do it. You can either 1. write a hadoop source for DataHub, or 2. emit the metadata to DataHub (see the example in the link)
Let me know if you need any help or resources on it
Here’s an example of emitting a hdfs mce into datahub rest. https://gist.github.com/chinmay-bhat/0199a01a63c5cf58afa7f677b6148b6e
if you run this file while pointing it to your Datahub rest endpoint, you should see a sample hdfs dataset in your UI
c
thank you so much. I will definitely check it out and let you know the result
@square-activity-64562 when I run datahub delete --urn "<my urn>" it tells me only the data related to this dataset is deleted
f
@witty-state-99511 Hi Chinmay, I am pretty new to DataHub. My requirement is ingest data from HDFS to Datahub. I saw your gistfile python code. Could you share details on how to run that ? TIA