Hi everyone, what would be your suggestion for get...
# ingestion
b
Hi everyone, what would be your suggestion for getting a CSV with metadata for a source into datahub? Does using the
File
source make the most sense here?
plus1 1
s
File
source is for when you metadata in the appropriate format. Not CSV format. What kind of metadata are you looking to ingest here?
b
My source is only accessible through an XML API. We've put the metadata for this source into CSV format.
Would I use
File Sink
to get my metadata into the appropriate format?
FYI, the data can easily be transformed into JSON/YAML
s
@bulky-jackal-3422 We released this https://datahubproject.io/docs/generated/ingestion/sources/csv today in the latest CLI https://github.com/acryldata/datahub/releases/tag/v0.8.38.3. Would be great if you can see if this helps you out. @echoing-airport-49548 worked on getting this out. Please let us know if this helps.
e
@bulky-jackal-3422 please let us know if you have any feedback here!
b
Oh wow! Thanks for the quick response. I was able to accomplish what I wanted by using the Python Emitter and parsing the CSV myself. I do think this is a useful implementation, though. Would it be possible to add descriptions to fields this way as well?
e
Yes, we plan to add support for descriptions soon!
plus1 1
thank you 2
s
Hi there, I'm trying the module csv-enricher and I have some questions about it. 1. Is the recipe to ingest the csv something like this?
source:
type: csv-enricher
config: /tmp/csv_test.csv
sink:
type: datahub-rest
config:
server: '<http://datahub-gms:8080>'
2. If I understood correctly I don't need to install any plugin for it, right?
e
I would check if the plugin is installed by default using
datahub check plugins
, but if not you can install it using
pip install acryl-datahub[csv-enricher]
I need to modify this file so it doesn’t reference my local environment 😅 but here’s what the recipe should look like https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/recipes/csv_enricher_to_datahub_rest.dhub.yml
s
Thank you very much Aditya! I've checked and the csv-enricher plugin is installed. Now i'm adapting your recipe and test the ingestion. I'll let you know if it's working. Thanks
Worked 😉 Thanks
e
Amazing!
thank you 1
m
Hi @echoing-airport-49548, I'm trying your helpful recipe file, but I'm running into issues. For some reason, I'm unable to connect to the sink.config.server
server: "<http://localhost:8080>"
. Do you know what might be causing this? I also tried
<https://datahub-gms:8080>
, but with that i get a file not found error
e
hmmm how are you running ingestion? is it through the UI or terminal?
m
Thanks for getting right back to me! I was trying to use the UI
e
Got it! Unfortunately any sources that rely on local files won’t work on UI ingestion, which is why it can’t find the file
I’d recommend you run CSV ingestion on the command-line
m
ah got it. I'll give that a try -- thank you!
e
No problem, let me know how it goes!
m
Worked great! Thanks again!
🙌 1
e
I’m so glad to hear it!