Hey everyone, I am trying to build a custom connec...
# advice-metadata-modeling
b
Hey everyone, I am trying to build a custom connector for Alteryx. Alteryx is an Orchestration/ETL tool where users can connect to almost any data source, do transformations and then output the data as a Tableau extract, CSV, to Snowflake or similar. As Alteryx is currently not one of the Sources I can add in DataHub, I was wondering if you have a hint on where I can get started to build a custom connector? 🙂 Thanks!
đź‘€ 1
âś… 1
I have found this on Github already, but the very brief description on adding custom sources does not really help me
a
Hi @bitter-park-52601, here are some docs that could be helpful:
h
p
fwiw I did something similar (custom ingestion sources) for Dataiku a competing product to Alteryx
b
Super interesting @proud-table-38689! Is there a chance I could take a look at how you did that? 🙂
a
We did it with a python script that uses the alteryx gallery API to download the execution logs. Then we parse the execution logs to identify the sources and sinks. It's dirty and needs constant maintenance for parsing new sources. Alteryx workflows are XML, so easy to parse, but there are some components that are dynamic and can change the data source/destination at execution time.
b
I see, sounds somewhat messy. For ingestion, did you use DataHub’s OpenAPI or did you build a custom ingestion source for Alteryx? @acoustic-rose-68681
a
Yes it is messy, but we get >95% of our workflows in datahub, with lineage. Before we had nothing. Alteryx's own solution for data governance does not support the dynamic input tools. We are using the Python API for custom ingestions.
p
yes I’ll try and get some sample code for you, please give me a few moments
b
Awesome, thanks!
@acoustic-rose-68681 sounds also very interesting! Is there any way you could share a sample code on how you push the data to Datahub after parsing the XMLs? 🙂
a
@bitter-park-52601 I'm not parsing the workflow XML at this time. I'm parsing the execution logs with a bunch of regexes. It is very hacky, high manintenance and only good for a POC.
g
@proud-table-38689 Could you tell more about the ingestion you do from Dataiku? We are Dataiku users as well and integration with DH is on our wishlist. So interested in approach, what you harvest from Dataiku. Anything you care to share really...
p
oh yes sorry I owe a code snippet here, but “datasets” and “projects” in Dataiku I think are good things to store in DataHub
a
Hey y’all, I went ahead and created #integration-alteryx-datahub in case you want a dedicated channel to discuss this! cc: @mammoth-bear-12532