Hello. I am getting an error trying to ingest meta...
# ingestion
f
Hello. I am getting an error trying to ingest metadata via hive from Databricks. I am including my recipe and the captured output (after cleaning up some identifiers and keys). My DataHub deployment is in Azure Kubernetes Service and I am running the default deployment that is created via the documented helm commands in the getting started guide.
Copy code
helm repo add datahub <https://helm.datahubproject.io/>
helm install prerequisites datahub/datahub-prerequisites
helm install datahub datahub/datahub
I also created the metadata ingestion source using the DataHub UI. Any ideas on how to fix this?
l
@careful-pilot-86309 ^
h
Hi, Please remove http:// from host_port section in your recipe. i.e. simply replace line
Copy code
host_port: '<https://adb>-<workspace url id>.<http://azuredatabricks.net:443|azuredatabricks.net:443>'
with
Copy code
host_port: 'adb-<workspace url id>.<http://azuredatabricks.net:443|azuredatabricks.net:443>'
👍 1
s
Hi @hundreds-photographer-13496 Is this true for all hive sources or just databricks? Maybe we can add a validator in hive source config similar to https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/superset.py#L63 to remove the
http
part so no one else in community faces the same issue?
h
Hi @square-activity-64562, I think, this is generally true for all sql (rather sqlalchemy) based sources. We do not include scheme (http://, hive:// ) in
host_port
field in config. there is a separate
scheme
field in config which take care of this part. We can update Databricks section in datahub hive docs to mention
host_port: <Databricks Server Hostname>:<Port>
instead of host_port:
<databricks workspace URL>:443
The details of Databricks Server Hostname and Port are available on link referenced in doc.
s
Thanks for clarifying @hundreds-photographer-13496 I raised a small PR https://github.com/linkedin/datahub/pull/4330 so people in community don't face this again. I don't have hive running so cannot test it out. Can you review please if this looks fine to you?
👍 1
f
Thanks for the feedback. I will remove the http from
host_port
and give it a try. Updating the docs is a good idea as well
I removed the http from
host_port
and this error no longer occurs.
👍 1