How does `datahub ingest` command mentioned in <ht...
# getting-started
s
How does
datahub ingest
command mentioned in https://datahubproject.io/docs/metadata-ingestion find datahub's kafka or rest endpoint? The use case is that I am thinking of running it via jenkins for now. Our jenkins will create a pod in K8s and run it. Jenkins will create a pod in jenkins namespace of our K8s cluster. Datahub is in apps namespace of our K8s cluster. So I am not sure how to configure datahub ingest so that it knows the location of datahub gms and frontend.
e
So likely you will use the rest endpoint for this. Since you are running in the same cluster, you don’t need to setup ingest on gms (you do need to if you plan to have external tools talk to gms) You can point to a port of a k8s service in the same cluster as follows
Copy code
http://<<service-name>>.<<namespace>>.svc.cluster.local:<<port>>
Copy code
<http://datahub-datahub-gms.default.svc.cluster.local:8080>
For instance if you used the default values and started datahub in the default namespace, you can use ^
s
the question is where do I set that. I was looking at https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/entrypoints.py but could not find anything for the URL in config. The only option seems to be
-c
for config which is for the recipe itself. Is there some environment variable somewhere for this?
b
correct
e
Yes!
g
Yup it’s in the recipe - one thing to add: you can use environment variables in the recipes and still populate those fields via k8s, and that way you avoid hardcoding it in the recipe