hi everone, I have a question regarding the ingest...
# getting-started
q
hi everone, I have a question regarding the ingestion scheduling. How will I best orchestrate the ingestion runs periodically to datahub? I would like to use airflow for that. Anybody has some experiences? Thanks a lot for your help
m
Hi @quick-raincoat-6968 , yes most folks in the community use Airflow for scheduling it.
we have a few sample DAGs in the repo
q
I need to be more specific. I have running airflow already out of the machine where I'm running datahub. So how will I trigger kind of the command "datahub ingest -c ..." via my airflow?
or will I just need to specify my path to the datahub-gms in your example in the pipeline?
thanks a lot to your insights
m
if you are running a local airflow already, are you following this guide? https://datahubproject.io/docs/docker/airflow/local_airflow/
if so, you just have to add a dag with the recipe to your airflow installation
q
no I'm running on a GoogleComputeEngine an airflow that I use to orchistrate other stuff. and now I want to orchistrate the datahub ingestion as well and I'm looking for the best way to launch the ingestion jobs from there? does this make sense
hi, sorry an other question. I would like to follow your proposition. Using the datahub.ingestion.run.pipeline. we need to make these request to the datahub-rest authorized in order to decline others the connection. I couldn't find a proper documentation of how I'm able to configure the sink more sophistically, can you point me to some resources? thanks a lot