Hi folks, when do you trigger an ingestion from a ...
# ingestion
i
Hi folks, when do you trigger an ingestion from a data source? For example, if I want to get the metadata about a Kafka broker (https://datahubproject.io/docs/metadata-ingestion/#kafka-metadata-kafka), I can run the 'datahub' CLI command manually and get that data. Is it expected that this would be set up to run on a schedule? Or perhaps it could be triggered as part of a DAG?
g
Exactly correct @icy-holiday-55016 - the recommended practice would be to run on a scheduler. We have an example of an airflow dag which can run ingestion here: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub_provider/example_dags/generic_recipe_sample_dag.py
i
Cool, thanks @green-football-43791
s
you can use CLI to run the commands. So you just need a scheduler - jenkins job, cron, airflow dag. Take your pick
b
@bland-orange-95847 We don't have (today). However if it's something that we feel the community needs, we will of course consider building it! Is scheduling on your end going to be a challenge?
b
okay thanks. No it’s not a challenge. It will either get a K8s Cronjob or Airflow DAG(s) depending on the number of items. So its fine. Just wanted to make sure I don’t miss something when planning the architecture and be prepared fo upcoming releases