Hi folks, when do you trigger an ingestion from a data source? For example, if I want to get the metadata about a Kafka broker (https://datahubproject.io/docs/metadata-ingestion/#kafka-metadata-kafka), I can run the 'datahub' CLI command manually and get that data.
Is it expected that this would be set up to run on a schedule? Or perhaps it could be triggered as part of a DAG?
you can use CLI to run the commands. So you just need a scheduler - jenkins job, cron, airflow dag. Take your pick
b
big-carpet-38439
08/06/2021, 4:12 PM
@bland-orange-95847 We don't have (today). However if it's something that we feel the community needs, we will of course consider building it! Is scheduling on your end going to be a challenge?
b
bland-orange-95847
08/09/2021, 5:13 AM
okay thanks. No it’s not a challenge. It will either get a K8s Cronjob or Airflow DAG(s) depending on the number of items. So its fine. Just wanted to make sure I don’t miss something when planning the architecture and be prepared fo upcoming releases