Hi! I was playing around with `datahub ingest list...
# ingestion
h
Hi! I was playing around with
datahub ingest list-runs
and got presented with something unexpected. Most IDs are random GUIDs, and one suspicioulsy large run wrt row count is called
no-run-id-provided
. We primarily use the kafka sink, is there a way of providing some human-readable name to the runs for easier rollback?
m
The
no-run-id-provided
rows were ingested before the ingestion framework had the ability to add run ids on ingestion. I have been thinking about this readability issue as well. Today, run ids can be specified in the ingestion yaml as part of config Eg
run_id: looker
will attach static run_id to each run.
datahub ingest show —run-id RUN_ID
will provide you a summary of each run with sample rows ingested
h
Oh nice! I expected there to be an option lile that but failed finding it! 😀 going to add it now to all our recipes
m
I was going to experiment with dynamic run ids using env var expansion. Let me know if you come up with something nifty.
h
First thought was to do
export RUN_ID_SUFFIX=$(date +%s)
before the run and have
run_id: looker_${RUN_ID_SUFFIX}
in the config. Works for us as we anyway run some preparation scripts before the actual ingestion
m
hmm there might be a way to do it in code also ... hang on
l
h
That was quick! 😅
m
That's @loud-island-88694 executing 🏃