Hello, I hope you are all well, I have a question ...
# ingestion
f
Hello, I hope you are all well, I have a question about generating runIds, I hope you can help me please, =) I currently ingest through a recipe with the glue plugin, I would like to persist the runId to a DB, to later use it in rollback if necessary, the response of the ingestion does not throw the associated runId, and when I get the runId through the command ( datahub ingest list-runs ) it only returns 10 last ingests, and it seems that the generation of runId is not immediate, so it is difficult to associate the execution pipeline of my job to a specific runId, is there any way to obtain the associated runId to the ingestion?, how can I get all the past runId, which are not listed in the first 10? thank you very much!
g
Hey @freezing-farmer-89710 - the run id is already persisted into the metadata_aspect_v2 table!
in the
systemmetadata
column
f
@green-football-43791 thank you very much, i will check it🙌
@green-football-43791 I already checked the table of my local environment with the following query, however in the productive environment I am not the owner of the infrastructure, I only consume it, is there a REST way to bring this information? , how can I link these runIds to a specific ingest? This is because the recipe is executed from the gitlab CI and I would like to link the pipeline number to the ingest.On the other hand, when I list the runIds through the command, I don't see the runId immediately after ingest, it still takes a while, right? select distinct json_extract(systemmetadata,'$.runId') from metadata_aspect_v2;
g
Ah hmm, it should show up immediately
Do you see it after some delay?
What do you see in the system_metadata of that aspect you had just ingested?
f
As I did not find a way to link the ingestion to the runId directly, since the only thing I have is the response at the end of the recipe ingestion, since another area owns the infrastructure, I only consume it, I execute a script that lists the runids and brings the last of the ingested source type, I tried and it is not so immediate that the runId appears, I think that since different sources such as glue, redshift and looker are ingested in parallel, it seems that the information is queued and until it finishes ingest returns a runId, so it's not immediate. I understand that this is how it works, hopefully in the future, could you please add the runId in the response of the ingestion, that would help to control changes and be able to perform rollback with more precision. 😃
g
ah, I think that makes a lot of sense @freezing-farmer-89710 - great suggestion
f
@green-football-43791 thank you very much =P, have a nice day