Hey guys, i was wondering how folks are dealing wi...
# troubleshoot
l
Hey guys, i was wondering how folks are dealing with scenarios where the datastore for datahub_gms becomes unavailable. I noticed that during an ingestion run it will just publish an event into kafka to indicate that the record failed and moves on, but is there anything we can set to make automatic retries? AFAIK those failed topics just log the events in there
h
@early-lamp-41924, could you help answer this?
e
As of now, we don’t have retries set up on the MCP/MCE processor side (which is consuming events from the kafka topic and sending the request to gms). https://github.com/linkedin/datahub/blob/master/metadata-jobs/mce-consumer/src/mai[…]m/linkedin/metadata/kafka/MetadataChangeProposalsProcessor.java We haven’t been able to invest in this area too much in the last few months, but would love contribution on this!!
l
Sure! Did you have anything in mind for how retries would be conducted? A very simple one i can think of is to perform a few exponential backoff retries before emitting a failed MCP when the datastore isn't available.
e
Yeah. Also take a look at the error codes or the exception thrown
We shouldn’t retry for all types of exceptions thrown by rest.li as some are real and retry will probably not help
plus1 1
l
are those status codes expected to change as things get migrated to openapi?
e
@orange-night-91387 any thoughts on this q?
o
Currently I'm trying to align as closely as possible with the existing endpoints, but am very much open to feedback for alternate/additional codes.
👍 1