Hey guys i was wondering how folks are dealing with scenario DataHub #troubleshoot

Hey guys, i was wondering how folks are dealing wi...

lemon-hydrogen-83671

03/14/2022, 7:35 PM

Hey guys, i was wondering how folks are dealing with scenarios where the datastore for datahub_gms becomes unavailable. I noticed that during an ingestion run it will just publish an event into kafka to indicate that the record failed and moves on, but is there anything we can set to make automatic retries? AFAIK those failed topics just log the events in there

helpful-optician-78938

03/14/2022, 11:36 PM

@early-lamp-41924, could you help answer this?

early-lamp-41924

03/15/2022, 12:47 AM

As of now, we don’t have retries set up on the MCP/MCE processor side (which is consuming events from the kafka topic and sending the request to gms). https://github.com/linkedin/datahub/blob/master/metadata-jobs/mce-consumer/src/mai[…]m/linkedin/metadata/kafka/MetadataChangeProposalsProcessor.java We haven’t been able to invest in this area too much in the last few months, but would love contribution on this!!

lemon-hydrogen-83671

03/15/2022, 2:50 PM

Sure! Did you have anything in mind for how retries would be conducted? A very simple one i can think of is to perform a few exponential backoff retries before emitting a failed MCP when the datastore isn't available.

early-lamp-41924

03/15/2022, 5:32 PM

Yeah. Also take a look at the error codes or the exception thrown

early-lamp-41924

03/15/2022, 5:32 PM

We shouldn’t retry for all types of exceptions thrown by rest.li as some are real and retry will probably not help

plus1 1

early-lamp-41924

03/15/2022, 5:33 PM

for ingestion we retry on these status codes https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/emitter/rest_emitter.py#L47

lemon-hydrogen-83671

03/15/2022, 5:51 PM

are those status codes expected to change as things get migrated to openapi?

early-lamp-41924

03/15/2022, 6:03 PM

@orange-night-91387 any thoughts on this q?

orange-night-91387

03/15/2022, 6:10 PM

Currently I'm trying to align as closely as possible with the existing endpoints, but am very much open to feedback for alternate/additional codes.

👍 1

6 Views

Open in Slack

Previous Next