We just tried to update from 0.6.1 to 0.7.0 and su...
# getting-started
h
We just tried to update from 0.6.1 to 0.7.0 and suddenly the MCE isn’t consuming message anymore. No errors in the log, no configs has been changed. Any ideas as to what could be the issue?
i
Elasticsearch upgrade from v5 to v7?
h
yes
seems to have gone through fine, but I nuked the old volume and now I’m trying to push some new data, and it’s not happening. No reaction from MCE even though the messages exist in the Kafka topic
i
Perhaps the MCE has changed schema in the version in a non-backwards-compatible way? Try comparing your MCE messages with https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/mce_files/bootstrap_mce.json
h
Good idea! I check my producer, and it was indeed using an out-of-date schema. I updated it, but still no messages being processed.
I can see from kafka that the messages are being consumed (offset is updated, lag is 0).
i
No messages are processed but offset is updated? Is that it?
h
yeah, so the consumer clearly has a connection to kafka (I can see the offset being updated on both ends ) but the logs in the MCE shows no life, and nothing get’s forwarded to the GMS. I’m now turning on debug logging to see if it would reveal something.
l
@early-lamp-41924 ^ this might be related to the silent error you were seeing
h
So got the debug logs on and now I see that events are indeed recieved, but I see strange stuff like this in the log:
so they key seems to be serialized / deserialized wrong?
g
Is it possible that the emitter has an issue? How are you producing these events?
If you emit events directly to REST, is that successful?
h
Yes, after checking the emitter it does seem like we’ve been avro-serializing the key aswell. No clue why this worked before the update.
But the MCE consumer could be a bit more verbose in these situation imo
g
yes- good point. While you have all the details in your head, would you mind filing an issue?
to add verbosity to MCE in this case
l
@high-hospital-85984 were you able to fix the issue or is the pipeline still stalled?
will be good to publish a runbook for people who face similar issues after upgrading
h
@green-football-43791 Looking at the code, it does seem like the Consumer is quite verbose already (error logging and sendFailedMCE). The debug log I’m seeing is like
Copy code
13:44:20.181 [mce-consumer-job-client-0-C-1] DEBUG o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer - Received: 1 records
13:44:20.181 [mce-consumer-job-client-0-C-1] DEBUG o.s.k.l.a.RecordMessagingMessageListenerAdapter - Processing [GenericMessage [payload={"auditHeader": null, "proposedSnapshot": <REMOVED_STUFF_HERE>}]}, "proposedDelta": null}, headers={kafka_offset=11314, kafka_consumer=org.apache.kafka.clients.consumer.KafkaConsumer@621b2105, kafka_timestampType=CREATE_TIME, kafka_receivedMessageKey=�urn:li:dataset:(urn:li:dataPlatform:snowflake,foobar,PROD), kafka_receivedPartitionId=0, kafka_receivedTopic=MetadataChangeEvent_v4, kafka_receivedTimestamp=1616420660132}]]
which might indicate that the problem is actually upstream (
RecordMessagingMessageListenerAdapter
seems to be part of the kafka adapter in the spring framework). So if the event is bad enough, it never reaches the MCE, and nothing is logged. I’ll create an issue about this now.
@loud-island-88694 I tried out a quick fix locally and it’s seems to work (but did not see the error locally in the first place either, which was strange). I’ll try it in our staging environment tomorrow (EET).
l
ok thanks!
h
So we now fixed the emitter and the
kafka_receivedMessageKey
now looks ok in the debug logs in the MCE consumer, but the messages do not get to the
MetadataChangeEventsProcessor.consume
function.
Small update: The problem was actually a non-issue. Message was in fact getting through to GMS, but they were not logged because they didnt update anything (we just kept rerunning the same testing script). 🤦 Better logging (at least debug) would have helped here.
l
that's fair. We are starting to add instrumentation and health checks across the stack. @early-lamp-41924 @gray-shoe-75895 ^
👍 1