good morning from Chicago :wave::skin-tone-2: We'...
# ingestion
o
good morning from Chicago šŸ‘‹šŸ» We've been hitting an issue recently where a failed MCE message can't be written to the
FailedMetadataEventChange_v4
topic. We have compaction enabled on that topic, but it seems DataHub is trying to write messages there without a key. (more details in thread)
so I see an error like the following in the MCE consumer's logs
173204.819 [mce-consumer-job-client-0-C-1] INFO c.l.m.k.MetadataChangeEventsProcessor - Error while processing MCE: FailedMetadataChangeEvent - {error=java.lang.ClassCastException
, metadataChangeEvent={proposedSnapshot={com.linkedin.metadata.snapshot.DataFlowSnapshot={urn=urnlidataFlow:(airflow,redacted_airflow_job_title,prod), aspects=
And then soon after
173204.821 [kafka-producer-network-thread | producer-1] ERROR o.s.k.s.LoggingProducerListener - Exception thrown when sending a message with key='null' and payload='{"auditHeader": null, "metadataChangeEvent": {"auditHeader": null, "proposedSnapshot": {"urn": "urn:...' to topic FailedMetadataChangeEvent_v4:
org.apache.kafka.common.InvalidRecordException: This record has failed the validation on broker and hence will be rejected.
We have compaction set up on that topic to reduce the resource usage of the topic, which we thought was safe to do based on the conversation in https://datahubspace.slack.com/archives/CUMUWQU66/p1622671383110500?thread_ts=1622670900.109800&cid=CUMUWQU66.
The setup in https://github.com/linkedin/datahub/tree/9b81fa428cfa10260b047c6656180f1d90a33978/docker/kafka-setup doesn't have an opinion about whether or not topics are compacted, as far as I can tell. Seems that it's taking whatever the default settings are for the cluster.
I'm not blocked by this because for now I'm just going to turn off compaction on that topic to prevent such errors. But would the team here consider one or both of the following? • update https://github.com/linkedin/datahub/tree/9b81fa428cfa10260b047c6656180f1d90a33978/docker/kafka-setup with explicit guidance on whether or not to enable compaction on each of the necessary topics • for the topics where you recommend compaction, ensure that DataHub always produces keyed messages to those topics
āž• 1
other details: • running DataHub 0.8.8 • Kafka topics are not created with the DataHub
kafka-setup
job, but through a different internal process for provisioning topics
šŸ™ 1