I’m running into a problem where using the ingesti...
# ingestion
w
I’m running into a problem where using the ingestion recipe framework in which the schema registry in our kafka cluster is rejecting all the messages trying to be created on our MetadataChangeEvent_v4 with the following message:
Copy code
{'error': KafkaError{code=INVALID_RECORD,val=87,str="Broker: Broker failed to validate record"}, 'msg': <cimpl.Message object at 0x7f185b5d6a70>}
This is using the
acryl-datahub==0.3.4
package and the redshift source. I turned on debug but it doesn’t seem to be giving much more information. Did some change in how these messages are produced? Looking at the older ingestion framework (0.6.1) compared to the new one, I think keys were basically dropped from the produced records. They used to include the urn as the key: https://github.com/linkedin/datahub/blob/v0.6.0/metadata-ingestion/sql-etl/common.py#L100. Now it’s not included: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/emitter/kafka_emitter.py#L62-L66. This causes our topics to fail because topic compaction no longer works. Is compaction something we shouldn’t be doing on these topics?
FYI @early-lamp-41924
e
@gray-shoe-75895 any ideas?
g
Huh interesting - seems like the key is normally not required, but is required if you want to use compaction (as per https://stackoverflow.com/questions/29511521/is-key-required-as-part-of-sending-messages-to-kafka). I think it makes sense for us to include a key in the messages we emit via the ingestion framework so that compaction works
@white-beach-27328 do you think this will fix it https://github.com/linkedin/datahub/pull/2634?
w
I think so? I can’t say concretely without testing the source code myself, but it’s my best guess.
I can try to update the ingestion with that branch and run it
m
@white-beach-27328 yes this seems to be the right fix.
w
Is there a test for the produced kafka messages? I’m all for solving the problem quickly, but we may want a test if this is going to cause issues if someone reverts the key being produced
g
yep there is a test (hence CI failed on the PR), but wanted to unblock you
Ill update the tests soon, but you can try out the change in the interim using
pip install 'git+<https://github.com/hsheth2/datahub.git@kafka-key#egg=acryl_datahub[datahub-kafka]&subdirectory=metadata-ingestion>'
w
yeah let me give it a shot
I believe that worked
m
nice!
w
Thanks for the PR @gray-shoe-75895 Happy to see it merged!
What’s the speed at which these changes make it out into the
acryl_datahub
package in pypi?
g
It’s still pretty ad hoc right now- I’ll be cutting a new release today (will ping this thread when I do). Eventually we’ll move towards a more automated publishing model using github actions
w
cool cool, yeah let me know. For now I can just lock to the git hash
g
@white-beach-27328 I just released acryl-datahub 0.4.0, which should have this change
w
heeeeey. Nice! Thanks for the quick turn around,