Hi community I was having a look at the the kafka connector DataHub #contribute-code

Hi community, I was having a look at the the kafka...

breezy-guitar-97226

02/08/2022, 8:17 AM

Hi community, I was having a look at the the kafka connector code these days in order to add stateful ingestion to it and I noticed that there are two cases that produce warnings: 1. Schema not found (https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/kafka.py#L165-L166) 2. Schema is not AVRO (https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/kafka.py#L175-L177) while I may see specific scenarios where this behaviour could be useful, both of these warnings could actually be misleading, as outcome of perfectly normal situations (i.e. a topic with no schema or using a different schema strategy and a topic with a non AVRO schema), when a commit policy of

ON_NO_ERRORS_AND_NO_WARNINGS

is selected. My proposal would be to either not report the warnings entirely, or to add a couple of configuration parameters to the source: •

ignore_warnings_on_missing_schema

•

ignore_warnings_on_schema_type

to selectively disable them. wdyt?

incalculable-ocean-74010

02/08/2022, 5:27 PM

Hello Claudio! You bring up a very good point! I think the best way forward would to be add both of those configuration parameters, disabled by default to not break UX initially. As it gets used we can think of switching the defaults. Thoughts?

breezy-guitar-97226

02/09/2022, 8:40 AM

Hi Pedro, yes I agree with your analysis, shouldn’t be hard to achieve 😉

teamwork 1

breezy-guitar-97226

02/17/2022, 12:26 PM

FYI: I opened a PR for this: https://github.com/linkedin/datahub/pull/4169

😍 1

2 Views

Open in Slack

Previous Next