A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

Hi team,

I’m trying to consume Kafka topics with Avro schemas that were generated by Apache Flink. However Flink creates Avro records with the name property set to “record,” which breaks the ingestion of Avro data into DataHub. This is because “record” is recognized as a reserved name.

e.g. <https://github.com/datahub-project/datahub/issues/2565>.

Although it was indeed a limitation of the Python Avro library, its recent versions have a feature that allows disabling this check.
Here is the Apache JIRA ticket: <https://issues.apache.org/jira/browse/AVRO-3680>.

I was thinking, could we add this feature to DataHub?
We could either disable the check entirely or add another configuration option to this function call: <https://github.com/datahub-project/datahub/blob/94e7e51175660afbfb7b5cf198a3263f30d56f62/metadata-ingestion/src/datahub/ingestion/extractor/schema_util.py#L509>.

Thanks for raising this. cc <@U01GZEETMEZ>