Can we delete /modify this pydantic check? <https:...
# ingestion
c
Can we delete /modify this pydantic check? https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/configuration/kafka.py#L21 Our bootstrap name has dots in the name so it does not pass, so we patched it. For the rest everything works fine, so nice work!
b
Thanks for calling this out Adriaan! cc @gray-shoe-75895
c
In addition to this I think there is an additionak bug. For me, it seems that Pydantic is throwing an error if you follow the examples which is specified here: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/recipes/kafka_to_datahub.yml More specifically something like
Copy code
sink:
  type: "datahub-kafka"
  config:
    connection.bootstrap: "example:port"
Does not work, as pydantic throws
Copy code
connection.bootstrap
  extra fields not permitted (type=value_error.extra)
So I had to specify it as
Copy code
sink:
 type: "datahub-kafka"
  config:
    connection:
      bootstrap: "example:port"
Which is not a big deal, but I think it’s worthwile to look into, as you might want to or update the pydantic rules, or change the examples to make it easier to follow.
And just another random idea is to add strong typing for the producer/consumer config. So you get a bit better error/pydantic notifications. (Not that it’s that bad, kafka gives quite good messages and it’s easy searchable.)
g
can you elaborate a bit on the strong typing piece?
g
ah those are actually passed directly into the producer/consumer - they serve as an escape hatch if you want to do anything fancy - see https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md and https://docs.confluent.io/platform/current/clients/confluent-kafka-python/index.html#serde-producer
for the first thing - normally the yaml parser unpacks those for us, but I will update the documentation nonetheless
agreed on the hostname validation - will loosen the validation a bit
c
Cool! Wasn’t aware that it was that configurable! 🙂 So makes sense yes. And yeah, I’m just trying it out so I wasn’t sure if I did anything wrong or not 😄 But I thought it couldn’t hurt to at least notify you. But thanks or the updates and great work!
g
Thanks! To close the loop here, I opened two PRs based on your feedback - https://github.com/linkedin/datahub/pull/2171 and https://github.com/linkedin/datahub/pull/2172 🙂 Let me know if you run into any other issues - happy to help
👍 1