FYI <@U01NN9CTQ01>, I’m trying to figure out how t...
# ingestion
w
FYI @early-lamp-41924, I’m trying to figure out how to integrate our airflow jobs with datahub’s lineage but I’m running into issues trying to figure out how to get the kafka ssl configuration passed. I’m getting errors like
Copy code
ssl.ca.location
  extra fields not permitted (type=value_error.extra)
for the extra json configuration I’m trying to put together. I tried using a similar pattern with keys from the ingestion recipes I’m using to get Datasets, but I’m getting errors like
Copy code
schema_registry_url
  extra fields not permitted (type=value_error.extra)
Kind of at a loss since most of the links like the following aren’t working: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/configuration/kafka.py#L56 Any tips on how to do this configuration?
m
@white-beach-27328: thanks for flagging, looks like this is the right doc link at least (https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#serde-producer)
e
@gray-shoe-75895 do we have a way of setting ssl config in kafka sink?
g
Yep, here’s an example: https://github.com/linkedin/datahub/blob/3d9f4ec1b40a9d75538bd1a5b40007c09ea02419/metadata-ingestion/examples/recipes/secured_kafka_to_console.yml. Seems like the links to kafka docs have rotted, so I’ll update them soon
w
yeah It looks like I need to include the
connection
top-level key
1
in the JSON
I had tried to do it just below that and then everything just upleveled to the top
Ok I have it properly writing out to the MCE topic. Thanks for the help! Two notes: • Might want want to make a note about how to do this configuration given. I was digging into the code base to figure out that the extra args got passed into the kafka sink, other people might not. • As the pattern stands, if you’re using basic auth with a schema registry currently you have to place a the secret for the schema registry user in the
Extra
JSON in plaintext which isn’t great. Maybe most people aren’t using kafka as the ingestion mechanism and have instead moved to the REST sinks, so this is less relevant. Just some thoughts.
g
Thanks for the feedback! • Absolutely - let me know if you think this is sufficiently clear https://github.com/linkedin/datahub/pull/2837 • If you’re not comfortable with having the secrets in plaintext in the recipe, we also support bash-like environment variable substitution, so you can use something along these lines:
<http://basic.auth.user.info|basic.auth.user.info>: ${KAFKA_BASIC_USER_AUTH}