Hi Team I am trying to ingest the kafka data but g...
# troubleshoot
n
Hi Team I am trying to ingest the kafka data but getting the following error: ImportError: datahub.ingestion.source.confluent_schema_registry.ConfluentSchemaRegistry Below is the config that I am using: source: type: "kafka" config: platform_instance: "amqstreams-cluster" connection: bootstrap: "*****:9095" schema_registry_url: "*****:8080" sink: type: datahub-rest config: server: 'http://datahub-gms.telco-dataprocessing-mvp:8080' # Add a secret in secrets Tab token: null
h
@numerous-account-62719 are you using UI ingestion or CLI ingestion ?
n
I am using CLi
h
which CLI version are you using ?
n
how to check that? my datahub version is 0.8.41
'cli_version': '0.8.38.4'
Got this cli version from stack trace
h
you can just type
datahub version
at terminal where you run cli ingestion to check version.
It is advisable to update the datahub cli version to match with datahub server version. Is it possible for you to update cli version and check if error still exists ?
another quick way to confirm the presence of module - is to confirm if this import works in python shell in same terminal.
from datahub.ingestion.source.confluent_schema_registry import ConfluentSchemaRegistry
n
@hundreds-photographer-13496 I tried running it from the shell getting the following error
>> from datahub.ingestion.source.confluent_schema_registry import ConfluentSchemaRegistry
Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/extractor/protobuf_util.py", line 24, in <module> import grpc ModuleNotFoundError: No module named 'grpc' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/confluent_schema_registry.py", line 13, in <module> from datahub.ingestion.extractor import protobuf_util, schema_util File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/extractor/protobuf_util.py", line 28, in <module> raise ModuleNotFoundError( ModuleNotFoundError: The protobuf_util module requires Python 3.7 or newer because of the networkx.algorithms.dag.topological_generations dependency.
h
oops looks like required dependencies are missing. Can you try installing kafka plugin first and execute the recipe?
pip install 'acryl-datahub[kafka]'==<your cli version>
n
Yes that is resolved now
But getting this error now: %4|1663837888.835|FAIL|rdkafka#consumer-1| [thrdamqstreams cluster kafka bootstrap.amqstreams kafka9095/bootst]: amqstreams-cluster-kafka-bootstrap.amqstreams-kafka9095/bootstrap Disconnected: verify that security.protocol is correctly configured, broker might require SASL authentication (after 341ms in state UP, 4 identical error(s) suppressed)
This is the config that I am using source: type: "kafka" config: platform_instance: "amqstreams-cluster" connection: bootstrap: "***:9095" schema_registry_url: "https://****:8080" sink: type: datahub-rest config: server: 'http://datahub-gms.telco-dataprocessing-mvp:8080' # Add a secret in secrets Tab token: null
@hundreds-photographer-13496?
h
Hey @numerous-account-62719 the message says that broker might require SASL authentication, you can follow this example - https://datahubproject.io/docs/generated/ingestion/sources/kafka#connecting-to-confluent-cloud if that's the case for you.
n
Yes I tried this but what to put in the below 2 fields sasl.username: "${CLUSTER_API_KEY_ID}" sasl.password: "${CLUSTER_API_KEY_SECRET}"
h
Hey, my knowledge in limited in this regard. Can you check with your kafka administrator regarding these values. If it helps - datahub kafka source uses python kafka client library
confluent_kafka
under the hood.