Hi all. I have version 9.0 of Data Hub deployed on...
# troubleshoot
f
Hi all. I have version 9.0 of Data Hub deployed on Amazon EKS but I am having some connection issues. I am attempting to ingest metadata from Snowflake but when I put in my snowflake info and hit "Test Connection" in the UI I get an endless loop. I attempted to manually execute and ingestion as well and received N/A instead of the job kicking off. I read on the UI guide that often this is due to the datahub-actions pod being down. I checked the error logs for the datahub-actions pod and am getting the following kafka error regarding an "Unknown magic byte" :
Copy code
2022/10/24 19:37:03 Waiting for: <http://datahub-dev-datahub-gms:8080/health>
2022/10/24 19:37:03 Received 200 from <http://datahub-dev-datahub-gms:8080/health>
No user action configurations found. Not starting user actions.
[2022-10-24 19:37:04,202] INFO     {datahub_actions.cli.actions:68} - DataHub Actions version: unavailable (installed editable via git)
[2022-10-24 19:37:04,333] INFO     {datahub_actions.cli.actions:98} - Action Pipeline with name 'ingestion_executor' is now running.
Exception in thread Thread-1 (run_pipeline):
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/confluent_kafka/deserializing_consumer.py", line 137, in poll
    value = self._value_deserializer(value, ctx)
  File "/usr/local/lib/python3.10/site-packages/confluent_kafka/schema_registry/avro.py", line 317, in __call__
    raise SerializationError("Unknown magic byte. This message was"
confluent_kafka.serialization.SerializationError: Unknown magic byte. This message was not produced with a Confluent Schema Registry serializer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline_manager.py", line 42, in run_pipeline
    pipeline.run()
  File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline.py", line 161, in run
    for enveloped_event in enveloped_events:
  File "/usr/local/lib/python3.10/site-packages/datahub_actions/plugin/source/kafka/kafka_event_source.py", line 152, in events
    msg = self.consumer.poll(timeout=2.0)
  File "/usr/local/lib/python3.10/site-packages/confluent_kafka/deserializing_consumer.py", line 139, in poll
    raise ValueDeserializationError(exception=se, kafka_message=msg)
confluent_kafka.error.ValueDeserializationError: KafkaError{code=_VALUE_DESERIALIZATION,val=-159,str="Unknown magic byte. This message was not produced with a Confluent Schema Registry serializer"}
%4|1666640260.315|MAXPOLL|rdkafka#consumer-1| [thrd:main]: Application maximum poll interval (10000ms) exceeded by 336ms (adjust <http://max.poll.interval.ms|max.poll.interval.ms> for long-running message processing): leaving group
Has anyone seen this before or has any advice of what I can troubleshoot to get Data Hub ingesting from Snowflake properly? Any help much appreciated!
m
Which schema registry are you using?
f
Hi @mammoth-bear-12532, I believe we are using the AWS Glue Schema Registry.
m
@flat-match-62670 did you ever get to fix that loop? Running into the same issue.