Hi Everyone, I am trying to test the metadata inge...
# ingestion
c
Hi Everyone, I am trying to test the metadata ingestion for hive metastore. We have a standalone hive metastore service running with version 3.1.2. Below is the receipe file I used.
Copy code
source:
    type: hive
    config:
        scheme: hive+http
        host_port: 'hive-metastore.hive.svc.cluster.local:9083'
        database: null
        username: null
        password: null
sink:
    type: datahub-rest
    config:
        server: '<http://datahub-datahub-gms.datahub.svc.cluster.local:8080>'
When I ran it from the datahub frontend I got the below error. (pasted partial logs)
Copy code
......
    version, status, reason = self._read_status()\n'
           'File "/usr/local/lib/python3.9/http/client.py", line 289, in _read_status\n'
           '    raise RemoteDisconnected("Remote end closed connection without"\n'
           '\n'
           'RemoteDisconnected: Remote end closed connection without response\n',
           "2022-02-25 06:21:18.926125 [exec_id=a071f153-5777-419f-9511-37214e1429b6] INFO: Failed to execute 'datahub ingest'",
           '2022-02-25 06:21:18.926532 [exec_id=a071f153-5777-419f-9511-37214e1429b6] INFO: Caught exception EXECUTING '
           'task_id=a071f153-5777-419f-9511-37214e1429b6, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
           '    self.event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n'
           '    return f.result()\n'
           '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
.......
In the metastore logs I found this. am I miss anything ? what could be the reason ?
Copy code
2022-02-25T06:19:46,599 ERROR [pool-6-thread-200] server.TThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:228) ~[libthrift-0.9.3.jar:0.9.3]
        at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:76) ~[hive-standalone-metastore-3.1.2.jar:3.1.2]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [libthrift-0.9.3.jar:0.9.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
l
Hi @curved-carpenter-44858! Gentle reminder to please post large blocks of code/log output in threads; it’s a big help for us to keep track of which questions are sill unanswered across all of our support channels teamwork
c
@little-megabyte-1074 Sure.
I found the reason for the error. After going through the past slack messages and some articles, realized that I need hiveserver not just hive metastore. I will try it first with a spark thrift server (hive on spark).
👍 1