Hello, tried to ingest metadata from hive, but got...
# ingestion
g
Hello, tried to ingest metadata from hive, but got below error for a table. It complaints a SerDe not found
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.ClassNotFoundException Class com.mongodb.hadoop.hive.BSONSerDe not found
log:
Copy code
[2021-08-13 02:50:02,220] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit source.parquet_source
[2021-08-13 02:50:02,437] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit source.parquet_sourceinfo
[2021-08-13 02:50:02,805] ERROR    {datahub.entrypoints:111} - File "/home/ec2-user/.local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
    1186  def _execute_context(
    1187      self, dialect, constructor, statement, parameters, *args
    1188  ):
 (...)
    1273                          evt_handled = True
    1274                          break
    1275              if not evt_handled:
    1276                  self.dialect.do_execute(
--> 1277                      cursor, statement, parameters, context
    1278                  )
    ..................................................
     self = <sqlalchemy.engine.base.Connection object at 0x7ff130715690>
     dialect = <pyhive.sqlalchemy_hive.HiveDialect object at 0x7ff1375cd810>
     constructor = <method 'DefaultExecutionContext._init_statement' of <class 'pyhive.sqlalchemy_hive.HiveExecutionContext'> default.py:99
                    9>
     statement = 'DESCRIBE `stage.3rd_video_feature_dump`'
     parameters = {}
     args = ('DESCRIBE `stage.3rd_video_feature_dump`', [], )
     evt_handled = False
     self.dialect.do_execute = <method 'DefaultDialect.do_execute' of <pyhive.sqlalchemy_hive.HiveDialect object at 0x7ff1375cd810> default.py:607>
     cursor = <pyhive.hive.Cursor object at 0x7ff138d40810>
     context = <pyhive.sqlalchemy_hive.HiveExecutionContext object at 0x7ff14b7ecb50>
    ..................................................

OperationalError: (pyhive.exc.OperationalError) TExecuteStatementResp(status=TStatus(statusCode=3, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.ClassNotFoundException Class com.mongodb.hadoop.hive.BSONSerDe not found:28:27'
I am wondering if I need add these two jars somewhere mongo-hadoop-core-2.0.4.jar mongo-java-driver-3.12.5.jar
m
@gray-autumn-29372: do any tables get ingested?
Hi @gray-autumn-29372: just checking in on this.
g
Yes, most of tables got ingested. Few of them failed due to above error. It looks like those tables have different data sources with custom connector and it will then loads to other db.
m
got it... from the error, it seems like an issue on the Hive server side
can you check if you are able to actually run Hive queries against those tables @gray-autumn-29372?
g
m
@gray-autumn-29372: are you turning on profiling?
g
no. I managed to load other tables by ignoring these with custom connector though.
m
did the connector croak on these tables and not make progress?
g
seems like a pyhive issue, I can connect to the hive with pyhive, but got the same “mongo class not found” error, when run DESC command. Digging more from log ….