```File "datahub_v_0_8_6/metadata-ingestion/dhubv0...
# ingestion
r
Copy code
File "datahub_v_0_8_6/metadata-ingestion/dhubv086/lib64/python3.6/site-packages/pyhive/hive.py", line 479, in execute
    _check_status(response)
File "datahub_v_0_8_6/metadata-ingestion/dhubv086/lib64/python3.6/site-packages/pyhive/hive.py", line 609, in _check_status
    raise OperationalError(response)
OperationalError: (pyhive.exc.OperationalError) TExecuteStatementResp(status=TStatus(statusCode=3, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.ClassNotFoundException Class com.LeapSerde not found:17:16', 'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:400', 'org.apache.hive.service.cli.operation.SQLOperation:runQuery:SQLOperation.java:238', 'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:274', 'org.apache.hive.service.cli.operation.Operation:run:Operation.java:337', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:439', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:405', 'org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:257', 'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:503', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor:process:HadoopThriftAuthBridge.java:747', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*org.apache.hadoop.hive.metastore.api.MetaException:java.lang.ClassNotFoundException Class com.LeapSerde not found:28:12', 'org.apache.hadoop.hive.metastore.MetaStoreUtils:getDeserializer:MetaStoreUtils.java:406', 'org.apache.hadoop.hive.ql.metadata.Table:getDeserializerFromMetaStore:Table.java:274', 'org.apache.hadoop.hive.ql.metadata.Table:getDeserializer:Table.java:267', 'org.apache.hadoop.hive.ql.exec.DDLTask:describeTable:DDLTask.java:3184', 'org.apache.hadoop.hive.ql.exec.DDLTask:execute:DDLTask.java:380', 'org.apache.hadoop.hive.ql.exec.Task:executeTask:Task.java:214', 'org.apache.hadoop.hive.ql.exec.TaskRunner:runSequential:TaskRunner.java:99', 'org.apache.hadoop.hive.ql.Driver:launchTask:Driver.java:2054', 'org.apache.hadoop.hive.ql.Driver:execute:Driver.java:1750', 'org.apache.hadoop.hive.ql.Driver:runInternal:Driver.java:1503', 'org.apache.hadoop.hive.ql.Driver:run:Driver.java:1287', 'org.apache.hadoop.hive.ql.Driver:run:Driver.java:1282', 'org.apache.hive.service.cli.operation.SQLOperation:runQuery:SQLOperation.java:236'], sqlState='08S01', errorCode=1, errorMessage='Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.ClassNotFoundException Class com.LeapSerde not found'), operationHandle=None)
[SQL: DESCRIBE `default.leap_flume_prod_new`]
Hi Guys Can you please help me with this error while ingesting hive metadata. datahub version: v0.8.6
g
Could you try opening up a hive shell and executing `DESCRIBE `default.leap_flume_prod_new`` ? This appears to be a failure on the hive side or possibly in the driver
r
Yes got the same error on hive cli as well.
is there a way to exclude this table while ingesting hive metadata
g
Yep! Use the
table_pattern
option
Copy code
table_pattern:
      deny:
        - "default.leap_flume_prod_new"
(see the docs for details https://datahubproject.io/docs/metadata-ingestion - the deny pattern is actually a regex)
r
@gray-shoe-75895 The ignore pattern option works fine. Now i am stuck with a weird problem where the ingestion job suddenly fails with error user XXX does have privilege to describe formatted schema.table. Using hive the same user is able to execute the command that datahub fails to execute.
Also is there a way to speed up the metadata ingestion process in case of hive. Every time the hive ingestion job fails it starts afresh and this job has to scan more than 14K tables.
g
Can you try running with
datahub --debug ingest ...
?
Unfortunately there’s no way to speed it up - with hive, we’re forced to call
describe formatted
for each table, which is where much of the time goes
I’d recommend limiting the amount of data you ingest while debugging/testing using the
schema_pattern
and
table_pattern
options