powerful-telephone-71997
06/28/2021, 5:27 AMboundless-student-48844
06/29/2021, 10:21 AMTraceback (most recent call last):
File "/home/hadoop/.pyenv/versions/3.7.2/bin/datahub", line 8, in <module>
sys.exit(main())
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/datahub/entrypoints.py", line 93, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/datahub/entrypoints.py", line 81, in ingest
pipeline.run()
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
for wu in self.source.get_workunits():
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/datahub/ingestion/source/sql_common.py", line 239, in get_workunits
yield from self.loop_views(inspector, schema, sql_config)
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/datahub/ingestion/source/sql_common.py", line 319, in loop_views
view_definition = inspector.get_view_definition(view)
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 338, in get_view_definition
self.bind, view_name, schema, info_cache=self.info_cache
File "/home/hadoop/.pyenv/versions/3.7.2/lib/python3.7/site-packages/sqlalchemy/engine/interfaces.py", line 363, in get_view_definition
raise NotImplementedError()
NotImplementedError
brief-lizard-77958
06/29/2021, 12:15 PMboundless-student-48844
06/29/2021, 12:52 PMhive
plugin fails to ingest tables with table names starting with underscore (),_ such as crm_.__test
. Upon drilling down, it is because pyhive’s _get_table_columns()
doesn’t escape such table names with backtick (`), as can be seen here https://github.com/dropbox/PyHive/blob/master/pyhive/sqlalchemy_hive.py#L283
The DESCRIBE query for above case in Hive should be
describe `crm._test`;
instead of
describe crm._test;
future-waitress-970
06/29/2021, 1:55 PM{'failures': [{'e': JSONDecodeError('Expecting value: line 1 column 1 (char 0)',)},
is how the error is showing below itastonishing-yak-92682
07/01/2021, 4:30 AMcrooked-librarian-97951
07/01/2021, 1:51 PMfuture-waitress-970
07/01/2021, 6:58 PMFailed to establish a new connection: [Errno 111] Connection refused',))"})
And
datahub-gms exited with code 255
faint-wolf-61232
07/02/2021, 9:00 AMcool-iron-6335
07/02/2021, 9:22 AM[2021-07-02 16:17:51,963] INFO {datahub.entrypoints:75} - Using config: {'source': {'type': 'hive', 'config': {'host_port': 'localhost:10000', 'database': 'test'}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<http://localhost:8080>'}}}
[2021-07-02 16:17:52,210] ERROR {datahub.ingestion.run.pipeline:52} - failed to write record with workunit test.test.test1 with Expecting value: line 1 column 1 (char 0) and info {}
Source (hive) report:
{'failures': {},
'filtered': [],
'tables_scanned': 1,
'views_scanned': 0,
'warnings': {},
'workunit_ids': ['test.test.test1'],
'workunits_produced': 1}
Sink (datahub-rest) report:
{'failures': [{'e': JSONDecodeError('Expecting value: line 1 column 1 (char 0)')}], 'records_written': 0, 'warnings': []}
colossal-furniture-76714
07/02/2021, 2:02 PMsquare-activity-64562
07/06/2021, 9:45 AMsquare-activity-64562
07/06/2021, 10:09 AMpostgres
DB itself. Multiple hosts with same database name. And nobody knows them as postgres
but instead as something which is business specific. I would like to have them be business specific. Looking at transformations it might be possible https://datahubproject.io/docs/metadata-ingestion/#transformations. Am I missing some option here?white-beach-27328
07/06/2021, 6:43 PMssl.ca.location
extra fields not permitted (type=value_error.extra)
for the extra json configuration I’m trying to put together. I tried using a similar pattern with keys from the ingestion recipes I’m using to get Datasets, but I’m getting errors like
schema_registry_url
extra fields not permitted (type=value_error.extra)
Kind of at a loss since most of the links like the following aren’t working: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/configuration/kafka.py#L56
Any tips on how to do this configuration?better-orange-49102
07/07/2021, 9:19 AMcalm-sunset-28996
07/07/2021, 12:58 PMcrooked-leather-44416
07/07/2021, 3:06 PMcurl --location --request POST '<http://localhost:8080/datasets?action=ingest>' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"snapshot": {
"aspects": [
{
"com.linkedin.dataset.DatasetProperties": {
"customProperties": {
"ValidThroughDate": "2021-03-15T11:40:49Z"
}
}
}
],
"urn": "urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)"
}
}'
If I send this request, it will remove all existing custom properties, unless I stuff them in the same request.tall-monitor-59941
07/08/2021, 9:05 AMbetter-orange-49102
07/08/2021, 9:21 AMsquare-activity-64562
07/08/2021, 9:42 AMsimple_add_dataset_tags
do not show up in the search autocomplete in the UI. They are present in the system and show up in search results and UI. But not in autocomplete. Tags added via the UI show up in search autocomplete in the UI.witty-butcher-82399
07/08/2021, 10:42 AMIf you’d like to add more complex logic for assigning ownership, you can use the more generic `add_dataset_ownership` transformer, which calls a user-provided function to determine the ownership of each dataset.Is there any example of this? I’m not sure how to set such a function in the yaml. Also, does the function need to be registered somewhere? Thanks!
better-orange-49102
07/09/2021, 10:04 AMcool-iron-6335
07/12/2021, 9:46 AMrich-policeman-92383
07/12/2021, 12:58 PMrich-policeman-92383
07/13/2021, 3:26 PMFile "datahub_v_0_8_6/metadata-ingestion/dhubv086/lib64/python3.6/site-packages/pyhive/hive.py", line 479, in execute
_check_status(response)
File "datahub_v_0_8_6/metadata-ingestion/dhubv086/lib64/python3.6/site-packages/pyhive/hive.py", line 609, in _check_status
raise OperationalError(response)
OperationalError: (pyhive.exc.OperationalError) TExecuteStatementResp(status=TStatus(statusCode=3, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.ClassNotFoundException Class com.LeapSerde not found:17:16', 'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:400', 'org.apache.hive.service.cli.operation.SQLOperation:runQuery:SQLOperation.java:238', 'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:274', 'org.apache.hive.service.cli.operation.Operation:run:Operation.java:337', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:439', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:405', 'org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:257', 'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:503', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor:process:HadoopThriftAuthBridge.java:747', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*org.apache.hadoop.hive.metastore.api.MetaException:java.lang.ClassNotFoundException Class com.LeapSerde not found:28:12', 'org.apache.hadoop.hive.metastore.MetaStoreUtils:getDeserializer:MetaStoreUtils.java:406', 'org.apache.hadoop.hive.ql.metadata.Table:getDeserializerFromMetaStore:Table.java:274', 'org.apache.hadoop.hive.ql.metadata.Table:getDeserializer:Table.java:267', 'org.apache.hadoop.hive.ql.exec.DDLTask:describeTable:DDLTask.java:3184', 'org.apache.hadoop.hive.ql.exec.DDLTask:execute:DDLTask.java:380', 'org.apache.hadoop.hive.ql.exec.Task:executeTask:Task.java:214', 'org.apache.hadoop.hive.ql.exec.TaskRunner:runSequential:TaskRunner.java:99', 'org.apache.hadoop.hive.ql.Driver:launchTask:Driver.java:2054', 'org.apache.hadoop.hive.ql.Driver:execute:Driver.java:1750', 'org.apache.hadoop.hive.ql.Driver:runInternal:Driver.java:1503', 'org.apache.hadoop.hive.ql.Driver:run:Driver.java:1287', 'org.apache.hadoop.hive.ql.Driver:run:Driver.java:1282', 'org.apache.hive.service.cli.operation.SQLOperation:runQuery:SQLOperation.java:236'], sqlState='08S01', errorCode=1, errorMessage='Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.ClassNotFoundException Class com.LeapSerde not found'), operationHandle=None)
[SQL: DESCRIBE `default.leap_flume_prod_new`]
Hi Guys
Can you please help me with this error while ingesting hive metadata.
datahub version: v0.8.6faint-hair-91313
07/14/2021, 8:22 AMsalmon-cricket-21860
07/15/2021, 1:00 AMFile "/home/jovyan/conda-envs/catalog/lib/python3.8/site-packages/datahub/ingestion/source/sql_common.py", line 62, in make_sqlalchemy_uri
40 def make_sqlalchemy_uri(
41 scheme: str,
42 username: Optional[str],
43 password: Optional[str],
44 at: Optional[str],
45 db: Optional[str],
46 uri_opts: Optional[Dict[str, Any]] = None,
47 ) -> str:
(...)
58 if uri_opts is not None:
59 if db is None:
60 url += "/"
61 params = "&".join(
--> 62 f"{key}={quote_plus(value)}" for (key, value) in uri_opts.items() if value
63 )
AttributeError: 'DruidConfig' object has no attribute 'items'
salmon-cricket-21860
07/15/2021, 2:50 PMsquare-activity-64562
07/15/2021, 9:59 PMSink (datahub-rest) report:
{'failures': [], 'records_written': 1, 'warnings': []}
But when I go to the UI there is no dataset. Is there supposed to be some delay? Should I check the errors of some service. If yes, which one?salmon-cricket-21860
07/16/2021, 1:00 AM