faint-hair-91313
04/09/2021, 11:55 AMmammoth-bear-12532
faint-hair-91313
04/09/2021, 2:06 PMfaint-hair-91313
04/09/2021, 2:07 PMmammoth-bear-12532
faint-hair-91313
04/09/2021, 2:41 PMmammoth-bear-12532
faint-hair-91313
04/09/2021, 2:42 PMfaint-hair-91313
04/09/2021, 2:43 PMdatahub ingest -c example_to_datahub_console_hive.yml
[2021-04-09 14:43:22,969] INFO {datahub.entrypoints:66} - Using config: {'source': {'type': 'hive', 'config': {'username': 'admin', 'password': 'pass', 'host_port': 'hdin
<http://sightSbxHive.azurehdinsight.net:443|sightSbxHive.azurehdinsight.net:443>', 'database': 'default', 'options': {'connect_args': {'auth': 'CUSTOM'}}}}, 'sink': {'type': 'console'}}
Traceback (most recent call last):
File "/home/linadmin/.local/lib/python3.6/site-packages/thrift/transport/TSocket.py", line 126, in read
buff = self.handle.recv(sz)
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/datahub", line 8, in <module>
sys.exit(datahub())
File "/home/linadmin/.local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/linadmin/.local/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/linadmin/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/linadmin/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/linadmin/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/linadmin/.local/lib/python3.6/site-packages/datahub/entrypoints.py", line 72, in ingest
pipeline.run()
File "/home/linadmin/.local/lib/python3.6/site-packages/datahub/ingestion/run/pipeline.py", line 80, in run
for wu in self.source.get_workunits():
File "/home/linadmin/.local/lib/python3.6/site-packages/datahub/ingestion/source/sql_common.py", line 198, in get_workunits
inspector = reflection.Inspector.from_engine(engine)
File "<string>", line 2, in from_engine
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/util/deprecations.py", line 390, in warned
return fn(*args, **kwargs)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/reflection.py", line 171, in from_engine
return cls._construct(cls._init_legacy, bind)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/reflection.py", line 117, in _construct
init(self, bind)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/reflection.py", line 124, in _init_legacy
self._init_engine(bind)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/reflection.py", line 128, in _init_engine
engine.connect().close()
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3095, in connect
return self._connection_cls(self, close_with_result=close_with_result)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 91, in __init__
else engine.raw_connection()
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3174, in raw_connection
return self._wrap_pool_connect(self.pool.connect, _connection)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3141, in _wrap_pool_connect
return fn()
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 301, in connect
return _ConnectionFairy._checkout(self)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 755, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 419, in checkout
rec = pool._do_get()
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 145, in _do_get
self._dec_overflow()
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
with_traceback=exc_tb,
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 142, in _do_get
return self._create_connection()
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 247, in _create_connection
return _ConnectionRecord(self)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 362, in __init__
self.__connect(first_connect_check=True)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 605, in __connect
pool.logger.debug("Error on connect(): %s", e)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
with_traceback=exc_tb,
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 599, in __connect
connection = pool._invoke_creator(self)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/create.py", line 578, in connect
return dialect.connect(*cargs, **cparams)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 558, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/home/linadmin/.local/lib/python3.6/site-packages/pyhive/hive.py", line 94, in connect
return Connection(*args, **kwargs)
File "/home/linadmin/.local/lib/python3.6/site-packages/pyhive/hive.py", line 192, in __init__
self._transport.open()
File "/home/linadmin/.local/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 93, in open
status, payload = self._recv_sasl_message()
File "/home/linadmin/.local/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 112, in _recv_sasl_message
header = self._trans_read_all(5)
File "/home/linadmin/.local/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 198, in _trans_read_all
return read_all(sz)
File "/home/linadmin/.local/lib/python3.6/site-packages/thrift/transport/TTransport.py", line 62, in readAll
chunk = self.read(sz - have)
File "/home/linadmin/.local/lib/python3.6/site-packages/thrift/transport/TSocket.py", line 140, in read
raise TTransportException(message="unexpected exception", inner=e)
thrift.transport.TTransport.TTransportException: unexpected exception
faint-hair-91313
04/09/2021, 2:46 PMmammoth-bear-12532
mammoth-bear-12532
faint-hair-91313
04/09/2021, 2:52 PMfaint-hair-91313
04/09/2021, 2:52 PMfaint-hair-91313
04/09/2021, 2:58 PMfaint-hair-91313
04/09/2021, 2:59 PMfaint-hair-91313
04/09/2021, 2:59 PMfaint-hair-91313
04/09/2021, 2:59 PMmammoth-bear-12532
faint-hair-91313
04/09/2021, 3:01 PMmammoth-bear-12532
big-carpet-38439
04/09/2021, 5:45 PMincalculable-ocean-74010
04/09/2021, 6:37 PMincalculable-ocean-74010
04/09/2021, 6:39 PMgray-shoe-75895
05/04/2021, 5:48 AMgray-shoe-75895
05/04/2021, 5:50 AMpip uninstall pyhive
before running pip install --upgrade 'acryl-datahub[hive]'
faint-hair-91313
05/06/2021, 2:17 PMfaint-hair-91313
05/06/2021, 2:18 PMfaint-hair-91313
05/06/2021, 3:01 PMsource:
type: hive
config:
scheme: 'hive+https'
username: token
password: dapi8dfbd3073717dcc751e903883d319c47
host_port: <http://adb-3571544599855006.6.azuredatabricks.net:443|adb-3571544599855006.6.azuredatabricks.net:443>
database: default
options:
connect_args: LDAP
sink:
type: console
and got this error
$ datahub ingest -c example_to_datahub_console_hive.yml
[2021-05-06 15:00:32,708] INFO {datahub.entrypoints:68} - Using config: {'source': {'type': 'hive', 'config': {'scheme': 'hive+https', 'username': 'token', 'password': 'dapi8dfb
d3073717dcc751e903883d319c47', 'host_port': '<http://adb-3571544599855006.6.azuredatabricks.net:443|adb-3571544599855006.6.azuredatabricks.net:443>', 'database': 'default', 'options': {'connect_args': 'LDAP'}}}, 'sink': {'type': 'console
'}}
Traceback (most recent call last):
File "/usr/local/bin/datahub", line 8, in <module>
sys.exit(datahub())
File "/home/linadmin/.local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/linadmin/.local/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/linadmin/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/linadmin/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/linadmin/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/linadmin/.local/lib/python3.6/site-packages/datahub/entrypoints.py", line 74, in ingest
pipeline.run()
File "/home/linadmin/.local/lib/python3.6/site-packages/datahub/ingestion/run/pipeline.py", line 108, in run
for wu in self.source.get_workunits():
File "/home/linadmin/.local/lib/python3.6/site-packages/datahub/ingestion/source/sql_common.py", line 206, in get_workunits
engine = create_engine(url, **sql_config.options)
File "<string>", line 2, in create_engine
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/util/deprecations.py", line 298, in warned
return fn(*args, **kwargs)
File "/home/linadmin/.local/lib/python3.6/site-packages/sqlalchemy/engine/create.py", line 565, in create_engine
cparams.update(pop_kwarg("connect_args", {}))
ValueError: dictionary update sequence element #0 has length 1; 2 is required
mammoth-bear-12532
gray-shoe-75895
05/06/2021, 4:17 PMauth: LDAP
as a key nested underneath connect_argsmammoth-bear-12532
faint-hair-91313
05/10/2021, 10:03 AMsource:
type: hive
config:
scheme: 'hive+https'
username: admin
password:
host_port: <http://hdinsight-dataeng-muac.azurehdinsight.net:443|hdinsight-dataeng-muac.azurehdinsight.net:443>
database: default
options:
connect_args:
auth: BASIC
sink:
type: console
I got this:
$ datahub ingest -c example_to_datahub_console_hive_hd.yml
[2021-05-10 10:03:25,936] INFO {datahub.entrypoints:68} - Using config: {'source': {'type': 'hive', 'config': {'scheme': 'hive+https', 'username': 'admin', 'password': '', 'host_port': '<http://hdinsight-dataeng-muac.azurehdinsight.net:443|hdinsight-dataeng-muac.azurehdinsight.net:443>', 'database': 'default', 'options': {'connect_args': {'auth': 'BASIC'}}}}, 'sink': {'type': 'console'}}
Aborted!
faint-hair-91313
05/10/2021, 10:04 AMpip freeze
acryl-datahub==0.3.0
acryl-PyHive==0.6.6
avro-gen3==0.5.0
avro-python3==1.10.2
certifi==2020.12.5
chardet==4.0.0
click==7.1.2
dataclasses==0.8
docker==5.0.0
entrypoints==0.3
expandvars==0.7.0
future==0.18.2
greenlet==1.0.0
idna==2.10
importlib-metadata==3.10.0
mypy-extensions==0.4.3
py4j==0.10.9
pydantic==1.8.1
pyhocon==0.3.57
pyparsing==2.4.7
pyspark==3.1.1
python-dateutil==2.8.1
python-tds==1.10.0
pytz==2021.1
PyYAML==5.4.1
requests==2.25.1
sasl==0.2.1
six==1.15.0
SQLAlchemy==1.4.6
sqlalchemy-pytds==0.3.1
thrift==0.13.0
thrift-sasl==0.4.2
toml==0.10.2
typing-extensions==3.7.4.3
typing-inspect==0.6.0
tzlocal==2.1
urllib3==1.26.4
websocket-client==0.58.0
zipp==3.4.1
faint-hair-91313
05/10/2021, 10:04 AMmammoth-bear-12532
gray-shoe-75895
05/10/2021, 8:19 PMgray-shoe-75895
05/11/2021, 10:45 PMdatahub --debug ingest …
?faint-hair-91313
05/12/2021, 10:18 AM$ pip freeze | grep acryl
acryl-datahub==0.3.1
acryl-PyHive==0.6.6
and the debug output
datahub ingest -c example_to_datahub_console_hive_hd.yml
[2021-05-12 10:17:16,578] INFO {datahub.entrypoints:68} - Using config: {'source': {'type': 'hive', 'config': {'scheme': 'hive+https', 'username': 'admin', 'password': '', 'host_port': '<http://hdinsight-dataeng-muac.azurehdinsight.net:443|hdinsight-dataeng-muac.azurehdinsight.net:443>', 'database': 'default', 'options': {'connect_args': {'auth': 'BASIC'}}}}, 'sink': {'type': 'console'}}
Aborted!
[linadmin@vmsbxdocker ~]$ datahub --debug ingest -c example_to_datahub_console_hive_hd.yml
[2021-05-12 10:17:32,918] INFO {datahub.entrypoints:68} - Using config: {'source': {'type': 'hive', 'config': {'scheme': 'hive+https', 'username': 'admin', 'password': '', 'host_port': '<http://hdinsight-dataeng-muac.azurehdinsight.net:443|hdinsight-dataeng-muac.azurehdinsight.net:443>', 'database': 'default', 'options': {'connect_args': {'auth': 'BASIC'}}}}, 'sink': {'type': 'console'}}
[2021-05-12 10:17:32,918] DEBUG {datahub.ingestion.run.pipeline:74} - Source type:hive,<class 'datahub.ingestion.source.hive.HiveSource'> configured
[2021-05-12 10:17:32,919] DEBUG {datahub.ingestion.run.pipeline:80} - Sink type:console,<class 'datahub.ingestion.sink.console.ConsoleSink'> configured
[2021-05-12 10:17:32,919] DEBUG {datahub.ingestion.source.sql_common:205} - sql_alchemy_url=hive+<https://admin>:@hdinsight-dataeng-muac.azurehdinsight.net:443/default
Aborted!
I've removed the password ...faint-hair-91313
05/12/2021, 10:59 AMfaint-hair-91313
05/12/2021, 11:03 AMbig-carpet-38439
05/12/2021, 2:56 PMfaint-hair-91313
05/12/2021, 2:57 PMbig-carpet-38439
05/12/2021, 3:00 PMgray-shoe-75895
05/12/2021, 5:52 PMgray-shoe-75895
05/12/2021, 5:54 PMhttp_path
option (https://datahubproject.io/docs/metadata-ingestion/#hive-hive). Can you try adding that?faint-hair-91313
05/12/2021, 9:03 PM[2021-05-12 21:02:11,333] INFO {datahub.ingestion.run.pipeline:44} - sink wrote workunit default.partitioned_full_efds_rtepts
Source report:
{'failures': {},
'filtered': [],
'tables_scanned': 6,
'warnings': {'default.partitioned_full_efds_asplist': ['unable to map type HiveTimestamp() to metadata schema',
'unable to map type HiveTimestamp() to metadata schema'],
'default.partitioned_full_efds_main': ['unable to map type HiveDate() to metadata schema',
'unable to map type HiveDate() to metadata schema',
'unable to map type HiveTimestamp() to metadata schema',
'unable to map type HiveDate() to metadata schema',
'unable to map type HiveDate() to metadata schema',
'unable to map type HiveDate() to metadata schema',
'unable to map type HiveDate() to metadata schema',
'unable to map type HiveDate() to metadata schema'],
'default.partitioned_full_efds_rtepts': ['unable to map type HiveTimestamp() to metadata schema']},
'workunit_ids': ['default.hivesampletable',
'default.partitioned_full_efds_afregullist',
'default.partitioned_full_efds_asplist',
'default.partitioned_full_efds_geo',
'default.partitioned_full_efds_main',
'default.partitioned_full_efds_rtepts'],
'workunits_produced': 6}
Sink report:
{'failures': [], 'records_written': 6, 'warnings': []}
Didn't load anything eventually ...faint-hair-91313
05/12/2021, 9:22 PMlittle-smartphone-52405
08/19/2021, 3:21 PMlittle-smartphone-52405
08/19/2021, 3:23 PMmammoth-bear-12532
mammoth-bear-12532
little-smartphone-52405
08/19/2021, 4:46 PMscheme: 'databricks+pyhive'
little-smartphone-52405
08/19/2021, 4:47 PMlittle-smartphone-52405
08/19/2021, 4:47 PMmammoth-bear-12532
faint-hair-91313
08/23/2021, 12:24 PMlittle-smartphone-52405
08/23/2021, 4:45 PMsource:
type: hive
config:
host_port: <databricks workspace URL>:443
username: token
password: <api token>
scheme: 'databricks+pyhive'
options:
connect_args:
http_path: 'sql/protocolv1/o/xxxyyyzzzaaasa/1234-567890-hello123'
sink:
type: "datahub-rest"
config:
server: "http://<datahubip>:8080"