Hello, I want to ingest data from hive to datahub,...
# ingestion
c
Hello, I want to ingest data from hive to datahub, but I permanently get the error code:
http.client.BadStatusLine: Invalid status 80
I'm not sure where to look for the error as I do not know what causes the error. Do you have any hints? I've configured a yml for acryl-datahub...
I can connect through pyhive. So it's rather my config on the datahub side
g
Can you provide some more details about the error that you’re seeing - full logs (run with
datahub --debug
) + the recipe that you’re using + the pyhive connection code that works for you
c
@gray-shoe-75895 thanks for your reply. I've now got a working connection. I had to change from "hive+http" to only "hive". But I'm now running into a different error. The code does not get the table names correct, instead I see the schema(database) twice
Copy code
sudo datahub --debug ingest -c hive_to_datahub.yml
[2021-05-26 12:47:15,684] INFO     {datahub.entrypoints:68} - Using config: {'source': {'type': 'hive', 'config': {'scheme': 'hive', 'username': 'username', 'host_port': 'hostname' 'database': 'db1', 'table_pattern': {'deny': ['table1, 'table2'], 'allow': ['table3']}}}, 'sink': {'type': 'console'}}
[2021-05-26 12:47:15,685] DEBUG    {datahub.ingestion.run.pipeline:74} - Source type:hive,<class 'datahub.ingestion.source.hive.HiveSource'> configured
[2021-05-26 12:47:15,685] DEBUG    {datahub.ingestion.run.pipeline:80} - Sink type:console,<class 'datahub.ingestion.sink.console.ConsoleSink'> configured
[2021-05-26 12:47:15,685] DEBUG    {datahub.ingestion.source.sql_common:206} - sql_alchemy_url=<hive://username@hostname1/db1>
[2021-05-26 12:47:15,731] INFO     {pyhive.hive:473} - USE `db1`
[2021-05-26 12:47:15,744] INFO     {pyhive.hive:473} - SHOW SCHEMAS
[2021-05-26 12:47:15,851] INFO     {pyhive.hive:473} - SHOW TABLES IN db2
db2 db2
I think this is related to the following bug
I've succeded