I am trying to ingest presto-on-hive, but got foll...
# ingestion
c
I am trying to ingest presto-on-hive, but got following error, could anyone help to check it. Thx
Copy code
datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure source (presto-on-hive)
[2022-09-26 05:58:34,312] ERROR    {datahub.entrypoints:195} - Command failed: 
	Failed to configure source (presto-on-hive) due to 
		'TSocket read 0 bytes'.
	Run with --debug to get full stacktrace.
My datahub version is v0.8.44. the recipe.yml is like
Copy code
source:
  type: presto-on-hive
  config:
    host_port: <http://xxx.presto.com:9106|xxx.presto.com:9106>
    database: db
    scheme: 'hive'
    schema_pattern:
      allow:
        - "default"
    username: hive
    options:
      connect_args:
        http_path: "/hive2"
        auth: NOSASL


sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"
d
In Presto on Hive you have to connect to the Metastore db. I think here you tried to connect to something else.
c
what are you going to ingest, presto or hive? the source type presto-on-hive doesn't exist . it should be hive if your target is hive.
c
Interesting. presto-on-hive is provided by the official doc. Actually, I want to ingest hive metadata, while using presto query engine ability to do data profiling
c
my mistake. 😂
c
So, how can I achieve this?
Copy code
Actually, I want to ingest hive metadata, while using presto query engine ability to do data profiling
c
use source type hive instead of presto-on-hive, I think.
d
Unfortunately you can currently only do profiling with the hive source. Presto on hive source connects directly to the metastore database and that’s why it is much faster to ingest metadata but it can’t access the table data therefore it can’t do profiling.