Hi Everyone! I am trying to establish connection ...
# integrate-iceberg-datahub
f
Hi Everyone! I am trying to establish connection to iceberg using SQL type of catalog. I tried lots of combinations of attributes in yaml file for iceberg, not working. Error: “fail to get catalog …: ‘SQL’”. What I’ve searched through slack - is that DataHub last version uses v0.4.0 pyicberg library, is it true? Then does it mean that “sql” type of iceberg catalog is not supported? @lively-appointment-50242 Hi! I can see lots of messages from you related to data hub to iceberg connection. Have you achieved any success in it?
@victorious-car-1170 Hi! I saw that you also tried to connect to iceberg with sql catalog. Have you succeeded in it?
l
yes, I have achieved success but with the Hive meta catalogue. The status of the ingestion is
failed
, however data was ingested. We use min.io as an S3 source, here is an example of my settings:
Copy code
source:
    type: iceberg
    config:
        env: PROD
        catalog:
            name: my_iceberg_catalog
            type: hive
            config:
                uri: 'thrift://...'
                s3.access-key-id: ...
                s3.secret-access-key: ...
                s3.endpoint: 'https://...'
                TrustServerCertificate: Yes
                ssl: 'True'
        platform_instance: my_iceberg_catalog
        profiling:
            enabled: true
👍 1
f
Sounds promising! We are also using Minio as object storage. As we are in POC then we will redeploy iceberg using Hive catalog and test one more time. Thank you for your reply!
@lively-appointment-50242 Hello! Reinstalled iceberg with hive meta catalog. Facing an error during connection from data hub right now: “Apache hive support not installed: pip install pyiceberg[hive]”. I installed it on VM itself, installed into DataHub-actions container. Restarted that container. But facing the same error. Have you faced it, would you be able please to share some suggestions?
l
I faced it and was not able to solve it locally (with a docker quickstart option). I postponed work on Hive connection using DataHub.
f
But how than you was able to successfully ingest metadata from iceberg into DataHub?
l
there is connection to Iceberg via Hive Catalog
and it worked for me
we deployed it on Dev env and DevOps somehow fixed issue during the ingestion. Unfortunately I don't have much info about steps that solve the issue on dev env(maybe it was just installing packages like yuo do locally), but deploy done using Kubernetes, not from docker quickstart and so solution there worked.