Hi all, just a quick message to let you know that ...
# integrate-iceberg-datahub
m
Hi all, just a quick message to let you know that I'm working on a PR that will modify the current Iceberg source to use the new 0.4.0 pyiceberg library. With this, the Iceberg source will now be able to ingest tables from any Iceberg catalog currently supported by pyiceberg (REST, Hive, Glue and DynamoDB... I also have a JDBCCatalog draft implementation that I am working on with Fokko Driesprong and this should be available in pyiceberg end of summer). It will also introduce support for S3 🎉 and make https://github.com/acryldata/py-iceberg obsolete. Here is the PR: https://github.com/datahub-project/datahub/pull/8357
🧊 1
🤗 3
d
Awesome, thanks! Let me know if you need any help from us!
m
Thanks @dazzling-judge-80093 . I have opened the PR for final review. The
gh_pages
keeps failing, but I don't understand why. If you can explain why, I can fix it.
d
sure, I will check that, don’t worry about it
thanks! 🙇
@modern-monitor-81461 if you update your branch from master the
gh_pages
issues should be fixed
m
@dazzling-judge-80093 pyiceberg does not support Python < 3.8, so all the 3.7 builds are failing. I tried to disable the dependencies and the tests, but maybe I did something wrong or missed something because the Github checks are still failing. Check this check for example: https://github.com/datahub-project/datahub/actions/runs/5519992480/jobs/10066022396. What am I missing to avoid testing Iceberg in 3.7?
d
Let me check
@modern-monitor-81461 I think one issue is for sure that the pyiceberg library is only a dependency in 3.10, but you import packages at iceberg_test.py. I think you should conditionally import pyiceberg to make sure it won’t fail on the import and hopefully after that as it will skip the tests it won’t fail on anything.
m
@dazzling-judge-80093 Friday is my last day before some vacations, so I could use some help to make that build pass. It's only a matter of configuring the code to avoid testing and linting the Iceberg source when using Python 3.7. But somehow I'm struggling to make those changes...
d
I’m checking
@modern-monitor-81461 can you try this? -> https://github.com/datahub-project/datahub/commit/5bebd22bf3231b9189c974a802d2db05bd1d9efe I think this should fix the build
d
@modern-monitor-81461 @dazzling-judge-80093 thanks for the info are we expecting to be added to the master branch anytime soon.
m
@dazzling-london-20492 Just got back from vacation, I will get to this sometimes this week.
@dazzling-judge-80093 sorry for not actioning this earlier, I was away. I applied the changes you proposed and it did the trick. My build is almost passing now, except for a test unrelated to Iceberg.
d
awesome