A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

we are implementing a S3 lakehouse on athena/spark with iceberg tables; I would like to be able to ingest the iceberg tables

Hi <@U04RY3GTVMJ> the current codebase only supports Azure datalake.  This limitation was originally imposed by the legacy Python Iceberg implementation.  I have been holding on a new code update that leverages the new pyiceberg library, which will support AWS and ADLS as well as different Catalog implementations.  The current code base only support HadoopCatalog.

I am waiting on release 0.4.0 of pyiceberg to submit my new PR to DataHub.  I did not test my ingestor with any AWS based infra since I do not have such access, but pyiceberg was developed for AWS first, so it should work.

we are in the design phase of the project, so just created some test iceberg tables ; I am excited to be get the S3 iceberg ingestion capability