<#C021R3F2HUJ|> The Iceberg destination has been ...
# give-feedback
r
#C021R3F2HUJ The Iceberg destination has been broken for quite some time. I believe some priority is needed. I see many people asking about this issue, and some have even submitted fixes, but it appears the pull requests are still pending. We are seriously considering using this as our default ETL tool, but if the primary connector isn't functioning, I'm not sure it's worth investing more time.
🙏 1
u
Hello @Rahul I’m trying to catch up with the open contributions in Github. I saw there are a few to improve the Iceberg connector. I have some plans to deep dive into them in the next 2 weeks.
r
@[DEPRECATED] Marcos Marx thank you for your reply.
p
FWIW, I just went through the process of compiling the destination-iceberg container from this PR https://github.com/airbytehq/airbyte/pull/38283, which just updates the iceberg dependencies, and testing it locally and it worked great for me. did a Faker -> Iceberg on MinIO using latest Nessie in REST Catalog mode.
u
Thanks lot me know Pablo! I think the only problem today is tests are blowing up and timing out.
r
A big thank you to @[DEPRECATED] Marcos Marx @Pablo Sole and @Eduard Tudenhoefner for providing this fix. I checked with this pull request, and it works for me as well. #C021R3F2HUJ
QQ: Is this pull request complete? Can we update Airbyte with the
main
or
master
branch?
u
@Rahul yes, you can update the connector to version 0.1.7
p
excellent, thank you all
r
Ok. thank you @[DEPRECATED] Marcos Marx. we updated and it works fine.
@[DEPRECATED] Marcos Marx Quick question: While the sync is working fine, Airbyte is putting all the data under a single column called "airbyte data." Is this expected behavior?
image.png
Are there any settings to insert data as-is as source column ?
u
Not today Rahul. For that to happen the connector must implement the new loading strategy of typing-and-deduping./
p
@Rahul I found this nice repo/blogpost showing an example dbt project to parse the raw tables from airbyte into a typed staging: https://github.com/Teradata/airbyte-dbt-jaffle
👍 2
d
@Rahul I had simiilar issue/question regarding the __airbyte__data column , and discovered that a CTE works pretty well to parse the json in DuckDB also
Copy code
WITH extracted AS (
     SELECT json_extract(_airbyte_data, ['delivery_date', 'batch_id']) AS extracted_list
     FROM iceberg_scan('<s3://airbyte/warehouse/airbyte_raw_local_campaign_dashboard_daily_t/metadata/00000-d8921586-306d-47f5-bde7-88833d97f55a.metadata.json>')
 )
 SELECT
     extracted_list[1] AS delivery_date,
     extracted_list[2] AS batch_id
 FROM extracted limit 10;
@Pablo Sole do you have any good resource for config steps for Nessie catalog (REST) with airbyte. (screenshot of your destination settings?) I am fairly new to airbyte and configuring connectors……did not have any luck trying to configure the REST version of Nessie on residing on Dremio. Maybe there is a better way? I would like to use Nessie as my main iceberg catalog for tables I have synced with Airbyte. and not have a mix of JDBC and REST catalogs.
r
Thanks @Dave Trotter for sharing this. I learned something valuable from your post. However, our solution has some deep complexities that require us to keep the data as-is in the store, allowing others to read it directly without extracting it in a JSON format.