Airbyte is an open-source data integration engine that helps you consolidate your data in your data warehouses, lakes and databases.

Airbyte

<#C021R3F2HUJ|>  The Iceberg destination has been broken for quite some time. I believe some priority is needed. I see many people asking about this issue, and some have even submitted fixes, but it appears the pull requests are still pending. We are seriously considering using this as our default ETL tool, but if the primary connector isn't functioning, I'm not sure it's worth investing more time.

Hello <@U06PWELLR5G> I’m trying to catch up with the open contributions in Github. I saw there are a few to improve the Iceberg connector. I have some plans to deep dive into them in the next 2 weeks.

FWIW, I just went through the process of compiling the destination-iceberg container from this PR <https://github.com/airbytehq/airbyte/pull/38283>, which just updates the iceberg dependencies, and testing it locally and it worked great for me. did a Faker -&gt; Iceberg on MinIO using latest Nessie in REST Catalog mode.

Thanks lot me know Pablo! I think the only problem today is tests are blowing up and timing out.

A big thank you to <@U01MMSDJGC9> <@U078F164J8H> and <@U04QM3YSQ6S> for providing this fix. I checked with this pull request, and it works for me as well.  <#C021R3F2HUJ|>

QQ: Is this pull request complete? Can we update Airbyte with the `main` or `master` branch?

<@U06PWELLR5G> yes, you can update the connector to version 0.1.7

Ok. thank you <@U01MMSDJGC9>. we updated and it works fine.

<@U01MMSDJGC9> Quick question: While the sync is working fine, Airbyte is putting all the data under a single column called "airbyte data." Is this expected behavior?

image.png

Are there any settings to insert data as-is as source column ?

Not today Rahul. For that to happen the connector must implement the new loading strategy of typing-and-deduping./

<@U06PWELLR5G> I found this nice repo/blogpost showing an example dbt project to parse the raw tables from airbyte into a typed staging: <https://github.com/Teradata/airbyte-dbt-jaffle>

<@U06PWELLR5G> I had simiilar issue/question regarding the __airbyte__data column , and discovered that a CTE works pretty well to parse the json in DuckDB also

``` WITH extracted AS (
     SELECT json_extract(_airbyte_data, ['delivery_date', 'batch_id']) AS extracted_list
     FROM iceberg_scan('<s3://airbyte/warehouse/airbyte_raw_local_campaign_dashboard_daily_t/metadata/00000-d8921586-306d-47f5-bde7-88833d97f55a.metadata.json>')
 )
 SELECT
     extracted_list[1] AS delivery_date,
     extracted_list[2] AS batch_id
 FROM extracted limit 10;```

<@U078F164J8H> do you have any good resource for config steps for Nessie catalog (REST) with airbyte.  (screenshot of your destination settings?) I am fairly new to airbyte and configuring connectors……did not have any luck trying to configure the REST version of Nessie on residing on Dremio.  Maybe there is a better way?  I would like to use Nessie as my main iceberg catalog for tables I have synced with Airbyte. and not have a mix of JDBC and REST catalogs.

Thanks <@U07F7PHG2NB> for sharing this. I learned something valuable from your post. However, our solution has some deep complexities that require us to keep the data as-is in the store, allowing others to read it directly without extracting it in a JSON format.