Hello Airbyte Team I have been using Airbyte for approximate Airbyte #give-feedback

Hello Airbyte Team, I have been using Airbyte for...

Meni Shakarov

04/29/2024, 8:08 AM

Hello Airbyte Team, I have been using Airbyte for approximately six months and am genuinely impressed with its capabilities! However, I am encountering challenges with efficiently inserting data into my AWS S3-Glue data lake. I have experimented with several destinations, but each seems to have its own issue: 1. Destination-Data-Lake: Performance is slow and maintenance seems to be lacking. I have had an open PR for about four months without resolution. 2. Destination-S3: Encountering a bug related to data type handling when writing to Parquet ( issue with dictionary data types). 3. Destination-Glue: Only supports JSON format, which is not optimal for our needs. 4. Destination-Iceberg: Does not support the Glue data catalog. Given these challenges, I am curious if there are any plans on the roadmap to enhance support for data lake operations. I believe my use case is fairly common and robust support could benefit many users.

[DEPRECATED] Marcos Marx

04/29/2024, 1:42 PM

Hello @Meni Shakarov and thanks a lot for your feedback and for the contribution. For your contribution (it is merged now) sorry the long delayed to get it done, team was dedicated trying to improve our current CI pipeline to be easier to run tests. Maybe we should communicate better about this and set expectations better. I hope to get the current backlog down very quick and improve a lot during next weeks airbyte heart For S3, I’ll ask the connector team if there is any timeline to fix the issue. For Glue and Iceberg there are active contributions: • https://github.com/airbytehq/airbyte/pull/32996 maybe you can contribute informing what type you need • https://github.com/airbytehq/airbyte/pull/32720 implement Nessie catalog not the Glue one, but maybe is a similar contribution

Open in Slack

Previous Next