Hi I am exploring data ingestion through multiple datasource Airbyte #ask-community-for-troubleshooting

Hi, I am exploring data ingestion through multiple...

10/25/2021, 3:35 PM

Hi, I am exploring data ingestion through multiple datasources using Airbyte. I would like to know is it possible to have runtime processing. I would like to deploy Airbyte in AWS and then use it for reading and updating data from multiple data sources. For example, Let say system would read the data from Snowflake and keep it in memory, do some transformations in lambda and then update it back in snowflake. Can we do it using some temp processing through Airbyte, without storing the data? If yes then what is the process to do? Any help and input is appreciated!!

Kamil Breguła

10/25/2021, 3:54 PM

Do you consider to use Apache Beam/Dataflow or CDAP/Datafusion?

10/25/2021, 4:22 PM

@Kamil Breguła I havent used these tools. I am open. Do you think this usecase is achievable ?

Kamil Breguła

10/25/2021, 4:25 PM

If you want to transform your data, I think you will find a better tool than Airbyte. Airbyte focuses on copying data from SaaS services.

Kamil Breguła

10/25/2021, 4:28 PM

Most often, the transformations are already performed in the target database, when the data is already saved. EL*T* approach.

10/25/2021, 4:28 PM

transform data is just an example. I would like to do some data processing. For this I want a tool which read and write from multiple datasources but I wouldnt want data to be stored anywhere. I am exploring local json option with airbyte. This is also storing data. Wondering if its possible to keep in runtime only

Kamil Breguła

10/25/2021, 4:29 PM

If you want to do in-memory transformations, you need something for ETL like Apache Spark, Beam, CDAP, Datafusion, Presto DB

3 Views

Open in Slack

Previous Next