Hi Airbyte Community :wave: , I'm a bit frustrate...
# ask-ai
v
Hi Airbyte Community 👋 , I'm a bit frustrated by the number of solutions available and would like to have some feedback / help I saw a pretty good diagram of Modern Data Architecture with many.. MANY solutions

https://i1.wp.com/a16z.com/wp-content/uploads/2022/11/Unified-Data-Infrastructure-2.0-1.png

I'm on AWS on currently have a few ETL process build on my own with lambda functions and would like to scale the entire process with real ETL solution I would like to know if I can mix Airbyte with some AWS solutions and maybe some other solutions in order to achieve a pretty good base architecture that can scale by adding more tools I would like to start with Airbyte as ETL base product with dbt for more advanced transformation Is a workflow manager like airflow necessary to start? Can I use it with lake formation for example as an alternative to proposed solutions in Lakehouse in this diagram? Thanks! P.S: is there a design help channel? This question is more about design than deployment
k
A message from kapa.ai
Sorry @Valentin B., I am experiencing especially high traffic right now, please try again. 🧑‍💻
b
following 😛 but curious to hear @Valentin B. is there a reason you want to stitch everything together over using Databricks as a standalone? EDIT: Databricks offers load, transform, and storage + orchestration of all of that afaik
v
HI @Ba Thien Tran, thanks for the suggestion, I would like to have this on my own infrastructure (multiple AWS accounts with shared network), can I integrate Databricks into my own infrastructure?
What AWS services does Databricks cover? AWS Glue? EMR?
m
@Simon Späti (Airbyte) maybe can give some opinion
👍🏻 1
👍 1
v
In addition, please note that I really want to use Airbyte for data ingestion and base transformations but also for data sync. I saw that airbyte support postgres as source and weviate as destination and I planned to add Weaviate. So I really would like to use Airbyte but when I see https://airbyte.com/etl-tools-comparison AWS Glue (or more recently AppFlow) and Databricks are not mentionned, so are they competitors or are they complementary?
s
That's a common question many have as the data space explodes yearly. If you want an end-to-end base solution with integration, transformation, and a BI solution (this is the most flexible to exchange with any tool you like) and orchestrate everything, I have a hands-on example here Configure Airbyte Connections with Python (Dagster) or if you are interested why each tool, I wrote the related article The Open (aka Modern) Data Stack Distilled into Four Core Tools (including a linked GitHub repo to get you started). Our alpha destination is a good start and works alongside a Databricks Spark Server. I wrote a tutorial explaining why you'd want to use Databricks and how to implement it (some things might have changed, and some also reported some problems with some features of this connection). AWS glue, you can see more as a lean version of orchestration, not as a comparison to Airbyte. But I wouldn't use that Glue for orchestration except for a POC or small applications. You usually want to reuse existing components in orchestration pretty fast, want to have restart ability, etc., all of the features an orchestrator does for you. I hope some of this help, but I'm sure new different questions are popping up now :)
v
Thanks a lot @Simon Späti (Airbyte) for your detailed answer 🙂 I'm reading your glossary it's awesome! With these resources i'll probably be able to choose between all possible solutions
gratitude thank you 1
airbyte heart 3