Hello All! I want to build a data ware house for m...
# advice-data-warehouses
s
Hello All! I want to build a data ware house for my company. ---------------------------------------------------------------------------------------------------------------------- Current situation ---------------------------------------------------------------------------------------------------------------------- My current stack is taking all of our seperate databases (rds postgres) and using AWS DMS to use CDC to replicate the data into their own representative schema in a huge postgres instance.
Copy code
(ie: Microservice_1_Database | public schema ->  Huge_Postgres_Instance |  microservice_1 schema)
There are two problems with this. 1. AWS DMS is not resilient to DDL changes on the source DB 2. A huge postgres instance is still a postgres instance -> designed to be OLTP and not OLAP (we wanted to use redshift but many existing analytics queries break. This is something we are okay with when moving to cloud agnostic snowflake) ---------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------- Desired Situation ---------------------------------------------------------------------------------------------------------------------- Airbyte -> Snowflake using AWS RDS postgres (CDC enabled) ---------------------------------------------------------------------------------------------------------------------- 1. Does anyone have a stack like this, that uses airbyte to replicate to snowflake real-time? 2. What Logical Decoding Output Plug-in are you using (hopefully you are using AWS RDS instances and that plugin is complaint) and why? 3. Did you deploy airbyte in a pod into a k8 cluster? and if so, how did you determine the specs it needed? I would assume on-going-replication is a heavy lift, and am unsure how to calculate the specs for this deployment
k
Hey @Sefath Chowdhury, We haven't migrated to Airbyte yet, however, we do have somewhat similar stack using Kafka Connect to make a realtime replication into snowflake Postgres RDS -> Kafka Connect(Source connector - over Debezium using pgoutput plugin as jsonConverter) -> kafka topic -> Kafka Connect (Sink Connector - to snowflake STAGING Table as variant) -> Stream + Task on this STAGING Table in Snowflake to MERGE into Replica Table of Postgres Our plan is to replace this kafka connector with airbyte, so that we will not have an extra hop. Hope this Helps. I can update the thread with our findings once we start using Airbyte over k8s
thanku 1
s
Hey yeah that would be awesome! I'll start my POC this sprint so if i have any working solutions i'll pass it along also