Couple of architectural questions: 1. There is no ...
# ask-community-for-troubleshooting
a
Couple of architectural questions: 1. There is no persistence of message passed between a Source to Destination? A Worker reads from stdout and puts to stdin and this is all ephemeral 2. A Worker is also a container that lives independently of a source/destination container? a. If so, is a Worker tightly coupled with a Source/Dest pair? I.e. create a connection, it gets its own worker b. How many workers are there? Is there a set # in the cluster, 1 per source/dest...? c. If a worker fails, what is the impact to the related Source/Dest? 3. If a Destination fails, what is the impact to its Source? a. What happens to in-flight data? Do we assume the data is persistent and we can re-consume? b. If a destination commits part of its messages to the downstream application, but fails - what is the ability for it to resume? I assume there isn't any? 4. Can a source/destination scale independently? I.e. where the down/upstream application would support parallelism, e.g. Kafka, could a Source scale itself to be many containers working as a consumer group? a. If so, is the Worker able to scale with it to be able to consume from many Sources in parallel?
đź‘€ 1
âś… 1
I assume the answer to many of these questions will be Temporal - I'm not so familiar with it - but I assume this is the mechanism by which messages are passed between a Source/Dest and this is providing resilience to failure of an individual connector instance
Reading through Temporal docs, so presumably a Source is pushing messages to a Task Queue, which the Destination is polling - meaning there is some level of persistence outside of the Destination container itself, though I assume the Task Queue's life cycle is tied to the schedule of the source/dest "job" - i.e. created when the sync is run, and removed when the sync is stopped.
This leads me to another question.... I see there is a concept of Signals in Temporal - could this be something that would enable a Destination to communicate something back to it's Source?
j
1. We don’t save messages generally. For some destinations we “stage” messages to cloud storage, but that’s on a case by case basis and not part of the core framework. 2. Workers act as a bus passing messages between the source and the destination. We use this, not temporal task queues, which wouldn’t be able to scale for this purpose. Right now we share a worker across multiple source/dest pairs but we will likely change this to be per src/dest pair in the next couple of months. a. so right now there is a set number in the cluster b. worker failures right now would kill all related src/dest syncs that are running. c. again we’re going to restructure this to be more independent 3. We have checkpointing for incremental syncs which allows resuming. We only support at-least once delivery for incremental. 4. We don’t support sharding for sources/dests currently. You can scale up resources specifically for a single source container but not split it across containers.
a
Thanks! Is there more detail on what the thinking is on the future of this Worker then? It sounds like a bottleneck both in performance and resiliency, which has been solved in similar use cases in different ways - e.g. the Pulsar architecture solving a similar problem In its current form, it sounds like a problem that can't be solved by the end user - it's structural, and no deployment scenario can improve it. I.e. you can't throw more nodes, more disks, more compute at the problem as there's no way for this to be consumed - which is very limiting
I know that right now with the Batch focus it's not so much of a concern, but as you go in to the streaming side, this will be critical - especially if the tool is to be competitive at scale (and where resiliency requirements are high)
j
Yeah we’re currently working on the design to make syncs completely independent.
You can expect syncs to run completely independently of each other by the end of this quarter. Sharding will likely take longer to add support for but something we’re planning on doing at some point.
👍 1
Streaming processing at scale will likely be even further output. However, sharded batch-based consumption of a streaming source will be possible in the next couple quarters, which should solve most use cases unless there’s a need for extremely low latency loads into the destiantion.
a
Interesting, thanks for the info. I'd love to see Airbyte publish a design spec that outlines the solution so that the community can provide feedback and get an idea of the direction you're headed