Hey all, Question for you, I've been seeing event...
# general
m
Hey all, Question for you, I've been seeing event sourcing all around for years, but have never seen an actual production implementation. Seems like a neat idea but can't figure how a business application actually functions this way (like how does this replace a RDBMS when querying?). And if you play them into an RDBMS then what is the point of the double storage? Just save a backup of the DB... What are the actual mechanics and how would an application/ system that works on event sourcing work (emitting events as audit or communication I get, but when you have state (lots of data and your user is searching, doing etc) - what storage is involved, how are queries performed? Sorry this is a little random, but I can't wrap my head around "real" use cases.
l
Event sourcing is typically paired with CQRS - you have two data stores: one that can be optimized for writes, and one that can be optimized for reads As an example, at my current company we offer a "proxy-workflow engine" as a service: so we proxy a significant portion of our customer's traffic & need to operate on that traffic quickly without impacting latency. We have two separate data stores as a result - the write-optimized data store, which is very difficult to query but generally has 10-20ms inserts, and the read-optimized data store, which is treated as eventually consistent & populated by a data pipeline, and is much more usable for analytical queries and is what the UI of our application points to With event sourcing, your event store essentially becomes the write-optimized store in this system. Then you build read-optimized projections of that data in separate data stores - this typically means writing a stream processor to populate those new stores. The obvious trade off here is that you do double the work of maintaining two databases. I don't recommend CQRS or event sourcing unless you absolutely need it. Personally, I think the main benefit of event-sourcing (audibility) can still be achieved with your traditional RDBMS model
👍 1
(we don't do event sourcing at my company, only CQRS - the concepts are closely related but separate). The hard part of event sourcing is that you're putting all of your business logic into a stream of events, which is cognitively harder to think about and change - I've seen event sourcing fail more than it succeeds, so I'm a bit biased
👍 1
m
I understand CQRS, that is a classic separation of concerns, but as you say, it is often lumped up with event sourcing, but they are definitely not the same. In the architecture you outline - as you write - there is no event sourcing. I completely understand why you might want two data stores (or more) to optimize your data for different use cases. I'm looking specifically for a working example of event sourcing as the root of the architecture (exactly as you explain in the second part). It's not only consuming streams and processing to populate a data store, it's the entire application running on top of a stream - constantly scanning it from start to ... (endless?) I get Kafka , the application itself is a bit this, but it's a super specific case, and there is more state stored not as events (in the nodes etc)...
l
You could model pretty much any system as a stream of events, you just have to decide when & if it's appropriate. A specific case where I've seen it done (and it ultimately failed) was when I worked at a life-insurance tech startup. We modeled the entire flow of initiating & managing a policy as a stream of events, so every single creation/modification/deletion of the policy itself became something like
PolicyHolderAddressChanged
or
PolicyBeneficiaryAdded
instead of calls to a generic
UpdatePolicy
endpoint. To query for the current policy, services would either have to get all events from the event store, order them, and compute the state in-memory, or query a "projection" showing the current state. We tended to opt for the projection route for performance. With a few exceptions, mostly everything communicated via publishing an event to a queue. We chose event sourcing because we knew we needed audited history for the policies. We built & modeled that entire system using protobufs for the event definitions. Why I ultimately consider it a failure (this startup recently shut down at the beginning of the year), is that it simply took way, way, way too much time to engineer the system like this. And from a compliance perspective, we would have been okay simply creating triggers on our database tables that inserted versions of rows into some "history" table. We had real problems when we needed to add new events to the domain or update the schema of existing events - which are inevitable changes in any system. This meant that every single projection needed to be replayed & rebuilt, and this was especially expensive for the projections we built for our analytics team, as they needed the most generic view of the data. As our data grew, this only become more difficult. I do remember we ran several event-storming sessions to model the business before we started eng work, so if you're thinking about it, it might be useful to do this exercise
❤️ 1
m
Thanks for this @Lucas Stephens you just confirmed to me that my understanding was basically correct. It takes a very special use-case to justify the complexity and the performance cost (which can be huge), I mean any successful product would live for a long time, and so to preserve a semblance of performance you'd have to save snapshots (or rollups) in order to avoid recomputing years of events... A fellow architect would have to work really hard to convince me this is a fitting way to go...
🙌 1
l
Yeah, going into the project I wasn't that against the idea, but now that I've seen how the sausage is made I'll always be a bit scarred from it. What is really ironic to me is that it seems most appropriate for systems with infrequent writes (less events, less need for projections), but the added complexity and extra work is especially not worth it in those situations. It can be argued that an unstructured event store (NoSQL) is easier to scale than your traditional RDBMS - but there's so much exciting stuff happening in more "horizontally scalable" SQL databases like Vitess, Cockroach, and Clickhouse that I don't feel compelled to ever choose event sourcing 😆 Not to mention, event sourcing is done so sparingly there's not a lot of open-source tooling or experience amongst devs that can help you out with things like replaying events, building projections, etc. Whereas there is a ton of tooling for the traditional RDBMS model
👍 2
m
You convinced me...
d
I've seen this done in payments processing - specifically servicing the 'instant payments' standard where latency is <2s. It makes sense for this to be an event driven system because it's high volume, transactions have a limited number of pathways through the process, generally live for a very short amount of time, and tables can be materialised of the various outputs over the streams as they happen. Ultimately though this is just a core feature of a larger banking system, not the entire system itself, so I don't know if it really answers your question. I guess you could ask what % of a system needs to be event sourced for it to be the main architecture.
Also I would agree if you were thinking you need event sourcing because of some kind of scale problem, technology like ClickHouse (and others as mentioned) is solving a lot of this kind of thing without making your architecture structurally complex. We operate CH in a use case over billions of daily rows with a latency that pegs under 500ms at a couple hundred qps on average that can spike from 5-20x that on a busy day. It requires operational finesse to maintain, but the architecture isn't complex and development over it is quick and easy.
m
Hey @Daniel Chaffelson, thanks for this! It sounds like two separate cases, the first you described sounds like event streaming, but on a small time scale (so there is no persistence of events or rather, events have a short TTL), but indeed the operation and state is done/processed as stream always (so queries are aggregating the events, and there is no other data in the system - views are temporary and either aggregated in place, or recreated) - this is a very valid use-case actually, because thinking of the tradeoffs, it throws away the biggest problem with event based (data accumulation and how to deal with it), and enjoys all the benefits (one example is any stream processor that might use time windows and such) The second part sounds to me like an Event Driven architecture - i.e. communication is done via events, but they just drive the state (which is managed internally by each component/service (in a DB?) according to their preference - this is fine, but it isn't event streaming - I like this architecture because, it again takes the advantages and "throws away" the problem - events are easy to pass, and provide a rich and contextful interface, but we don't have to aggregate many small pieces every time we want an answer - we use SQL/Whatever on a strong Data Store... BTW you can still store the events aside and replay them if needed, when you want to correct something. which is a nice backup strategy that can help deal with data corruption due to a bug in handling events (actually built such a thing, in a previous company when sometimes our processing logic was changed (either a bug, or a client request to use different aggregation rule - and we could delete the stored data and replay the original events and recalculate - which was heavy, but our support loved it!)
d
yeah I'm with you - and I agree it goes back to your original point that there's a lot of streaming use cases around, some on truly massive scale, but very few actual event-sourcing architectures. Also, perhaps it's useful to observe that the first case was building a payments system from the ground up to be streaming-ledger, whereas typically they are migrating from a mainframe or other database-as-ledger approach. And the second is taking a problem that the existing RDBMS couldn't scale to, and reworking it into a Kafka + OLAP solution. Replay and archiving to cold storage, amongst other things, are part of what make the managed service so valuable as you note. We find customers get a lot of value from our data engineers optimising their queries specifically for CH performance, which would otherwise require a larger and more expensive internal team if every customer had to do it for themselves, and would be probably an order of magnitude more expensive if you had to hire the unholy trinity of Kafka + Flink + Cloud developer to service it. Ultimately I think if event sourcing was such a great solution to particular problems, 'streaming is the answer, what was the question' companies like Confluent and StreamNative would have more use-cases for it on their homepage.
💥 1
m
I actually like that customizing idea, really cool, I suspect that giving the client access requires a bit more than just creating a custom view, but still. However maintaining it and supporting schema changes must be a full time and ungrateful job. Thanks for the added information. I should say this has all come about because some architect in the org with a lot of pull has decided event sourcing is the way to go, and having some experience I immediately thought "this is a horrible idea, it will be a nightmare to implement & maintain", and for what? where is the gain. I'm new at the company, and the confidence he projected made me feel that this was a done deal... A conversation with my manager revealed that it wasn't the first rodeo with futuristic ideas from the same person, and that they usually remain just that - a short, super high level and sparse memo, and a lot of talk and hand waving 👋 So I'm a lot calmer now 🥴
g
IMHO if you want to use event sourcing, you should use the proper tools, and not build them yourself on top of event streaming. You can read a what I mean in this blog. Axon Framework has a lot of downloads, and there are quite some companies paying for Axon Server, so it's not like nobody is doing event sourcing. Although within the event streaming communities, it does seem very rare.
To me the main benefit of event sourcing is reducing complexity. This has multiple reasons. The model is more in line with how the world works. It's also easier to add functionally later on, and being sure the current functionality isn't affected. Slightly more a CQRS thing, it's very easy to rebuild your projection, to accommodate new queries. As an AxonIQ employee I might be biased. But I spend a lot of time before joining building event sourcing on top of event streaming. With projects like bkes.