I can't figure out when you'd do EventBridge ->...
# random
t
I can't figure out when you'd do EventBridge -> sqs -> lambda What's the issue with having eb invoke a lambda directly with a DLQ?
d
I’ve been wondering too. SQS makes sense between different lambdas/services. But eb to lambda? And if we want to fan out, we can still do eb -> sns or eb -> multiple targets. So idk.
Then again, with SQS, we get FIFO and lambdas can pull off messages in batches. Unsure if lambdas can have that behavior with eb directly
EB and lambda are great as Cron jobs, given the Cron construct we got.
t
Ah messages in batches makes sense
f
Also, if EB has spiky volume, and u can add a buffer in between
o
The main reason I put SQS between EB and lambda is because I get automatic retries from SQS. Saved my bacon a few times where an unexpected event comes in and it causes a failure. I was able to publish a hotfix quickly (thanks seed!) to handle those events and SQS automatically retried those events. With a DLQ I think I’d have to figure out how replay those events
Also the batching is super useful for my indexer - I have a couple of EB rules on a couple different data sources go into one SQS queue, and then batch process in the indexer lambda, because I can make one batch request to add everything to the search engine
t
Doesn't EventBridge have retries too?
o
I don’t think EB has (async invocation) but Lambda DLQs I think do
d
Dumb q. How do you handle messages in a dlq? Can you also have a dlq send messages to other targets, say a lambda to send slack msgs.
f
yeah DLQ can have subscribers afaik
o
Correction above: EB doesn’t support retires but lambda async invocation does - roughly the same model as SQS retries, but all the values seem hardcoded:
Lambda manages the function’s asynchronous event queue and attempts to retry on errors. If the function returns an error, Lambda attempts to run it two more times, with a one-minute wait between the first two attempts, and two minutes between the second and third attempts. Function errors include errors returned by the function’s code and errors returned by the function’s runtime, such as timeouts.
https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html
For DLQs I haven’t had to do anything with them yet - none of my DLQs have any subscribers - but I’d probably write a new lambda handler if an issue came up and do … something.. with the events (save into S3, replay directly). I was bemoaning that a good solution for DLQs doesn’t exist a few weeks ago
Some service that you can attach to all your DLQs as subscriber that persists the message, alerts you, allows you to inspect them, then requeue them back (selectively) to a bunch of targets would be amazing
Reroute the life support DLQ to the shields service!
c
I’m admittedly coming up to speed with much of this, but the architecture patterns John Gilbert has laid on some articles seems to make sense. Here he’s using EB and Kinesis together as a central event hub to help leverage the EB routing + batching / optional keys from Kinesis - https://medium.com/@jgilbert001/combining-the-best-of-aws-eventbridge-and-aws-kinesis-9b363b043ade
Also, is it correct that EB only holds onto messages / retries for 24 hours and thus the need for other tools like SQS for a throttled lambda queue and for a DLQ?
t
Yeah that's right it can retry up to 24 hours before it DLQs it
@Omi Chowdhury I've been a bit confused about retries. EB documentation says it retries unsuccessful deliveries for 24 hours: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-rule-dlq.html Is a lambda executing and failing considered an unsuccessful delivery? Or does EB consider it successful as soon as the lambda service has it
o
@thdxr I haven’t tested failure handling between EB and Lambda, but my understanding is that because EB invokes the Lambda asynchronously, once Lambda has it, EB is out of the picture. Lambda is now in charge of making sure that event gets executed. I think the EB retries would only come into play if it couldn’t deliver the event to Lambda
t
Yeah that makes sense, been to lazy to test it myself. Now I feel like I need a queue between every single handler and eventbridge 😬
o
It’s confusing because lambda async has its own functionality around retries/DLQs/destinations, but when SQS triggers Lambda, it does so synchronously, so none of that kicks in and SQS’s functionality around retires kicks in
Lol yup that’s I do…and can’t wait to switch to SST and make a reusable construct for it. Before this thread though I didn’t know that EB had some form of retry built-in
I guess it’s necessary for every element in the stack to handle failure, but damn its a lot to have to know. Still AWS are on the hook for designing a system where learning about a concept in one service doesn’t really translate over to others.