I have a use case where I want to have a queue of ...
# help
t
I have a use case where I want to have a queue of jobs and then one-by-one invoke a lambda which should have a very low (1-2) concurrency. If a job fails it should be retried ~5 times before DLQ. After some research it seems a simple SQS queue is not a good fit for this (messages will be rejected because the lambda can't be invoked so they'll reach their max retries quickly). Anyone can advice a good pattern to solve this?
a
you’ll need to use SQS with step-functions to achieve this.
t
Clarification: the jobs are not depending on each other. The concurrency is low because I have to call an external API with every job, with the same credentials and the API has a rate limit.
a
I get that but to be able to manually move failures to a DLQ after x number of attempts either requires a state machine or a event based approach.
if you need ordering, you’ll need to use a SQS fifo queue, that will take care of the ordering but then you need to ensure that you don’t pick up more than a couple of jobs at a time, in that case you’ll need to play with the lambda processing and delay job addition to the queue for some time. It’s going to be complicated to achieve all of this together at once and so go ahead by solving one step at a time.
d
Just use a CRON + SQS. You can call for messages and process at whatever pace you want and else they will just sit there.
For retries, just put a prop on the message that you increment on failures.
a
@Derek Kershner how’d you deal with limiting the amount of items that needs to be processed by the queue on the cron trigger? Like suppose there are 40 items in queue and suppose only 2 need to be processed per trigger?
d
ReceiveMessageCommandInput | SQS Client - AWS SDK for JavaScript v3 (amazon.com)
Just set that to how many you want to process per trigger.
a
Nice! this will definitely do the trick, my only concern is that if the number of events blows out of proportion this will create a serious back-pressure. This is why I try my best to not cap processing, I’d instead try to cap the notification of delivery.
the retry by appending number of retries also seems kinda confusing because it’s not clear whether the failed attempts need to be retried first or the ones in the queue. This really would be simplified a lot using event sourcing or step-functions with sqs fifo. or I might just be complicating it as I prefer. 😅
d
No doubt on the backpressure, but that's sort of the idea as proposed.
This one is architecture simple, code complex. Step functions would be architecture complex and code simple. Tradeoffs.
This is also quite a bit cheaper, FWIW, but neither is expensive here since you could use express step functions.
a
yep, the express ones are dirt cheap lol!
I’d actually implement this using dynamo and cron. Primarily, I’d keep persisting all events sorted by date and time. then, use cron to check a gsi for all unprocessed items and get as many as I want to process based on some .env config or ssm param. I’d then process them either using express step-functions or just handle it in the same lambda triggered by the cron, this would deal with the retries too. I think this could work great, no worries regarding dropping events due to the queue being full. What do you think?
d
oh, I see, you are concerned about retries being at the back of the pack forever, Dynamo would give you way more ordering abilities, and no real downside other than cost, which would still be quite low. I got the sense this was merely about spreading out bursts, though, and that throughput in general was not a concern, only that it exceeded boundaries on rare occasion. I’d probably stick to SQS for simplicity myself, and just stick an alarm on queue length, call it a day, but nothing stopping ya!
a
yep, thanks for the discussion, very insightful.