Hey all. I have a greenfield project that we’ll be...
# help
c
Hey all. I have a greenfield project that we’ll be able to architect as a serverless app from the ground up. It’s a relatively simple ETL & CRUD app that will integrate with 3rd party data sources. I’m still coming up to speed on current best practices for serverless, but am leaning toward using a AppSync GraphQL API with EventBridge as the spine for events. I’m having a hard time finding good examples of using EventBridge period, let alone with a GraphQL API. I also don’t see much reference to it in SST, other than for cron jobs. A couple questions – * For an app like this would you recommend against using EventBridge or AppSync for any reason? * Do you know of any architecture examples that use EventBridge for something similar to this? * Any recommendations of best practices for using SQS or SNS in combination with EventBridge? Thanks for any insights or help.
r
Is the plan to use EventBridge to kick off ETL processes as a result of changes made to the data via the GraphQL API?
o
For an app like this would you recommend against using EventBridge or AppSync for any reason?
Depending on how complicated your API is - you could run into AppSync’s limitations. For my current API I couldn’t run it directly against my DB or running a lambda - per - resolver would be too slow. But simpler APIs would probably work just fine and you can later switch over to running Apollo or something
Any recommendations of best practices for using SQS or SNS in combination with EventBridge?
Yeah any time I need an event processed by a lambda, it goes via SQS. That helps with retries, batching and theoretically handling back pressure (never needed to use it for that myself)
Also piling onto Ross’ question - is there a DB in the mix?
f
Hey @Clayton, yeah if you could share a bit more about ur use case wrt Ross and Omi’s questions.
c
Thanks @Ross Coundon, @Omi Chowdhury, @Frank - typing up additional details right now…
Admittedly, I’m still piecing together my larger understanding of how these individual elements (functions, queues, APIs, DBs) should composed together best. And, apologies if I’m using the term ETL too loosely, or incorrectly in this context. Current thinking is ETL processes would happen as result of 1) changes to data via GraphQL API (e.g. user updates via frontend) and 2) due to data updates received from 3rd party systems at set intervals. Re: databases, based on my current understanding I’m planning for 1+ DynamoDB tables connected to AppSync. And in case I know just enough to be completely backwards on this, the larger flow this is all meant to support is – * user can connect and import data from a CRM (e.g Salesforce) * system processes this data to produce user-specific profiles (e.g. ideal target companies) * system imports data from 3rd-party providers to create/build cross-user DB (e.g. public company profiles) * system identifies matches between user-specific profiles and common DB (e.g. prospect companies) * system reports this information back to user by email, within CRM (e.g. passed to Salesforce UI), and within a custom front-end app * user can refine and manage system generated profiles within custom front-end app * users can proactively search DB, utilizing profiles or ad-hoc, within custom front-end app * utilizing standard auth methods for user accounts, endpoints etc… I’m probably over complicating all of this a bit, but am hoping to find an approach that provides a solid foundation to grow with, without overbuilding unnecessarily at the outset. Thanks again
r
Only had a quick scan but if you're using DynamoDB a common pattern (we use this) is to set up a lambda as a consumer of a DynamoDB stream which reflects all the changes made to a table.
That lambda can do the work required, or create some events for other lambdas do some work - fan out.
c
That’s interesting Ross. Do you still decouple these via something like SQS to help handle throughput/errors/retries?
o
We have a similar architecture, lambda that on the dynamodb stream that publishes events onto eventbridge. Then for each thing that needs to happen on an event, there’s an EB rule that matches and forwards the event to SQS, which is connected to a lambda. Sometimes if more orchestration is required step functions get in the mix
c
Thanks Omi - not sure if I’m following your flow yet completely, do you mean this - DynamoDB Stream → EventBridge / Rule → SQS → Lambda
o
Pretty much - DDB Stream → Lambda → EventBridge → Rule → SQS → Lambda
and you have a [Rule → SQS → Lambda] unit for each process
Btw its considered dynamo best practice to use a single table for all your data. Sounds weird but it’s an example of how dynamodb is probably different than other DBs most devs have encountered (you can’t treat it like a relational db, or even mongo)
c
Gotcha, thanks. And for your CRUD to DB does it look something like this? Frontend → GraphQL endpoint → Lambda → SQS → Lambda → DDB Definitely still a bit confused on how to best connect + decouple the FE to GraphQL API to DB(s) properly…
o
No it just writes to the DB like a normal CRUD app - Frontend → GraphQL endpoint → Lambda → DDB
c
Re: single table, thanks. I saw that, but it sounds like single table performance may not be a factor if you’re using a GraphQL API since it has to resolve each field separately any how, right?
o
putting an SQS queue in there would be a pattern I’d use if I needed to batch or delay writes to my DB - but never run into that scenario - you’re talking a lot of writes into a single partition that can’t be broken out into multiple partitions in that scenario
c
Gotcha. Without an SQS queue in the frontend → DB path how do you handle errors / retries? And sorry for the onslaught of questions 😂 Definitely appreciate all of your advice
o
it sounds like single table performance may not be a factor if you’re using a GraphQL API since it has to resolve each field separately any how, right?
not sure I follow on that - you want to concentrate writes to a single table because dynamo scales depending on load across the table. The more dynamo scales out the faster things get as more nodes are allocated to that table (assuming you don’t have a hot partition problem). Were you getting at batch getting items from the DB?
Without an SQS queue in the frontend → DB path how do you handle errors / retries?
If its a frontend (or in our case usually a customer’s backend) - if there’s an error it should be propagated up, let the client decide what to do with it. Are you running async jobs?
c
I was basing understanding of GraphQL ≠ single table from the details in this article – https://www.alexdebrie.com/posts/dynamodb-single-table/#graphql--single-table-design tl;dr was that since GraphQL had to make multiple requests behind the scenes anyhow it negated the benefits of optimizing a single table. Again, I’m completely running on newly earned knowledge in this space right now, so I may not be appreciating other important considerations
Are you running async jobs?
Re from FE/user, I’m not 100% certain yet but could imagine I might for processing that takes more than a mere lookup
But, re: saving updates to DB from FE, you’re right, probably not - other than to decouple things for resiliency or retries, etc.
o
Yeah that’s a good article. I’d say if you’re doing a bunch of processing and you’re using an optimised schema for that process a single table design will work well - but it does come with complexity. The question is whether you have a use case of whether that complexity is worth it - and how complex it is depends on previous experience and good tooling. It’s a good skill to learn in terms of really understanding the strengths and weaknesses of dynamo. If you can build a really good single table design, then a multi-table one is easy, and at some point building one isn’t any harder than the other. You can implement graphql on either, esp if you have dataloader batching requests so that each phase of graphql resolution only makes one request
“You must do the hard thing before you are able to do the easy thing” - mr_miyagi if he was on github 😛
c
Thanks again @Omi Chowdhury - this has been a massive help. So here’s my current takeaway for a system that leverages EB and AppSync/GraphQL – Frontend → AppSync endpoint → SQS (optionally) → Lambda → Dynamo → Stream → Lambda → EventBridge → Rule → SQS → Lambda Rinse-and-repeat as needed for each process…
Looking at this, the last thing I’m wondering - if any one has an opinion or the patience remaining – is when you’d choose to push things to EventBridge as opposed to an individual SQS or SNS? I’m guessing you’d opt for SQS when there is only one recipient of the queued event - is that right? When there are multiple recipients is the modern best practice to use EventBridge, or does SNS still have its place alongside EB too?
o
I always push everything to eventbridge - unless its a processing fan out scenario which matches your one recipient criteria. Like if I have an event that X got deleted, there’s a lambda that queries the db for child objects of X, and puts them onto a queue, which another lambda starts deleting
Never used SNS (although I’m looking at it now as a target for cloudwatch alarms)
p
We went down the pattern of using SNS for pretty much everything so that we could have message filters for load tests. Made it really easy to override so we could isolate services. We also have a poor... microservice architecture.... so... take that with a grain of salt 😆 . With CDK (and SST plug!) you can create a load test env that is just like prod and tear it down when you are done... but thats if you don't have a bunch of crazy domain services that are all doing whatever they feel like. Programming! But the messaging filtering is pretty slick depending on your scenario.
m
@Clayton: I am not sure if you really need appsync here. a simple apigateway would work. I would choose appsync if your end users are mostly mobile and you want "real-time" aspect to your app. Otherwise, you can rely on simple apigateway
c
Thanks @Patrick Young Thanks @Muhammad Ali
I am not sure if you really need appsync here. a simple apigateway would work.
Muhammad, you’re referring to implementing a standard REST API with discrete endpoints instead of a GraphQL one, right?
m
Yeah. Based on my understanding you just rest API to expose your endpoints. I don't see a reason to use graphql/appsync.
o
Couterpoint: I found my API much easier to manage once I switched to graphql. I adopted it but was on the fence until I had a ticket to completely revamp a dashboard UI, and I didn’t have to make any changes to the backend to do it. I could use the same graphql API in a completely different way, and maintain a good response time
In a previous project that had a REST API - we ended up building a bunch of graphql features (selections, expansions, schemas) with a lot more effort and not nearly enough thought