hey everyone quick thread on something I've been g...
# random
t
hey everyone quick thread on something I've been going back and forth on: so for the new ideal stack we're trying to figure out if we default to RDS or Dynamo. It'll work with either and we'll document both but we're wondering what's best to show newcomers. The thinking is Dynamo more or less requires single table design and that might be really confusing when also learning serverless/graphql/aws. That said, once you learn it I do feel like using dynamo is a 10x experience over using RDS (especially with @Tyler Walch work on ElectroDB) since it's truly serverless what do you all think?
d
I have a strong bias towards Dynamodb. I'm listening to you and Paul on awsfm right now and everything about saying that there are 10x better companies. The whole reason I'm fumbling about learning cdk, sst, dynamo etc... is that I think it's a better experience and will lead to better products. I accepted that I need to learn new paradigms when I decided to adopt serverless. Adopting Dynamodb is part of that.
c
well IMHO DynamoDB is better due to costs and easy setup, even though the single table design is indeed confusing… but I’d adopt it as well since DynamoDB is kinda part of the AWS Serverless world as @Devin pointed
t
I come from a strong relational database background, and learning single table design was definitely difficult at first. I feel like this depends heavily on the particular newcomer though. I do a lot of mentoring work with people just starting their careers and in my experience people who haven't been thinking relationally for years tend to pick up single table just as quickly as relational (if it's presented in context). I've also personally experienced just how quickly those free tier micro RDS instances hit their limits, and then scaling RDS is another thing to learn altogether.
u
what i like with dynamodb are the other features like the stream that can trigger your lambda , global table, cost , scalability etc
p
We migrated away from DynamoDB (to RDS postgres) as soon as we got more “consumers” within the company for analytics / internal tooling purpose. One limit we definitly hit with DynamoDB was explorability for ad hoc analytics and reporting. I would be really interested in hearing how you see that with single table design. Or if you would suggest to set up a “data lake / data export” as part of every dynamodb setup?
t
I use rockset with dynamodb. Very easy to config (actually do most of it in my cdk code) and then you get complete ad hoc query ability. I do wish this capability was native to AWS + serverless
u
I think if you want analytics its better to stream your data in s3 and process it with other service like aws glue, athena
etc
p
If you would include how to build a simple report/reporting in your ideal stack, it would be very nice I think
With stream to s3 / glue /… I feel like you only do that once you know what you really want to look into. For simple ad-hoc queries, that setup feels quite extensive
t
it's funny you bring that up because we've talked about exactly that! We want to follow the journey of a startup and make sure we have answers for everything they typically run into and internal ad-hoc querying is something that shows up soon after launching the product
u
my wish list : a kind of pattern/best practice to do ddd/cqrs easily with sst like nestjs
t
the ideal stack follows what I consider practical DDD
p
rockset looks awesome btw. have to try it
a
do we have to get dynamoDB with single table design? I have been following Cui Yan’s appsync masterclass and he creates as many tables as needed and the design is very clear. I second @Uncharted ’’s point that one good thing about dynamo is stream which is kind of sacrificed with single table design
t
at this point I'm a strong believer that all your tables should be single-table designed, even if you have multiple
the problem with a lot of educational content out there is they optimize for making something simple to learn and less so what happens after a year on a project
a
I would argue that the masterclass is simple since it is cloning twitter from ground up and supports major features(retweet, like, comments, DM, notification, search) But I am sure for some certain use cases, single table design would work better. Just to be clear, for single table design, do you mean we should put two different entities in one table even they don’t have any relationship with each other?
t
Yeah I think that's the right place to start
you want to have as few tables as possible, the more tables you have the more associated overhead there is
btw this was a big point of friction between Rick Houllihan and the amplify/appsync team, was a bit of blow up about it recently
u
Alex de brie say in one of his blog if you use appsync graphql you lose the advantage of the single table
a
i don’t want to derail this thread into a single table design debate. 🤣 back to the ideal stack topic, yeah I feel if we start with dynamodb + single table design, it sounds a little too much for a beginner to digest
but meanwhile, I am also waiting for the dynamoDB version to see if it applies to my project
d
Just to clarify, Alex said something more nuanced like:
Copy code
You absolutely can use single-table design with GraphQL, and I know some very smart people who are doing so. However, I don’t think it’s wrong to chose not to, depending on your specific needs + preferences.
u
I m currently a single table with multiple entities It was a bit confusing but i appreciate the fact that i can do a kind of sql join in one fetch
t
oh does the appsync class only use VTL?
a
It is like 50:50(vtl: lambda)
It is trying to use VTL as much as possible
d
I think to Da's point however the question is what is the goal. If the goal lean into AWS as much as possible I think single table design is the way. If it's get someone started, choose what the SST team feels most confident the can support and communicate. That's my vote at least
t
the other conflict I feel is I'd never actually use RDS since my db is important enough to pick the best option which at this point is Planetscale (or that new Neon thing I haven't tried yet)
d
I think that's important. You're gonna have noobs like me asking a lot of questions. That's an overhead burden
a
as far as ideal stack’s concerned, what kind of sample project you are looking to build? Maybe it would be ok to have it in single table design, but users still have the freedom to define multiple tables and the framework would still work?
u
Vtl is kinda meh bur fortunately they want to replace it by js if i remember
t
@arda yeah we definitely would allow people to do whatever, I'm mainly trying to avoid the situation where someone new to dynamodb accidentally does a table per entity because that's what they're used to in RDS and ends up in a bad place for a real app
u
The good thing also about dynamodb is that i wanted to use it to manage multi tenancy All my data are in the same table and i generate and use role to restrict the access by tenant
a
ends up in a bad place for a real app
I am surprised you guys are having such certain preference over single table design. I have to admit I haven’t dived deep enough to make a meaningful comment. But my impression from Alex in this podcast is that this design principle should be considered case by case: https://podcasts.apple.com/us/podcast/real-world-serverless-with-theburningmonk/id1499753495?i=1000520217935
t
I think what I'm saying is there's a difference between knowing single table and deciding not to use it and not knowing it and not using it
most applications need single table and tbh I haven't come across one that doesn't but I'm open to the idea. But even when you don't you end up needing to manage a lot more in CDK (columns, indexes, etc per table) and that creates slowdowns in your day to day work
j
Completely agree with that it's much easier to use DynamoDB to start with, especially if your target is greenfield projects or startups. Having worked on a few of those, we generally need to iterate pretty quickly on data models and Dynamo (or any non-relational) works better than relational. In saying that, learning DDB with the whole indexing etc is a bit of a learning curve if you haven't done it yet. A use-case that I have and is the main reason I am slowly migrating a side project into RDS is because of GeoQueries. AFAIK it's pretty complex to do it with DDB and need to know way ahead of time on your resolution requirements. I did think about using just ElasticSearch on top of DDB, but then it's not completely serverless and means I have to have an instance running all the time.
t
curious if you're using serverless rds
l
@justindra for DynamoDB do you mean working with Single table design from the get-go for a greenfield project? Or just iterating over entities in separate tables?
@thdxr I guess two approaches would be perfect but for a totally greenfield project, where you're just scratching the surface of access patterns that will emerge within the next 6m-1y I'm highly biased towards the RDS. It's just too fluid at the very start of the game to keep reimagining single table from ground up every week 😅
Might be I need more experience with the matter and I'm looking forward to our current project reaching enough stability in order to migrate
j
@thdxr yep, using serverless RDS from SST. It's a bit of a pain that on cold-starts of the DB it takes so long (up to a couple mins). But after it's great. I could set it to always have one instance going to reduce that cold start, but then it's back to the non-serverless days. I saw you posted a link the other day to another provider. I might look into that and see if it's better to use that, downside is you can't use SST with it. @Lukasz K I'd say it depends on your team's capabilities and the use-case. At the end of the day, when doing greenfield/startups, where the goal is to move fast, just do whatever is easiest for your team. If your team is more confident in relationalDB then just use that. It's going to be faster than trying to learn something and move fast. As an example, some projects, we're adopting Single-Table design where we've scoped out the feature extensively and have high probability it won't change. At the same time, other projects where it's more of an exploration, we might create multiple tables to test things out and then refactor it later on. In a way, similar to migrations for relational, but for some of these, we can forgo validation at the start and let the frontend define the data model until we get to a point where we're happy with it. Access patterns-wise, you can create multiple GSIs at the start, your volume is normally so small that the extra cost doesn't really matter. And then optimize it once you get to a stable point. It's not very serverless but you can also cheat and put ElasticSearch on top of any DB and just index everything 😛
k
I can't tie this back to production apps, but my motives for SST / Dynamo are similar to what Devin expressed... it's ultimately going to produce a better product. Though completely understood about the friction single table design creates. I learned DDB through Alex DeBrie's resources, and while it absolutely took time to wrap my head around, it's all much better understood now. So perhaps with enough examples and high quality explainers that could be sufficient for helping to highlight the benefits (cost being a big one)?
On a more meta level, seems like choosing RDS over Dynamo would clash w/ the whole idea of using SST in the sense that it's supposed to provide a better experience.. takes a learning curve for sure but the goal is a better developer UX, which IMO is more aligned w/ Dynamo (and the right patterns!)