sooo, how do you guys deal with m to n relations i...
# random
a
sooo, how do you guys deal with m to n relations in dynamodb? Make a separate mapping table? Or is there some trick I'm not thinking of?
for example books and authors. An author can write many books, and a book can be written by multiple authors
g
It always depends on the exact use case and what kind of queries you want to be able to perform.
a
Yet data modeling with DynamoDB is tricky for those used to the relational databases that have dominated for the past few decades.
Ain't that the truth right there
@Garret Harp Well, in the use case of books and authors, you would obviously want to be able to list all books of an author and also all authors of a book
so, let's say you have a list of authors, then you click on an author, and it should list all the books he's written, and when you click on a book, it should then show book details including all authors of that particular book
j
I am not an expert on this, Alex Debrie and Rick Houlihan are. I'm wondering if you could have a GSI that is the primary key flipped.
a
thanks
The main criticism of multi table approach seems to be network latency, but in many cases you could probably cache the mappings and then it would also be only a single lookup
but yeah, I didn't even think of using compound sort keys with startsWith
j
yeah, it's a whole new way of thinking. You do not want to normalize your data. You want a single query to be able to get all of your relevant data.
and on the sort key you can do different logic like starts with
a
i don't really see how many to many relations are done...the examples all seem to be one to many only, or maybe I just don't get it yet...
Maybe you just duplicate the data?
j
I'm wondering if you can create global secondary index with the partition key and sort key flipped
or you could duplicate the data, but it would be better to not
a
hmmm, yeah, I gotta try this
I mean, I should have been able to come up with that, too, since I'm already doing it on the "mapping table" of the multi table approach
my guts tell me it won't work, but my guts have been wrong many times before in this new (for me) serverless world
hmmm, I can't come up with a way to update the table, though...in the case of many to many
their example is super confusing for me, though...bills and invoices? I don't work in finance, god damn it
what even is a bill that belongs to multiple invoices? I have no idea
it doesn't even seem to be many to many, because apparently, the bills have different data when they belong to different invoices...
I guess their example is when you have a bill in a restaurant for a table, and then multiple invoices for people who want to pay separately, so the bill gets spread over multiple users (invoices)
but that is not technically many to many, at least not how I understand it
because obviously, you can get away without data duplication in this example, but that's not true for all many to many examples
eh, I'm probably just gonna do multi table and worry about it later when I have performance problems
future me will hate me, but whatever, he's not here yet 🙂
d
I purchased and worked through the DynamoDB Book. It’s really vey good. I learned a TON.
that said I still feel totally lost doing single table design. What I feel like I really need is the Single Table Design workbook. That is just a bunch of exercises where you model, change update… over and over.
In the mean time, I’m trying to do single table wherever I can and just do multi for the rest. I def have “fewer” tables
s
I think in the adjacency list example that's shared, the invoice is equivalent to your book, so your book id would be pk, and then authors would be sk. Then you have gsi_pk_1 pointing at the sk column (author) so you can query that. It works because gsi_pk_1 can contain duplicates because GSI can have duplicate primary key. Since pk would by default be replicated to the GSI you can always query gsi_pk_1 with author name and get a list of book ids from the pk column, along with any other data you want to add to the GSI. I think the main drawback on Single Table Design is its in-flexible, it would be tough to update if your use case changed
a
i don't see how that supports full many to many queries from both sides without data duplication, though
or BatchGetItem
because unlike in their example, author data does not change per book
most articles I've read now either introduce data duplication or then use BatchGetItem after an initial query to load the rest of the data
i think turning it around and storing author book bookdata items would be better in this case, because it would lead to only a few of the books having duplicated data, and not many times, so you could still get away with relatively cheap updates
since the amount of books authors write is greater than the amount of authors books have (usually only 1)
i will probably do it this way, because my real case is similarly lopsided
s
Ooh nice logic, makes sense