Hi folks, let me ask for your opinion on modeling ...
# troubleshooting
d
Hi folks, let me ask for your opinion on modeling tables in Pinot. Suppose (just a fake case for simple illustration) that you had a data source where you have users having different objects at home, where the types and names of these objects are dynamic, and you wanted to have a way to store them in such a way that you could be able to query them by objects amounts, like finding users that have 2 cars and 2 TVs. Considering that you don't know what objects would be coming in beforehand, how would you model this? JSON field for the objects, to keep them in a single row that represents the individual user? Spreading the objects as different rows and then aggregating and filtering at the application side? How would you guys model this?
k
Model as json type columns as long the objects a single user holds does not run into hundreds of thousands
This will be the fastest and most efficient
d
Hmm... In some cases I might have about 100 items or so, but I hope this doesn't turn out to be a problem
m
Should be fine
k
One thing missing in Pinot right now is ability to have configure indexes for each field within a json
We only do inverted index rt now by default
Will be great if you can file an issue for this
d
I can, yes. Will do ASAP. Thanks again! 🙂
m
Also is there structure to it, or you just want to do text match on bunch of strings?
d
Something like, imagine a user has: • TVs: 2 • Cars: 1 And another user has: • TVs: 1 • Dogs: 4 So each user has a certain "thing" and then a certain amount of that thing. Just one level, no complex structure really. But the problem is that users would have different things, hence me not being able to define them as columns.
k
yeah json is right
d
Initially I structured this as each "thing" being a separate row, but it turned out to have an obvious problem: it would be impossible to filter users that have "2 TVs and 1 car", for example.
a
Maybe you can transform one json to multiple records with TVs/Dogs/.. stored in a field named as type and 2/1/.. stored in a field named as amount.
k
yes, thats also another idea and commonly used. the only drawback with that is if you want to get count of users who have TVs and Dogs. That will require distinctCount vs Count with json.
d
Thanks for the hint, Alice! But I was doing that already, and it didn't solve my problem because I ended up not being able to correlate different rows in the same query (e.g. "users that have 2 TVs, and either 2 dogs or 2 cars")
k
how big is the dataset
d
I don't know yet how big the dataset will be, in total it will probably be in the order of a few terabytes.