https://pinot.apache.org/ logo
#troubleshooting
Title
# troubleshooting
d

Diogo Baeder

03/25/2022, 11:35 PM
Hi folks! Now that we're using Pinot with realtime tables in production, I'm also doing some experiments with offline tables for something else I'm developing. However, one thing I'd like to do is to be able to partition the data according to the values in some of the dimension columns. I'll follow in a thread:
For example, suppose I have a table with a number of columns, where two of them are "country" and "state". Suppose, for example, that I have the following possible combinations: • Country: US, State: NJ • Country: BR, State: SP • Country: BR, State: BA
In the case above, I'd like to have 3 partitions, one for each combination of country + state, but without knowing beforehand that I'll end up having 3 partitions - because I want to have more partitions in the future if more combinations are necessary.
Does Pinot support that? If yes, then is there a doc I can follow to learn how to do that? Thanks! šŸ™‚
Just to explain why I need this: in this system I'm developing/experimenting-with, when we query for a certain subset of data, it will necessarily be under the same combination of those columns; Therefore, creating these partitions will be a good move for us, because none of our queries would cross partition boundaries, and would reduce the amount of documents to query - we'll also use inverted indexes because they will fit well for us, but even then, partitioning will further improve performance by a great deal for us.
r

Richard Startin

03/26/2022, 1:57 PM
there's a new feature to do this https://github.com/apache/pinot/pull/8224
d

Diogo Baeder

03/26/2022, 2:28 PM
Ah, nice! šŸ™‚
@User it seems like it has to be configured with predefined column values though; Is this a requirement for using that? Or will the partitions be created automatically for new values that come in?
I was thinking, maybe something I could do is to add a "partition_data" column to my table, and join the fields values I want - e.g.
US|NJ
,
BR|SP
,
BR|BA
- and then use an existing partitioning strategy, but with a very high number of partitions, like 1000. I think this might be enough for us - not the ideal solution, but good enough.