Hi all, I've read the documentation around pinot s...
# getting-started
p
Hi all, I've read the documentation around pinot star-tree index (https://docs.pinot.apache.org/basics/indexing/star-tree-index) and was thinking on how to define the set of dimensions in the Dimensions Split Order. We currently have a ES solution that is used for real-time dashboarding and a future need for real-time analytics. I can of course go over the current queries executed on ES and work from there. We can do use needs gathering on the future requirements for the real-time analytics. This will lead to an ordered list of dimensions. Does one now add all these dimensions to the "dimensions split order" and leave it up to pinot to stop the tree splits when the leave contains no more than T number of documents, or does one typically limit the number of dimensions by only adding a certain number of dimensions in the 'dimensions split order'? If the latter is the case, how do you know how many to add, and is there a reason not to add all dimensions to 'dimensions split order'? Also, what happens when you have more than T documents in a leaf node: if one has put all dimensions in 'dimensions split order', I assume there is no further actions that can be taken? Else of course, one could add a dimension to 'dimensions split order'. Is the latter an easy fix or does it require reindexing all documents?
k
If you know the dimensions that are typically queried include them in descending order of cardinality.. put the time column at the end.. You can also start with default/auto and let Pinot decide everything for you.. you the can measure the performance and go from there..
p
Time at the end... makes sense. thanks! Also I understand that I need to enable the startree index and it cannot be applied to pre-existing segments. Presumably the startree configuration (incl dimensions to split on) can then also not be updated for segments created under the previous startree configuration?
k
it can be applied to pre-existing segments as well.. there is flag to enableDynamicStarTree or something like that
1
update the tableconfig and reload the segments
p
Can I alsl ask if there is a possibility to have immutable segments with a startree while having mutable segments on another index to allow updates/upserts?
k
No
😁 1
p
i have a follow up question on the comment about adding time column at the end. what if the queries have a group by on the time columns and the predicate has a time range. think of it has a time series plot with some aggregates over certain time interval. or would you suggest a range index for some thing like that?
👍 1
🤔 1
as a side note, i think it would be helpful to have either a meet up talk or a blog series or both going over such well known use cases to make the adoption of pinot easier. i understand these wont be probably the best performing solutions but it'll give a kind of a head start to figuring out the right indexing strategy @Kishore G
k
I think we had a meet up around this and I explained all the indexing techniques but that was a bit rushed.. may be a great idea for @Tim Berglund for his next video
🙏 2