Few question on sorted index: 1. I was trying to c...
# general
a
Few question on sorted index: 1. I was trying to create a sorted index on a STRING column. But it was not working. Then i tried it on a INT column and it worked. Is sorted index only supported on INT (or LONG) types? 2. I see isSorted = true in metadata.properties file for the event time as well as the metric column. Though i did not enable sorted index for those. What does this imply? Given i remember it was mentioned that only one column can be used as sorted index 3. Related to above, if most queries will have time in where clause, then should we add sorted index on time field? Or is it more beneficial to add sorted index on a field (used often to filter) other than time field?
m
Copy code
1. Sorted index is independent of data type. Is this for realtime or offline? In offline, the data is expected to be sorted before pushing to Pinot.
2. For a column with single value, the data is sorted by definition. Check the cardinality of these columns.
3. More beneficial to add sorted index that is used often in filter and that has low selectivity
a
Thanks @Mayank. For (1), i am using offline. I missed the sorting requirement. Now i sorted it before the segment creation and i can see isSorted true for the string data type column as well
m
๐Ÿ‘
a
not sure i understood point (2). All other dimensional columns are single valued. But i see they are set to isSorted = false. Only the column i added to the table config and the time and metric fields, i see isSorted as true
m
If the metadata says it is sorted, then the column is sorted. However, note that if all rows had a value of metric = 1, then by default that is sorted, right?
a
ah, i see. Makes sense for the metric field. But what the time column?
the input to segment creation had time sorted, so i believe thatโ€™s what it is referring to then
m
yes
what is your time unit?
a
seconds
m
I think if you sort by some other dimension, then time won't be sorted
a
got it.
Last question on (3). By low selectivity, do you mean the cardinality. Number of distinct values for that column?
m
selectivity - how many rows does that predicate select. low selectivity - more rows eliminated - better performance.
a
got it. Thanks again ๐Ÿ™
m
๐Ÿ‘