Apache Pinot

(I’ve asked this question in <https://github.com/apache/pinot/issues/8851|Github Issues> here, but I am forwarding the question here since it would be more appropriate.)
Hi Pinot team, I have a question regarding star-tree indexing a JSON key.
For example, in the following `tableIndexConfig`,
```"tableIndexConfig": {
  "starTreeIndexConfigs": [{
    "dimensionsSplitOrder": [
      "Country",
      "Browser",
      "Locale"
    ],
    "skipStarNodeCreationForDimensions": [
    ],
    "functionColumnPairs": [
      "SUM__Impressions"
    ],
    "maxLeafRecords": 1
  }],
  ...
}```
Is there a way to specify a JSON key column within the `dimensionSplitOrder`?
For example, if we have a column named `person` having a JSON value such as `{ "number" : 112, "street" : "main st", "country" : "us" }`, can we add `"country"` as one of the `dimensionsSplitOrder`?

Not as of now.. you will have to create a derived column for the "country" first and then add it to startree index. This is an easy enhancement to add to startree index.. <@UDT7GFEG6> ^^ WDYT

Directly supporting it is not straight forward because in order to filter on json field, we'll need to do either `JSON_MATCH` of `JSON_EXTRACT`, and star-tree needs to understand the semantic of the query in order to answer it. Since you already know the dimensions, you may use the <https://docs.pinot.apache.org/basics/data-import/complex-type|complex type handling> feature to flatten the json into columns, then apply star-tree on these columns

Probably something we can discuss in the issue.. if we add support for the udf in dimension split order then it’s simple mapping right? 

Another option is to model it as virtual column and rewrite the query on the fly? 

Probably not easy but doable?

Yeah, we should be able to support it. Internally we can generate a derived column, and create startree on top of it. On the query side we can auto rewrite the query to use the derived column so that startree can be used. In order to create startree, the column needs to be materialized, and dictionary encoded

Created a GH issue to track that, and linked the original issue (already closed) <https://github.com/apache/pinot/issues/8863>