(I’ve asked this question in <Github Issues> here,...
# general
y
(I’ve asked this question in Github Issues here, but I am forwarding the question here since it would be more appropriate.) Hi Pinot team, I have a question regarding star-tree indexing a JSON key. For example, in the following
tableIndexConfig
,
Copy code
"tableIndexConfig": {
  "starTreeIndexConfigs": [{
    "dimensionsSplitOrder": [
      "Country",
      "Browser",
      "Locale"
    ],
    "skipStarNodeCreationForDimensions": [
    ],
    "functionColumnPairs": [
      "SUM__Impressions"
    ],
    "maxLeafRecords": 1
  }],
  ...
}
Is there a way to specify a JSON key column within the
dimensionSplitOrder
? For example, if we have a column named
person
having a JSON value such as
{ "number" : 112, "street" : "main st", "country" : "us" }
, can we add
"country"
as one of the
dimensionsSplitOrder
?
k
Not as of now.. you will have to create a derived column for the "country" first and then add it to startree index. This is an easy enhancement to add to startree index.. @Jackie ^^ WDYT
j
Directly supporting it is not straight forward because in order to filter on json field, we'll need to do either
JSON_MATCH
of
JSON_EXTRACT
, and star-tree needs to understand the semantic of the query in order to answer it. Since you already know the dimensions, you may use the complex type handling feature to flatten the json into columns, then apply star-tree on these columns
k
Probably something we can discuss in the issue.. if we add support for the udf in dimension split order then it’s simple mapping right? Another option is to model it as virtual column and rewrite the query on the fly? Probably not easy but doable?
j
Yeah, we should be able to support it. Internally we can generate a derived column, and create startree on top of it. On the query side we can auto rewrite the query to use the derived column so that startree can be used. In order to create startree, the column needs to be materialized, and dictionary encoded
k
Let’s add this to the issue..
j
Created a GH issue to track that, and linked the original issue (already closed) https://github.com/apache/pinot/issues/8863