Querying question: When we query our table (with 6...
# general
e
Querying question: When we query our table (with 6million rows) and aggregate (sum/avg) metrics over dimensions we get very good performance, usually ~250ms. If we use DATETIMECONVERT and group by month and year we suddenly see a jump to about 1sec or even more. Our solution is to add year and month as dimensions with an ingestion transform, is there a better way?
r
group by relies on there being a dictionary (a mapping from each distinct value to an integer dictionary code) to be fast, because it makes computing the coordinates fast, but when you use a transform function like
datetimeconvert
there is no longer a dictionary to rely on, so one needs to be materialised on the fly, which slows the evaluation down
we recently developed a feature specifically for
datetimeconvert
which pre-materialises the function at various granularites (hour, day, week etc.) precisely so group by queries can be accelerated
the timestamp index is available on master and in the latest docker builds if you want to try it
e
any documentation/links about the timestamp index?
r
e
I'm running the latest image now, the index is not listed in the gui, does that mean it isnt available or is the gui just lagging?
r
just isn't available in the gui
you can configure it manually
e
I don't understand the configuration instructions:
Copy code
Pinot will pre-generate one column per time granularity with forward index and range index. The naming convention is $${ts_column_name}$${ts_granularity}, e.g. Timestamp column ts with granularities DAY, MONTH will have two extra columns generated: $ts$DAY and $ts$MONTH.
I simply put a sorted-column: ["$ts$MONTH"] in my indexconfig ?
r
@User can you help with configuration please?
this is quite a raw feature so the documentation isn't quite there yet
e
nice! we got it working by setting
Copy code
"fieldConfigList": [
      {
        "name": "ts",
        "encodingType": "DICTIONARY",
        "indexType": "TIMESTAMP",
        "indexTypes": [
          "TIMESTAMP"
        ],
        "timestampConfig": {
          "granularities": [
            "DAY",
            "WEEK",
            "MONTH",
            "YEAR"
          ]
        }
      }
    ],
In the table config. Getting really good performance on time queries now!
Thx for the quick help.
r
nice
time filters should be faster too
j
Getting really good performance on time queries now!
back to the original 250ms or even better?
x
Thanks @User ! @User the perf should be on par with extract to a separated column
🙏 1
e
Yes, @User we are seeing the same performance as when using group by on dimension columns like @User said, back to ~250ms
👍 1
Hi again @Xiang Fu! using the :latest docker image our timestamp indexes seem to have stopped working. When we started using them we noticed that datetrunc would return a string instead of an epochmillis timestamp only on tables which had a timestamp index, now we are getting epochmillis on those tables and the performance is slower than expected (compared to group-by queries on dimension fields).
Is this datetrunc behaviour a part of the timestamp index function or was it just a coincidence? Is there any way to see whether or not a timestamp index is enabled? They are not yet supported by the ui.
x
sure, let me check
When did you pull the latest image?
e
I can't find the exact time but the digest is "3abe6ae8dbafb30ebf8f9056bdcb6ae1d038cadfa8fee260c0a0eab23f6130ca". Must have pulled it late last week.
x
I see, did you try the latest one?
It could be that broker override doesn't happen
This is actually a bug that ts index got dropped during segment reloading, I will fix it and publish new docker image that you can try later today or tomorrow.
e
Cool, thanks for the quick fix! Are indexes stored in the segment files on disk? Are indexes re-created when we use "reload all segments" in the gui?
x
yes
index should be rewritten
you can check local disk on data directory
under metadata.properies you should find columns like
$ts$DAY
etc
which indicate the timestamp index
e
My fiendConfigList is not recognized when I create tables using the docker image tagged 0.11.0-SNAPSHOT-438c53b-20220523 Not seeing anything interesting the logs relating to this, any tips on how to debug it? field config in table: (based on https://github.com/apache/pinot/blob/master/pinot-tools/src/main/resources/examples/batch/airlineStats/airlineStats_offline_table_config.json)
Copy code
"fieldConfigList": [
    {
      "name": "ts",
      "encodingType": "DICTIONARY",
      "indexTypes": [
        "TIMESTAMP"
      ],
      "timestampConfig": {
        "granularities": [
          "DAY",
          "WEEK",
          "MONTH",
          "YEAR"
        ]
      }
    }
  ],
The result from /tables
Copy code
"unrecognizedProperties": {
    "/instanceAssignmentConfigMap": null,
    "/tierConfigs": null,
    "/tableIndexConfig/bloomFilterConfigs": null,
    "/tableIndexConfig/starTreeIndexConfigs": null,
    "/tableIndexConfig/segmentPartitionConfig": null,
    "/tableIndexConfig/columnMinMaxValueGeneratorMode": null,
    "/fieldConfigList": null,
    "/ingestionConfig/filterConfig": null,
    "/segmentsConfig/retentionTimeUnit": null,
    "/segmentsConfig/retentionTimeValue": null,
    "/segmentsConfig/crypterClassName": null,
    "/segmentsConfig/peerSegmentDownloadScheme": null,
    "/query/timeoutMs": null,
    "/upsertConfig": null,
    "/routing/segmentPrunerTypes": null,
    "/routing/instanceSelectorType": null,
    "/task": null,
    "/quota/storage": null,
    "/quota/maxQueriesPerSecond": null
  },
x
Can you try to create the table then edit the fieldConfigList ? We have a pinot quickstart to test the table creation for airlineStats, if that one passed then the validation shouldn’t fail.
e
If I edit in in the web gui it is accepted but I still cannot get it to work through the http api.
and looking at the generated segmens metadata.properties it looks like the timestamp index has been successfully created
x
got it
maybe something related to request validation, @Rong R in case you have time, also cc: @Jackie
r
Looks like the unrecognizable config has a
/
at the beginning. Did you escape the JSON payload correctly?