https://pinot.apache.org/ logo
#general
Title
# general
e

erik bergsten

04/22/2022, 2:28 PM
Querying question: When we query our table (with 6million rows) and aggregate (sum/avg) metrics over dimensions we get very good performance, usually ~250ms. If we use DATETIMECONVERT and group by month and year we suddenly see a jump to about 1sec or even more. Our solution is to add year and month as dimensions with an ingestion transform, is there a better way?
r

Richard Startin

04/22/2022, 2:35 PM
group by relies on there being a dictionary (a mapping from each distinct value to an integer dictionary code) to be fast, because it makes computing the coordinates fast, but when you use a transform function like
datetimeconvert
there is no longer a dictionary to rely on, so one needs to be materialised on the fly, which slows the evaluation down
we recently developed a feature specifically for
datetimeconvert
which pre-materialises the function at various granularites (hour, day, week etc.) precisely so group by queries can be accelerated
the timestamp index is available on master and in the latest docker builds if you want to try it
e

erik bergsten

04/22/2022, 2:41 PM
any documentation/links about the timestamp index?
r

Richard Startin

04/22/2022, 2:48 PM
e

erik bergsten

04/22/2022, 2:50 PM
I'm running the latest image now, the index is not listed in the gui, does that mean it isnt available or is the gui just lagging?
r

Richard Startin

04/22/2022, 2:51 PM
just isn't available in the gui
you can configure it manually
e

erik bergsten

04/22/2022, 2:57 PM
I don't understand the configuration instructions:
Copy code
Pinot will pre-generate one column per time granularity with forward index and range index. The naming convention is $${ts_column_name}$${ts_granularity}, e.g. Timestamp column ts with granularities DAY, MONTH will have two extra columns generated: $ts$DAY and $ts$MONTH.
I simply put a sorted-column: ["$ts$MONTH"] in my indexconfig ?
r

Richard Startin

04/22/2022, 2:58 PM
@User can you help with configuration please?
this is quite a raw feature so the documentation isn't quite there yet
e

erik bergsten

04/22/2022, 3:12 PM
nice! we got it working by setting
Copy code
"fieldConfigList": [
      {
        "name": "ts",
        "encodingType": "DICTIONARY",
        "indexType": "TIMESTAMP",
        "indexTypes": [
          "TIMESTAMP"
        ],
        "timestampConfig": {
          "granularities": [
            "DAY",
            "WEEK",
            "MONTH",
            "YEAR"
          ]
        }
      }
    ],
In the table config. Getting really good performance on time queries now!
Thx for the quick help.
r

Richard Startin

04/22/2022, 3:21 PM
nice
time filters should be faster too
j

Johan Adami

04/22/2022, 6:18 PM
Getting really good performance on time queries now!
back to the original 250ms or even better?
x

Xiang Fu

04/22/2022, 7:29 PM
Thanks @User ! @User the perf should be on par with extract to a separated column
🙏 1
e

erik bergsten

04/23/2022, 7:06 AM
Yes, @User we are seeing the same performance as when using group by on dimension columns like @User said, back to ~250ms
👍 1
Hi again @Xiang Fu! using the :latest docker image our timestamp indexes seem to have stopped working. When we started using them we noticed that datetrunc would return a string instead of an epochmillis timestamp only on tables which had a timestamp index, now we are getting epochmillis on those tables and the performance is slower than expected (compared to group-by queries on dimension fields).
Is this datetrunc behaviour a part of the timestamp index function or was it just a coincidence? Is there any way to see whether or not a timestamp index is enabled? They are not yet supported by the ui.
x

Xiang Fu

05/10/2022, 4:49 PM
sure, let me check
When did you pull the latest image?
e

erik bergsten

05/10/2022, 6:15 PM
I can't find the exact time but the digest is "3abe6ae8dbafb30ebf8f9056bdcb6ae1d038cadfa8fee260c0a0eab23f6130ca". Must have pulled it late last week.
x

Xiang Fu

05/10/2022, 7:11 PM
I see, did you try the latest one?
It could be that broker override doesn't happen
This is actually a bug that ts index got dropped during segment reloading, I will fix it and publish new docker image that you can try later today or tomorrow.
e

erik bergsten

05/13/2022, 2:11 PM
Cool, thanks for the quick fix! Are indexes stored in the segment files on disk? Are indexes re-created when we use "reload all segments" in the gui?
x

Xiang Fu

05/13/2022, 7:37 PM
yes
index should be rewritten
you can check local disk on data directory
under metadata.properies you should find columns like
$ts$DAY
etc
which indicate the timestamp index
e

erik bergsten

05/23/2022, 2:54 PM
My fiendConfigList is not recognized when I create tables using the docker image tagged 0.11.0-SNAPSHOT-438c53b-20220523 Not seeing anything interesting the logs relating to this, any tips on how to debug it? field config in table: (based on https://github.com/apache/pinot/blob/master/pinot-tools/src/main/resources/examples/batch/airlineStats/airlineStats_offline_table_config.json)
Copy code
"fieldConfigList": [
    {
      "name": "ts",
      "encodingType": "DICTIONARY",
      "indexTypes": [
        "TIMESTAMP"
      ],
      "timestampConfig": {
        "granularities": [
          "DAY",
          "WEEK",
          "MONTH",
          "YEAR"
        ]
      }
    }
  ],
The result from /tables
Copy code
"unrecognizedProperties": {
    "/instanceAssignmentConfigMap": null,
    "/tierConfigs": null,
    "/tableIndexConfig/bloomFilterConfigs": null,
    "/tableIndexConfig/starTreeIndexConfigs": null,
    "/tableIndexConfig/segmentPartitionConfig": null,
    "/tableIndexConfig/columnMinMaxValueGeneratorMode": null,
    "/fieldConfigList": null,
    "/ingestionConfig/filterConfig": null,
    "/segmentsConfig/retentionTimeUnit": null,
    "/segmentsConfig/retentionTimeValue": null,
    "/segmentsConfig/crypterClassName": null,
    "/segmentsConfig/peerSegmentDownloadScheme": null,
    "/query/timeoutMs": null,
    "/upsertConfig": null,
    "/routing/segmentPrunerTypes": null,
    "/routing/instanceSelectorType": null,
    "/task": null,
    "/quota/storage": null,
    "/quota/maxQueriesPerSecond": null
  },
x

Xiang Fu

05/23/2022, 8:07 PM
Can you try to create the table then edit the fieldConfigList ? We have a pinot quickstart to test the table creation for airlineStats, if that one passed then the validation shouldn’t fail.
e

erik bergsten

05/24/2022, 2:17 PM
If I edit in in the web gui it is accepted but I still cannot get it to work through the http api.
and looking at the generated segmens metadata.properties it looks like the timestamp index has been successfully created
x

Xiang Fu

05/24/2022, 10:32 PM
got it
maybe something related to request validation, @Rong R in case you have time, also cc: @Jackie
r

Rong R

05/25/2022, 12:12 AM
Looks like the unrecognizable config has a
/
at the beginning. Did you escape the JSON payload correctly?