What kind of index should I use if I have datetime...
# general
t
What kind of index should I use if I have datetime string column to enable faster ranged query? Will a ranged index help?
x
range index only works for numbers.
for date string, you can try to have a separated column to convert the datestring to seconds or days value
also if you have the data partitioned by date for segments, then range query should mostly fall into to the individual segment
t
I see. But the problem for our use case is, superset time column can only be datetime, that’s why we converted epoch_ms to datetime during ingestion.
x
oh?
I thought superset can have epoch_ms and epoch_s?
something like:
t
For temporal column, it has to datetime or datetime like.
I guess a long type can’t work as temporal?
x
you can check mark
is temporal
this is one example
message has been deleted
this is the query superset generated:
t
It works, the description on superset is a little misleading, I thought it must be datetime type.
👍 1
Why using epoch_ms time column is much slower than a datetime string column?
The same superset cost 3675 ms using epoch_ms column, but only cost 575 ms using a datetime string column.
x
hmm, do you have the query generated for the graph?
it’s possible that the group by transformation make the computation complicated
t
The only difference for the query is the time filter.
Copy code
SELECT count(DISTINCT device_sn) AS count_1
FROM pad_interaction
WHERE "time" >= 1620705960000
  AND "time" < 1620706560000
  AND company_name = 'xxx'
  AND action_value = 'mqtt_ping'
  AND action_message = 'success'
  AND ((regexp_like(device_sn, '^WAL.*')))
ORDER BY count(DISTINCT device_sn) DESC;
x
oh? can you paste the queries?
t
Copy code
SELECT count(DISTINCT device_sn) AS count_1
FROM pad_interaction
WHERE ds >= '2021-05-11 12:03:00'
  AND ds < '2021-05-11 12:13:00'
  AND company_name = 'xxx'
  AND action_value = 'mqtt_ping'
  AND action_message = 'success'
  AND ((regexp_like(device_sn, '^WAL.*')))
ORDER BY count(DISTINCT device_sn) DESC
x
hmm
this query seems only hit the most recent data
it’s weird that there are some big difference
t
numEntriesScannedInFilter difference is huge, datetime string has only 9370339 entries scanned, epoch_ms has 262577613 entries scanned.
x
oh, because it’s ms not seconds
then add a second epoch value column should help
also,
do you have any sort index?
t
No, no index is configured.
x
ic, then adding inverted index may help as well for columns in filtering
t
Thanks, I will add some inverted index.
k
Why not range?
t
Can range index used for string columns? I didn’t see any description in the documentation.
x
I feel sortIndex on time and inverted index on company_name, action_value, action_message may help
t
inverted index can’t apply on multivalue columns?
x
inv index works for multi-value column