Hi Pinot team, what do you recommend to optimize a...
# general
q
Hi Pinot team, what do you recommend to optimize a range search on a “date” column? Specifically, we have a column
created_at
which records the epoch time in seconds, we have a use case needs to search for entries created in the last 30 days, 60 days, 90 days, etc. I wonder if there is a best practice I could follow to optimize this, for example: 1. What type is best for this
created_at
column? Timestamp, Long, etc.? 2. What kind of indexing would help? range index? 3. anything else could help? Thanks in advance!
k
our typical recommendation is to start with no indexes.. try out the query first and looking at the response metadata, we can guide you on the indexing.. You can add/remove indexes dynamically after data is ingested
q
@User Thanks. What data type do you suggest for
created_at
for this kind of use case considering we need to query something like “in the past 30 days”? we could achieve it by using epoch time in seond like
where created_at > timestamp_30_days_before and created_at < current_timestamp
. However I wonder if there is a better way?
k
For a similar situation we use an int field that represents days. And if the segment data is sorted by this field, you’ll automatically get a sorted index, which should make range queries fast.
q
the number of days since 1970-1-1?
@User ^
k
yes
If you need timezone-aware query by days then resolution would have to be in hours, not days (for most timezones, other than whacky timezones like Nepal (15 minutes off) 🙂
q
very interesting. Thanks very much!
k
please use long or int.. my recommendation is to have it milliseconds (long) but round it to nearest day/hour etc
y
Does segment pruner take this into account by default or it needs some config?
k
pruner uses it automatically
y
👍
q
Thanks very much!