https://pinot.apache.org/ logo
Join Slack
Powered by
# dhill-date-seg
  • k

    Kishore G

    05/03/2020, 7:21 PM
    let me summarize • Presto has datetrunc udf • Pinot has datetimeconvert udf • When you query via presto you can use either datetrunc or datetimeconvert, when you use datetrunc presto-pinot connector converts it into datetimeconvertudf automatically • When you query via pinot, you can only use datetimeconvertudf
  • d

    Dan Hill

    05/03/2020, 7:23 PM
    Cool
  • d

    Dan Hill

    05/03/2020, 7:31 PM
    Are there any performance tips related to doing date/timestamp filters with Pinot? I want to make sure I don't do weird things in Presto that hurts latency.
  • d

    Dan Hill

    05/03/2020, 7:31 PM
    I'm looking at TimeZone support.
  • k

    Kishore G

    05/03/2020, 7:45 PM
    yes, its better to use epoch
  • k

    Kishore G

    05/03/2020, 7:45 PM
    while storing data in Pinot
  • k

    Kishore G

    05/03/2020, 7:45 PM
    and if possible do the tz conversion in client side
  • d

    Dan Hill

    05/03/2020, 8:05 PM
    How do I modify the time in date_trunc? I run the following query to do a report in UTC but I think date_trunc is rendering in LosAngeles. The Pinot docs seem out of sync with the code.
    Copy code
    presto> select platform_id, date_trunc('day',"timestamp") as "date", sum(clicks) as clicks, sum(impressions) as impressions, sum(cost_usd_micros) as cost_usd_micros from pinot.default.events_testing where platform_id = 1 and "timestamp" >= from_iso8601_timestamp('2020-02-01T00:00:00.000Z') and "timestamp" < from_iso8601_timestamp('2020-02-03T00:00:00.000Z') group by platform_id, date_trunc('day',"timestamp") order by date asc, platform_id asc limit 101;
     platform_id |          date           | clicks | impressions | cost_usd_micros 
    -------------+-------------------------+--------+-------------+-----------------
               1 | 2020-01-31 00:00:00.000 |      0 |           1 |               0 
               1 | 2020-02-01 00:00:00.000 |      1 |           1 |           85791 
               1 | 2020-02-02 00:00:00.000 |      0 |           5 |               0
    https://docs.pinot.apache.org/users/user-guide-query/pinot-query-language https://github.com/prestodb/presto/blob/78f24709b80f17bc7b7f6a0317cd9ee0515c638e/presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotAggregationProjectConverter.java#L100
  • d

    Dan Hill

    05/03/2020, 8:07 PM
    I'll probably support a couple levels of timezones. I think when we chatted previously, it made sense to introduce other date columns in the appropriate timezone.
  • d

    Dan Hill

    05/03/2020, 11:58 PM
    How does the time field interact with the star-tree-index? Should I include the time field in my index like any other field? If I only need date-level data externally, should my extra timestamp columns actually just have the local dates in the timezones I need?
  • k

    Kishore G

    05/03/2020, 11:59 PM
    if there is a udf on the time column, star-tree will be de-activated at query time
  • k

    Kishore G

    05/04/2020, 12:00 AM
    yes, include the time field as it
  • k

    Kishore G

    05/04/2020, 12:00 AM
    better to create multiple time columns to get benefit of star tree
  • k

    Kishore G

    05/04/2020, 12:00 AM
    time_ms, time_h, time_d, time_w, time_m
  • d

    Dan Hill

    05/04/2020, 12:02 AM
    Cool, sounds good. How about an int for day? E.g. 20200503?
  • k

    Kishore G

    05/04/2020, 12:05 AM
    yeah, that can work
  • k

    Kishore G

    05/04/2020, 12:06 AM
    my suggestion is to use ms (long) for everything
  • k

    Kishore G

    05/04/2020, 12:06 AM
    but round it to nearest day
  • d

    Dan Hill

    05/04/2020, 12:39 AM
    Interesting. Why?
  • d

    Dan Hill

    05/04/2020, 12:39 AM
    Local date can then be represented using an int32.
  • k

    Kishore G

    05/04/2020, 12:42 AM
    Eventually, we plan to add intelligence in query layer to use the right column
  • k

    Kishore G

    05/04/2020, 12:43 AM
    Based on the query and granularity requested, we can pick the right column
  • d

    Dan Hill

    05/04/2020, 12:43 AM
    Ah, I see.
  • k

    Kishore G

    05/04/2020, 12:44 AM
    Having them in same format (ms) will help achieve this
  • d

    Dan Hill

    05/04/2020, 12:47 AM
    Makes sense. It'll probably be a month before I improve TimeZone support (supporting multiple levels). I'll sync up then.
  • k

    Kishore G

    05/04/2020, 12:50 AM
    Okay
  • k

    Kishore G

    05/27/2020, 5:18 PM
    @User has left the channel
  • d

    Dan Hill

    05/27/2020, 7:03 PM
    @User has left the channel
  • n

    Neha Pawar

    08/11/2020, 4:54 PM
    @User has left the channel
  • x

    Xiang Fu

    12/07/2020, 11:51 PM
    @User has left the channel