https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • a

    Alex

    02/17/2020, 7:51 PM
    @User what do you mean?
  • k

    Kishore G

    02/17/2020, 7:54 PM
    I meant we dont allow generating star-tree index on the fly. There is no reason to do this other than Star-tree index generation is a bit resource intensive and might impact query performance while index is being generated
  • a

    Alex

    02/17/2020, 7:56 PM
    yep, that is what I though. I'm trying to figure out automation that will be required to do those updates, and currently I'm thinking that if we need to update star tree we will run a Segment generation job (Spark or Flink based one) and upload those new segments into Pinot cluster. Does it sound like a ok idea?
  • x

    Xiang Fu

    02/17/2020, 8:00 PM
    yes^^
  • a

    Alex

    02/17/2020, 8:02 PM
    great! So, which one is a better idea -> use some external dataset to do it (hadoop) or use existent Pinot segment files?
  • k

    Kishore G

    02/17/2020, 8:02 PM
    There is something called as Minion
  • k

    Kishore G

    02/17/2020, 8:03 PM
    It can be used for doing these tasks.. it’s used for running GDPR related tasks or optimize segments etc
  • k

    Kishore G

    02/17/2020, 8:03 PM
    Take a look at that. You can add startree index gen task to it
  • a

    Alex

    02/17/2020, 8:14 PM
    I thought minions are deprecated, is it a good idea to use them?
  • k

    Kishore G

    02/17/2020, 9:19 PM
    Yes. We will add more and more tasks to minion
  • m

    Mayank

    02/17/2020, 9:21 PM
    Curious, where you got that idea about minions being deprecated. If it’s a documentation issue, we should fix that.
  • a

    Alex

    02/17/2020, 9:32 PM
    nope, I think somebody mentioned it at some point
  • a

    Alex

    02/17/2020, 9:48 PM
    ok, if we use minions -> what is the right way to use them on kubernetes? Just have a cronjob that launches a specific task? Any good examples on github we can check?
  • k

    Kishore G

    02/17/2020, 9:50 PM
    Minion uses Helix Task Framework. @User can you share the design doc and some examples for Minion
  • a

    Alex

    02/17/2020, 9:51 PM
    so, minions always run?
  • m

    Mayank

    02/17/2020, 9:51 PM
    Yes
  • a

    Alex

    02/17/2020, 9:54 PM
    what is the trigger to execute the work?
  • m

    Mayank

    02/17/2020, 9:54 PM
    IIRC, the tasks are defined using Helix task framework
  • k

    Kishore G

    02/17/2020, 9:55 PM
    the trigger can be manual or scheduled
  • k

    Kishore G

    02/17/2020, 9:55 PM
    or it can also be based on another resource
  • k

    Kishore G

    02/17/2020, 9:55 PM
    for e.g. you can configure it to run some task whenever a new segment is uploaded
  • a

    Alex

    02/17/2020, 9:56 PM
    oh, that is good to have
  • k

    Kishore G

    02/17/2020, 9:56 PM
    or you can run something every day at 12.00 mid night
  • k

    Kishore G

    02/17/2020, 9:56 PM
    or you can even write your own task generator
  • k

    Kishore G

    02/17/2020, 9:57 PM
    the good thing about minion framework is it abstracts out all the common things needed to perform some action on a segment
  • k

    Kishore G

    02/17/2020, 9:58 PM
    it can download segment, upload segment, do the bookkeeping etc and also provide pinot segment readers etc
  • a

    Alex

    02/17/2020, 10:00 PM
    nice. will need to dig into this. On kube it feels like a waste to run infra which just sits idle when there are no tasks. Would be great to provision pods on demand. Is it possible today?
  • k

    Kishore G

    02/17/2020, 10:08 PM
    no, but that will be a great enhancement and Pinot has the primitives to achieve that
    👍 1
  • k

    Kishore G

    02/17/2020, 10:29 PM
    feel free to start an issue around this, I will add my thoughts and provide some pointers
  • t

    Ting Chen

    02/18/2020, 8:12 PM
    Does Pinot do early stop when enough results have been already collected? We have queries of form "SELECT * FROM table WHERE userID='H' AND sourceEventTimestamp>=t1 AND sourceEventTimestamp<=t2 ORDER BY sourceEventTimestamp DESC LIMIT 500". The table has been sorted by sourceEventTimestamp and userID has inverted index. I notice that the selectivity of the query is low (meaning many rows passing the condition). So the first 500 results should be collected relatively quick. But the exec times are long i.e., > 10s.
1...111112113...160Latest