https://pinot.apache.org/ logo
Join Slack
Powered by
# minion-star-tree
  • k

    Kishore G

    03/27/2020, 8:42 PM
    Objective: Generate star tree on demand so high level design • Create a task generator (StarTreeIndexTaskGenerator) in controller • OnEveryTableConfig and/or periodically it gets the metadata of segment from server and compares it with tableConfig • If there is a difference, it downloads the segment, invokes SegmentGenerationJob and pushes the segment back • Trigger segment reload
  • n

    Neha Pawar

    03/27/2020, 8:45 PM
    why is
    trigger segment reload
    needed at the end?
  • x

    Xiang Fu

    03/27/2020, 8:46 PM
    Question, so we don’t allow server to create startree index on the fly, it’s all coming from controller?
  • n

    Neha Pawar

    03/27/2020, 8:46 PM
    also, why doesn't star tree work like inverted indexes, wherein just updating table config and reloading works?
  • k

    Kishore G

    03/27/2020, 8:47 PM
    Copy code
    why is trigger segment reload needed at the end?
  • k

    Kishore G

    03/27/2020, 8:47 PM
    probably not needed if the CRC changes
  • k

    Kishore G

    03/27/2020, 8:47 PM
    i thought its needed bcos the segment name remains the same and servers dont download the new segments
  • k

    Kishore G

    03/27/2020, 8:48 PM
    Copy code
    also, why doesn't star tree work like inverted indexes, wherein just updating table config and reloading works?
  • k

    Kishore G

    03/27/2020, 8:49 PM
    It can but it is resource intensive and might impact query performance
  • k

    Kishore G

    03/27/2020, 8:49 PM
    Copy code
    Question, so we don't allow server to create startree index on the fly, it's all coming from controller?
  • k

    Kishore G

    05/23/2020, 3:34 PM
    @User has left the channel
  • b

    Buchi Reddy

    08/31/2020, 11:27 PM
    Huh.. Seems like the message history in this channel is lost. Is this archived somewhere by any chance? I’m interested in knowing more about the minion usage and its stability.
  • b

    Buchi Reddy

    08/31/2020, 11:27 PM
    any design docs or wiki is fine too
  • n

    Neha Pawar

    08/31/2020, 11:33 PM
    @User ^^ any design doc you can share about minion?
  • j

    Jackie

    08/31/2020, 11:59 PM
    @User Are you planning to implement some customized task for minion?
  • j

    Jackie

    09/01/2020, 12:02 AM
    AFAIK it is only used in LinkedIn right now to help purge records for GDPR compliance. It is quite stable and is powering the production use cases.
  • j

    Jackie

    09/01/2020, 12:04 AM
    FYI, that part of the code is not open-sourced because it integrates with the LinkedIn internal services.
  • j

    Jackie

    09/01/2020, 12:07 AM
    You may refer to this test
    SimpleMinionClusterIntegrationTest
    to understand how to add new task for minion.
    b
    • 2
    • 1
  • b

    Buchi Reddy

    09/01/2020, 12:36 AM
    Yes @User The use case is almost like I want to purge data based on the value of a column.
  • j

    Jackie

    09/01/2020, 12:52 AM
    Oh, actually we have the executor open-sourced:
    PurgeTaskExecutor
  • j

    Jackie

    09/01/2020, 12:53 AM
    Where you can register your own
    RecordPurgerFactory
  • j

    Jackie

    09/01/2020, 12:57 AM
    Unfortunately we don't have doc on this part yet. We can try to make it work, then use it as an example to add the doc, or make a post
  • b

    Buchi Reddy

    09/01/2020, 2:03 AM
    okay. thanks. I just got this idea and wanted to explore. Seems like it’s possible. We may or may not implement immediately but will definitely ping here if we’re proceeding with impl. appreciate the help
  • n

    Neha Pawar

    03/29/2021, 9:25 PM
    @User has left the channel