https://pinot.apache.org/ logo
Join Slack
Powered by
# minion-improvements
  • l

    Laxman Ch

    05/27/2021, 10:56 PM
    100 to 500 MB for REALTIME right. How about OFFLINE segments?
  • l

    Laxman Ch

    05/27/2021, 11:00 PM
    is it same for both?
  • n

    Neha Pawar

    05/27/2021, 11:08 PM
    yes, same for both
  • l

    Laxman Ch

    05/27/2021, 11:44 PM
    okay. thank you
  • l

    Laxman Ch

    05/28/2021, 8:11 PM
    • All in production now. Conversion is at good pace. • After several rounds of tuning, 15 days of REALTIME data (total 90days) got converted to OFFLINE in last 24 hours. • By mid/end of next week all data should be migrated. 🤞 Early next I will try my Purger task implementation. Thanks @User @User @User @User for all your help and support.
    👍 2
  • l

    Laxman Ch

    05/28/2021, 8:25 PM
    I have few observations noted for this task and task framework in general. • Lack of monitoring and alerting. This task and task framework doesn’t expose any status/failure/progress metrics. Have to manually monitor this. In case if any reason, conversion gets stuck, there is no way user will know directly. Task status also says SUCCEEDED even when 1 out 5 tables failed. • Conversion task implementation heavily memory intensive. Disk+memory based implementation will be helpful. For most of our tables, I had to configure 1 hour as the bucket period due to this limitation. • Data gap larger than the bucket completely stalls the task. https://github.com/apache/incubator-pinot/issues/6988 • Default null value vector information is lost in the conversion. Its a major BLOCKER for tables enabled with
    nullValueHandling
    z
    • 2
    • 9
  • n

    Neha Pawar

    05/28/2021, 8:32 PM
    whats the size of the data per day in realtime table?
    l
    • 2
    • 21
  • j

    Jackie

    05/28/2021, 8:34 PM
    Great feedbacks! We will track these issues and fix them shortly
  • d

    Deepak Mishra

    07/16/2021, 6:33 AM
    Hello Everyone
  • d

    Deepak Mishra

    07/16/2021, 6:34 AM
    How to start a minion using apache minon using config file. It is not mentioned in the doc
  • x

    Xiang Fu

    07/16/2021, 7:50 PM
    have you checked the minion quickstart
  • x

    Xiang Fu

    07/16/2021, 7:51 PM
    and the sample table config which contains the sample table config
  • d

    Deepak Mishra

    07/19/2021, 5:28 AM
    I have check minion quickstart batch but don’t find how to start minion manually
    l
    • 2
    • 11
  • l

    Laxman Ch

    01/20/2022, 7:30 PM
    Hi Folks, We are exploring upsert feature in Pinot. Have few questions around this. Please help me to understand the feature. 1. We are using managed offline flow with 2 days as the buffer time which means they get converted to OFFLINE segments after 2 days. However our REALTIME segments rollup at every 1 hour/partition. Does the upsert can handle any update within this 2 days time period? 2. How is this handled in managed offline flow. Does these multiple update records for same row gets merged to single row? 3. I'm going through the design documents available here. But for one document access is closed. Can you please provide access
    • 1
    • 1
  • l

    Lee Wei Hern Jason

    03/08/2023, 8:22 AM
    Hello Team, we are exploring the managed offline flow in Pinot. I was wondering how do people monitor the RealtimeToOffline task E.g. 1. ensure that there are no data gaps between the Realtime and Offline tables ? 2. How many segments are dropped if my buffer Time is too low.
    • 1
    • 1
  • m

    Mayank

    03/08/2023, 2:29 PM
    @Seunghyun @Haitao Zhang @manish
  • m

    Monika reddy

    12/09/2024, 10:01 PM
    Hi all,
  • m

    Monika reddy

    12/09/2024, 10:04 PM
    Hi All, Can we upload segments directly to the deep store for an offline table via MInions. So far I haven't seen any document that claims this. I tried setting "push.mode"="METADATA", to SegmentGenerationAndPush task, but I see in the logs its still picking up as push.mode=TAR. I do see in Startree doc there is FileIngestionTask, which has a config push.mode=METADATA. I still haven't tried this, but does this config only support offline tables and not Minions?
  • m

    Mayank

    12/10/2024, 1:00 AM
    Not sure what you mean, offline ingestion is why minions in this case, so the config applies.
    but does this config only support offline tables and not Minions?
  • m

    Mayank

    12/10/2024, 2:54 AM
    Also, adding @Manish
  • m

    Monika reddy

    12/10/2024, 2:28 PM
    Yes, in opensource if I set push.mode="METADATA" in SegmentGenerationAndPush task, I see in the logs it didn't change and the mode was push.mode=TAR. Is there any other type of task I can use?
  • m

    Monika reddy

    12/10/2024, 2:30 PM
    There could be 2 reasons. 1. Either the config is not working for SegmentGenerationAndPush
  • m

    Monika reddy

    12/10/2024, 2:32 PM
    2. The problem is in the logging, the logs describe as push.mode=TAR, and request sent to upload segment includes controller URL. If the case is 2nd, is there a way I can confirm, segments are directly uploaded by minions and not via controller.
  • m

    Manish

    12/10/2024, 3:11 PM
    Monika, Did you specify this in the SegmentGenerationAndPush taskConfig "push.mode": "METADATA"
  • m

    Manish

    12/10/2024, 3:14 PM
    The task expects the outputDir also to be specified for the pushMode to work. Minion will push segments to the output dir in deep store and send the metadata to the controller.
    Copy code
    "outputDirURI": <Location of outputdir>
  • m

    Monika reddy

    12/10/2024, 3:18 PM
    I tried setting push.mode in SegmentGenerationAndPush taskConfig once and next time in batch configmaps, and didn't work. However, yes I didn't set outputDirURI, let me try with that. Thank, you!
  • m

    Monika reddy

    12/10/2024, 6:56 PM
    I tried with outputDirUri and it worked. But have a few doubts, please advise. 1. In Slack, I read outputDirURI is only to store temporary segments, not for the deep store is it correct? 2. If the outputDirUri is temporary, I was still able to see all the segments. 3. For Deep Store, the location would be picked from pinot-server config, but from this test, I noticed, that no segments were uploaded in the path given in the pinot-server config file. 4. In the minion logs, I noticed that a. at first segments got created in the pinot-server machines at location /pinot/tmp/pinotMinion/data/SegmentGenerationAndPushResult, b. copied from this tmp to pushed to location (outputDirURI), and c. moved from outputDir to pinot/tmp/segmentMetdata d. Untarred from /segmentmetadata and pushed to controller. e. Pushing segment logs says sends an API request: <controlleruri>:9000/v2/segments/tableName?tableType. I see the message "successfully uploaded the segment" from SegmentPushUtils. Why did it again pushed to controller when its set to METADATA?
  • m

    Manish

    12/12/2024, 4:27 PM
    1. outputDirUri is actually the deep store location (location you want segments to be uploaded to) 2. This is permanent location 3. For some reason SegmentGenerationAndPushTask is not using the deep store location from controller.conf and instead expects user to provide that location as outputDirUri in the taskConfig. I need to check why thats the case. 4. Thanks for closely following logs 🙂. Here minion pushed the segment to deep store, extracted the metadata and pushed only metadata to the controller . Its not pushing the segment itself to the controller.
  • m

    Manish

    12/12/2024, 4:32 PM
    I do see other tasks like mergeRollUp, realtimeToOffline pick up the data dir configured on the controller (which is the deep store setup) and not explicitly request for outputDir from the user.
  • m

    Monika reddy

    12/23/2024, 7:46 PM
    Thank you so much, just got back from vacation.