https://pinot.apache.org/ logo
Join Slack
Powered by
# segment-cold-storage
  • n

    Noah Prince

    11/02/2020, 4:55 PM
    So I’d need to “hack” that to only show the ones that have actually materialized.
  • n

    Neha Pawar

    11/02/2020, 4:58 PM
    the unloading could also be done by the same periodic task which is responsible for moving from tier to tier (assuming you are planning to use the tiered storage design). Based on some flag in the tier config, for storageType=deepStore
    n
    m
    • 3
    • 7
  • n

    Neha Pawar

    11/02/2020, 4:59 PM
    will help in reducing 1 component
  • k

    Kishore G

    11/02/2020, 5:08 PM
    lets do a zoom call?
  • n

    Noah Prince

    11/02/2020, 5:15 PM
    I’m down. When?
  • k

    Kishore G

    11/02/2020, 5:22 PM
    3 pm pst?
  • n

    Noah Prince

    11/02/2020, 5:23 PM
    Earlier might be better, if you’re available
  • k

    Kishore G

    11/02/2020, 5:25 PM
    1:30 pm pst
  • n

    Noah Prince

    11/02/2020, 5:26 PM
    Works for me
  • s

    Slackbot

    11/02/2020, 5:31 PM
    This message was deleted.
    n
    m
    +2
    • 5
    • 7
  • n

    Noah Prince

    11/03/2020, 7:19 PM
    Might be a little more difficult than I had originally imagined. There’s really two entry points to downloading a segment,
    SegmentFetcherAndLoader
    and
    RealtimeTableDataManager
    . Unifying those two seems like it may be difficult, as the realtime use case has some backup peer downloading.
  • j

    Jackie

    11/03/2020, 8:05 PM
    The peer downloading should be applicable to both offline and realtime (might not be the case right now)
  • j

    Jackie

    11/03/2020, 8:06 PM
    And all segment download should be handled within the same class
  • n

    Noah Prince

    11/03/2020, 8:21 PM
    Yeah, appears it is not handled that way now. The only way it knows the uri for the deep store download is from a realtime specific metadata class
  • n

    Noah Prince

    11/03/2020, 8:21 PM
    This bit of the codebase could use a refactor, but I’m not sure I have the time
  • n

    Noah Prince

    11/06/2020, 10:38 PM
    Neat, got the lazy loading working. Going to add some metrics and clean this up a bit.
    🍷 1
    🎉 1
    👍 1
    😲 1
  • n

    Noah Prince

    11/06/2020, 11:56 PM
    Ah, also need to make it download direct from s3 instead of the controller. That's gonna put load on the controller for a big query
  • k

    Kishore G

    11/07/2020, 12:22 AM
    That’s already supported
  • k

    Kishore G

    11/07/2020, 12:22 AM
    It depends on how segment is pushed
  • k

    Kishore G

    11/07/2020, 12:23 AM
    If you use uri based push, it automatically downloads it from S3
  • x

    Xiang Fu

    11/07/2020, 12:41 AM
    @User you can try to use
    jobType: SegmentCreationAndMetadataPush
    in the spec.yaml file
  • x

    Xiang Fu

    11/07/2020, 12:43 AM
    if your output directory is already s3
    Copy code
    # outputDirURI: Root directory of output segments, expected to have scheme configured in PinotFS.
    outputDirURI: 's3://<your-bucket>/pinot/<table>/segments'
  • n

    Noah Prince

    11/07/2020, 12:54 AM
    I think server download from s3 is only done for real time tables.
  • x

    Xiang Fu

    11/07/2020, 1:02 AM
    it’s both 🙂
  • x

    Xiang Fu

    11/07/2020, 1:02 AM
    you also need to configure pinot fs on server as well
  • x

    Xiang Fu

    11/07/2020, 1:02 AM
    https://docs.pinot.apache.org/users/tutorials/use-s3-as-deep-store-for-pinot#start-server
  • x

    Xiang Fu

    11/07/2020, 1:04 AM
    this tutorial uses segment uri push, since you already on newest version, segment metadata push will be much better
    n
    • 2
    • 2
  • n

    Noah Prince

    11/07/2020, 2:06 AM
    Yep, can verify with the SegmentCreationAndMetadataPush there’s no data that goes through the controller, downloads straight from s3. Of course,
    select * limit 1
    causes it to download all of the segments lol. Really going to need that time based segment pruner. Also will need an option to limit the number of segments a query can hit, made an issue here
  • n

    Noah Prince

    11/09/2020, 9:58 PM
    https://github.com/apache/incubator-pinot/pull/6250 The PR for this feature Dunno if it’s just the permissions, or github’s user interface has changed for the worst since I last used it --but I can’t figure out how to add reviewers or labels.
    🎉 1
  • k

    Kishore G

    11/09/2020, 11:00 PM
    I think only committers can do that