https://pinot.apache.org/ logo
Join SlackCommunities
Powered by
# general
  • y

    Young Seok (Tony) Kim

    07/18/2022, 11:14 PM
    [Question] Hi, I’ve configured Apache Pinot with deep store connected to Google Cloud Storage. Does this mean that some cold (less frequently used) segments will be persisted in GCS while hot segments will be served as sort of “cache” in Pinot Servers? I’m curious whether • All the segments are distributed in Pinot servers OR • Only frequently used segments are cached in Pinot servers while unused segments are stored in Deep store (Such as GCS/S3/Azure Data Lake Storage / HDFS) I’m asking this because if we add more and more data, I was concerned whether the number of nodes always increase.
    p
    • 2
    • 12
  • e

    Eaugene Thomas

    07/19/2022, 6:05 AM
    Hi , I was going through https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing#partitioning & had this doubt . From my understanding pinot segments are partitioned timestamp based . But the above doc mentions the segments can be partitioned based on a particular dimensions . Not clear with this part of timestamp based partitioning between segments & dimension based partitioning between segments . Can any one explain more clearly on that ? Thanks in Advance !
    k
    • 2
    • 7
  • y

    Yarden Rokach

    07/19/2022, 11:27 AM
    Another Pinot meetup is coming on July 30 in Bangalore 🇮🇳 more info at the #C03N1JNHXLY channel 🔥
    🔥 5
  • y

    Yarden Rokach

    07/20/2022, 3:36 PM
    This is a conferences alert 📣 Speaking opportunities are available at the #C03N1JNHXLY channel. I'm here for any questions you have ❤️ Have a lovely day everyone !
    🚀 1
  • p

    Priyank Bagrecha

    07/20/2022, 8:30 PM
    this link seems to be broken now. it was working until yesterday.
  • p

    Priyank Bagrecha

    07/20/2022, 8:35 PM
    https://docs.pinot.apache.org/v/release-0.9.0/users/tutorials/ingest-parquet-files-from-s3-using-spark works instead
    👀 1
  • y

    Young Seok (Tony) Kim

    07/20/2022, 10:55 PM
    Hi, this might be related to above issue, but it seems https://docs.pinot.apache.org/ is entirely not available. Is it just me?
  • x

    Xiang Fu

    07/20/2022, 10:56 PM
    Yes, there is some dns issue with
    <http://apahce.org|apahce.org>
    domain that we are fixing
  • x

    Xiang Fu

    07/20/2022, 10:56 PM
    please use : https://apache-pinot.gitbook.io/latest for now
    👍 3
    p
    • 2
    • 2
  • y

    Young Seok (Tony) Kim

    07/20/2022, 10:57 PM
    Thanks for providing alternative! 🙂
  • s

    Sudharsan Kannan

    07/21/2022, 3:58 AM
    Team, https://docs.pinot.apache.org/ is not accessible
    m
    • 2
    • 7
  • m

    Mohit S

    07/22/2022, 11:05 AM
    Hey Everyone! Just getting started with Pinot. Is there any example on how to use custom decoder during stream ingestion? I am following this example https://docs.pinot.apache.org/basics/getting-started/pushing-your-streaming-data-to-pinot. My data is in custom binary format. Looks like I have to implement
    org.apache.pinot.spi.stream.StreamMessageDecoder
    . Any reference code example would be helpful?
    m
    • 2
    • 1
  • t

    Tim Berglund

    07/22/2022, 9:40 PM
    Hey, folks! StarTree just opened up a new Slack workspace today. Here’s the blog post I wrote outlining what we did and why we did it.
  • t

    Tim Berglund

    07/22/2022, 9:40 PM
    Basically, we need a place to talk about StarTree-ish things in addition to Pinot-ish things, and it’s not okay for us, a vendor, to do that in an Apache-branded Slack workspace like this one. We’ll all still be here helping folks new to the Apache Pinot™ community learn and solve problems, but we’ll be intentional about keeping StarTree-specific conversations out of this workspace and in the StarTree one.
  • t

    Tim Berglund

    07/22/2022, 9:40 PM
    That said, ours is still a non-commercial, community space. Please head over and join if it sounds interesting. It will look and feel a lot like this one in terms of the general lack of product pitches, other than our shared enthusiasm for Apache Pinot and the growing category of analytical queries that run really really fast. 🙂
  • t

    Tim Berglund

    07/22/2022, 9:40 PM
    Just go to https://stree.ai/slack to join. StarTree people will occasionally mention this if it’s a more appropriate venue to address a question; otherwise, we won’t be doing a lot more promoting of it here.
  • t

    Tim Berglund

    07/22/2022, 9:40 PM
    Lemme know if you have questions!
    dancingcharmander 6
    🍷 16
    🆒 3
    🔥 8
  • s

    Sukesh Boggavarapu

    07/26/2022, 9:42 PM
    I have a hybrid table. What tasks should I create in order to have both daily and hourly rollups?
    p
    • 2
    • 6
  • s

    Sukesh Boggavarapu

    07/26/2022, 9:43 PM
    A
    RealtimeToOfflineSegmentsTask
    will generate segments from my real time table and create offline segments.
  • s

    Sukesh Boggavarapu

    07/26/2022, 9:43 PM
    Copy code
    "RealtimeToOfflineSegmentsTask": {
            "bucketTimePeriod": "1h",
            "bufferTimePeriod": "2h",
            "roundBucketTimePeriod": "1m",
            "mergeType": "rollup",
            "revenue.aggregationType": "sum",
            "maxNumRecordsPerSegment": "100000"
          }
  • s

    Sukesh Boggavarapu

    07/26/2022, 9:44 PM
    So that configuration will create an
    hourly rollup
    and gets added to my offline table of the hybrid tables.
  • s

    Sukesh Boggavarapu

    07/26/2022, 9:44 PM
    Can I also do a daily rollup here in
    RealtimeToOfflineSegmentsTask
    ?
  • s

    Sukesh Boggavarapu

    07/26/2022, 9:45 PM
    Or should I create a
    MergeRollupTask
    in the offline table?
  • s

    Sukesh Boggavarapu

    07/26/2022, 9:45 PM
    Copy code
    "MergeRollupTask": {
            "1hour.mergeType": "rollup",
            "1hour.bucketTimePeriod": "1h",
            "1hour.bufferTimePeriod": "3h",
            "1hour.maxNumRecordsPerSegment": "1000000",
            "1hour.maxNumRecordsPerTask": "5000000",
            "1hour.maxNumParallelBuckets": "5",
            "1day.mergeType": "rollup",
            "1day.bucketTimePeriod": "1d",
            "1day.bufferTimePeriod": "1d",
            "1day.roundBucketTimePeriod": "1d",
            "1day.maxNumRecordsPerSegment": "1000000",
            "1day.maxNumRecordsPerTask": "5000000",
            "metricColA.aggregationType": "sum",
            "metricColB.aggregationType": "max"
          }
  • s

    Sukesh Boggavarapu

    07/26/2022, 9:46 PM
    What does that do? It creates both hourly and daily segments for the same table?
  • s

    Sukesh Boggavarapu

    07/26/2022, 9:46 PM
    So, in total would I need both
    RealtimeToOfflineSegmentsTask
    and
    MergeRollupTask
    ?
    n
    d
    • 3
    • 13
  • s

    Sukesh Boggavarapu

    07/26/2022, 9:46 PM
    Thanks
  • m

    Mugdha Goel

    07/27/2022, 3:18 PM
    Hello, I am using a gs bucket as my deepstore and I also have a RealtimeToOfflineSegmentsTask setup to convert realtime segments to offline segments . I would like to store only offline segments in gs and not the realtime segments, because reading realtime segments from gs is causing an issue for some of my tables. Where could I find the configuration for specifically storing only offline tables in gs?
    l
    k
    • 3
    • 28
  • s

    Sukesh Boggavarapu

    07/28/2022, 12:27 AM
    Hi, How do we replace offline segments created by the
    RealtimeToOfflineSegmentsTask
    /
    MergeRollupTask
    if we ever want to?
  • s

    Sukesh Boggavarapu

    07/28/2022, 12:28 AM
    Like if we want to replace the segments 30 days ago in the offline tables, how do we go about doing it? Because I am not sure if the segment names created by the ``RealtimeToOfflineSegmentsTask` /
    MergeRollupTask
    would match with the segment names created by the offline batch ingestion job.
1...474849...160Latest