https://pinot.apache.org/ logo
Join Slack
Powered by
# multiple_streams
  • k

    Kishore G

    08/11/2020, 5:52 PM
    hopefully šŸ˜‰
  • k

    Kishore G

    08/11/2020, 5:58 PM
    Screen Shot 2020-08-11 at 10.57.56 AM.png
  • a

    Anthony Tran

    08/11/2020, 5:59 PM
    haha thanks kishore. lucidchart?
  • k

    Kishore G

    08/11/2020, 6:00 PM
    draw.io
  • f

    Fra Costa

    08/11/2020, 10:54 PM
    @User is deleting rows (or more in general segments) something that Pinot supports? I found this config parameter for retention of deleting segments, but I can’t operatively understand how do you go about doing it In other words is there a way in which I can delete a set of rows between 2 points in time?
  • k

    Kishore G

    08/11/2020, 11:04 PM
    We don’t support that
  • k

    Kishore G

    08/11/2020, 11:05 PM
    You can delete at segment granularity
  • f

    Fra Costa

    08/12/2020, 12:20 AM
    Anthony pointed out that the offline rows win over the live, my question on delete was really to address that at serving and it seems that there is no need of deleting
  • f

    Fra Costa

    08/12/2020, 12:21 AM
    thanks for your response!
  • k

    Kishore G

    08/12/2020, 12:30 AM
    ah got it
  • a

    Anthony Tran

    08/12/2020, 12:53 AM
    @User as an FYI ^
  • f

    Fra Costa

    08/18/2020, 1:12 AM
    @User I have a question around the input connector for Realtime tables. Kafka has a contract of
    at least once
    how does Pinot preserves reporting integrity in case Kafka outputs the same element twice into the datastore? Thanks in advance
  • k

    Kishore G

    08/18/2020, 1:29 AM
    Kafka does not output into Pinot
  • k

    Kishore G

    08/18/2020, 1:30 AM
    Pinot pulls from Kafka and we ensure that every event is consumed only once
  • k

    Kishore G

    08/18/2020, 1:30 AM
    @User talk at Kafka Summit will talk about this in detail
  • f

    Fra Costa

    08/18/2020, 1:49 AM
    I understand, great to hear, so we have one less thing to worry about šŸ™‚
  • k

    Kishore G

    08/18/2020, 1:53 AM
    yes. we ensure consistency across replicas as well.
  • f

    Fra Costa

    08/19/2020, 11:33 PM
    @User sorry to bother you again, but I would like to ask one more clarification around the duality between real-time and offline. As we mentioned with an hybrid table setup at query time if an entry appears in both stores the offline ones take precedence • How is the ā€œcollisionā€ determined? Is there some sort of identifier column in the table schema that governs that? • When setting up real-time and offline, my understanding is that real-time has some sort of window associated with it: 1. What happens when that period passes? Are entires purged by the real time datastore? 2. If an instance exists in which entries are consolidated and moved from real time to offline, how the same key aspect is dealt with? Are real time entries dismissed in favor of the existing offline (if any)? I apologize if some of these questions are already addressed in the documentation, happy to read relevant sections in case I missed them Thanks,
  • n

    Neha Pawar

    08/19/2020, 11:35 PM
    Regarding hybrid table and which data takes precedence, this might help: https://docs.pinot.apache.org/basics/components/broker (time boundary section)
  • n

    Neha Pawar

    08/19/2020, 11:38 PM
    I’m not sure what you mean by ā€œwindowā€ of a realtime table. We have a concept of retention. This can be configured for both realtime and offline tables. If the data in the table becomes older than the retention, it is deleted
  • f

    Fra Costa

    08/19/2020, 11:40 PM
    Thanks Neha, I am going to read that and reply after
  • f

    Fra Costa

    08/19/2020, 11:45 PM
    That page perfectly explains the first question, it’s done on the time series dimension, there’s no key on single object involved, makes sense. As for the retention, yes I was referring to that seems like that is a different concept than the behavior I was worring about. I guess the only question left is if there is instances in which Pinot independently consolidate Realtime segments into the offline ones, I have a vague memory of reading something about it, but not 100% sure
  • f

    Fra Costa

    08/19/2020, 11:46 PM
    If that is the case I am basically trying to understand if the Offline data is replaced by the newly ā€œconsolidatedā€ realtime segments
  • n

    Neha Pawar

    08/19/2020, 11:46 PM
    as of now, the offline data needs to be populated by you, with your own offline jobs setup.
  • n

    Neha Pawar

    08/19/2020, 11:46 PM
    but,
  • n

    Neha Pawar

    08/19/2020, 11:47 PM
    we have a project ongoing, which will move segments from realtime to offline table - https://docs.google.com/document/d/1-e_9aHQB4HXS38ONtofdxNvMsGmAoYfSnc2LP88MbIc/edit#
  • n

    Neha Pawar

    08/19/2020, 11:47 PM
    though i dont know if that’ll help in your case, since you need the accurate data merged with the realtime data
  • f

    Fra Costa

    08/20/2020, 12:06 AM
    Thanks Neha, in my case it would actually hurt us
  • f

    Fra Costa

    08/20/2020, 12:06 AM
    I was more watching out for that happening under the cover
  • f

    Fra Costa

    08/20/2020, 12:07 AM
    so in that regard we are good, thank you very much
    šŸ‘ 2