Kishore G
Kishore G
Kishore G
Kishore G
Kishore G
Subbu Subramaniam
05/18/2020, 8:11 PMFlavio Junqueira
06/10/2020, 12:30 PMSegmentIterator
. The idea is the following:
• We obtain all iterators via BatchClientFactory.getSegments(...)
• getSegments
returns a StreamSegmentsIterator
, which iterates over SegmentRange
instances, one for each stream segment
• SegmentRange
contains the information we need to define a split
• A split can be read using BatchClientFactory.readSegment(...)
, which takes a SegmentRange
instance that we can build from the split information
This should be simple to implement as we can distribute the split load by sending the segment range data to workers.
A couple of relevant references:
https://github.com/pravega/pravega/blob/master/client/src/main/java/io/pravega/client/BatchClientFactory.java
https://github.com/pravega/pravega/blob/master/client/src/main/java/io/pravega/client/batch/SegmentRange.javaKishore G
Subbu Subramaniam
06/10/2020, 5:38 PMSubbu Subramaniam
06/10/2020, 5:39 PMFlavio Junqueira
06/11/2020, 3:20 PMFlavio Junqueira
12/17/2020, 10:17 AMPartitionGroupMetadata
in the document, and I thought the description was too long to do it as a comment in the document, but if you prefer like that, then I can raise it in the document directly.
PartitionGroupMetadata
has a groupID
attribute, which I don't understand how it maps to a group of shards/segments. For instance, say that I'm able to list segments via the iterator that getSegments
returns:
StreamSegmentsIterator getSegments(Stream stream, StreamCut fromStreamCut, StreamCut toStreamCut);
https://github.com/pravega/pravega/blob/master/client/src/main/java/io/pravega/client/BatchClientFactory.java#L70
How do Pinot map those segments to a partition group? Also, such a list of segments is flat, meaning that it is not preserving a predecessor-successor order of segments. When you have scaling, there is a natural order of segments that needs to be followed to avoid breaking per-key order. The event stream API in Pravega makes such a guarantee, but the batch API doesn't. How do you do it for Kinesis?Kishore G
Flavio Junqueira
12/17/2020, 8:18 PMits upto the implementation on how to map segments to partition groupwhere would you implement this logic? in the implementation of the
PartitionGroupMetadata
interface?
also, could you please comment on this:
such a list of segments is flat, meaning that it is not preserving a predecessor-successor order of segments. When you have scaling, there is a natural order of segments that needs to be followed to avoid breaking per-key order. The event stream API in Pravega makes such a guarantee, but the batch API doesn't. How do you do it for Kinesis?
Kishore G
PartitionGroupMetadata
interface?
• yesKishore G
such a list of segments is flat, meaning that it is not preserving a predecessor-successor order of segments. When you have scaling, there is a natural order of segments that needs to be followed to avoid breaking per-key order. The event stream API in Pravega makes such a guarantee, but the batch API doesn't. How do you do it for Kinesis?we provide the current segments to the interface, so the implementation should not return the new segments until the old segments have reached EOF
Flavio Junqueira
12/17/2020, 8:23 PMFlavio Junqueira
12/17/2020, 8:24 PMKishore G
interesting, that's essentially the logic that the event stream reader implements, which I can't use because it requires direct access to the segments. it feels like pravega is being punished for doing the right thing because others don't do it...😞 I think there are challenges with both models. Its basically comes down to exposing the right API
Kishore G
Flavio Junqueira
12/17/2020, 8:47 PMSubbu Subramaniam
12/17/2020, 9:02 PMsegment
same as a kinesis shard
(or, kafka partition
)? Just so I get the terms rightFlavio Junqueira
12/17/2020, 9:11 PMFlavio Junqueira
12/17/2020, 10:58 PMKishore G
Kishore G
Kishore G
Kishore G
Kishore G
Flavio Junqueira
12/18/2020, 12:57 PMone consumer consumer from pravega and then writes to PinotI'm not sure which suggestion was that, can you elaborate?
we plan to add a write api soon
may be we should integrate with Pravega once we have write API?How will the write API change the picture? To restate my concerns: • Dealing with scaling segments/shards/partitions is difficult and it is not a difficulty introduced by Pravega specifically, it is inherent to a stream changing its parallelism dynamically. The current API changes proposed seem to expect the plugin implementation to both expose segments individually so that they can be arranged in groups at table creation time and to deal transparently with the order of segments/shards/partitions. • It might add to the end-to-end latency if we process events in batches by requesting data between a start and an end. I wonder if that will affect the latency perceived by real-time queries. At least for Pravega, we would be able to address those concerns in the case we use the event stream API. This API requires the use of either coarse-grained checkpointing or opaque position objects, and the assignment of segments isn't deterministic in the case of rollbacks. I believe these are the main contention points. One concern on the Pinot side is that you need Pinot segments to be deterministically written. I'm wondering if we added a hint to the event saying from which segment it is coming from would help to achieve this goal in the absence of a deterministic schedule. A segment is owned by a single reader at a time and that reader is responsible for adding to that segment. If the segment changes ownership, then the change happens at checkpoint time. At a checkpoint, all readers in the group receive a special event indicating the checkpoint. Would something like this work?