Hi! We are checking the realtime ingestion with up...
# general
r
Hi! We are checking the realtime ingestion with upsert and we have some questions around it • Can we have a retention period of say 6 months? • Is there a significant impact on this for the upsert logic? • If we add new servers, are the partitions correctly spread to the new servers?
k
Yes. You can have a retention of 6 months. Don’t see a significant impact on the performance. make sure you provision the servers accordingly and possibly over partition the Kafka topic Yes the rebalance command will take care of that
r
Hi! Thanks a lot for your reply! Are there some guidelines over the server provisioning? I'm assuming you are talking mostly about disk space since the segments will be bound to a server in order to easily update that segment if a new event for a given key arrives. From a test we were running, it seems that Pinot can return more than one row for a given key even with updates enabled. Is that the right behaviour or is it a bug? Is it expected that the update event is returned until the "merge" takes place or should it be asynchronous? In the tests, we were sending events with repeated keys, and when doing a
select count(*), count(distinct key)
the values were different, but were converging if we repeated the query
k
That should not happen, most likely the stream is not partitioned properly
Try adding $hostName , $segmentName in the query
And paste the response
r
I'll check the topic configuration, but regarding this test we were still running on a single server
k
Can you share the query and response
r
It appears there was an error with our configuration of the table, which I was still unaware of. All is good after all :)
Regarding this
I'm assuming you are talking mostly about disk space since the segments will be bound to a server in order to easily update that segment if a new event for a given key arrives.
is this assumption correct?
k
mind sharing what was wrong with table config
disk space and also memory, I think the key Map is stored in memory as of now for performance reasons @User @User can confirm this
y
yes, the pk is stored in mem for lookup
j
The map from primary key to record location are stored in heap memory (ConcurrentHashMap)
k
is there plan to move this offheap?
j
Not in short term. We need the concurrency primitives provided by the ConcurrentHashMap. We maintain one map per partition, so should be fine as long as the cardinality of the primary key is not too high