Hello team, I am trying to run the Realtime Provis...
# general
p
Hello team, I am trying to run the Realtime Provisioner for one of my tables with the following config:
RealtimeProvisioningHelper -tableConfigFile /Users/prashant.pandey/table_config.json -numPartitions 4 -pushFrequency null -numHosts 12 -numHours 2 -sampleCompletedSegmentDir /Users/prashant.pandey/segment_dir -ingestionRate 4750 -maxUsableHostMemory 10G -retentionHours 24
The segment is around 426M in size. But this returns the following:
Copy code
Note:

* Table retention and push frequency ignored for determining retentionHours since it is specified in command
* See <https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime>
2022/02/22 11:41:31.825 INFO [RealtimeProvisioningHelperCommand] [main] 
Memory used per host (Active/Mapped)

numHosts --> 12              |
numHours
 2 --------> NA              |
2022/02/22 11:41:31.826 INFO [RealtimeProvisioningHelperCommand] [main] 
Optimal segment size

numHosts --> 12              |
numHours
 2 --------> NA              |
2022/02/22 11:41:31.826 INFO [RealtimeProvisioningHelperCommand] [main] 
Consuming memory

numHosts --> 12              |
numHours
 2 --------> NA              |
2022/02/22 11:41:31.827 INFO [RealtimeProvisioningHelperCommand] [main] 
Total number of segments queried per host (for all partitions)

numHosts --> 12              |
numHours
 2 --------> NA              |
Class transformation time: 0.271994872s for 4134 classes or 6.579459893565553E-5s per class
Why am I getting
N/A
s? Is the config incorrect?
We found why this happened. The problem was that our retention period is 7 days, but we move segments to OFFLINE servers under 3h. I was configuring the retention to be 7 days due to which
if (activeMemoryPerHostBytes <= _maxUsableHostMemory)
in
MemoryEstimator.java
was evaluating to be
false
.
m
so did you have to update your table config to get this working?
s
1. @User this means that you don't have enough active memory to host all your mem requirements for 24h. You can run the command with higher memory and see what it reports. It will give you a report of mapped vs raw memory as well (which means data is pulled from disk by OS whenever needed). If you are ok with that, then you may be fine with the existing memory/numHosts. Otherwise, you need to increase something. Just to get an idea, you can always run the command with higher memory and more number of hosts (you can give multiple values) and see where you stand.
@User not sure why table config needs to change?
m
Ah I dunno, was just asking what Prashant had changed to get it to work.
p
@User Yes actually had to reduce retention from 7 days to 3h in our table config. This was done was segments are stored in realtime servers only for some time, and then are moved to OFFLINE servers. The program actually uses what’s in the supplied config over what’s supplied in the program args. So this 24h was actually moot and not used - It was using full 7 days as retention period as was present in the config@User. I think we can document this special case, and also that retention period specified in the config takes precedence over the one supplied in prog. args.
s
@User that's not the case. Look at the code in RTProvHelper Command where it uses the value: https://github.com/apache/pinot/blob/master/pinot-tools/src/main/java/org/apache/pinot/tools/admin/command/RealtimeProvisioningHelperCommand.java#L227 If _retentionHours is provided as a command argument, it ignores the table config retention.
And subbu is right. When you get NA, it means the memory is not enough. So use a large number for maxUsableHostMemory parameter so you'll see how much memory you'll need
p
@User You’re right, I got this totally incorrect. Apologies for the unnecessary noise on this, this works as documented.
👍 2