One more question, folks: when it comes to segment...
# general
d
One more question, folks: when it comes to segments of ~200M in size, what segment storage technology would you recommend using when running a cluster in AWS? HDFS? S3? EFS mounted?
k
Ebs
m
Yes, for local storage attached to serving nodes you can use EBS. For deep store you can use S3.
k
@User - you can also use HDFS for deep store.
@User do you know of any Pinot performance comparisons of EBS vs local SSDs?
k
Nothing in a presentable form.
d
Thanks guys, but which one of those options do you think that gives us the best performance, say, in a scenario of having something like up to 10T in data?
k
What’s your qps and latency expectation
The only options are local ssd or ebs or efs
S3 hdfs options are only applicable to deepstore which is a backup segment store and will not be accessed during query time
d
QPS up to a few dozens at max, latency can be seconds but preferably under 1 minute. Thanks for the info, man!
m
Yeah, you definitely don’t need local SSD for this. As Kishore mentioned, any of the options for network attached disk on serving node will work.
d
Ah, awesome, thank you guys!
m
Since the latency is not too tight, you might want to pack a lot of data per instance, so EBS for serving nodes seems good. For deep store, S3 or HDFS both work (S3 is more popular in my personal experience).
d
Got it. I'll take that into consideration, and also probably go for S3 for the deep store backups (since we already use it a lot for other things)
👍 1
a
If you want latency in seconds ideally, have you considered Hudi?
d
Not really; I'm not sure what role that would play when integrated to Pinot however