Hello All, We need your inputs as we are thinking...
# general
k
Hello All, We need your inputs as we are thinking about the big features to add in Pinot. we have avoided implementing joins in Pinot and have always referred folks to use Presto/Spark to achieve joins on top of Pinot. However, we are seeing contributions from Uber on lookup join and requests from users to support native join support in Pinot. Is this something that will benefit existing users of Pinot. How do you handle joins • 1️⃣ We dont need it since we pre-join the data before pushing it to Pinot • 2️⃣ We use Presto/Trino and we are happy with Presto/Trino • 3️⃣ We would LOVE to see Pinot support JOIN Please vote
1️⃣ 1
3️⃣ 9
2️⃣ 2
j
It's hard to refuse such a feature (from a user's perspective) but surely if it was not done earlier, there must exist strong reasons "against" it - so I wonder what cost this feature would incur (e.g. in terms of other features being pushed back) Anyway, I get that your question is only on how people are doing it right now 🙂
k
https://poll.ly/#/P7J1obwA we typically follow this list which we published in Dec 2020. The community has doubled since then, may be we should do another survey to include the new users of Pinot
j
Ah interesting, thanks for sharing
k
more than other features, we are concerned about designing join in bits and pieces.. lookup join, subquery, colocated joins, window functions, equality joins, inner joins only .. etc IMO, it's better to design for next few years but implementation can be in bits and pieces
j
I think I get what you mean, that would also help in having a more standard & consistent query language rather than having related but separate smaller features But that would make planning and initial development a lot more challenging I guess..
k
Joins would be great, but as @Jonathan Meyer said, “versus what else?“. And I’d respond with a
1
, since we denormalize via a Flink workflow, but it would significantly reduce our data footprint if we weren’t replicating lots of data between rows because of denormalization. Also agree that there are lots of possible definitions for what it means to “support joins”.
And what about my personal favorite, removing the Zookeeper requirement? 🙂
k
Pinot depends on Helix which needs ZK. it will be a big undertaking and no one has complained about ZK.
Is your concern about ZK or using a central config store. in other words, you prefer etcd over ZK or you are referring to removal of central metadata store
s
@Ken Krugler I am also curious to know what your motivation is behind removing zookeeper requirements. What about zookeeper is the issue that you see?
k
@Subbu Subramaniam In my experience it’s been hard for ops teams to have a stable, performant Zookeeper cluster. E.g. we’re wrestling with an issue now where if we do a metadata push of a 1000 segments, the Zookeeper cluster goes down (there’s a weird file permission error when writing to the WAL that shows up in the ZK logs). It looks like one problem was that the Pinot cluster was only configured with one of the three ZK servers, but still…
The odd thing is that over the years I’ve asked ops people I run into at conferences about ZK, and it seems like 50% say its no problem, and 50% hate it with a passion
s
thanks for clarification. Metadata push of a 1000 segments simultaneously is something that we have not attempted. It is usually a few segments at a time. But, we have seen Helix's use of Zk to be a bottleneck when there are 1000s of tables (each with 1000s or 10s of 1000s of segments), especially during server deployment.
k
I know that Ververica now ships their platform with an HA mode (only for k8s) that removes the need for ZK
s
Not familiar with Ververica. I will look it up
k
Ververica is the main company behind Flink - so it’s their Flink platform, where HA mode means maintaining state across multiple Job Managers
Cassandra uses Paxos (and their Gossip protocol, I guess) vs. Zookeeper
y
which version of pinot do you use? i added a patch in 0.6 to cap the throughputs on zk activities to address issues like this at uber
k
0.7.1
How can I adjust that cap? Seems interesting…
the default is set to a large number
and at uber, we set a a much smaller one
k
OK - any suggestions for what to use, for a small (3 node) ZK cluster with non-SSD drives? This is for our beta cluster. I’m thinking 1K 🙂
y
1k might be too low, you can try 10k
it’s more a like rate limiter
k
I saw the lengthy discussion on your PR. Wondering if as per “BTW, Helix team does suggest to use throttling to controller the number of messages. For the number of the threshold, it is up to Pinot team to discuss.” there are other settings we should also adjust
E.g.
jute.maxbuffer
, if we’re using a lower value for max messages
y
we set it to a large number at uber
40MB
k
So which of the configs (controller, broker, server) should get the updated
pinot.helix.instance.messages.max
setting? Oh, wait, the name got changed to
pinot.helix.instance.state.maxStateTransitions
. Looks like it’s only used in the controller
y
Yes