Hello All We need your inputs as we are thinking about the b Apache Pinot #general

Hello All, We need your inputs as we are thinking...

Kishore G

07/01/2021, 6:28 PM

Hello All, We need your inputs as we are thinking about the big features to add in Pinot. we have avoided implementing joins in Pinot and have always referred folks to use Presto/Spark to achieve joins on top of Pinot. However, we are seeing contributions from Uber on lookup join and requests from users to support native join support in Pinot. Is this something that will benefit existing users of Pinot. How do you handle joins • 1️⃣ We dont need it since we pre-join the data before pushing it to Pinot • 2️⃣ We use Presto/Trino and we are happy with Presto/Trino • 3️⃣ We would LOVE to see Pinot support JOIN Please vote

1️⃣ 1

3️⃣ 9

2️⃣ 2

Jonathan Meyer

07/01/2021, 6:31 PM

It's hard to refuse such a feature (from a user's perspective) but surely if it was not done earlier, there must exist strong reasons "against" it - so I wonder what cost this feature would incur (e.g. in terms of other features being pushed back) Anyway, I get that your question is only on how people are doing it right now 🙂

Kishore G

07/01/2021, 6:35 PM

https://poll.ly/#/P7J1obwA we typically follow this list which we published in Dec 2020. The community has doubled since then, may be we should do another survey to include the new users of Pinot

Jonathan Meyer

07/01/2021, 6:36 PM

Ah interesting, thanks for sharing

Kishore G

07/01/2021, 6:39 PM

more than other features, we are concerned about designing join in bits and pieces.. lookup join, subquery, colocated joins, window functions, equality joins, inner joins only .. etc IMO, it's better to design for next few years but implementation can be in bits and pieces

Jonathan Meyer

07/01/2021, 6:40 PM

I think I get what you mean, that would also help in having a more standard & consistent query language rather than having related but separate smaller features But that would make planning and initial development a lot more challenging I guess..

Ken Krugler

07/01/2021, 6:43 PM

Joins would be great, but as @Jonathan Meyer said, “versus what else?“. And I’d respond with a

, since we denormalize via a Flink workflow, but it would significantly reduce our data footprint if we weren’t replicating lots of data between rows because of denormalization. Also agree that there are lots of possible definitions for what it means to “support joins”.

Ken Krugler

07/01/2021, 6:44 PM

And what about my personal favorite, removing the Zookeeper requirement? 🙂

Kishore G

07/01/2021, 6:49 PM

Pinot depends on Helix which needs ZK. it will be a big undertaking and no one has complained about ZK.

Kishore G

07/01/2021, 6:50 PM

Is your concern about ZK or using a central config store. in other words, you prefer etcd over ZK or you are referring to removal of central metadata store

Subbu Subramaniam

07/01/2021, 7:23 PM

@Ken Krugler I am also curious to know what your motivation is behind removing zookeeper requirements. What about zookeeper is the issue that you see?

Ken Krugler

07/01/2021, 8:04 PM

@Subbu Subramaniam In my experience it’s been hard for ops teams to have a stable, performant Zookeeper cluster. E.g. we’re wrestling with an issue now where if we do a metadata push of a 1000 segments, the Zookeeper cluster goes down (there’s a weird file permission error when writing to the WAL that shows up in the ZK logs). It looks like one problem was that the Pinot cluster was only configured with one of the three ZK servers, but still…

Ken Krugler

07/01/2021, 8:06 PM

The odd thing is that over the years I’ve asked ops people I run into at conferences about ZK, and it seems like 50% say its no problem, and 50% hate it with a passion

Subbu Subramaniam

07/01/2021, 8:09 PM

thanks for clarification. Metadata push of a 1000 segments simultaneously is something that we have not attempted. It is usually a few segments at a time. But, we have seen Helix's use of Zk to be a bottleneck when there are 1000s of tables (each with 1000s or 10s of 1000s of segments), especially during server deployment.

Ken Krugler

07/01/2021, 8:10 PM

I know that Ververica now ships their platform with an HA mode (only for k8s) that removes the need for ZK

Subbu Subramaniam

07/01/2021, 8:10 PM

Not familiar with Ververica. I will look it up

Ken Krugler

07/01/2021, 8:11 PM

Ververica is the main company behind Flink - so it’s their Flink platform, where HA mode means maintaining state across multiple Job Managers

Ken Krugler

07/01/2021, 8:11 PM

See https://docs.ververica.com/user_guide/application_operations/deployments/configuration.html#kubernetes

Ken Krugler

07/01/2021, 8:12 PM

Cassandra uses Paxos (and their Gossip protocol, I guess) vs. Zookeeper

Yupeng Fu

07/01/2021, 8:44 PM

which version of pinot do you use? i added a patch in 0.6 to cap the throughputs on zk activities to address issues like this at uber

Ken Krugler

07/01/2021, 8:45 PM

0.7.1

Ken Krugler

07/01/2021, 8:45 PM

How can I adjust that cap? Seems interesting…

Yupeng Fu

07/01/2021, 8:46 PM

https://github.com/apache/incubator-pinot/pull/5631

Yupeng Fu

07/01/2021, 8:46 PM

the default is set to a large number

Yupeng Fu

07/01/2021, 8:47 PM

and at uber, we set a a much smaller one

Ken Krugler

07/01/2021, 8:47 PM

OK - any suggestions for what to use, for a small (3 node) ZK cluster with non-SSD drives? This is for our beta cluster. I’m thinking 1K 🙂

Yupeng Fu

07/01/2021, 8:51 PM

1k might be too low, you can try 10k

Yupeng Fu

07/01/2021, 8:51 PM

it’s more a like rate limiter

Ken Krugler

07/01/2021, 8:52 PM

I saw the lengthy discussion on your PR. Wondering if as per “BTW, Helix team does suggest to use throttling to controller the number of messages. For the number of the threshold, it is up to Pinot team to discuss.” there are other settings we should also adjust

Ken Krugler

07/01/2021, 8:52 PM

E.g.

jute.maxbuffer

, if we’re using a lower value for max messages

Yupeng Fu

07/01/2021, 8:54 PM

we set it to a large number at uber

Yupeng Fu

07/01/2021, 8:54 PM

40MB

Ken Krugler

07/01/2021, 10:21 PM

So which of the configs (controller, broker, server) should get the updated

pinot.helix.instance.messages.max

setting? Oh, wait, the name got changed to

pinot.helix.instance.state.maxStateTransitions

. Looks like it’s only used in the controller

Yupeng Fu

07/02/2021, 12:35 AM

Yes

Open in Slack

Previous Next