what s a good channel to discuss more system platform design Apache Pinot #general

what's a good channel to discuss more system / pla...

Edwin Law

08/05/2022, 2:13 AM

what's a good channel to discuss more system / platform design oriented questions? here? https://www.startree.ai/blogs/real-time-analytics-at-scale-solving-the-trade-off-problem read this last night, i have questions - i am intrigued but skeptical that things will pan out well with the approach proposed here where we eschew data modelling / preparation and just rely on indexing in Pinot

👀 2

Edwin Law

08/05/2022, 2:21 AM

also, do people actually do this in practice? how do you get around not having join support? Presto / Trino?

Edwin Law

08/05/2022, 2:58 AM

or i Could be reading this wrong. in that what the article is saying is that you still need to have all those data models, raw, pre-agg, pre-cube but just store them all in Pinot.

Edwin Law

08/05/2022, 2:58 AM

image.png

Edwin Law

08/05/2022, 3:03 AM

Or is the right interpretation - ingest raw data, build indexes and those will serve as your pre-agg, pre-cube layers

Mayank

08/05/2022, 3:23 AM

Hey @Edwin Law the article gives more of an overall view. It depends on your use case on what you want to optimize for (happy to help in this regard as well).

Mayank

08/05/2022, 3:24 AM

Pinot has lookup join today (where you want to do dimension looksup). We are working on a multi-stage query execution engine, with which joins will be supported. In the interim, yes Presto/Trino is the recommendation.

Mayank

08/05/2022, 3:25 AM

No, you don’t need to have all data models, you pick the one that works for you the best.

Edwin Law

08/05/2022, 3:26 AM

So over here at Grab, we have a super extensive data lake with tons of data, most work is done there. but data is as usual not that fresh, even though we're trying to bring in things like Hudi / Delta to help with improving that

Mayank

08/05/2022, 3:26 AM

What’s the end use case for which you think you need something like Pinot?

Edwin Law

08/05/2022, 3:26 AM

We're looking at Pinot to be a place for people to work on calculating / retrieving realtime metrics.

Mayank

08/05/2022, 3:27 AM

By people do you mean your internal BI folks and data scientists?

Edwin Law

08/05/2022, 3:27 AM

we use flink for this today, but flink is of course not a great place to serve metrics

Edwin Law

08/05/2022, 3:27 AM

via dashboards, etc.

Edwin Law

08/05/2022, 3:28 AM

yes internal BI folks/ DS and via dashboards, internal operations teams that want to see the state of the world

Mayank

08/05/2022, 3:28 AM

Ok, for internal dashboards, you will typically have heavy write qps, but light on the read qps. Also you’d be ok with sub-second latency

Edwin Law

08/05/2022, 3:28 AM

i'm trying to figure out where to position Pinot and how we should think about it.

Mayank

08/05/2022, 3:29 AM

One approach would be to have denormalized data into Pinot, and serve dashbaords via it.

Edwin Law

08/05/2022, 3:29 AM

so we should do the denormalization in stream?

Mayank

08/05/2022, 3:29 AM

If you can’t denormalize, then you could use Pinot + Presto/Trino

Mayank

08/05/2022, 3:30 AM

Yeah, you could do it in stream.

Edwin Law

08/05/2022, 3:30 AM

how much should we invest in data modelling?

Edwin Law

08/05/2022, 3:30 AM

or how much do people who use pinot invest in that aspect?

Mayank

08/05/2022, 3:31 AM

It varies. But it is always good to get the modelling right.

Mayank

08/05/2022, 3:31 AM

How many dashboards are we talking about?

Edwin Law

08/05/2022, 3:31 AM

like do we go to the extent of the DWH models where you have fact tables, dim tables etc. or.. do people usually just go with denormalization.

Edwin Law

08/05/2022, 3:32 AM

we're trying to start a platform here, so initially 1? but we want to support a lot of use cases

Mayank

08/05/2022, 3:32 AM

I see. Let me share my experience ther eshortly

Edwin Law

08/05/2022, 3:32 AM

kinda like make this a real-time companion to the datalake?

Open in Slack

Previous Next