Hi Team First post We re evaluating Pinot for our use case a Apache Pinot #general

Hi Team! First post! We’re evaluating Pinot for ou...

Anthony Tran

08/11/2020, 1:57 AM

Hi Team! First post! We’re evaluating Pinot for our use case and wanted to get some of your thoughts on if it’s a good fit for our use case and/or best practices to make it happen. The main complication we’re running into is we feel that we may need to be able to mutate our data which may not be a good fit for pinot (maybe this can be avoided with some smarter data modeling or some future tech?). We’re attracted to pinot because it’s ability to perform fast aggregation and reduce eng cost from having to do things like precubing data. • In particular we have two streams of order data (e.g. you can imagine booking details like total price in $, an order id, account id, user name, date, etc) that are flowing into our system. • The two streams (let’s call them “Fast Stream” and “Accurate Stream”) of order data may overlap (i.e. the Fast Stream and the Accurate Stream may both have order info for “order 1” but Fast Stream may be the only one that has “order 2” or Accurate Stream may be the only one that has "order 3") • Ideally we want to merge these streams together such that whenever they overlap (if they overlap), we use the data from Accurate Stream instead because it has richer user details and more accurate reporting of price. We want to be able to do things like get time based aggregate totals based on account id quickly. Is there a good way to model this since we have two data sources we want to merge? Thanks so much for your help!

Mayank

08/11/2020, 2:05 AM

Hi, the upsert feature allows you to mutate data from accurate stream. That should help your usecase to be on Pinot

❤️ 1

Anthony Tran

08/11/2020, 2:06 AM

does the feature already exist? Found this open ticket (https://github.com/apache/incubator-pinot/issues/4261) and the proposal

Anthony Tran

08/11/2020, 2:09 AM

and thanks for taking the time to answer Mayank!

Kishore G

08/11/2020, 2:09 AM

It’s not available as of now, Uber engineers are working on this and will be available in Q4

Anthony Tran

08/11/2020, 2:12 AM

Thanks Kishore! Hmm in the meantime, are there ways others have dealt with this? We were thinking about modeling our data as append only but we’re not sure how well pinot handles that E.g. Fast data inserts order Accurate data also inserts order 1 so we have two rows now with order id 1 (we can probably add a “source” field where it’s either fast data or accurate) Not sure how well pinot handles aggregate queries where you select accurate data instead of fast data if it exists

Kishore G

08/11/2020, 2:15 AM

Got it, there are some options depending on your ingest rate. Create a channel to discuss further?

Anthony Tran

08/11/2020, 2:16 AM

awesome. thanks so much for taking the time kishore. I created the channel #C018T5T57KN

Open in Slack

Previous Next