Hi Team! First post! We’re evaluating Pinot for our use case and wanted to get some of your thoughts on if it’s a good fit for our use case and/or best practices to make it happen.
The main complication we’re running into is we feel that we may need to be able to mutate our data which may not be a good fit for pinot (maybe this can be avoided with some smarter data modeling or some future
tech?). We’re attracted to pinot because it’s ability to perform fast aggregation and reduce eng cost from having to do things like precubing data.
• In particular we have two streams of order data (e.g. you can imagine booking details like total price in $, an order id, account id, user name, date, etc) that are flowing into our system.
• The two streams (let’s call them “Fast Stream” and “Accurate Stream”) of order data may overlap (i.e. the Fast Stream and the Accurate Stream may both have order info for “order 1” but Fast Stream may be the only one that has “order 2” or Accurate Stream may be the only one that has "order 3")
• Ideally we want to merge these streams together such that whenever they overlap (if they overlap), we use the data from Accurate Stream instead because it has richer user details and more accurate reporting of price.
We want to be able to do things like get time based aggregate totals based on account id quickly. Is there a good way to model this since we have two data sources we want to merge?
Thanks so much for your help!