Hi @everyone, i was exploring pinot for one of my ...
# troubleshooting
a
Hi @everyone, i was exploring pinot for one of my use case i want to create segment of users based on their actions eg- users who have done payment using credit card with amount>100 at least 2 times -> segment 1
Copy code
my event payoad would be something like this
{
"eventid": "xyz",
"userId" : "abc",
"amount" : "100",
"paymode" : "CC"
}

table - 
CREATE TABLE pay_events (
	event_id serial PRIMARY KEY,
	user_id VARCHAR (50) NOT NULL,
	amount int not null,
	paymode varchar(4) not null
);

The query for getting the data from pino would be something like this

select pe.user_id from pay_events as pe where pe.paymode = 'CC' and pe.amount > 100 group by pe.user_id having count(pe.event_id) > 1;
A segment can have a million users we need to extract all the users somehow and that is going to be used by downstream services for sending bulk campaigns and notifications My questions are- 1.is pinot the right choice for this use case? 2.is there any scalable way of fetching all the users other than pagination through limit and offset? Do let me know if you guys need any clarification ThanksšŸ˜€
@Neha Pawar @Mayank can you guys answer my queries.It would be really helpful Thanks
m
Is this the only query you will ever run, or would you have more aggregation queries as well? Even if not, I feel you could use Pinot. As for pagination, the existing solution is limited to selection queries (which is your use case), but it is not optimal as each call would compute the entire query
Are you interested in users or distinct count of users?
a
There can be 1k different aggregation queries or maybe more in future , each resulting in a new segment. I'm interested in users as i have to send bulk notifications to them
m
By aggregation queries I mean which so aggregation (eg count, sum, etc), and not query raw columns like userid
a
it depends on the use case for a query like sum of transactions amount > 100 it would have to use sum too but at the end i need users list who satisfy these filter queries
m
My point is as long as you have sum like queries as well, you should use Pinot. For getting userid, perhaps we can use the grpc broker endpoint to stream this out. @Xiang Fu wdyt?
x
This is pretty much a group by query, the users list is anyway computed in broker as a whole during reduce phase, paging is not supported. grpc can help from the client side to not blow up the memory from application
a
@Xiang Fu Any resources for pinot grpc connector i'm not able to find it on docs
x
so far we only enable grpc for pinot server query which are used for presto/trino. Broker grpc endpoint is not yet implemented
a
Thanks @Mayank @Xiang Fu
m
I thought we recently added broker grpc @Rong R ?
r
broker GRPC is not enabled by default. one configuration needs to be added to broker conf
pinot.broker.request.handler.type=grpc
in order to enable grpc broker port
m
But once enabled, is it feature complete?
r
Wasn't sure what feature exactly we talking about. It supports selection / selection filter for sure in streaming fashion. For group by we have to wait till all server return stream before we can return the final result
m
I see
s
hey @Akash Yadav we’re solving this segmentation usecase by storing the user_id list we get from Pinot to a Hive table built on S3. There’s a single ā€œINSERT into … select fromā€ presto query that runs that gets data from the Pinot catalog and inserts into the hive catalog. We then provide the S3 location of the files to the marketing team to run the bulk campaigns