Priyank Bagrecha
06/09/2022, 10:17 PMNikhil Varma
06/10/2022, 5:00 AMAlice
06/10/2022, 11:27 AMStuart Millholland
06/10/2022, 5:45 PMAli Atıl
06/13/2022, 1:22 PMAHMEDSHEHATA
06/13/2022, 3:47 PMLars-Kristian Svenøy
06/13/2022, 7:31 PMNikhil
06/14/2022, 12:39 AMpinot-admin.sh StartZookeeper
. What's the recommended practice for deploying a fault tolerant ZK setup? How to set up the metadata sync across multiple ZK instances?Shreesh Mansotra
06/14/2022, 7:18 AMLovin Singla
06/14/2022, 11:06 AMAkash Yadav
06/14/2022, 12:31 PMmy event payoad would be something like this
{
"eventid": "xyz",
"userId" : "abc",
"amount" : "100",
"paymode" : "CC"
}
table -
CREATE TABLE pay_events (
event_id serial PRIMARY KEY,
user_id VARCHAR (50) NOT NULL,
amount int not null,
paymode varchar(4) not null
);
The query for getting the data from pino would be something like this
select pe.user_id from pay_events as pe where pe.paymode = 'CC' and pe.amount > 100 group by pe.user_id having count(pe.event_id) > 1;
A segment can have a million users
we need to extract all the users somehow and that is going to be used by downstream services for sending bulk campaigns and notifications
My questions are-
1.is pinot the right choice for this use case?
2.is there any scalable way of fetching all the users other than pagination through limit and offset?
Do let me know if you guys need any clarification
Thanks😀Satyam Raj
06/14/2022, 6:03 PMConnection connection = ConnectionFactory.fromHostList(this.pinotConfig.getBrokerUrl());
ResultSetGroup resultSet = connection.execute(new Request("sql",query));
Priyank Bagrecha
06/15/2022, 12:32 AMPriyank Bagrecha
06/15/2022, 12:40 AMGrace Walkuski
06/15/2022, 4:54 PMPrashant Pandey
06/16/2022, 6:48 AMVisar Buza
06/16/2022, 7:59 AMharnoor
06/16/2022, 12:48 PMnumSegmentsQueried=21380, numSegmentsProcessed=90
Hi, we haven’t set any bloom filters and don’t use any partitioning. The query has a time range filter.
I just wanted to confirm that in the above example, time-based pruning of segments happened at the broker level and after the broker layer, only 90 segments were queried in the server, right?
And if we set bloom filters, are the segments pruned at the server or broker?Kevin Peng
06/16/2022, 8:07 PMINSERT INTO "baseballStats"
FROM FILE '<s3://my-bucket/public_data_set/baseballStats/rawdata/>'
OPTION(taskName=myTask-s3)
OPTION(input.fs.className=org.apache.pinot.plugin.filesystem.S3PinotFS)
OPTION(input.fs.prop.accessKey=my-key)
OPTION(input.fs.prop.secretKey=my-secret)
OPTION(input.fs.prop.region=us-west-2)
I just changed the table name, location and aws credentials, but when I run it I get
[
{
"message": "QueryExecutionError:\nshaded.org.apache.commons.httpclient.HttpException: Unable to get tasks states map. Error code 400, Error message: {\"code\":400,\"error\":\"No task is generated for table: segments_aggregated, with task type: SegmentGenerationAndPushTask\"}\n\tat org.apache.pinot.common.minion.MinionClient.executeTask(MinionClient.java:123)\n\tat org.apache.pinot.core.query.executor.sql.SqlQueryExecutor.executeDMLStatement(SqlQueryExecutor.java:95)\n\tat org.apache.pinot.controller.api.resources.PinotQueryResource.executeSqlQuery(PinotQueryResource.java:120)\n\tat org.apache.pinot.controller.api.resources.PinotQueryResource.handlePostSql(PinotQueryResource.java:100)",
"errorCode": 200
}
]
I am also seeing this in the terminal 2022/06/15 02:02:05.604 ERROR [JobDispatcher] [HelixController-pipeline-task-QuickStartCluster-(e322dd58_TASK)] Job configuration is NULL for TaskQueue_SegmentGenerationAndPushTask_Task_SegmentGenerationAndPushTask_cafd03d1-f383-48ba-aea6-bc0a934522db_1655153184751
Any ideas of what I am doing wrong or where I can go to dig in more?
I am running this off the latest docker image for pinot.Stuart Millholland
06/17/2022, 2:10 PMNorman he
06/17/2022, 8:19 PMStuart Millholland
06/20/2022, 7:29 PMabhinav wagle
06/20/2022, 11:06 PMdocker-build.sh
and running into following issue. Any pointers on how to get around this issue :
executor failed running [/bin/sh -c git clone ${PINOT_GIT_URL} ${PINOT_BUILD_DIR} && cd ${PINOT_BUILD_DIR} && git checkout ${PINOT_BRANCH} && mvn install package -DskipTests -Pbin-dist -Pbuild-shaded-jar -Djdk.version=${JDK_VERSION} -T1C && mkdir -p ${PINOT_HOME}/configs && mkdir -p ${PINOT_HOME}/data && cp -r build/* ${PINOT_HOME}/. && chmod +x ${PINOT_HOME}/bin/*.sh]: exit code: 1
Alice
06/20/2022, 11:48 PMAlice
06/21/2022, 10:21 AMAHMEDSHEHATA
06/21/2022, 2:58 PMPriyank Bagrecha
06/21/2022, 11:24 PM{e1:b1,e2:b2,e3:b3...}
an account can be in multiple ab tests. the query is going to be SELECT exp_id, event_ts, bucket_id, DISTINCTCOUNTHLL(account_id) FROM table WHERE exp_id = <exp id> AND event_ts > start_time and event_ts < end_time GROUP BY event_ts, bucket_id
. event_ts has a granularity of some time interval so it is not a problem of high cardinality.Alice
06/22/2022, 1:14 AMAlice
06/22/2022, 2:40 AMsum
, max
, min
supported as for today for realtimetoofflineSegmentsTask, right?harry singh
06/22/2022, 5:00 AMcreated_at
stores the epoch_timestamp in integer datatype.
While querying, the BI tool generates a filter on the above field, with the syntax
from_unixtime(table1.created_at) AT TIMEZONE 'Asia/Kolkata'
this doesn't get pushdown, and Trino tries to load the entire table.
any workaround for this?
Also is there a looker-pinot connector in near future?