David Cromberge
11/01/2022, 9:45 AMabhinav wagle
11/01/2022, 4:49 PMLee Wei Hern Jason
11/02/2022, 7:19 AMBobby Richard
11/02/2022, 8:13 PMRyhan Sunny
11/02/2022, 8:46 PMAnita Jas
11/03/2022, 11:56 AMvishal
11/04/2022, 9:20 AMStart generating task configs for table: events2_REALTIME for task: RealtimeToOfflineSegmentsTask
No realtime-completed segments found for table: events2_REALTIME, skipping task generation: RealtimeToOfflineSegmentsTask
Finished CronJob: table - events2_REALTIME, task - RealtimeToOfflineSegmentsTask, next runtime is 2022-11-04T07:04:00.000+0000
i've pushed huge number of data and its creating multiple segment but its not converting to realtime to offline table.
ThanksSonit Rathi
11/05/2022, 11:04 AMAshish Kumar
11/07/2022, 5:09 AMDiogo Baeder
11/07/2022, 11:51 AMDhwanil Ditani
11/08/2022, 5:46 AMvishal
11/08/2022, 9:58 AMWindow data overflows into CONSUMING segments for partition of segment: events16__0__165__20221108T0953Z. Skipping task generation: RealtimeToOfflineSegmentsTask
Finished CronJob: table - events16_REALTIME, task - RealtimeToOfflineSegmentsTask, next runtime is 2022-11-08T09:58:00.000+0000
how can i solve this overflows issue?Dhwanil Ditani
11/08/2022, 1:20 PMAbdelhakim Bendjabeur
11/08/2022, 3:25 PMRaja Kirshnamoorthi
11/08/2022, 8:14 PMGaurav Sinha
11/09/2022, 8:41 AM[
{
"message": "null:\n4 segments [user_impressions_v1_stg__3__0__20221107T1247Z, user_impressions_v1_stg__0__0__20221107T1247Z, user_impressions_v1_stg__1__0__20221107T1247Z, user_impressions_v1_stg__4__0__20221107T1247Z] unavailable",
"errorCode": 305
}
]
Gaurav Sinha
11/09/2022, 8:42 AMRebalance Server
& Rebalance Brokers
without any successMamlesh
11/09/2022, 10:08 AMDan Caragea
11/09/2022, 10:44 AMid, type, dimension1, dimension2
1 t1 a
1 t1
1 t2 b
2 t1 a
Rows with the same id are part of the same "session" so in the example above I have 2 sessions: one with 3 events and another with 1 event. If I were to reconstitute the sessions from these individual events, they would look like this:
Sessions:
id, dimension1, dimension2, countT1, countT2
1 a b 2 1
2 a 1 0
Questions I have to answer are "how many t1
types in sessions where dimension1 is a
?" or "how many sessions had more than one t1
type?".
As far as I can tell, my options are:
1. storing the raw data in pinot as is and figuring out the queries for the above questions. TBH this would be my preferred route but can you help me with a sample SQL for the questions above? Also, can these queries be reasonably fast? The bit I am struggling with (my sql skills are really rusty atm) is that a naive query like select count(type) where dimension1='a' and type='t1'
would return 1 for id 1 yet it should be 2 (see the reconstituted session for id 1). So I probably need some sort of joins but I am not sure what's the best way to do it.
2. I could try to use the upsert feature of pinot to reconstitute and store the sessions in pinot instead of the raw data. This could work although I am not sure I can do the counts with upserts (countT1/countT2). Also, since I'll have to reconstitute multiple session types (based on various other ids) and pinot requires the topic in kafka to be keyed, it means I'll have to duplicate topics in kafka just to use a different key. It seems a bit wasteful to me atm.
3. I could try to reconstitute in a job outside of pinot and insert only the final version of the session in pinot. This has the downside that I will have to wait for a session to be complete before inserting it and this means less fresh data and even lost events if they come very late (there's no end-of-session marker and events can come out of order anyway).
So what would you recommend and can you help with #1 above?Lars-Kristian Svenøy
11/09/2022, 10:58 AMWeixiang Sun
11/09/2022, 9:54 PMQiaochu Liu
11/10/2022, 8:54 PMquick question about distinctCountHLL https://docs.pinot.apache.org/configuration-reference/functions/distinctcounthll DISTINCTCOUNTHLL(colName, log2m)if pinot users want to leverage the log2m parameter during query, do we need to emit the HLL object with given precision at ingestion time? Will it work if no changes on the ingestion side?
abhinav wagle
11/10/2022, 11:10 PMAshish Kumar
11/11/2022, 4:11 AMjava.lang.IllegalArgumentException: INT96 not implemented and is deprecated error
. does someone know, what could be the right way of reading parquet using pinot-parquet record reader? It seems like by default my batch job is using parquetAvroFormatReader and that doesn't implement INT96, what could be other possible way of reading parquet files?
stack-trace:vishal
11/11/2022, 5:30 AMAbhishek Dubey
11/11/2022, 7:02 AMAshish Kumar
11/12/2022, 2:15 PMAbdelhakim Bendjabeur
11/14/2022, 10:06 AMAbdelhakim Bendjabeur
11/14/2022, 10:40 AM