Raj
03/10/2023, 2:36 PMSaubhagya Awaneesh
03/11/2023, 2:31 AMAshish Kumar
03/14/2023, 8:23 AMYarden Rokach
Weixiang Sun
03/15/2023, 1:39 AMEXPLAIN PLAN FOR
for hybrid table is same as realtime table which is different from offline table. Is it expected?Rohit Yadav
03/16/2023, 10:16 AMpiby
03/16/2023, 2:34 PMabhinav wagle
03/16/2023, 9:40 PMSameer Awasekar
03/17/2023, 4:20 AMwatermark metadata
is atomic? I do see the segment replacement protocol but I think it doesn't come into picture for RealtimeToOfflineTask but for Merge task.vishal
03/17/2023, 7:14 AMAshish Kumar
03/17/2023, 3:28 PMLaunchSparkDataIngestionJobCommand
& LaunchDataIngestionJobCommand
? When using batch ingestion job (https://docs.pinot.apache.org/basics/data-import/batch-ingestion/spark) which one should be the main class?Nizar Hejazi
03/17/2023, 11:52 PM{
"name": "event_time_ms",
"dataType": "TIMESTAMP",
"format": "1:MILLISECONDS:TIMESTAMP",
"granularity": "1:MILLISECONDS"
}
Jason MacLulich
03/18/2023, 6:18 AMIN
operator?Deena Dhayalan
03/20/2023, 7:48 AMPratik Tibrewal
03/20/2023, 5:29 PM_tmp/tmp-<segment_name>-<timestamp>/tmp-<uuid>
The segment name in this path^ does not exist anymore for that table (deleted by retention). The contents of the directory are of this manner:
0 col1.sv.sorted.fwd
0 col2.mv.fwd
0 col3.sv.sorted.fwd
0 col4.sv.sorted.fwd
0 col5.sv.sorted.fwd
0 col6.sv.sorted.fwd
4.0K col1.dict
4.0K col2.dict
4.0K col3.dict
26G col4.dict
132G col5.dict
148G col6.dict
Any idea what this _tmp
folder signifies and why are they getting created?Andi Miller
03/20/2023, 5:37 PMSegmentGenerationAndPushTask
? do I need to trigger a MergeRollupTask
and hope it does it?abhinav wagle
03/20/2023, 8:00 PMssl.truststore.location
as mentioned here part of the pinot Deployment using helm. Is it local jks
file being packaged as part of the docker or being added post cluster deployment. Any ideas/best practices around this ? Thanks !Bobby Richard
03/20/2023, 8:44 PMMingmin Xu
03/20/2023, 9:51 PMTim Berglund
Tim Berglund
Tim Berglund
Grace Lu
03/21/2023, 9:06 PMuuid date group metrics_1, metrics_2. … metrics_xxxx
And a typical simplified query we want to run on this table is selecting a bunch of metrics aggregation for certain groups of uuids across days and then aggregate them again by group, eg:
select
group,
avg(m1),
sum(m2),
...
avg(mxxx)
from
(
select
uuid,
group,
avg(metrics_1) as m1,
sum(metrics_2) as m2,
…
avg(metrics_xxx) as mxxx
from metrics_table where group in (xxx) and date between aa and bb
group by 1, 2
) group by 1
When we did preliminary testing previously, we ran into issues of simple aggregation query on uuid takes very long to return, or query return inaccurate approximations due to high cardinality, we want to get some suggestions about whether it is a good use case with pinot, and if it is how to model this with proper cluster config and index config, thank you!
cc @Mingmin XuAshish Kumar
03/22/2023, 2:04 PM-Pbuild-shaded-jar
and without it?
2. Is it possible to shade org.apache.hadoop
being used in main pom.xml in pinot-0.12.0, seems like it's using different version then hadoop being used in our team's cluster. I believe, if we can shade it and build pinot from source code, then it should be fine.Yarden Rokach
Yarden Rokach
Ken Krugler
03/22/2023, 9:54 PMDavid G. Simmons
03/23/2023, 11:48 AMDavid G. Simmons
03/23/2023, 11:51 AMTim Berglund