Priyank Bagrecha
06/22/2022, 7:19 AMvaluein
and distinctcounthll
from looker to query pinot data via trino. i get this error
Query failed (#20220622_071712_00006_mi58r): line 4:24: Function 'valuein' not registered
Priyank Bagrecha
06/22/2022, 8:30 AMfield = 'Some Value'
returns no result because it gets translated to field = 'some value'
. did anyone figure out a way to resolve this issue?Lars-Kristian Svenøy
06/22/2022, 9:58 AMRakesh Bobbala
06/22/2022, 2:45 PMRakesh Bobbala
06/22/2022, 2:48 PMselect count(*) from table
Also, the segments are not getting pushed to the s3 bucket after the threshold.Rakesh Bobbala
06/22/2022, 4:33 PMmkdir <s3://test_bucket/test_key_1/data/rakesh_test>
Copying uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1609Z.tmp.c2fcb060-bf45-487c-b025-c0b27c601f38> to uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1609Z>
Deleting uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1609Z> force true
Caught exception while committing segment file for segment: rakesh_test__0__0__20220622T1609Z
software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403
The controller was able to create the tmp files. But still I see access deniedRakesh Bobbala
06/22/2022, 4:34 PMpinot.controller.storage.factory.s3.disableAcl=true
and
pinot.controller.storage.factory.s3.disableAcl=false
Rakesh Bobbala
06/22/2022, 4:34 PMMichael Latta
06/22/2022, 4:34 PMStuart Millholland
06/22/2022, 5:15 PMRakesh Bobbala
06/22/2022, 7:59 PMCopy /tmp/pinot-tmp-data/fileUploadTemp/rakesh_test__0__0__20220622T1821Z.a804f229-8b20-4b26-be2a-36c97eecd56b from local to <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z.tmp.295786f8-ca70-4929-bacc-8685d7eed4d5>
Response to segmentUpload for segment:rakesh_test__0__0__20220622T1821Z is:{"offset":143796,"status":"UPLOAD_SUCCESS","isSplitCommitType":false,"segmentLocation":"<s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z.tmp.295786f8-ca70-4929-bacc-8685d7eed4d5>","streamPartitionMsgOffset":"143796","buildTimeSec":-1}
Handled request from 10.0.1.187 POST <http://pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local:9000/segmentUpload?segmentSizeBytes=151514&buildTimeMillis=119&streamPartitionMsgOffset=143796&instance=Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098&offset=-1&name=rakesh_test__0__0__20220622T1821Z&rowCount=1000&memoryUsedBytes=510612>, content-type multipart/form-data; boundary=XMJs_YLCxWLk2ADRtRKgDg5q7PaR5pB4_Bkwm3 status code 200 OK
Processing segmentCommitEndWithMetadata:Offset: -1,Segment name: rakesh_test__0__0__20220622T1821Z,Instance Id: Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098,Reason: null,NumRows: 1000,BuildTimeMillis: 119,WaitTimeMillis: 0,ExtraTimeSec: -1,SegmentLocation: <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z.tmp.295786f8-ca70-4929-bacc-8685d7eed4d5,MemoryUsedBytes>: 510612,SegmentSizeBytes: 151514,StreamPartitionMsgOffset: 143796
Processing segmentCommitEnd(Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098, 143796)
Committing segment rakesh_test__0__0__20220622T1821Z at offset 143796 winner Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098
Committing segment file for segment: rakesh_test__0__0__20220622T1821Z
mkdir <s3://test_bucket/test_key_1/data/rakesh_test>
Copying uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z.tmp.295786f8-ca70-4929-bacc-8685d7eed4d5> to uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z>
Deleting uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z> force true
Caught exception while committing segment file for segment: rakesh_test__0__0__20220622T1821Z
software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: TNEVQ80HE8YYE6QV, Extended Request ID: s2urpeZBQG+wFdNvBN/AC57hEzwBOJy4kQJMP/rKOpJZHQLsfTQMV5ghT3bF2XatKwxTqmjP0UQ=)
Rakesh Bobbala
06/22/2022, 7:59 PMTiger Zhao
06/22/2022, 8:40 PMprimaryKeyColumns
for an existing upsert table and have it take effect? Or would I need to recreate the table?Stuart Millholland
06/23/2022, 12:20 AMahmed
06/23/2022, 1:20 AMAlice
06/23/2022, 6:17 AMLars-Kristian Svenøy
06/23/2022, 8:44 AMMohamed Emad
06/23/2022, 10:36 AMAlice
06/23/2022, 3:40 PMAlice
06/24/2022, 2:49 AMabhinav wagle
06/24/2022, 3:44 AMexport JAVA_OPTS="-Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-broker.log"
./bin/pinot-admin.sh StartBroker \
-zkAddress localhost:2191
[0.006s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:gc-pinot-broker.log instead.
2022/06/23 20:41:49.149 INFO [StartBrokerCommand] [main] Executing command: StartBroker -brokerHost null -brokerPort 8099 -zkAddress localhost:2191
2022/06/23 20:41:49.161 INFO [StartServiceManagerCommand] [main] Executing command: StartServiceManager -clusterName PinotCluster -zkAddress localhost:2191 -port -1 -bootstrapServices []
2022/06/23 20:41:49.161 INFO [StartServiceManagerCommand] [main] Starting a Pinot [SERVICE_MANAGER] at 0.341s since launch
2022/06/23 20:41:49.165 INFO [StartServiceManagerCommand] [main] Started Pinot [SERVICE_MANAGER] instance [ServiceManager_192.168.50.11_-1] at 0.346s since launch
2022/06/23 20:41:49.167 INFO [StartServiceManagerCommand] [Start a Pinot [BROKER]] Starting a Pinot [BROKER] at 0.347s since launch
Jun 23, 2022 8:41:53 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:8099]
Jun 23, 2022 8:41:53 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
2022/06/23 20:41:57.595 INFO [StartServiceManagerCommand] [Start a Pinot [BROKER]] Started Pinot [BROKER] instance [Broker_192.168.50.11_8099] at 8.775s since launch
Alice
06/24/2022, 4:24 AMTommaso Peresson
06/24/2022, 9:47 AMselect date,
fields.column1,
distinctcounthll(hllState)
from EventsHll
where fields.column2 in (1,2,10)
group by date, fields.column1
limit 300
And I get sub optimal performance:
"numServersQueried": 2,
"numServersResponded": 2,
"numSegmentsQueried": 1816,
"numSegmentsProcessed": 1816,
"numSegmentsMatched": 1816,
"numConsumingSegmentsQueried": 0,
"numDocsScanned": 154861922,
"numEntriesScannedInFilter": 214829377,
"numEntriesScannedPostFilter": 464585766,
"numGroupsLimitReached": false,
"totalDocs": 447509450,
"timeUsedMs": 25832,
"offlineThreadCpuTimeNs": 0,
"realtimeThreadCpuTimeNs": 0,
"offlineSystemActivitiesCpuTimeNs": 0,
"realtimeSystemActivitiesCpuTimeNs": 0,
"offlineResponseSerializationCpuTimeNs": 0,
"realtimeResponseSerializationCpuTimeNs": 0,
"offlineTotalCpuTimeNs": 0,
"realtimeTotalCpuTimeNs": 0,
"segmentStatistics": [],
"traceInfo": {},
"minConsumingFreshnessTimeMs": 0,
"numRowsResultSet": 140
So from my understanding the best combination would be a star-tree index for the aggregation and an inverted index for the filtering. Now when I look at the query explanation it seems that it uses the star-tree index filtering
Operator#$%0 Operator_Id#$%1 Parent_Id#$%2
------------------------------------------------------------------ ------------------ ----------------
"BROKER_REDUCE(limit:300)" "0" "-1"
"COMBINE_GROUPBY_ORDERBY" "1" "0"
"AGGREGATE_GROUPBY_ORDERBY" "2" "1"
"TRANSFORM(fields.column1, date)" "3" "2"
"PROJECT(fields.column1, date, distinctCountHLL__hllState)" "4" "3"
"FILTER_STARTREE_INDEX" "5" "4"
I'm using pinot 0.10.0 and I have the star tree index enabled on date, fields.column1, fields.column2
on distinctcounthll__hllState
and the inverted index enabled on fields.column2
.
Just as a reference the same query without the filtering takes 277ms with numEntriesScannedInFilter:0
and numEntriesScannedPostFilter:54480
.
My question is, how can I further optimise filtering when grouping by and using a star-tree index? Can the star tree index be used in conjunction with the inverted index?
Thanks a lotkauts shukla
06/24/2022, 10:09 AMAlice
06/24/2022, 10:47 AMMichael Latta
06/24/2022, 3:35 PMabhinav wagle
06/24/2022, 9:18 PM{"code":400,"error":"Invalid table config for table transcript_OFFLINE: Failed to find instances with tag: DefaultTenant_OFFLINE for table: transcript_OFFLINE"}`
Diogo Baeder
06/25/2022, 1:48 AM1
if it matches a certain number X, and to 0
if it doesn't. More on this thread.Alice
06/25/2022, 4:31 AMKY
06/25/2022, 4:55 PM