Apache Pinot #troubleshooting

Join Slack

Channels

aggregators

announcements

enable-generic-offsets

events

feat-better-schema-evolution

feat-compound-types

feat-geo-spatial-index

feat-logical-table

feat-partial-upsert

feat-pausless-consumption

feat-pravega-connector

feat-presto-connector

fix_llc_segment_upload

fix-numerical-predicate

flink-pinot-connector

latency-during-segment-commit

odsc-europe-2022-workshop

pinot-contributor-calls

pinot-realtime-table-rebalance

pinot_website_improvement_suggestions

pinot-youtube-channel

pql-sql-regression

presto-pinot-connector

time-based-segment-pruner

v2_engine_beta_feedback

Priyank Bagrecha

06/22/2022, 7:19 AM

i am trying to use

valuein

and

distinctcounthll

from looker to query pinot data via trino. i get this error

Copy code

Query failed (#20220622_071712_00006_mi58r): line 4:24: Function 'valuein' not registered

Priyank Bagrecha

06/22/2022, 8:30 AM

Trino lowercases the query when passing to pinot. As a result query with predicates like

field = 'Some Value'

returns no result because it gets translated to

field = 'some value'

. did anyone figure out a way to resolve this issue?

Lars-Kristian Svenøy

06/22/2022, 9:58 AM

Hello team 👋 Is there any way to remove fields from a schema? I know that this is counted as a breaking change, but it would be nice to be able to do it anyway if it was explicitly intentional. I have a field right now which is set to null, and which we've decided to not include. The only way right now to get rid of that field would be to delete the entire table, recreate the schema and start ingesting the data again, but this isn't a very friendly solution. Any suggestions?

Rakesh Bobbala

06/22/2022, 2:45 PM

Hello Team, My realtime table is not consuming records from kafka topic after reaching the "segment.flush.threshold.size": "100000". Am I missing some configurations ?

Rakesh Bobbala

06/22/2022, 2:48 PM

So when I run the below query, it keeps returning 100000.

Copy code

select count(*) from table

Also, the segments are not getting pushed to the s3 bucket after the threshold.

Rakesh Bobbala

06/22/2022, 4:33 PM

Can someone help me with the below error .

Copy code

mkdir <s3://test_bucket/test_key_1/data/rakesh_test>
Copying uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1609Z.tmp.c2fcb060-bf45-487c-b025-c0b27c601f38> to uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1609Z>
Deleting uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1609Z> force true
Caught exception while committing segment file for segment: rakesh_test__0__0__20220622T1609Z
software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403

The controller was able to create the tmp files. But still I see access denied

Rakesh Bobbala

06/22/2022, 4:34 PM

i tried both

Copy code

pinot.controller.storage.factory.s3.disableAcl=true

and

Copy code

pinot.controller.storage.factory.s3.disableAcl=false

Rakesh Bobbala

06/22/2022, 4:34 PM

no luck

Michael Latta

06/22/2022, 4:34 PM

Looks like your s3 credentials need to be looked at

➕ 1

Stuart Millholland

06/22/2022, 5:15 PM

We are running an init shell scrip that runs a few curl commands to create our tables. We sometimes see the error {"code":409,"error":"Table mutable_events_REALTIME already exists"}failed to create the mutable_events table even when we know the table doesn't exist. Is this a known issue? Is there some sort of lag after you delete a table before it is recognized as being gone? Using the swagger api, we verify that there are no table in the system but this create still fails with this error.

Rakesh Bobbala

06/22/2022, 7:59 PM

Can someone help with the below S3 access issue

Copy code

Copy /tmp/pinot-tmp-data/fileUploadTemp/rakesh_test__0__0__20220622T1821Z.a804f229-8b20-4b26-be2a-36c97eecd56b from local to <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z.tmp.295786f8-ca70-4929-bacc-8685d7eed4d5>
Response to segmentUpload for segment:rakesh_test__0__0__20220622T1821Z is:{"offset":143796,"status":"UPLOAD_SUCCESS","isSplitCommitType":false,"segmentLocation":"<s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z.tmp.295786f8-ca70-4929-bacc-8685d7eed4d5>","streamPartitionMsgOffset":"143796","buildTimeSec":-1}
Handled request from 10.0.1.187 POST <http://pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local:9000/segmentUpload?segmentSizeBytes=151514&buildTimeMillis=119&streamPartitionMsgOffset=143796&instance=Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098&offset=-1&name=rakesh_test__0__0__20220622T1821Z&rowCount=1000&memoryUsedBytes=510612>, content-type multipart/form-data; boundary=XMJs_YLCxWLk2ADRtRKgDg5q7PaR5pB4_Bkwm3 status code 200 OK
Processing segmentCommitEndWithMetadata:Offset: -1,Segment name: rakesh_test__0__0__20220622T1821Z,Instance Id: Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098,Reason: null,NumRows: 1000,BuildTimeMillis: 119,WaitTimeMillis: 0,ExtraTimeSec: -1,SegmentLocation: <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z.tmp.295786f8-ca70-4929-bacc-8685d7eed4d5,MemoryUsedBytes>: 510612,SegmentSizeBytes: 151514,StreamPartitionMsgOffset: 143796
Processing segmentCommitEnd(Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098, 143796)
Committing segment rakesh_test__0__0__20220622T1821Z at offset 143796 winner Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098
Committing segment file for segment: rakesh_test__0__0__20220622T1821Z
mkdir <s3://test_bucket/test_key_1/data/rakesh_test>
Copying uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z.tmp.295786f8-ca70-4929-bacc-8685d7eed4d5> to uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z>
Deleting uri <s3://test_bucket/test_key_1/data/rakesh_test/rakesh_test__0__0__20220622T1821Z> force true
Caught exception while committing segment file for segment: rakesh_test__0__0__20220622T1821Z
software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: TNEVQ80HE8YYE6QV, Extended Request ID: s2urpeZBQG+wFdNvBN/AC57hEzwBOJy4kQJMP/rKOpJZHQLsfTQMV5ghT3bF2XatKwxTqmjP0UQ=)

Rakesh Bobbala

06/22/2022, 7:59 PM

checked all the permissions. But, couldn't resolve this

Tiger Zhao

06/22/2022, 8:40 PM

Hi, is it possible to update the

primaryKeyColumns

for an existing upsert table and have it take effect? Or would I need to recreate the table?

Stuart Millholland

06/23/2022, 12:20 AM

Any insight on why some of my deepstore segments are named like this: immutable_events__8__0__20220622T1721Z655cec28-e5c8-4500-b18c-abbbc9d77b47 and some are named like this: immutable_events__5__0__20220622T1721Z for the same table

ahmed

06/23/2022, 1:20 AM

Hi I started working on Pinot this week and read couple of articles on how to make udf and created one but the problem after importing it into plugins and lib it doesn't recognize the function and I don't why can you help ? here is the UDF code https://gist.github.com/AhmedElsagher/fd941e7d6d9607167a52825c8e370d03

Alice

06/23/2022, 6:17 AM

Hi team. I noticed the sample config for segmentPartitionConfig in Pinot. But I have a question about this. If highlevel type is used, is it ok to use a different partition config in Pinot from kafka partition config? eg, set numPartitions a value not the same as Kafka topic partition number. Or if no partition key is set in Kakfa, will it take effect in Pinot if a column is used in columnPartitionMap?

Lars-Kristian Svenøy

06/23/2022, 8:44 AM

Hello team. Quick question, does pool-based tagging also work for brokers?

Mohamed Emad

06/23/2022, 10:36 AM

Hello, We have a pinot cluster and we notice after adding real-time tables the status of the server became dead with the following errors ''' "_code": 404, "_error": "ZKPath /pinot-quickstart/LIVEINSTANCES/Server_pinot-release-server-1.pinot-release-server-headless.default.svc.cluster.local_8098 does not exist: ''' when I restart the server pod the status become healthy. Does anyone face this issue before?

Alice

06/23/2022, 3:40 PM

Hi, how to tune query performance like ‘select count(distinct user_id) from table_name’? When I run such a query, it returned several servers not responded. 😂 This table has about 100million rows and inverted_index is created for user_id.

Alice

06/24/2022, 2:49 AM

Hi team, Can I use upsert in realtime table and use realtimetoofflinesegmentstask at the same time? will this task make sure rows in offline table upserted?

abhinav wagle

06/24/2022, 3:44 AM

Hi Team, when I start broker locally. Where can I tail logs for broker. This is what I see during startup process, at which point no activity logs are seen. Do they get redirected to some file ?

Copy code

export JAVA_OPTS="-Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-broker.log"
./bin/pinot-admin.sh StartBroker \
    -zkAddress localhost:2191
[0.006s][warning][gc] -Xloggc is deprecated. Will use -Xlog:gc:gc-pinot-broker.log instead.
2022/06/23 20:41:49.149 INFO [StartBrokerCommand] [main] Executing command: StartBroker -brokerHost null -brokerPort 8099 -zkAddress localhost:2191
2022/06/23 20:41:49.161 INFO [StartServiceManagerCommand] [main] Executing command: StartServiceManager -clusterName PinotCluster -zkAddress localhost:2191 -port -1 -bootstrapServices []
2022/06/23 20:41:49.161 INFO [StartServiceManagerCommand] [main] Starting a Pinot [SERVICE_MANAGER] at 0.341s since launch
2022/06/23 20:41:49.165 INFO [StartServiceManagerCommand] [main] Started Pinot [SERVICE_MANAGER] instance [ServiceManager_192.168.50.11_-1] at 0.346s since launch
2022/06/23 20:41:49.167 INFO [StartServiceManagerCommand] [Start a Pinot [BROKER]] Starting a Pinot [BROKER] at 0.347s since launch
Jun 23, 2022 8:41:53 PM org.glassfish.grizzly.http.server.NetworkListener start
INFO: Started listener bound to [0.0.0.0:8099]
Jun 23, 2022 8:41:53 PM org.glassfish.grizzly.http.server.HttpServer start
INFO: [HttpServer] Started.
2022/06/23 20:41:57.595 INFO [StartServiceManagerCommand] [Start a Pinot [BROKER]] Started Pinot [BROKER] instance [Broker_192.168.50.11_8099] at 8.775s since launch

Alice

06/24/2022, 4:24 AM

Hi, is there possibility Pinot will create 2 rows for one Kafka message?

Tommaso Peresson

06/24/2022, 9:47 AM

Hi everybody, I have a question for you. I'm trying to figure out the best combination of filters for queries like:

Copy code

select date, 
         fields.column1, 
         distinctcounthll(hllState)
  from EventsHll 
  where fields.column2 in (1,2,10)
  group by date, fields.column1 
  limit 300

And I get sub optimal performance:

Copy code

"numServersQueried": 2,
  "numServersResponded": 2,
  "numSegmentsQueried": 1816,
  "numSegmentsProcessed": 1816,
  "numSegmentsMatched": 1816,
  "numConsumingSegmentsQueried": 0,
  "numDocsScanned": 154861922,
  "numEntriesScannedInFilter": 214829377,
  "numEntriesScannedPostFilter": 464585766,
  "numGroupsLimitReached": false,
  "totalDocs": 447509450,
  "timeUsedMs": 25832,
  "offlineThreadCpuTimeNs": 0,
  "realtimeThreadCpuTimeNs": 0,
  "offlineSystemActivitiesCpuTimeNs": 0,
  "realtimeSystemActivitiesCpuTimeNs": 0,
  "offlineResponseSerializationCpuTimeNs": 0,
  "realtimeResponseSerializationCpuTimeNs": 0,
  "offlineTotalCpuTimeNs": 0,
  "realtimeTotalCpuTimeNs": 0,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 0,
  "numRowsResultSet": 140

So from my understanding the best combination would be a star-tree index for the aggregation and an inverted index for the filtering. Now when I look at the query explanation it seems that it uses the star-tree index filtering

Copy code

Operator#$%0                                                       Operator_Id#$%1    Parent_Id#$%2   
 ------------------------------------------------------------------ ------------------ ---------------- 
  "BROKER_REDUCE(limit:300)"                                         "0"                "-1"            
  "COMBINE_GROUPBY_ORDERBY"                                          "1"                "0"             
  "AGGREGATE_GROUPBY_ORDERBY"                                        "2"                "1"             
  "TRANSFORM(fields.column1, date)"                                  "3"                "2"             
  "PROJECT(fields.column1, date, distinctCountHLL__hllState)"        "4"                "3"             
  "FILTER_STARTREE_INDEX"                                            "5"                "4"

I'm using pinot 0.10.0 and I have the star tree index enabled on

date, fields.column1, fields.column2

distinctcounthll__hllState

and the inverted index enabled on

fields.column2

. Just as a reference the same query without the filtering takes 277ms with

numEntriesScannedInFilter:0

and

numEntriesScannedPostFilter:54480

. My question is, how can I further optimise filtering when grouping by and using a star-tree index? Can the star tree index be used in conjunction with the inverted index? Thanks a lot

kauts shukla

06/24/2022, 10:09 AM

@All : I have moved to latest version 0.10.0 and setup on EC2 graviton machines. no error nothing but not able to consume from kafka. Strange no error longs on server logs.

Alice

06/24/2022, 10:47 AM

Does Pinot 0.11 disable groovy by default? I tried Pinotadmin.sh to upload table and it returned this following error😅. {“code”400,“error”“Groovy filter functions are disabled for table config

Michael Latta

06/24/2022, 3:35 PM

When I attempt to create a real time table but get the kafka connect string wrong Pinot is left in an inconsistent state. Attempting to create the table fails with message that the table already exists, but the table is not listed in UI or swagger as an existing table, and an attempt to delete it using swagger fails. I am not sure how to clean this up other than rebuilding the cluster, which is undesirable once we actually start to rely on it.

abhinav wagle

06/24/2022, 9:18 PM

Hi Team, I am trying to load this https://docs.pinot.apache.org/basics/getting-started/pushing-your-data-to-pinot table on a locally running pinot instance and seeing this :

Copy code

{"code":400,"error":"Invalid table config for table transcript_OFFLINE: Failed to find instances with tag: DefaultTenant_OFFLINE for table: transcript_OFFLINE"}`

Diogo Baeder

06/25/2022, 1:48 AM

Hi folks, I need some help to understand how I can do some numeric conversions within queries. I want to be able to convert an integer to

if it matches a certain number X, and to

if it doesn't. More on this thread.

Alice

06/25/2022, 4:31 AM

Hi team, we’re using Pinot master and found kafka configuration about ssl in “streamConfigs” section has changed. Previous version config is like “sasl.jaas.config” “org.apache.kafka.common.security.scram.ScramLoginModule And latest config needs to be like the following: “sasl.jaas.config”: “org.apache.pinot.shaded.org.apache.kafka.common.security.scram.ScramLoginModule So, just to confirm, is this change a temporal plan or it will be used in future versions?

06/25/2022, 4:55 PM

we were trying the peer download as per the following doc. https://docs.pinot.apache.org/operators/operating-pinot/decoupling-controller-from-the-data-path#overview-of-peer-download-policy Noticed that there is an option to download from the replica's peer using the localpinotFS. what should be the value of pinot.server.instance.segment.store.uri when used in combination with file://dir ? Our goal is to bypass controller and rely on the peer download in case of offline assignments of segments. Want to try out the behavior with and without deepstor ..