Apache Pinot #troubleshooting

Hey, this is more of a question to validate, you cannot have a multiple value / array Metric field in Pinot, correct? i.e. for Dimensions you can specify

Copy code

"singleValueField": false

Is there something analogous for Metrics? I ask because we would like to keep our records at a certain granularity with a couple of metrics that are arrays, but we want to keep them metrics to be able to do aggregates on them.

Tiger Zhao

08/15/2022, 9:55 PM

Hi, looking at https://github.com/apache/pinot/pull/8032 from the 0.10.0 release, my understanding is that we can now batch ingest segments into realtime tables? I just tried doing this with

pinot-admin.sh LaunchDataIngestionJob

but I get a

Failed to decode table config from JSON

error. Is this expected?

Andrew Sunarto

08/16/2022, 1:09 AM

Hi guys, I’m new to pinot. I have a realtime table filled with 2603 documents from a Kafka producer, which continues to produce messages to the Kafka topic. The number of documents in pinot from this topic is no longer increasing, so I’m wondering what possibly could have caused this. Reported/Estimated size for this table is 0 bytes (although there are 2603 valid documents in the table) and the status for each of the table segments is Good and the server is Consuming in the replica set. Any ideas I can try to get pinot to continue collecting the messages from the kafka sink?

Ehsan Irshad

08/16/2022, 8:15 AM

Hi. We have pinot setup on local ec2 instances (no Kubernetes). Just wondering where zookeeper config is stored, or do we have to create it manually zoo.conf in /conf directory? In my Zookeeper Browser I see the zookeeper config empty. Problem is zookeeper is eating all the disk we assign to it, so to me it seems some logs are not purged. Following this thread in particular to debug (@Mayank hope heading in right direction ?) https://apache-pinot.slack.com/archives/C011C9JHN7R/p1650312581517129

Priyank Bagrecha

08/16/2022, 10:15 PM

We are deploying pinot on k8s cluster using the community provided helm charts. We are using spot instances for the k8s cluster. Do we have to rebalance the cluster to allow re-assignment whenever a pod gets replaced as the spot instances goes away?

Priyank Bagrecha

08/16/2022, 11:54 PM

One more question - We are loading data for an offline table using a spark job. We see some segments go into ERROR state. What makes a segment go into ERROR state v/s OFFLINE? Does it mean the download of the segment from S3 to local server disc failed? If the server retries to grab the segment, how many retries does it do, and is it configurable? Will

Reload Segment

force download of segment from S3 to pinot server disc?

Tanmesh Mishra

08/17/2022, 9:11 PM

Hey team, I am working on this PR and have a question

Neeraja Sridharan

08/17/2022, 9:31 PM

Just wanted to check if there is a recipe for broker side pruning. More info in this thread:

Scott deRegt

08/17/2022, 10:52 PM

Hey team 👋 I'm running into an issue with offline segments in

error

state. When checking

debug

endpoint and server logs, am seeing

java.nio.file.NoSuchFileException

on specific segment files local paths. I've tried rebalancing servers w/

downtime

bootstrap

toggled as well as running reload segment with

forceDownload

set to

true

but still can't seem to clear the `error`s. any tips on how to repair this error state?

✅ 1

Ryan Ruane

08/18/2022, 4:58 PM

Hi guys, I was wondering whether it would be possible to start an entire cluster in a single docker image rather than having to spin up one for each part of the cluster with docker compose. I know that the quick start function will spin up a working cluster, but from the documentation I only see it providing preexisting tables. I would like to create a custom table with all cluster nodes within on docker image. Do you happen to know whether this would be easily achievable or desirable? The reason I'm asking concerns gitlab ci and issues I was having with trying to impose a dependency ordering. From what understand, gitlab ci lack some of docker composes features, such as a restart policy, etcetera.

Ryan Ruane

08/18/2022, 5:00 PM

Secondarily, on a completely unrelated note, does anybody know how it would be possible to extract two values from Json in a query, and then composed them into an array. I'm curious to know whether it is possible to extract three values by key, and return them as an array of integers.

Luis Fernandez

08/18/2022, 7:17 PM

question: can I add auth to the controller UI but not necessarily to the cluster as a whole (?) like can I add auth to the UI without requests to the broker having to be authorize via CURL?

Timothy James

08/19/2022, 12:32 AM

Hi Pinot heroes like @Mayank! I'm attempting to use a minion merge rollup task (as @Mayank helpfully suggested), but it seems to do... nothing. The segments aren't getting merged and I'm not seeing any relevant logging in minion or controller logs. Details in thread... but here's what my OFFLINE table's task definition looks like:

Copy code

"task": {
      "taskTypeConfigsMap": {
        "MergeRollupTask": {
          "5min.bucketTimePeriod": "5min",
          "5min.bufferTimePeriod": "15min",
          "1hour.bucketTimePeriod": "1h",
          "1hour.bufferTimePeriod": "2h",
          "1day.bucketTimePeriod": "1d",
          "1day.bufferTimePeriod": "1d"
        }
      }
    },

Rafael Jeon

08/19/2022, 9:19 AM

Hi I’m trying to start quick upsert examples with docker environment. But I got following error.

Copy code

docker run \
    -p 9000:9000 \
    apachepinot/pinot:0.9.3 QuickStart \
    -type upsert_json_index
Unable to find image 'apachepinot/pinot:0.9.3' locally
0.9.3: Pulling from apachepinot/pinot
a2abf6c4d29d: Pull complete
2bbde5250315: Pull complete
202a34e7968e: Pull complete
8c484b17211c: Pull complete
c79d6edef3e3: Pull complete
9335053f1957: Pull complete
99fa3710378d: Pull complete
dd4c492811d1: Pull complete
1979ceaa5442: Pull complete
Digest: sha256:fa8e27a6b81732ea238f0c41f85ba2f1f4578e1011c40d007e938f00bb59fb5d
Status: Downloaded newer image for apachepinot/pinot:0.9.3
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/pinot/lib/pinot-all-0.9.3-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.9.3-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-yammer/pinot-yammer-0.9.3-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-metrics/pinot-dropwizard/pinot-dropwizard-0.9.3-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-environment/pinot-azure/pinot-azure-0.9.3-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.9.3-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/opt/pinot/lib/pinot-all-0.9.3-jar-with-dependencies.jar) to method java.lang.Object.finalize()
WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
***** Starting Kafka *****
***** Starting meetup data stream and publishing to Kafka *****
javax.websocket.DeploymentException: Handshake error.
	at org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:656)
	at org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:694)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:848)
	at java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
	at org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
	at org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
	at org.apache.pinot.tools.streams.MeetupRsvpStream.run(MeetupRsvpStream.java:71)
	at org.apache.pinot.tools.UpsertJsonQuickStart.execute(UpsertJsonQuickStart.java:86)
	at org.apache.pinot.tools.admin.command.QuickStartCommand.execute(QuickStartCommand.java:161)
	at org.apache.pinot.tools.Command.call(Command.java:33)
	at org.apache.pinot.tools.Command.call(Command.java:29)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
	at picocli.CommandLine.access$1300(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
	at picocli.CommandLine.execute(CommandLine.java:2078)
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:161)
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:192)
Caused by: org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 404.
	at org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:299)
	at org.glassfish.tyrus.container.grizzly.client.GrizzlyClientFilter.handleHandshake(GrizzlyClientFilter.java:322)
	at org.glassfish.tyrus.container.grizzly.client.GrizzlyClientFilter.handleRead(GrizzlyClientFilter.java:291)
	at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
	at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
	at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
	at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:515)
	at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
	at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
	at java.base/java.lang.Thread.run(Thread.java:829)

Mark Needham

08/19/2022, 10:27 AM

I think that's because meetup disabled the RSVP API

👍 1

Mark Needham

08/19/2022, 10:27 AM

can you try to use 0.10.0

Mark Needham

08/19/2022, 10:27 AM

I think that quickstart has been fixed in that verson

Priyank Bagrecha

08/19/2022, 8:36 PM

Hello, I am looking for general guidance here. We are loading data for offline table from AWS S3 using the job spec for spark job. The segment size is ~ 400 MB on the disc. I am noticing that the servers run into OOM while trying to transitioning segment state after downloading it to the server disc. We are using 15 servers with 4 cpu and 32 GB ram and using 16 GB for heap and also using offheap. The servers have 2 TB disc each i.e. total of 30 TB disc space, and we are loading a total of 2 TB of data. We have also configured inverted index on top of some fields in the data.

Priyank Bagrecha

08/19/2022, 8:58 PM

How does Pinot, and specially server, behave differently between online v/s offline tables?

Sukesh Boggavarapu

08/19/2022, 9:03 PM

I have a real time table with 3 day retention. And offline table that basically gets data loaded daily through an offline ingestion job . So for example, when I query for data for yesterday, data gets queried only from offline table despite real time table also having it? Just wanted to check that offline table always has the precedence over real time table?

Jay Bhatt

08/22/2022, 2:55 PM

Hello folks 👋, new to the pinot world, just wanted clarification on one point. We have a

Long

type column in one of the tables. Going ahead, for more precision, we want to maintain a

Double

type of column. Would it be possible to directly update the column type in the schema or should we add a new column and delete the old one. Referred the schema evolution link, but couldn't determine how to proceed from it.

Stuart Millholland

08/22/2022, 4:52 PM

So we found ourselves in a position where we had to decrease kafka partitions for realtime and ended up deleting our consuming segments. We are looking for how we tell Pinot to start creating consuming segments again.

Stuart Millholland

08/22/2022, 7:00 PM

We've noticed realtime consumption "Seeking to offset" logs are super chatty in the realtime servers. Is there a way to tweak that?

Luis Fernandez

08/22/2022, 7:28 PM

hey friends, i come to you with a query optimization question, we have the following query:

select sum(impression_count) from metrics where user_id = xxx and product_id = xxxx

this query even tho it has more selectivity is slower than this one:

select sum(impression_count) from metrics where user_id = xxx

we currently are partitioning on the

user_id

and have a bloom filter on

product_id,user_id

is it because of the partitioning (?) also we do see an elevation of

numEntriesScannedInFilter

when we also add the

product_id

in the query but without it is pretty much 0, what do you recommend to do in this case.

Mahesh babu

08/23/2022, 7:08 AM

Hi Team, Pinot Realtime tables connected to Kafka topics stops updating after sometime even though Kafka topics gets populated this is the debug information [ { "tableName": "DISH12_REALTIME", "numSegments": 1, "numServers": 1, "numBrokers": 1, "segmentDebugInfos": [ { "segmentName": "DISH12__0__0__20220817T1159Z", "serverState": { "Server_10.90.21.84_8098": { "idealState": "CONSUMING", "externalView": "CONSUMING", "segmentSize": "0 bytes", "consumerInfo": { "segmentName": "DISH12__0__0__20220817T1159Z", "consumerState": "NOT_CONSUMING", "lastConsumedTimestamp": 1660823957234, "partitionToOffsetMap": { "0": "101151" } }, "errorInfo": { "timestamp": "2022-08-18 115922 UTC", "errorMessage": "Could not build segment", "stackTrace": null } } } } ], "serverDebugInfos": [], "brokerDebugInfos": [], "tableSize": { "reportedSize": "0 bytes", "estimatedSize": "0 bytes" }, "ingestionStatus": { "ingestionState": "UNHEALTHY", "errorMessage": "Segment: DISH12__0__0__20220817T1159Z is not being consumed on server: Server_10.90.21.84_8098" } }

Tony Zhang

08/23/2022, 7:32 AM

Hey Guys, Is the Tiered Storage available for the pinot? if so, how can I configure it to AWS S3? thanks.

Kaushik Ranganath

06/07/2021, 5:19 AM

When I do a kubectl get all -n pinot-quickstart, I see this has brought up classic load balancers to expose both the broker and the controller on tcp ports, and when I make a curl/browser request to the DNSs, I expect these to show up the UI for the broker and the Swagger UI for the controller, but the request times out eventually without bringing up the UI. I am a beginner in AWS Networking as well, but the security groups created by these setup instructions which I have followed exactly allows TCP requests from 0.0.0.0/0. Any inputs on bringing the UI up for the broker and server are much appreciated!

Luis Fernandez

08/23/2022, 7:33 PM

hello friends me again, with a different issue: we are executing queries in pinot and are getting the following exception at query time, not for all of them just a few:

Copy code

QueryExecutionError:\njava.lang.IndexOutOfBoundsException\n	at java.base/java.nio.Buffer.checkBounds(Buffer.java:714)\n	at java.base/java.nio.DirectByteBuffer.get(DirectByteBuffer.java:288)\n	at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getStringCompressed(VarByteChunkSVForwardIndexReader.java:81)\n	at org.apache.pinot.segment.local.segment.index.readers.forward.VarByteChunkSVForwardIndexReader.getString(VarByteChunkSVForwardIndexReader.java:61)

query looks like this:

Copy code

SELECT SUM(impression_count) as imp_count, stemmed_query FROM query_metrics WHERE user_id = xxx AND product_id = xxx AND serve_time BETWEEN 1660622400 AND 1661227199 GROUP BY stemmed_query ORDER BY impression_count LIMIT 100000

stats:

Copy code

"numServersQueried": 2,
    "numServersResponded": 2,
    "numSegmentsQueried": 11,
    "numSegmentsProcessed": 10,
    "numSegmentsMatched": 10,
    "numConsumingSegmentsQueried": 1,
    "numDocsScanned": 16241,
    "numEntriesScannedInFilter": 5862,
    "numEntriesScannedPostFilter": 64964,
    "numGroupsLimitReached": false,
    "totalDocs": 77203847,
    "timeUsedMs": 133,
    "offlineThreadCpuTimeNs": 0,
    "realtimeThreadCpuTimeNs": 0,
    "offlineSystemActivitiesCpuTimeNs": 0,
    "realtimeSystemActivitiesCpuTimeNs": 0,
    "offlineResponseSerializationCpuTimeNs": 0,
    "realtimeResponseSerializationCpuTimeNs": 0,
    "offlineTotalCpuTimeNs": 0,
    "realtimeTotalCpuTimeNs": 0,
    "segmentStatistics": [],
    "traceInfo": {},
    "minConsumingFreshnessTimeMs": 1661283161852,
    "numRowsResultSet": 100

Devang Shah

08/23/2022, 11:37 PM

Another question: Searched through the logs for the job that has been CrashLooping. Here's what I found. Any clues on how I can fix this?

Defaulted container "pinot-add-example-realtime-table-json" out of: pinot-add-example-realtime-table-json, pinot-add-example-realtime-table-avro

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jarfile/opt/pinot/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jarfile/opt/pinot/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.11.0-SNAPSHOT-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jarfile/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.11.0-SNAPSHOT-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jarfile/opt/pinot/plugins/pinot-metrics/pinot-yammer/pinot-yammer-0.11.0-SNAPSHOT-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jarfile/opt/pinot/plugins/pinot-metrics/pinot-dropwizard/pinot-dropwizard-0.11.0-SNAPSHOT-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jarfile/opt/pinot/plugins/pinot-environment/pinot-azure/pinot-azure-0.11.0-SNAPSHOT-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jarfile/opt/pinot/plugins/pinot-stream-ingestion/pinot-pulsar/pinot-pulsar-0.11.0-SNAPSHOT-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.codehaus.groovy.reflection.CachedClass (file:/opt/pinot/lib/pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar) to method java.lang.Object.finalize() WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.reflection.CachedClass WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 2022/08/23 232955.035 INFO [AddTableCommand] [main] Executing command: AddTable -tableConfigFile /var/pinot/examples/airlineStats_realtime_table_config.json -schemaFile /var/pinot/examples/airlineStats_schema.json -controllerProtocol http -controllerHost pinot-controller -controllerPort 9000 -user null -password [hidden] -exec 2022/08/23 232955.765 INFO [AddTableCommand] [main] {"code":500,"error":"org.apache.pinot.shaded.org.apache.kafka.common.KafkaException: Failed to construct kafka consumer"}

Devang Shah

08/23/2022, 11:31 PM

Hello friends, I have been busy deploying Pinot Quickstart on Kubernetes in Azure. Everything went well until deploying pods and services and seeing them running. I was even able to access the controller through my browser. But, then the moments of glory did not last too long. My controller is not responding anymore. Cant load the broker web page. See the screenshot below of the current state of the pods and services. Except for the failure of a job, I don't see any other pod or service failing. Has anyone faced this issue? Do you know what could be going wrong? Could it be that my k8s cluster is undersized?