Apache Pinot #troubleshooting

yelim yu

10/21/2021, 3:15 PM

Hi team, My team is struggling on “upsert” function. Here is a problem. We want to make realtime streaming user_state_table as an unique table while there are 3 different events which can update user_state_table. These 3 events have different columns but surely primary key and time dimension (create_time) are all included in these 3. The schema of user_state_table has intersecting columns of these 3 events. Lets say event_1 has columns of (id, create_time, a,b) and event_2 has columns of (id, create_time, c,d). And we set the upsert mode as full. If we publish event_1 -> event_2 ->event_1 in that order, column c and d become null even though we want last event_1 ‘s column a&b overwrite first event_1 while event_2’s colunms are not being null. How could we solve this problem by modifying table json & schema json? Please refer the comment to see how my team suggested schema. Many thks from korea

Philippe Dooze

10/22/2021, 1:23 PM

Hi all I'm new in Pinot and try to installe Pinot with docker composer thanks to the following yaml file https://github.com/apache/pinot/blob/master/docker/images/pinot/docker-compose.yml everything works well during a short while (10 seconds) and then after that the pinot broker container failed (status Exited), in the logs I found the following error message "Cluster structure is not set up for cluster: PinotCluster" any hint or any advice ? Thanks for your help

Grant Sherrick

10/22/2021, 3:09 PM

Hey y’all, what maven and java versions are required to build the repo?

Charles

10/25/2021, 3:03 AM

Hi @Jackie @Mayank Filed 2 issues in git: https://github.com/apache/pinot/issues/7626 https://github.com/apache/pinot/issues/7627

👍 1

Sadim Nadeem

10/25/2021, 5:11 AM

we have pinot realtime data backup on deepstore (gcs )as tar .. data was published on kafka as json but since its pushed on gcs as tar .. how can we restore data back into realtime / offline table .. q1)can we restore the tar segments present on gcs directly into realtime table q2)If not possible to restore into realtime table as of now (since zookeeper metadata also needed and feature may not be ready).. can we restore data stored in deep store as tar into offline table .. and then create hybrid table in order to not lose old data when Disaster Recovery happens .. q3)how to restore into offline table .. using Job Segment Metadata Push or Segment URI Push where can I refer to get the steps to restore into offline table? (also should we use org.apache.pinot.plugin.inputformat.json.JSONRecordReader https://github.com/apache/pinot/blob/master/pinot-plugins/pinot-input-format/pinot[…]/org/apache/pinot/plugin/inputformat/json/JSONRecordReader.java*) in config since data was published on kafka as json for .pinot/RecordReaderSpec.java at master · apache/pinot* some links already explored :- https://docs.pinot.apache.org/basics/getting-started/frequent-questions/operations-faq https://docs.pinot.apache.org/basics/data-import/pinot-file-system/import-from-gcp https://github.com/apache/pinot/blob/master/pinot-plugins/pinot-file-system/pinot-[…]rc/main/java/org/apache/pinot/plugin/filesystem/GcsPinotFS.java https://github.com/apache/pinot/blob/master/pinot-plugins/pinot-segment-uploader/p[…]apache/pinot/plugin/segmentuploader/SegmentUploaderDefault.java https://cwiki.apache.org/confluence/display/PINOT/By-passing+deep-store+requirement+for+Realtime+segment+completion#Bypassingdeeps[…]on-Configchange https://docs.pinot.apache.org/basics/components/table

Tiger Zhao

10/25/2021, 8:33 PM

Hi, I'm trying to run the RealtimeToOfflineSegmentsTask on a minion. I'm using S3 as a deepstore and the minion logs are showing this exception:

java.lang.IllegalStateException: PinotFS for scheme: s3 has not been initialized

. Is there a way to configure the minions to be able to read from S3? I couldn't find anything in the docs. Thanks!

bc Wong

10/25/2021, 10:23 PM

My queries don’t seem to span both OFFLINE and REALTIME tables. How do I debug that? Here’s what I did: 1. Added OFFLINE table via

AddTable

. Loaded data from Oct 1 via

ImportData

. 2. Query

select count(1) from tbl where ds = '2021-10-01'

ran successfully. 3. Added REALTIME table via web ui. Kafka ingested a bunch of data for

ds = '2021-10-03'

. Query shows new data. 4. But the query from #2 now returns no row. I have to query against

tbl_OFFLINE

to see the offline records. Many thanks!

Girish Patel

10/26/2021, 10:21 AM

Hi Team, I have a SaaS service which will keep data globally with data owners across regions. I want to keep their data in respective regions to comply with Data Privacy Laws & Regulations. But I have to run queries globally to do analytics across regions. According to me, there are two ways to do it. 1. Have multi-region Pinot setup with real-time/offline servers localized to specific region. And brokers from across regions will call servers to run queries. 2. Have region wise multiple Pinot setups. Do querying and aggregating result at application level. Any other approach possible ? Which approach looks more feasible and easier to implement ? Thanks !

Tamás Nádudvari

10/26/2021, 1:33 PM

Hello, we’re trying to use pool based assignment. We have three availability zones, a server group in each with different pool tags. The table replication setting is 2. Our issue is that we either have a three sized replica group and the segments will be replicated to each pool (unnecessary increasing the table replication) or a 2 sized replica group but then one of the server groups won’t have any segments to serve. Any idea for a configuration where each server groups are used and the replication of the table remains 2?

Diogo Baeder

10/26/2021, 2:00 PM

Hi folks, I'd like to ask for a bit of help here: I'm still struggling to be able to run integration tests between my app and Pinot. What I need is to be able to start each test with a clean slate, while also making sure I have the tables I need created. I tried deleting the tables and recreating them, tried deleting all segments then creating the tables afterwards (hoping a new segment would automatically be created), and none of these methods are working. How can I accomplish what I need?

suraj kamath

10/27/2021, 6:18 AM

Hi Folks, When running the lookup query I see the below error:

Copy code

"message": "QueryExecutionError:\norg.apache.pinot.spi.exception.BadQueryRequestException: Caught exception while initializing transform function: lookup\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:207)\n\tat org.apache.pinot.core.operator.transform.TransformOperator.<init>(TransformOperator.java:56)\n\tat org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:56)\n\tat org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:83)\n\tat org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:100)\n\tat org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:33)\n\tat org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:296)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:216)\n\tat org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:155)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:139)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)",
    "errorCode": 200
  },

From the error message, it's not very clear to me what's the issue here. Can anyone help ?

Mahesh babu

10/27/2021, 6:31 AM

Hi Folks, How to load orc files in pinot .I used Classname:org.apache.pinot.plugin.inputformat.orc.ORCRecordReader. but failed with schema not provided error.

suraj kamath

10/27/2021, 9:22 AM

Hi Team, Are aggregation functions supported with lookup join? I have the below query

Copy code

SELECT column1
	,column2
	,lookup('dimTable', 'username', 'orgId', orgId, 'userId', userId) AS username
FROM tableA
WHERE column2 IN ('Good')
	AND column1 != 'Unknown'
	AND username = 'user'
 limit 100

This returns me the result, However, the moment I add either a count(*) in the select statement or a "group by" the query keeps on loading. Eg:

Copy code

SELECT column1
	,column2
	,orgId
	,lookup('dimTable', 'username', 'orgId', orgId, 'userId', userId) AS username
FROM tableA
WHERE column2 IN ('Good')
	AND column1 != 'Unknown'
	AND username = 'user'
GROUP BY column1, column2, orgId, username
 limit 100

Lars-Kristian Svenøy

10/27/2021, 10:00 AM

Hi everyone. I’m looking into making pinot rack aware when ingesting data from kafka. I can’t find any documentation on this, is this something that is supported at the moment?

Luis Fernandez

10/27/2021, 7:13 PM

when upgrading pinot do people experience p99 response time spikes?

Abhishek Saini

10/28/2021, 5:59 PM

any pointers ?

hardik

10/29/2021, 6:36 AM

Any clue what might be the issue here? Trying to run ThirdEye after building locally w/o Pinot

Yeongju Kang

10/29/2021, 7:31 AM

hello folks, I’ve been working with feature ‘schema evolution’ on aws EKS cluster. I set ‘pinot.server.instance.reload.consumingSegment’ as true on server part of helm chart, and tried to upsert realtime table with FULL option. Column ‘doNotFailPlease’ was added after schema edit-reloading segment part, but that column never got updated by upsert. Is schema evolution supporting upsert of realtime table? --- FYI, pic #1 : before adding a column pic #2 : schema edit pic #3 : column displays well but doesn’t update pic #4 : my kafka event sample

Sadim Nadeem

10/29/2021, 3:20 PM

When I altered the table and added int type column in real-time table.. Now all old/past ingested rows for that particular newly added column's value in the table got initialized to default value ie integer minimum which was expected But new values for that column that I am ingesting through kafka for published and recently ingested rows after alter table is done also have that newly added column initialized as default value even when json published have value as 0 or 1 but initialized to integer minimum value

Elon

10/29/2021, 9:00 PM

Hi, we noticed that the external view and ideal state match eachother but do not match the server to segment mapping. Is there a way to get them to match? We rebalanced the table and rebuilt the helix tags, still no change. i.e. the external view and ideal state report a segment living on 3 servers but server -> segment map shows the segments living on 3 different servers.

Tony Requist

10/30/2021, 1:08 AM

I am ingesting a table from a Kafka topic with

Copy code

"realtime.segment.flush.threshold.rows": "0",
        "realtime.segment.flush.threshold.time": "4h",
        "realtime.segment.flush.threshold.segment.size": "40M",

and I am storing segment files in S3. I am seeing a huge number of files in S3 like

Copy code

TABLE__0__0__20211029T1835Z.tmp.420158c9-1742-4bd2-bbae-5a59d2205cd2

that are all much smaller than 40M. There are several thousand files. What are these?

Sadim Nadeem

10/31/2021, 4:29 AM

Hi folks .. most probably because of having inverted indexing on high cardinality data .. some memory leak is happening on pinot server pods .. and ram usage gradually increases by few hundred mb every day .. temporary resolution to free heap/ram memory we are thinking of is trying to restart pinot server pods on weekly basis using command:- kubectl delete po dev-pinot-server-0 dev-pinot-server-1 dev-pinot-server-2 -n dev-pinot will it cause any data loss or table and schema and data in table will remain intact after restart/bouncing of pinot server pods.. *ref:*https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/ Delete Pods You can perform a graceful pod deletion with the following command:

Copy code

kubectl delete pods <pod>

Sandeep R

10/31/2021, 11:30 PM

Hi Guys, I stood up pinot cluster with in one node, and created schema and integrated kafka broker to it. I tried inserting some sample data, but it is falling to consumer from pinot, getting below error,

Copy code

), currentOffset=1939499, numRowsConsumedSoFar=1, numRowsIndexedSoFar=1
2021/10/31 23:08:26.026 ERROR [LLRealtimeSegmentDataManager_pnrevents__0__0__20211031T2306Z] [pnrevents__0__0__20211031T2306Z] Caught exception while transforming the record: {
  "fieldToValueMap" : {
    "pt" : null,
    "osia" : null,
    "excludePII" : 0,
    "pcc" : null,
    "lname" : null,
    "mel" : null,
    "tkt" : null,
    "kafkaProps" : null,
    "timestamp" : null,
    "ver" : null,
    "dts" : null,
    "proxyUrl" : "<http://xyz.com:8080|xyz.com:8080>",
    "docid" : null,
    "rlc" : null,
    "rcode" : null,
    "message" : {
      "jsver" : "1",
      "core" : "1g",
      "pcc" : "ABCD",
      "notif" : "abc",
      "ver" : "5",
      "dts" : "20211022",
      "lname" : [ "JONES" ],
      "pt" : "2021-10-22T02:22:48.196",
      "mel" : [ "MULTIPAX.V6@XYZ.COM" ],
      "docid" : "1a0343cdrdc455",
      "rlc" : "1234F"
    },
    "itina" : null,
    "gname" : null,
    "url" : "<https://xyz.com/pir/abc>",
    "agencyName" : "MTT",
    "securityToken" : "xxxxxxxxxxxxx",
    "jsver" : null,
    "core" : null,
    "notif" : null,
    "nama" : null,
    "npnr" : null,
    "kafka" : null,
    "emd" : null,
    "phonea" : null
  },
  "nullValueFields" : [ ]
}
java.lang.RuntimeException: Caught exception while transforming data type for column: message
        at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:120) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
        at org.apache.pinot.segment.local.recordtransformer.CompositeTransformer.transform(CompositeTransformer.java:82) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
        at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:510) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
        at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:417) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
        at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:560) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
        at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.IllegalStateException: Cannot read single-value from Collection: [1, 1g, ABCD, abc, 5, 20211022, [Ljava.lang.Object;@4049e608, 2021-10-22T02:22:48.196, [Ljava.lang.Object;@3eb034ea, 1, 1234F] for column: message
        at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:721) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
        at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardizeCollection(DataTypeTransformer.java:199) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
        at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardize(DataTypeTransformer.java:144) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
        at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:90) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

Kamal Chavda

11/01/2021, 6:38 PM

Hi all, I have created a realtime Pinot table and after the initial snapshot it's not streaming any any new records. I queried the source table and it has new data in it. I'm using debezium connector and it's up and running. Any suggestions/ideas on troubleshooting?

Map

11/02/2021, 3:59 PM

for realtime segments, do we need to specify both

replication

and

replicasPerPartition

? Currently we only have

replicasPerPartition

set 2 but in the segment builds, their configs still show num of replicas as 1

Stuart Coleman

11/02/2021, 8:55 PM

hey - i am trying to use the support for complex type added in 0.8.0 for avro - when i run the example command

bin/pinot-admin.sh AvroSchemaToPinotSchema -timeColumnName fields.hoursSinceEpoch -avroSchemaFile /tmp/test.avsc -pinotSchemaName myTable -outputDir /tmp/test -fieldsToUnnest entries

with the schema in the pr (https://github.com/yupeng9/incubator-pinot/blob/660a70831cf0f7fc5a63c2f2c902c9c1f9[…]pinot-avro-base/src/test/resources/fake_avro_nested_schema.avsc) i get an exception below - any idea what i am doing wrong?

Copy code

Exception caught:

java.lang.RuntimeException: Caught exception while extracting data type from field: entries

                at org.apache.pinot.plugin.inputformat.avro.AvroUtils.extractFieldDataType(AvroUtils.java:252) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

                at org.apache.pinot.plugin.inputformat.avro.AvroUtils.getPinotSchemaFromAvroSchema(AvroUtils.java:69) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

                at org.apache.pinot.plugin.inputformat.avro.AvroUtils.getPinotSchemaFromAvroSchemaFile(AvroUtils.java:148) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

                at org.apache.pinot.tools.admin.command.AvroSchemaToPinotSchema.execute(AvroSchemaToPinotSchema.java:99) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

                at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:166) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

                at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:186) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

Caused by: java.lang.IllegalStateException: Not one field in the RECORD schema

                at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:444) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

                at org.apache.pinot.plugin.inputformat.avro.AvroUtils.extractSupportedSchema(AvroUtils.java:280) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

                at org.apache.pinot.plugin.inputformat.avro.AvroUtils.extractFieldDataType(AvroUtils.java:247) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

                ... 5 more

Anish Nair

11/03/2021, 11:15 AM

HI Team, We are doing pinot poc for offline ingestions currently. Currently facing an issue , while ingesting segment from s3 to pinot.

Copy code

2021/11/03 08:29:22.109 INFO [SegmentFetcherFactory] [HelixTaskExecutor-message_handle_thread] Segment fetcher is not configured for protocol: s3, using default
2021/11/03 08:29:22.109 WARN [PinotFSSegmentFetcher] [HelixTaskExecutor-message_handle_thread] Caught exception while fetching segment from: <s3://pinot-db/pinot-ingestion/mytable/mytable_OFFLINE_2021091800_2021091800_0.tar.gz> to: /tmp/data/pinotSegments/mytable_OFFLINE/tmp-mytable_OFFLINE_2021091800_2021091800_0-90b8d75e-b2e8-4e4f-b115-36e5528c37cf/mytable_OFFLINE_2021091800_2021091800_0.enc
java.lang.IllegalStateException: PinotFS for scheme: s3 has not been initialized
        at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:518) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
        at org.apache.pinot.spi.filesystem.PinotFSFactory.create(PinotFSFactory.java:78) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]

Following are our conf: Server conf: pinot.server.instance.enable.split.commit=true pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.server.storage.factory.s3.region=us-east-1 pinot.server.segment.fetcher.protocols=s3 pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher Controller conf: controller.data.dir=s3://pinot-db/ controller.local.temp.dir=/tmp/pinot/ controller.enable.split.commit=true pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS pinot.controller.storage.factory.s3.region=us-east-1 pinot.controller.segment.fetcher.protocols=file,http,s3 pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

Luis Fernandez

11/03/2021, 3:36 PM

I have been asking around but is there any desire to make pinot pagination work with group by? my current use case kinda would need pagination

Map

11/04/2021, 5:54 PM

We run a realtime table

table1

with fields

upsert

mode. When a new field

is added to the schema, a simple query

Copy code

select * from table1 limit 10

in the Pinot explorer will return the following error:

Copy code

[
  {
    "message": "MergeResponseError:\nData schema mismatch between merged block:  [X(DOUBLE)] and block to merge:  [X(DOUBLE),Y(DOUBLE)], drop block to merge",
    "errorCode": 500
  }
]

However, the following query would work as expected

Copy code

select * from table1 limit 10 option (skipUpsert=True)

Has anyone seen this before?

Yeongju Kang

11/05/2021, 7:23 AM

It seems like “realtime.segment.flush.threshold.time” doesn’t work. I set it as 30m and put some data and waited, but its segment size didn’t increase. My config location is REALTIME > tableIndexConfig > streamConfigs > “realtime.segment.flush.threshold.time”: “30m” <UPDATE> FYI, 1h config works. Is there something like floor value internally?