Apache Pinot #troubleshooting

Nizar Hejazi

05/24/2022, 9:00 PM

Hey team, deployed two different 0.11.0 nightly builds and found that an equality filter predicate on a field with a sorted column index isn’t working as expected if the segment is in CONSUMING state. Ex:

Copy code

select distinct (company) from role_with_company limit 1000000 -- answer: 51

Queries w/ less than or greater than predicates returns always the correct results:

Copy code

select count(distinct company) from role_with_company where company < '6269223774083d800011fd95' limit 1000000 -- answer: 36
select count(distinct company) from role_with_company where company > '6269223774083d800011fd95' limit 1000000 -- answer: 14

On the other hand, equality predicates when the segment is in CONSUMING state does not return the correct results:

Copy code

select count(distinct company) from role_with_company where company = '6269223774083d800011fd95' limit 1000000 -- answer: 0, when segment is in CONSUMING state

When the segment is COMMITTED, the query returns the correct results:

Copy code

select count(distinct company) from role_with_company where company = '6269223774083d800011fd95' limit 1000000 -- answer: 1, when segment is COMMITTED

Anyone aware of a change in behaviour that was introduced recently? @Richard Startin @Jackie Latest nightly build commit: 0.11.0-SNAPSHOT-438c53b-20220520 Previous nightly build commit: 0.11.0-SNAPSHOT-3403619-20220507

Hello

05/24/2022, 11:35 PM

Hello, How can I round decimals in a pinot query? Like if result is 1.674321, I want it to round to 1.7

Lars-Kristian Svenøy

05/25/2022, 10:24 AM

Hey team 👋 . I'm currently in the process of writing a custom flink job which is able to atomically replace the segments for a pinot refresh table. I've been looking into the segment replacement protocol, and wanted to see if I understand this correctly.. More info in thread

Tommaso Peresson

05/25/2022, 2:45 PM

Hi Everyone. I'm currently setting up a table that has a MV column called

items

containing a list of

item_id

. From what I've tried

distinctcounthllmv()

can't be used as an aggregated function in a star-tree index. Has anyone ever faced a similar problem? If yes how did you solved it? Is it possible to calculate the raw-hll state at ingestion time and then perform the estimation at query time? Thanks everyone for helping

Anish Nair

05/26/2022, 3:31 AM

Hi Everyone, Grouping on high cardinality column, we are observing discrepancy in results. We have already disabled the Server level group trimming at query level . OPTION(minServerGroupTrimSize=-1) Also in result metadata numGroupsLimitReached = false, so at segment level, threshold is not breached. Any other options to explore ? Referencing following doc: https://docs.pinot.apache.org/users/user-guide-query/grouping-algorithm

Atri Sharma

05/26/2022, 7:19 AM

Can you paste the queries here, along with what discrepancy are you seeing?

Tiger Zhao

05/26/2022, 3:01 PM

Hi, I started getting this error when making new table and doing a SegmentCreationAndMetadataPush:

Copy code

ERROR [PinotSegmentUploadDownloadRestletResource] [jersey-server-managed-async-executor-17] Caught internal server exception while uploading segment
java.lang.NullPointerException: Table config is not available for table 'test_table_OFFLINE'

any ideas as to what is causing this? I see that the table has been created successfully.

Andy Li

05/26/2022, 5:11 PM

Hi, we're seeing that for presto-pinot, queries that are direct to broker are returning properly, but queries where presto is masquerading as pinot broker result in empty results. Is there a way to get insight into the particulars of what is run on pinot servers or brokers?

Fernando Barbosa

05/26/2022, 10:25 PM

Hi there, I have a schema in avsc for a kafka topic, can anyone help me transform it to json so that I can feed it to pinot?

Fernando Barbosa

05/26/2022, 10:25 PM

I tried it in python with no luck and due to security issues I can't do it online

Fernando Barbosa

05/26/2022, 10:25 PM

any help would be much appreciated!

abhinav wagle

05/26/2022, 11:25 PM

Hi, I am running into below issue when I point my Pinot Table to a Kafka topic which has data in Avro format. Any pointers on how I can debug/triage on which field is throwing a null pointer. Any help much appreciated. Thanks !

Copy code

Caught exception while reading message using schema: <redacted>
java.io.EOFException: null
	at org.apache.avro.io.BinaryDecoder$ByteArrayByteSource.readRaw(BinaryDecoder.java:966) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:372) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:289) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:209) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:469) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:459) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) ~[pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder.decode(SimpleAvroMessageDecoder.java:89) [pinot-avro-0.10.0-SNAPSHOT-shaded.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder.decode(SimpleAvroMessageDecoder.java:43) [pinot-avro-0.10.0-SNAPSHOT-shaded.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:507) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:416) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:576) [pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar:0.10.0-SNAPSHOT-078c711d35769be2dc4e4b7e235e06744cf0bba7]
	at java.lang.Thread.run(Thread.java:829) [?:?]

Zsolt László

05/27/2022, 10:07 AM

Hey Team, How exactly should one interpret this statement (source)?

PInot currently relies on Pulsar client version 2.7.2. Users should make sure the Pulsar broker is compatible with the this client version.

Say I have a Pulsar cluster of version 2.4.1 in place already; then I won't be able to consume its traffic with the most up-to-date Pinot binary? Thanks in advance for any help!

Fernando Barbosa

05/27/2022, 12:25 PM

[Authentication - SASL_SSL] - Hy everyone, hope you all having a good day. I am trying to create a real time table in pinot. I have the following: • `docker-compose.yml`: with zoopkeeper and pinot broker, controller and server •

schema.json:

containing my table schema (after transforming from ascv as pointed out in here •

table.json:

where I am using `streamConfigs`to pass the confluent authentication keys Two questions: 1. Is it wrong to pass the credentials in that file (please disregard security issues because this is a very local and small test) ? 2. I keep getting a return 500: that says the Consumer couldnt be formed. I would really really appreciate your help. BTW I am following startree recipes. 🆘

Alice

05/27/2022, 1:28 PM

Hi, could Pinot ingest non-json format Kafka stream data? 😅

Fernando Barbosa

05/27/2022, 2:26 PM

Is is possible that these are the same?:

Copy code

"stream.kafka.decoder.prop.schema.registry.rest.url": "<https://xxxxx.uk-central2.gcp.confluent.cloud>",
            "stream.kafka.schema.registry.url": "<https://xxxxx.uk-central2.gcp.confluent.cloud>",

Scott deRegt

05/27/2022, 3:57 PM

Hey folks 👋, I'm having some issues with

spark

Batch Ingestion job when moving from

--master local --deploy-mode client

--master yarn --deploy-mode cluster

(as suggested here for production environments). I would greatly appreciate some guidance from others who have successfully configured this spark job. Details in thread 🧵

✅ 1

Stuart Millholland

05/27/2022, 4:43 PM

I'm having trouble when using tiered storage configuration when moving segments from one server to another. Here's the error message I get: Segment fetcher is not configured for protocol: http, using default Download and move segment immutable_events__0__0__20220527T1606Z from peer with scheme http failed. java.lang.IllegalArgumentException: The input uri list is null or empty

Alice

05/28/2022, 6:29 AM

Hi team, I’m using ebs instead of S3 for segment storage, and there’re 3 controllers in my cluster. I found one of 3 controllers used 80% volume and other 2 controllers used less than 10%. It seems segments are not evenly assigned across these controllers and I’m worried there will not be available volume in the first controller and the cluster will be not available. So is there any way I can do to make data evenly assigned across all controllers?

Diogo Baeder

05/29/2022, 11:32 AM

Hi guys! Is there a way to define the batch ingestion command to pull a job spec file from S3 instead of the local filesystem?

Diogo Baeder

05/29/2022, 12:52 PM

Another question (related to the previous one): after I trigger a

LaunchDataIngestionJob

job with a file spec, for how long do I need to keep that job spec file around? Can it be deleted right after the job finishes, if I downloaded it from somewhere before I triggered the job?

Vishal Garg

05/30/2022, 5:31 AM

Hi Team, I am using Pinot Java client 0.7.1 for querying the Pinot. My query looks something like this:

select metric_1, sum(metric_2) from table where some_filter = 'x' group by 1 limit 100

If I hit this query through Pinot portal, I get the integer value for sum(metric_2) but from pinot Java client it return double value. I am expecting it to return Integer value. My query would be dynamic in nature so can't query type specific data, I am always querying columns as string in the following way

resultSet.getString(row,col)

. Is there any way to configure Java client to read as integer value instead of double?

Mahesh babu

05/30/2022, 5:57 AM

Hi Team ,

Mahesh babu

05/30/2022, 5:57 AM

Facing this issues while running controller 2022/05/30 052033.337 ERROR [CompletionServiceHelper] [grizzly-http-server-0] Connection error java.util.concurrent.ExecutionException: java.net.NoRouteToHostException: No route to host (Host unreachable) at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] at org.apache.pinot.controller.util.CompletionServiceHelper.doMultiGetRequest(CompletionServiceHelper.java:79) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.pinot.controller.api.resources.ServerTableSizeReader.getSegmentSizeInfoFromServers(ServerTableSizeReader.java:69) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.pinot.controller.util.TableSizeReader.getTableSubtypeSize(TableSizeReader.java:181) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.pinot.controller.util.TableSizeReader.getTableSizeDetails(TableSizeReader.java:101) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] at org.apache.pinot.controller.api.resources.TableSize.getTableSize(TableSize.java:83) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]

Ali Atıl

05/30/2022, 11:14 AM

Hey everyone, Can i change SegmentGenerationAndPushTask.numConcurrentTasksPerInstance setting from a config file ? or do i have to use /cluster/configs rest api for that? I want to parallelize file ingestion tasks. I am using kubernetes deployment.

Mahesh babu

05/30/2022, 1:10 PM

Hi Team, Facing issue while loading into pinto through kafka getting _N value in string fileds even there is a data for those fields can any one help me on this.

Kevin Peng

05/30/2022, 3:39 PM

Hi all, is there a link anywhere to a m1 version of the pinot binary? I'm running into a few build errors and as I fix them I get more so was looking for the easy way out i.e. a pre built binary.

Kevin Peng

05/30/2022, 4:20 PM

I just ran into this issue while trying to build pinot for m1 mac

Copy code

] Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:3.2.1:shade (default) on project pinot-kafka-2.0: Error creating shaded jar:

I really don't need pinot kafka for my current test install is there a way to bypass this or fix the shade issue? Anyone run into this issue before. Before that I ran into issue with the spotless plugin which I commented out in the pom.xml in root folder and in the pinot-common directory.

Sowmya Gowda

05/31/2022, 6:42 AM

Hi Team, I'm facing a issue with pinot datatypes. I have a column jobTitle value as "Staff RN (Med Surg, Ortho/Neuro, GI/GU floor" in my file and defined schema with string datatype only. But I'm getting error while loading into table -

Cannot read single-value from Object[]: [Staff RN (Med Surg,  Ortho/Neuro,  GI/GU floor] for column: jobTitle

✅ 1

Luis Fernandez

05/31/2022, 2:13 PM

hello my friends, my team has been trying to ingest data using the job spec for some weeks now, and it has been quite challenging, we are trying to ingest around 500gb of data which is 2 years of data for our system, we are using apache pinot

0.10.0

we ran into this issue: https://github.com/apache/pinot/pull/8337 so we had to create a script to do the imports daily, however, for some reason pinot servers are exhausting memory (32gbs) and before running the job they are mostly at half capacity what are some of the reasons that our pinot servers would ran out of memory from these ingestion jobs? also we are using the standalone job and we change the input directory in our script every time it finishes daily. Would appreciate any help!