Anish Nair
12/30/2021, 8:00 AMSyed Akram
01/03/2022, 8:49 AMeywek
01/03/2022, 2:46 PMtimestamp
and I’m unable to use PrestoDB since double quotes are removedYeongju Kang
01/04/2022, 5:43 AMselect * from user123 where user_id = '1' -- returned 2 rows
select * from user123_REALTIME where user_id = '1' -- returned 1 row
select * from user123_OFFLINE where user_id = '1' -- returned 1 row
2. Inconsistent results are returned for realtime table. I met this when I was validating above. I configured
replicasPerPartition --> 2
stream.kafka.broker.list --> 3 brokers' endpoints
stream.kafka.consumer.type --> lowLevel
select user_id, count(*) from user123
group by user_id
-- case 4 servers were queried : returned 102 rows
-- case 3 servers were queried : returned 96 rows
Vibhor Jaiswal
01/05/2022, 11:53 PMpinot.max-rows-per-split-for-segment-queries=2147483647
But trino says session property pinot.max-rows-per-split-for-segment-queries does not exist .Please suggest if you can . i was following this -https://www.mail-archive.com/dev@pinot.apache.org/msg02773.htmlxtrntr
01/06/2022, 12:01 PM# <https://docs.pinot.apache.org/operators/tutorials/deployment-pinot-on-kubernetes#jvm-setting>
resources:
requests:
cpu: 4
memory: 10G
limits:
cpu: 4
memory: 10G
how do i diagnose the cause of slow queries based on the logs?
Processed requestId=93,table=eventsv2-2021-09_OFFLINE,segments(queried/processed/matched/consuming)=10/10/10/-1,schedulerWaitMs=0,reqDeserMs=1,totalExecMs=4883,resSerMs=1,totalTimeMs=4885,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=1194957,scanInFilter=422644416,scanPostFilter=1194957,sched=fcfs,threadCpuTimeNs=0
...
Processed requestId=93,table=eventsv2-2021-09_OFFLINE,segments(queried/processed/matched/consuming)=10/10/10/-1,schedulerWaitMs=0,reqDeserMs=1,totalExecMs=4414,resSerMs=0,totalTimeMs=4415,minConsumingFreshnessMs=-1,broker=Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=1156729,scanInFilter=422867040,scanPostFilter=1156729,sched=fcfs,threadCpuTimeNs=0
the offending query is:
SELECT user, COUNT(*) FROM "eventsv2-2021-09" WHERE IN_SUBQUERY(cell, 'SELECT ID_SET(location) FROM dimTable WHERE id IN (...)') = 1 GROUP BY user HAVING COUNT(user) >= 20 LIMIT 10000000
xtrntr
01/06/2022, 12:04 PMIN_ID_SET
inside IN_SUBQUERY
? i’ve tried, but i seem to keep running into sql quoting issues:
Was expecting one of:\n \")\" ...\n \",\" ...\n <QUOTED_STRING> ...","errorCode":150
Tiger Zhao
01/06/2022, 5:43 PMWeixiang Sun
01/06/2022, 5:57 PMSadim Nadeem
01/07/2022, 5:49 AMArpita Bajpai
01/07/2022, 6:09 AMElon
01/07/2022, 9:45 PM2022/01/07 11:39:58.204 WARN [BaseInstanceSelector] [HelixTaskExecutor-message_handle_thread] Failed to find servers hosting segment: MYSECRETTABLE-1641542240329_2022-01-06_2022-01-06_6 for table: MYSECRETTABLE
Is there any way to reduce/eliminate these errors? We're looking at zk client configuration in helix...Mayank
Mayank
Lars-Kristian Svenøy
01/10/2022, 11:37 AMxtrntr
01/10/2022, 12:12 PMMark Needham
sortedColumn
in your table config. Or have you already tried setting it?Lars-Kristian Svenøy
01/10/2022, 8:23 PMabhinav wagle
01/10/2022, 10:37 PM\t
in my caseAnish Nair
01/11/2022, 7:52 AM"tableIndexConfig": {
"invertedIndexColumns": [
"advertiser_id",
"partner_id",
"market_place_id"
],
"rangeIndexVersion": 1,
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false,
"loadMode": "MMAP",
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"aggregateMetrics": false,
"nullHandlingEnabled": true
When i check the table metadata, i get : http://serve1:9000/segments/reporting_aggregations/metadata?type=REALTIME
{
"reporting_aggregations__0__0__20220111T0608Z": {
"segmentName": "reporting_aggregations__0__0__20220111T0608Z",
"schemaName": null,
"crc": 1605865976,
"creationTimeMillis": 1641883074848,
"creationTimeReadable": "2022-01-11T06:37:54:848 UTC",
"timeGranularitySec": 0,
"startTimeMillis": 1641790800000,
"startTimeReadable": "2022-01-10T05:00:00.000Z",
"endTimeMillis": 1641859200000,
"endTimeReadable": "2022-01-11T00:00:00.000Z",
"segmentVersion": "v3",
"creatorName": null,
"custom": {},
"columns": [],
"indexes": {},
"star-tree-index": null
}
}
and also when checked /pinot/data/server/index/reporting_aggregations_REALTIME/reporting_aggregations__0__0__20220111T0608Z/v3/metadata.properties , following is visible for each and every column in table
column.postal_code.hasDictionary = true
column.postal_code.hasInvertedIndex = true
How to check which is correct? am i missing something in config?Aditya
01/11/2022, 8:57 AM"sortedColumn": [
"user_id"
]
Metadata shows the column is sorted if row count in segment is < 2.5M (org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
and jobType: SegmentCreationAndTarPush
for creating and pushing the segments.
The segment creation job logs shows its creating dictionary (default) for each column.
Is there a limit to number of rows/segment size if we use sorted index?
Could this be related to cardinality of column user_id?Lars-Kristian Svenøy
01/11/2022, 3:08 PMElon
01/11/2022, 11:51 PMSadim Nadeem
01/12/2022, 7:16 AMLars-Kristian Svenøy
01/12/2022, 12:56 PMxtrntr
01/13/2022, 4:41 AMpinot.server.query.executor.num.groups.limit=-1
Prashant Pandey
01/13/2022, 2:59 PMMark Needham
controller.port-9000
change that to:
controller.port=9000
abhinav wagle
01/14/2022, 2:54 AMjava.lang.NoSuchMethodException: org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:669)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:461)
at <http://org.apache.spark.deploy.yarn.ApplicationMaster.org|org.apache.spark.deploy.yarn.ApplicationMaster.org>$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:773)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:772)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:797)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
22/01/13 23:40:15 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.NoSuchMethodException: org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:669)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:461)
at <http://org.apache.spark.deploy.yarn.ApplicationMaster.org|org.apache.spark.deploy.yarn.ApplicationMaster.org>$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:773)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:772)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:797)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
abhinav wagle
01/14/2022, 6:59 PMLaunchDataIngestionJobCommand
. It keeps picking StartKafkaCommand
execution even if I have LaunchDataIngestionJob
passed as parameter. Running on spark 2.4.0
Pinot : 0.10.0
spark-submit \
--master yarn \
--deploy-mode cluster \
--driver-cores 4 \
--executor-cores 4 \
--driver-memory 4g \
--executor-memory 4g \
--class org.apache.pinot.tools.admin.PinotAdministrator /tmp/awagle/pinot-all-0.10.0-SNAPSHOT-jar-with-dependencies.jar LaunchDataIngestionJob -jobSpecFile /tmp/awagle/sparkIngestionJobSpec.yaml