Sadim Nadeem
07/05/2021, 3:53 PMsend
API. If the original stream is not partitioned, then a streaming processing job (e.g. Flink) is needed to shuffle and repartition the input stream into a partitioned one for Pinot's ingestion.Sadim Nadeem
07/05/2021, 3:58 PMCarlos Domínguez
07/08/2021, 9:40 PMCarlos Domínguez
07/08/2021, 9:40 PMCarlos Domínguez
07/08/2021, 9:42 PMPrashant Pandey
07/13/2021, 5:47 AMSelect api_id, service_name, service_id, api_name, COUNT(*) FROM myTable WHERE tenant_id = 'someTenantId' AND ( api_id IS NOT NULL AND start_time_millis >= 1625039026768 AND start_time_millis < 1625643826768 ) GROUP BY api_id, service_name, service_id, api_name ORDER BY PERCENTILETDIGEST99(duration_millis) desc limit 10000
And these are the query stats:
timeUsedMs: 1077
numDocsScanned: 560325713
totalDocs: 3103044892
numServersQueried: 8
numServersResponded: 8
numSegmentsQueried: 623
numSegmentsProcessed: 115
numSegmentsMatched: 115
numConsumingSegmentsQueried: 4
numEntriesScannedInFilter: 25000000
numEntriesScannedPostFilter: 2801628565
numGroupsLimitReached: false
partialResponse: -
minConsumingFreshnessTimeMs: 1626154723247
The most conspicuous of these stats is numEntriesScannedInFilter
. The troubleshooting guide says that if this number is too high, we should consider adding an index on the column, While we don’t have an index on this, our segment config is:
"segmentsConfig": {
"timeType": "MILLISECONDS",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"timeColumnName": "start_time_millis",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "7",
"replicasPerPartition": "1",
"schemaName": "rawServiceView"
}
As you can see, the timeColumnName
is start_time_millis
and therefore, we haven’t added any index on this column (our reasoning is that segments would be pruned on this column anyway so we don’t need an extra index).
myTable
is a real-time table.
If I remove the filter on start_time_millis
, then numEntriesScannedInFilter
becomes 0.
What are we doing wrong here?Kishore G
Kishore G
Kishore G
Kishore G
Kishore G
Kishore G
Prashant Pandey
07/13/2021, 6:38 AMKishore G
Prashant Pandey
07/13/2021, 6:45 AMBruce Ritchie
07/13/2021, 10:01 PMCaused by: java.lang.NullPointerException
at org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(SystemUtils.java:1626)
at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:207)
at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
Saurabh Dwivedy
07/14/2021, 11:32 AMSaurabh Dwivedy
07/14/2021, 11:32 AMSaurabh Dwivedy
07/14/2021, 11:33 AMSaurabh Dwivedy
07/14/2021, 11:34 AMSaurabh Dwivedy
07/14/2021, 11:34 AMSaurabh Dwivedy
07/14/2021, 11:34 AMSaurabh Dwivedy
07/14/2021, 11:34 AMSaurabh Dwivedy
07/14/2021, 12:07 PMSaurabh Dwivedy
07/14/2021, 12:07 PMSaurabh Dwivedy
07/14/2021, 1:56 PMSaurabh Dwivedy
07/14/2021, 1:56 PMLuiz Gabriel Lima Pinheiro
07/14/2021, 2:59 PMLaunchDataIngestionJob
http endpoint to be called? I could not find in the swagger interface to upload the jobSpec yaml file.Kishore G