Shreeram Goyal
03/17/2023, 11:15 AMPOST /tables/{tableName}/timeBoundary
. I tried querying the data residing in offline servers on both pinot query console and presto. On querying, I found that while I get the correct data on pinot query console, the last row is missing on presto. Can someone please help me understand and debug this?himanshu yadav
03/17/2023, 1:56 PMJun
03/19/2023, 4:17 PMVaragini Karthik
03/20/2023, 9:59 AMTEXT_MATCH
from Trio on Pinot table......
Iam getting the following error
trino error: line 4:10: Function 'text_match' not registered
this is my query
Select *
from pinot.default.jobTitles
where TEXT_MATCH(jobTitle, 'Java Developer')
[10:40 AM] Any idea how to resolve this ...
[10:40 AM] Trino version 403
Pinot Version 0.10.0Rajat Yadav
03/20/2023, 5:01 PMLewis Yobs
03/20/2023, 5:06 PM<https://docs.pinot.apache.org/developers/advanced/v2-multi-stage-query-engine#how-to-enable-the-multi-stage-query-engine>
Sid
03/20/2023, 6:43 PMRajat Yadav
03/21/2023, 5:54 AMSELECT count(*)
FROM
(Select COUNT(*)
from users where country IN ('INDIA')) AS virtual_table
LIMIT 1000;
But i am getting the following error:
[
{
"message": "TableDoesNotExistError",
"errorCode": 190
}
]
Even though the table is there. Does anyone know why it is happening.??arun udaiyar
03/21/2023, 7:53 AMRajat Yadav
03/21/2023, 9:49 AMShreeram Goyal
03/21/2023, 5:41 PMio.grpc.StatusRuntimeException: UNKNOWN
at io.grpc.Status.asRuntimeException(Status.java:535)
at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:648)
at com.facebook.presto.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:204)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:295)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:260)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:426)
at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:309)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:730)
at com.facebook.presto.operator.Driver.processFor(Driver.java:302)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:166)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599)
at com.facebook.presto.$gen.Presto_0_279_686ef1d____20230309_045351_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Sid
03/21/2023, 6:19 PMJack Luo
03/21/2023, 8:33 PMRajat Yadav
03/22/2023, 8:14 AMShreeram Goyal
03/22/2023, 12:52 PMaj
03/22/2023, 7:18 PMSid
03/23/2023, 6:14 AMBharath
03/23/2023, 9:11 AMpinot-controller
is exposed for accessing UI from AWS EKS cluster. So exposing piniot-zookepeer
similar to pinot-controller would work in this use case? Just not sure about it, so wanted to get a clarification.
The Apache Pinot is setup using this on AWS EKS.
https://docs.pinot.apache.org/basics/getting-started/kubernetes-quickstart (edited)
docs.pinot.apache.org
Running in Kubernetes
Pinot quick start in KubernetesTamás Nádudvari
03/23/2023, 12:40 PMRajat Yadav
03/23/2023, 1:34 PMRajat Yadav
03/23/2023, 3:16 PMMark Needham
03/23/2023, 3:50 PMZhuangda Z
03/23/2023, 7:42 PMabhinav wagle
03/24/2023, 3:04 AMBROKER_SEGMENT_UNAVAILABLE_ERROR_CODE
: 305 Error https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/exception/QueryException.java#L67Malte Granderath
03/24/2023, 11:09 AMBharath
03/24/2023, 12:00 PMSid
03/24/2023, 1:54 PMUtsav kansara
03/25/2023, 1:28 AMMar 25, 2023 12:51:08 AM org.glassfish.jersey.internal.Errors logErrors
WARNING: The following warnings have been detected: WARNING: Unknown HK2 failure detected:
MultiException stack 1 of 3
org.glassfish.hk2.api.UnsatisfiedDependencyException: There was no object available for injection at SystemInjecteeImpl(requiredType=LoggerFileServer,parent=PinotControllerLogger,qualifiers={},position=-1,optional=false,self=false,unqualified=null,1825910288)
at org.jvnet.hk2.internal.ThreeThirtyResolver.resolve(ThreeThirtyResolver.java:51)
at org.jvnet.hk2.internal.ClazzCreator.resolve(ClazzCreator.java:188)
at org.jvnet.hk2.internal.ClazzCreator.resolveAllDependencies(ClazzCreator.java:211)
at org.jvnet.hk2.internal.ClazzCreator.create(ClazzCreator.java:334)
at org.jvnet.hk2.internal.SystemDescriptor.create(SystemDescriptor.java:463)
at org.glassfish.jersey.inject.hk2.RequestContext.findOrCreate(RequestContext.java:59)
at org.jvnet.hk2.internal.Utilities.createService(Utilities.java:2102)
at org.jvnet.hk2.internal.ServiceLocatorImpl.internalGetService(ServiceLocatorImpl.java:758)
at org.jvnet.hk2.internal.ServiceLocatorImpl.internalGetService(ServiceLocatorImpl.java:721)
at org.jvnet.hk2.internal.ServiceLocatorImpl.getService(ServiceLocatorImpl.java:691)
at org.glassfish.jersey.inject.hk2.AbstractHk2InjectionManager.getInstance(AbstractHk2InjectionManager.java:160)
at org.glassfish.jersey.inject.hk2.ImmediateHk2InjectionManager.getInstance(ImmediateHk2InjectionManager.java:30)
at org.glassfish.jersey.internal.inject.Injections.getOrCreate(Injections.java:105)
at org.glassfish.jersey.server.model.MethodHandler$ClassBasedMethodHandler.getInstance(MethodHandler.java:260)
at org.glassfish.jersey.server.internal.routing.PushMethodHandlerRouter.apply(PushMethodHandlerRouter.java:51)
at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:86)
at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:89)
at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:89)
at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:89)
at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:89)
at org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:69)
at org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:38)
at org.glassfish.jersey.process.internal.Stages.process(Stages.java:173)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:247)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:356)
at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
at java.base/java.lang.Thread.run(Thread.java:829)
MultiException stack 2 of 3
java.lang.IllegalArgumentException: While attempting to resolve the dependencies of org.apache.pinot.controller.api.resources.PinotControllerLogger errors were found
Sid
03/25/2023, 6:45 AMJack Luo
03/25/2023, 8:49 AMEXPLAIN PLAN FOR SELECT
zone,
count(*)
FROM
"table"
WHERE
(
_timestampMillis <= 1679691885000
AND _timestampMillis > 1679432712000
)
AND (
text_match(
"json_data", '"instance*33554433"'
)
AND json_extract_scalar(
"json_data", '$.instance', 'INT',
0
) = 33554433
)
GROUP BY
zone
ORDER BY
count(*) desc
LIMIT
10
The goal is to perform exact match of JSON documents by first perform a fuzzy text_match
and then perform json_extract_scalar
only on the matching rows. The reason for using approach to search JSON rather than leverage the JSON index is because of much lower memory usage + disk usage, i.e. JSON index is too expensive. However, the default query planner's behavior is not ideal. Although text_match
alone returns result double digit milliseconds, text_match
+ json_extract_scalar
returns results 75-100x slower. The root cause I believe is that Pinot's query planner decides to execute text_match
and json_extract_scalar
concurrently rather than one after another. The actual query plan is as follows:
{
"rows": [
[
"BROKER_REDUCE(sort:[count(*) DESC],limit:10)",
1,
0
],
[
"COMBINE_GROUP_BY",
2,
1
],
[
"PLAN_START(numSegmentsForThisPlan:52)",
-1,
-1
],
[
"GROUP_BY(groupKeys:zone, aggregations:count(*))",
3,
2
],
[
"TRANSFORM_PASSTHROUGH(zone)",
4,
3
],
[
"PROJECT(zone)",
5,
4
],
[
"DOC_ID_SET",
6,
5
],
[
"FILTER_AND",
7,
6
],
[
"FILTER_TEXT_INDEX(indexLookUp:text_index,operator:TEXT_MATCH,predicate:text_match(json_data,'\"instance*33554433\"'))",
8,
7
],
[
"FILTER_RANGE_INDEX(indexLookUp:range_index,operator:RANGE,predicate:(_timestampMillis > '1679432712000' AND _timestampMillis <= '1679691885000'))",
9,
7
],
[
"FILTER_EXPRESSION(operator:EQ,predicate:jsonextractscalar(json_data,'$.instance','INT','0') = '33554433')",
10,
7
]
]
},
}
The optimized query plan for our use case should be the following:
{
"rows": [
[
"BROKER_REDUCE(sort:[count(*) DESC],limit:10)",
1,
0
],
[
"COMBINE_GROUP_BY",
2,
1
],
[
"PLAN_START(numSegmentsForThisPlan:52)",
-1,
-1
],
[
"GROUP_BY(groupKeys:zone, aggregations:count(*))",
3,
2
],
[
"TRANSFORM_PASSTHROUGH(zone)",
4,
3
],
[
"PROJECT(zone)",
5,
4
],
[
"DOC_ID_SET",
6,
5
],
[
"FILTER_AND",
7,
6
],
[
"FILTER_EXPRESSION(operator:EQ,predicate:jsonextractscalar(json_data,'$.instance','INT','0') = '33554433')",
8,
7
],
[
"FILTER_AND",
9,
8
],
[
"FILTER_RANGE_INDEX(indexLookUp:range_index,operator:RANGE,predicate:(_timestampMillis > '1679432712000' AND _timestampMillis <= '1679691885000'))",
10,
9
],
[
"FILTER_TEXT_INDEX(indexLookUp:text_index,operator:TEXT_MATCH,predicate:text_match(json_data,'\"instance*33554433\"'))",
11,
9
]
]
},
}
Does Pinot team have any plan to implement this optimization in the near future? If not, would Pinot team be interested in a pull request to optimize this query?