https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • s

    Shreeram Goyal

    03/17/2023, 11:15 AM
    Hi, I am using pinot release v0.12.0 and have set my timeboundary as max value of timecolumn of the offline segments using swagger api:
    POST /tables/{tableName}/timeBoundary
    . I tried querying the data residing in offline servers on both pinot query console and presto. On querying, I found that while I get the correct data on pinot query console, the last row is missing on presto. Can someone please help me understand and debug this?
    m
    x
    • 3
    • 29
  • h

    himanshu yadav

    03/17/2023, 1:56 PM
    Hi, has anyone ever tried to bootstrap realtime upsert table using flink our pinot version is 0.11.0 we are facing this issue https://apache-pinot.slack.com/archives/C01S5EHPS2U/p1679035695669099
    g
    • 2
    • 3
  • j

    Jun

    03/19/2023, 4:17 PM
    Hi Team, I found a potential flaky test, could anyone help me confirm that? https://github.com/apache/pinot/issues/10442 (Sorry I should not have post this to #CDRCA57FC)
    m
    s
    • 3
    • 29
  • v

    Varagini Karthik

    03/20/2023, 9:59 AM
    Hi All, I'm trying to execute the
    TEXT_MATCH
    from Trio on Pinot table...... Iam getting the following error
    trino error: line 4:10: Function 'text_match' not registered
    this is my query
    Copy code
    Select *
       from pinot.default.jobTitles
       where TEXT_MATCH(jobTitle, 'Java Developer')
    [10:40 AM] Any idea how to resolve this ... [10:40 AM] Trino version 403 Pinot Version 0.10.0
    m
    • 2
    • 1
  • r

    Rajat Yadav

    03/20/2023, 5:01 PM
    How to enable V2 multi-stage engine in pinot. Can anyone please share the steps and where to add the configurations in helm charts?
  • l

    Lewis Yobs

    03/20/2023, 5:06 PM
    <https://docs.pinot.apache.org/developers/advanced/v2-multi-stage-query-engine#how-to-enable-the-multi-stage-query-engine>
  • s

    Sid

    03/20/2023, 6:43 PM
    Hi team, been exploring apache pinot for the first time. I'm unable to make the filter function work on pinot tables consuming events from kafka. I wanted to filter events based on event_names field in each kafka event. I get the below error and I tried setting up the Groovy field in controller.conf file, still no luck. org.apache.pinot.segment.loca^Cjava.lang.RuntimeException: Caught exception while executing filter function: Caused by: java.lang.NumberFormatException: For input string: "{event_name}" Any help would be appreciated.
    m
    s
    • 3
    • 4
  • r

    Rajat Yadav

    03/21/2023, 5:54 AM
    Hi team, I am executing this query through V2 multi-stage engine:
    Copy code
    SELECT count(*)
    FROM
      (Select COUNT(*)
       from users where country IN ('INDIA')) AS virtual_table
    LIMIT 1000;
    But i am getting the following error:
    Copy code
    [
      {
        "message": "TableDoesNotExistError",
        "errorCode": 190
      }
    ]
    Even though the table is there. Does anyone know why it is happening.??
    m
    r
    +3
    • 6
    • 84
  • a

    arun udaiyar

    03/21/2023, 7:53 AM
    Hi Team, I am using helm chart to run the pinot on kubernetes cluster, now i have one requirement that i need to add java jks file into the container, what is the best way i can follow.
  • r

    Rajat Yadav

    03/21/2023, 9:49 AM
    Hi team, do we have any configuration to enable only V2 multi-stage engine in pinot. @Mayank @guru
    g
    • 2
    • 1
  • s

    Shreeram Goyal

    03/21/2023, 5:41 PM
    I keep getting this error while running query using presto even though I have port opened for grpc @Mayank @Xiang Fu:
    Copy code
    io.grpc.StatusRuntimeException: UNKNOWN
    	at io.grpc.Status.asRuntimeException(Status.java:535)
    	at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:648)
    	at com.facebook.presto.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:204)
    	at com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:295)
    	at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:260)
    	at com.facebook.presto.operator.Driver.processInternal(Driver.java:426)
    	at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:309)
    	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:730)
    	at com.facebook.presto.operator.Driver.processFor(Driver.java:302)
    	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079)
    	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:166)
    	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599)
    	at com.facebook.presto.$gen.Presto_0_279_686ef1d____20230309_045351_1.run(Unknown Source)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:750)
    x
    m
    • 3
    • 88
  • s

    Sid

    03/21/2023, 6:19 PM
    Hi Team, what sort time column should be created when timestamp from kafka events keeps changing in format: few examples are: 2023-03-21T110317.55803331Z 2023-03-09T101458.656Z 2023-03-09T101523.137+00:00 How to standardize this in schema.
    m
    • 2
    • 2
  • j

    Jack Luo

    03/21/2023, 8:33 PM
    Hi Team. I noticed that the native text index seems to allocate memory on the heap rather than using memory mapped pages. In our deployment, with native text-index enabled, the entire heap (192GB) will be consumed every half an hour under our use-case. We had to revert back to the legacy lucene based text-index. Does pinot team plan to support file-backed memory (MMAP) for native-text index in the future?
    m
    a
    • 3
    • 2
  • r

    Rajat Yadav

    03/22/2023, 8:14 AM
    How to delete the old tasks in minion I am seeing those tasks again and again and not able to push new data. Pinot version: 0.10
    m
    • 2
    • 1
  • s

    Shreeram Goyal

    03/22/2023, 12:52 PM
    Hi, I am facing a few issues on querying via presto majorly on offline servers which have major chunk of our data with some tables having 30G of data. I have 6 servers with 32G RAM and have configured 2RGs with 3 servers each. The issues are: 1. I am running heavy queries which are directly routed to servers without involving brokers (checked using explain plan) via presto and am facing memory issues where memory isn't getting released after the query is complete and eventually leading to server going down on another query. I have tried different configs for heap and direct memory and currently my configs are: xmx=16G and DirectMemory=12G. 2. On running multiple queries together, they are all routed to a single RG via presto. This shouldn't be the case ideally or correct me if I am wrong. Would be great if I could get some insights on the potential causes and workarounds if any other than vertical scaling!
  • a

    aj

    03/22/2023, 7:18 PM
    Hi, I asked this in #C016ZKW1EPK earlier and @Mayank suggested I post here instead https://apache-pinot.slack.com/archives/C016ZKW1EPK/p1679508026624729
    k
    m
    s
    • 4
    • 7
  • s

    Sid

    03/23/2023, 6:14 AM
    Hi Team, was discussing with @saurabh dubey and here are few suggestions would appreciate if can be considered to have on Pinot. • API support for schema generation from sample json file. • Logs for pinot servers on UI or through API. Currently during PoC of Pinot I have to check docker logs to see if stream ingestion has any issue.
    👍 1
    m
    • 2
    • 2
  • b

    Bharath

    03/23/2023, 9:11 AM
    Hello..... I'm looking for some help related to apache pinot zookeeper...... My dev team is using pinot for querying data sets. However, the team needs a zookeeper URL to be used to make calls from java application. Currently the
    pinot-controller
    is exposed for accessing UI from AWS EKS cluster. So exposing
    piniot-zookepeer
    similar to pinot-controller would work in this use case? Just not sure about it, so wanted to get a clarification. The Apache Pinot is setup using this on AWS EKS. https://docs.pinot.apache.org/basics/getting-started/kubernetes-quickstart (edited) docs.pinot.apache.org Running in Kubernetes Pinot quick start in Kubernetes
  • t

    Tamás Nádudvari

    03/23/2023, 12:40 PM
    Hi, I ran into a problem when I tried to upgrade from 0.11.0 to 0.12.0. Right after the controller restarted with 0.12.0 it started to throw exceptions about unable to get the consuming segments info for our hybrid table. Did anyone else run into something like this?
    • 1
    • 2
  • r

    Rajat Yadav

    03/23/2023, 1:34 PM
    Hi team, while running queries from superset we are getting error that: 2 out of 4 servers responded the dataset is very large around 700million. we have 4 servers [1core, 15G memory] Does anyone know is this infra issue or query proccessing error??
    m
    • 2
    • 1
  • r

    Rajat Yadav

    03/23/2023, 3:16 PM
    Hi team, We have an existing OFFLINE table and we want to load more segments to that table. Is there any way to do that?
  • m

    Mark Needham

    03/23/2023, 3:50 PM
    yes - you should be able to load more segments the same way you did the initial ones? Just you need to make sure the name of those segments doesn’t clash with the ones you already have
    r
    • 2
    • 5
  • z

    Zhuangda Z

    03/23/2023, 7:42 PM
    Hi folks, I ran into a deserializing problem where it doesn’t support parsing a JSON col
    m
    • 2
    • 6
  • a

    abhinav wagle

    03/24/2023, 3:04 AM
    Hellos any pointers on how to fix :
    BROKER_SEGMENT_UNAVAILABLE_ERROR_CODE
    : 305 Error https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/exception/QueryException.java#L67
    m
    • 2
    • 1
  • m

    Malte Granderath

    03/24/2023, 11:09 AM
    Hey 👋 Is there any way yet to upload segments from the minion directly to the deep store? I saw this thread but maybe something has changed since then
    m
    • 2
    • 3
  • b

    Bharath

    03/24/2023, 12:00 PM
    Hello everyone. Does anyone know how to create the zookeeper url that is running in AWS EKS? Need this zookeeper url string to connect to pinot cluster. I tried several methods but no luck. In short I want to expose the URL with a LoadBalancer. However, the pod failed health checks repeatedly so I couldn't essentially get a LoadBalancer url for zookeeper. Any help would be great. Below is the snippet from the documentation page. https://docs.pinot.apache.org/users/clients/java
  • s

    Sid

    03/24/2023, 1:54 PM
    Hi, Does anyone know how to reduce the number of segments on realtime table. Currently it keeps increasing by the amount of number of kafka events partition, making the queries slow. The merge roll up also is not working on realtime table.
    s
    s
    • 3
    • 10
  • u

    Utsav kansara

    03/25/2023, 1:28 AM
    Hi Guys, I am trying to call controller endpoint to enable logging as per: https://docs.pinot.apache.org/operators/operating-pinot/managing-logs Though for some reason it keeps failing with following exception.
    Copy code
    Mar 25, 2023 12:51:08 AM org.glassfish.jersey.internal.Errors logErrors
    WARNING: The following warnings have been detected: WARNING: Unknown HK2 failure detected:
    MultiException stack 1 of 3
    org.glassfish.hk2.api.UnsatisfiedDependencyException: There was no object available for injection at SystemInjecteeImpl(requiredType=LoggerFileServer,parent=PinotControllerLogger,qualifiers={},position=-1,optional=false,self=false,unqualified=null,1825910288)
    	at org.jvnet.hk2.internal.ThreeThirtyResolver.resolve(ThreeThirtyResolver.java:51)
    	at org.jvnet.hk2.internal.ClazzCreator.resolve(ClazzCreator.java:188)
    	at org.jvnet.hk2.internal.ClazzCreator.resolveAllDependencies(ClazzCreator.java:211)
    	at org.jvnet.hk2.internal.ClazzCreator.create(ClazzCreator.java:334)
    	at org.jvnet.hk2.internal.SystemDescriptor.create(SystemDescriptor.java:463)
    	at org.glassfish.jersey.inject.hk2.RequestContext.findOrCreate(RequestContext.java:59)
    	at org.jvnet.hk2.internal.Utilities.createService(Utilities.java:2102)
    	at org.jvnet.hk2.internal.ServiceLocatorImpl.internalGetService(ServiceLocatorImpl.java:758)
    	at org.jvnet.hk2.internal.ServiceLocatorImpl.internalGetService(ServiceLocatorImpl.java:721)
    	at org.jvnet.hk2.internal.ServiceLocatorImpl.getService(ServiceLocatorImpl.java:691)
    	at org.glassfish.jersey.inject.hk2.AbstractHk2InjectionManager.getInstance(AbstractHk2InjectionManager.java:160)
    	at org.glassfish.jersey.inject.hk2.ImmediateHk2InjectionManager.getInstance(ImmediateHk2InjectionManager.java:30)
    	at org.glassfish.jersey.internal.inject.Injections.getOrCreate(Injections.java:105)
    	at org.glassfish.jersey.server.model.MethodHandler$ClassBasedMethodHandler.getInstance(MethodHandler.java:260)
    	at org.glassfish.jersey.server.internal.routing.PushMethodHandlerRouter.apply(PushMethodHandlerRouter.java:51)
    	at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:86)
    	at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:89)
    	at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:89)
    	at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:89)
    	at org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:89)
    	at org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:69)
    	at org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:38)
    	at org.glassfish.jersey.process.internal.Stages.process(Stages.java:173)
    	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:247)
    	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
    	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
    	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
    	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
    	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
    	at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:356)
    	at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200)
    	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
    	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
    	at java.base/java.lang.Thread.run(Thread.java:829)
    MultiException stack 2 of 3
    java.lang.IllegalArgumentException: While attempting to resolve the dependencies of org.apache.pinot.controller.api.resources.PinotControllerLogger errors were found
    s
    • 2
    • 3
  • s

    Sid

    03/25/2023, 6:45 AM
    Hi Team, my segement generation and push task is in no_started state for a while now: Here is the table config file. What am i missing here. { "OFFLINE": { "tableName": "fullfillment_created_schema_OFFLINE", "tableType": "OFFLINE", "segmentsConfig": { "schemaName": "fullfillment_created_schema", "replication": "1", "replicasPerPartition": "1", "segmentPushType": "APPEND", "timeColumnName": "event_timestamp", "minimizeDataMovement": false, "segmentPushFrequency": "DAILY" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "invertedIndexColumns": [], "noDictionaryColumns": [], "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "sortedColumn": [], "bloomFilterColumns": [], "loadMode": "MMAP", "onHeapDictionaryColumns": [], "varLengthDictionaryColumns": [], "enableDefaultStarTree": false, "enableDynamicStarTreeCreation": false, "aggregateMetrics": false, "nullHandlingEnabled": false, "optimizeDictionary": false, "optimizeDictionaryForMetrics": false, "noDictionarySizeRatioThreshold": 0, "rangeIndexColumns": [], "rangeIndexVersion": 2 }, "metadata": {}, "quota": {}, "task": { "taskTypeConfigsMap": { "SegmentGenerationAndPushTask": { "schedule": "0 */10 * * * ?" } } }, "routing": {}, "query": {}, "ingestionConfig": { "batchIngestionConfig": { "batchConfigMaps": [ { "inputFormat": "json", "inputFormat": "json", "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS", "fs.prop.region": "ap-south-1", "fs.prop.accessKey": "asdasdasd", "fs.prop.secretKey": "asdasdas", "inputDirURI": "s3://json-prod/FulfillmentCreateFailedEvent/event_date=2023-03-24/", "includeFileNamePattern": "glob:**/*.json.gz" } ], "consistentDataPush": false }, "continueOnError": false, "rowTimeValueCheck": false, "segmentTimeValueCheck": true }, "isDimTable": false } }
  • j

    Jack Luo

    03/25/2023, 8:49 AM
    Hi Team, I have an aggregation query like the following:
    Copy code
    EXPLAIN PLAN FOR SELECT 
      zone, 
      count(*) 
    FROM 
      "table" 
    WHERE 
      (
        _timestampMillis <= 1679691885000 
        AND _timestampMillis > 1679432712000
      ) 
      AND (
        text_match(
          "json_data", '"instance*33554433"'
        ) 
        AND json_extract_scalar(
          "json_data", '$.instance', 'INT', 
          0
        ) = 33554433
      ) 
    GROUP BY 
      zone 
    ORDER BY 
      count(*) desc 
    LIMIT 
      10
    The goal is to perform exact match of JSON documents by first perform a fuzzy
    text_match
    and then perform
    json_extract_scalar
    only on the matching rows. The reason for using approach to search JSON rather than leverage the JSON index is because of much lower memory usage + disk usage, i.e. JSON index is too expensive. However, the default query planner's behavior is not ideal. Although
    text_match
    alone returns result double digit milliseconds,
    text_match
    +
    json_extract_scalar
    returns results 75-100x slower. The root cause I believe is that Pinot's query planner decides to execute
    text_match
    and
    json_extract_scalar
    concurrently rather than one after another. The actual query plan is as follows:
    Copy code
    {
        "rows": [
          [
            "BROKER_REDUCE(sort:[count(*) DESC],limit:10)",
            1,
            0
          ],
          [
            "COMBINE_GROUP_BY",
            2,
            1
          ],
          [
            "PLAN_START(numSegmentsForThisPlan:52)",
            -1,
            -1
          ],
          [
            "GROUP_BY(groupKeys:zone, aggregations:count(*))",
            3,
            2
          ],
          [
            "TRANSFORM_PASSTHROUGH(zone)",
            4,
            3
          ],
          [
            "PROJECT(zone)",
            5,
            4
          ],
          [
            "DOC_ID_SET",
            6,
            5
          ],
          [
            "FILTER_AND",
            7,
            6
          ],
          [
            "FILTER_TEXT_INDEX(indexLookUp:text_index,operator:TEXT_MATCH,predicate:text_match(json_data,'\"instance*33554433\"'))",
            8,
            7
          ],
          [
            "FILTER_RANGE_INDEX(indexLookUp:range_index,operator:RANGE,predicate:(_timestampMillis > '1679432712000' AND _timestampMillis <= '1679691885000'))",
            9,
            7
          ],
          [
            "FILTER_EXPRESSION(operator:EQ,predicate:jsonextractscalar(json_data,'$.instance','INT','0') = '33554433')",
            10,
            7
          ]
        ]
      },
    }
    The optimized query plan for our use case should be the following:
    Copy code
    {
        "rows": [
          [
            "BROKER_REDUCE(sort:[count(*) DESC],limit:10)",
            1,
            0
          ],
          [
            "COMBINE_GROUP_BY",
            2,
            1
          ],
          [
            "PLAN_START(numSegmentsForThisPlan:52)",
            -1,
            -1
          ],
          [
            "GROUP_BY(groupKeys:zone, aggregations:count(*))",
            3,
            2
          ],
          [
            "TRANSFORM_PASSTHROUGH(zone)",
            4,
            3
          ],
          [
            "PROJECT(zone)",
            5,
            4
          ],
          [
            "DOC_ID_SET",
            6,
            5
          ],
          [
            "FILTER_AND",
            7,
            6
          ],
          [
            "FILTER_EXPRESSION(operator:EQ,predicate:jsonextractscalar(json_data,'$.instance','INT','0') = '33554433')",
            8,
            7
          ],
          [
            "FILTER_AND",
            9,
            8
          ],
          [
            "FILTER_RANGE_INDEX(indexLookUp:range_index,operator:RANGE,predicate:(_timestampMillis > '1679432712000' AND _timestampMillis <= '1679691885000'))",
            10,
            9
          ],
          [
            "FILTER_TEXT_INDEX(indexLookUp:text_index,operator:TEXT_MATCH,predicate:text_match(json_data,'\"instance*33554433\"'))",
            11,
            9
          ]
        ]
      },
    }
    Does Pinot team have any plan to implement this optimization in the near future? If not, would Pinot team be interested in a pull request to optimize this query?
    ➕ 1
1...747576...166Latest