https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • d

    Dan Hill

    06/30/2020, 11:11 PM
    I have a Kubernetes batch job that runs a LaunchDataIngestionJob. If the job fails, the kubernetes job is still marked as succeeded and completed. This seems like a bug. I'd expect it to indicate that the job failed.
    Copy code
    kubectl get pods --namespace $NAMESPACE
    NAME                                                READY   STATUS        RESTARTS   AGE
    ...
    pinot-populate-local-data-hwpdm                     0/1     Completed     0          14s
    Copy code
    kubectl logs --namespace $NAMESPACE pinot-populate-local-data-hwpdm     
    ...
    
    java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
    ...
    Copy code
    kubectl describe --namespace $NAMESPACE pod/pinot-populate-local-data-hwpdm 
    ...
    Status:       Succeeded
  • d

    Dan Hill

    06/30/2020, 11:12 PM
    Copy code
    # TODO - is outputDirURI set correctly?
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: pinot-local-data-config
    data:
      local_batch_job_spec.yaml: |-
        executionFrameworkSpec:
          name: 'standalone'
          segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
          segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
          segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
        jobType: SegmentCreationAndTarPush
        inputDirURI: '/home/pinot/local-raw-data/'
        outputDirURI: '/tmp/metrics/segments/'
        overwriteOutput: true
        pinotFSSpecs:
          - scheme: file
            className: org.apache.pinot.spi.filesystem.LocalPinotFS
        recordReaderSpec:
          dataFormat: 'json'
          className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
        tableSpec:
          tableName: 'metrics'
          schemaURI: '<http://pinot-controller:9000/tables/metrics/schema>'
          tableConfigURI: '<http://pinot-controller:9000/tables/metrics>'
        pinotClusterSpecs:
          - controllerURI: '<http://pinot-controller:9000>'
    
    ---
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: pinot-populate-local-data
    spec:
      template:
        spec:
          containers:
            - name: pinot-populate-local-data
              image: apachepinot/pinot:0.4.0
              args: [ "LaunchDataIngestionJob", "-jobSpecFile", "/home/pinot/pinot-config/local_batch_job_spec.yaml" ]
              volumeMounts:
                - name: pinot-local-data-config
                  mountPath: /home/pinot/pinot-config
                - name: pinot-local-data
                  mountPath: /home/pinot/local-raw-data
          restartPolicy: OnFailure
          volumes:
            - name: pinot-local-data-config
              configMap:
                name: pinot-local-data-config
            - name: pinot-local-data
              hostPath:
                path: /my/local/path
      backoffLimit: 100
  • d

    Dan Hill

    06/30/2020, 11:12 PM
    This isn't blocking me but I'd imagine this would lead to quality bugs in production.
  • x

    Xiang Fu

    06/30/2020, 11:26 PM
    I will take a look, it would be helpful if you can paste the stacktrace or create an issue
  • x

    Xiang Fu

    06/30/2020, 11:27 PM
    so I can check why the job is not failing
  • d

    Dan Hill

    07/01/2020, 8:12 AM
    I'm having issues with slow queries. I recently started moving away from the built in time columns to my own floored to utc_date. Now my queries are taking 5 seconds over 80 mil rows (a lot slower than before). I removed some sensitive parts.
    Copy code
    metrics_offline_table_config.json: |-
        {
          "tableName": "metrics",
          "tableType":"OFFLINE",
          "segmentsConfig" : {
            "schemaName" : "metrics",
            "timeColumnName": "timestamp",
            "timeType": "MILLISECONDS",
            "retentionTimeUnit": "DAYS",
            "retentionTimeValue": "1461",
            "segmentPushType": "APPEND",
            "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
            "replication" : "1"
          },
          "tableIndexConfig" : {
            "loadMode"  : "MMAP",
            "noDictionaryColumns": ["impressions"],
            "starTreeIndexConfigs": [
              {
                "dimensionsSplitOrder": [
                  "utc_date",
                  "platform_id",
                  "account_id",
                  "campaign_id"
                ],
                "skipStarNodeCreationForDimensions": [
                ],
                "functionColumnPairs": [
                  "SUM__impressions",
                ]
              }
            ]
          },
          "tenants" : {},
          "metadata": {
            "customConfigs": {}
          }
        }
    The query I'm running looks pretty basic. It's asking for aggregate stats at a high-level. In my data, there are 8 unique utc_dates and 1 unique platform.
    Copy code
    select utc_date, sum(impressions) from metrics where platform_id = 13 group by utc_date
    Recent changes: • switched from timestamp to my own utc_date (long). • added
    "noDictionaryColumns": ["impressions"],
    This previously was 50ms-100ms. I'm going to bed now. No need to rush an answer.
  • d

    Dan Hill

    07/01/2020, 3:51 PM
    I'm guessing my latency issue is related to a lack of disk. The ingestion job still succeeded as successful even though I ran into disk issues on my pinot-server.
  • k

    Kishore G

    07/01/2020, 3:53 PM
    ingestion job will succeed as long as the data gets uploaded via controller api and stored in deep store
  • k

    Kishore G

    07/01/2020, 3:53 PM
    servers can pick it up any time
  • d

    Dan Hill

    07/01/2020, 3:54 PM
    Interesting. Is there a way to force the servers to pick it up again after it failed to process internally? I just increased disk and tried again and it worked.
  • k

    Kishore G

    07/01/2020, 3:54 PM
    yes, thats the way its supposed to work
  • k

    Kishore G

    07/01/2020, 3:54 PM
    restart will work
  • k

    Kishore G

    07/01/2020, 3:55 PM
    or a reset command for the segment in ERROR state
  • d

    Dan Hill

    07/01/2020, 3:55 PM
    Cool, ty
  • p

    Pradeep

    07/01/2020, 11:44 PM
    Hi, “select * from <table> order by <column> limit 10” is timing out I have ~40M rows and ~44columns and data spread across two machines
    Copy code
    {
      "exceptions": [],
      "numServersQueried": 2,
      "numServersResponded": 0,
      "numSegmentsQueried": 0,
      "numSegmentsProcessed": 0,
      "numSegmentsMatched": 0,
      "numConsumingSegmentsQueried": 0,
      "numDocsScanned": 0,
      "numEntriesScannedInFilter": 0,
      "numEntriesScannedPostFilter": 0,
      "numGroupsLimitReached": false,
      "totalDocs": 0,
      "timeUsedMs": 9999,
      "segmentStatistics": [],
      "traceInfo": {},
      "minConsumingFreshnessTimeMs": 0
    }
    Close to ~34 segments and all of them seem to be in either “ONLINE” or “CONSUMING” state
    Copy code
    I just see a timeout exception on one of the server logs
    Caught TimeoutException. (brokerRequest = BrokerRequest(querySource:QuerySource(tableName:searchtable_REALTIME), selections:Selection(se
    lectionColumns:[*], selectionSortSequence:[SelectionSort(column:timestampMillis, isAsc:true)], size:10), enableTrace:true, queryOptions:{re
    sponseFormat=sql, groupByMode=sql, timeoutMs=10000}, pinotQuery:PinotQuery(dataSource:DataSource(tableName:searchtable), selectList:[Exp
    ression(type:IDENTIFIER, identifier:Identifier(name:*))], orderByList:[Expression(type:FUNCTION, functionCall:Function(operator:ASC, operan
    ds:[Expression(type:IDENTIFIER, identifier:Identifier(name:timestampMillis))]))], limit:10), orderBy:[SelectionSort(column:timestampMillis,
     isAsc:true)], limit:10))
    java.util.concurrent.TimeoutException: null
            at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:1.8.0_252]
            at org.apache.pinot.core.operator.CombineOperator.getNextBlock(CombineOperator.java:169) ~[pinot-all-0.4.0-jar-with-dependencies.ja
    r:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
            at org.apache.pinot.core.operator.CombineOperator.getNextBlock(CombineOperator.java:47) ~[pinot-all-0.4.0-jar-with-dependencies.jar
    :0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
            at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:42) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-83
    55d2e0e489a8d127f2e32793671fba505628a8]
    Wondering if there is a way to improve the query latency? (tried with small subset of columns, query retunrs results)
  • k

    Kishore G

    07/02/2020, 12:05 AM
    is the timestampinMillis dictionary encoded?
  • k

    Kishore G

    07/02/2020, 12:06 AM
    can you make it noDictionaryColumns
  • p

    Pradeep

    07/02/2020, 12:54 AM
    got it, thanks let me try that
  • k

    Kishore G

    07/02/2020, 12:55 AM
    there is an optimization that we can do specifically for time column sorting
  • k

    Kishore G

    07/02/2020, 12:55 AM
    I remember Uber folks also suggesting this
  • j

    Jackie

    07/02/2020, 2:02 AM
    @Pradeep What is the total size of your data? In order to solve this query, the servers need to scan the whole table
  • p

    Pradeep

    07/02/2020, 2:09 AM
    It’s close to ~4G
  • p

    Pradeep

    07/02/2020, 2:09 AM
    does indexingConfig changes applied to an existing table update the old segments?
  • p

    Pradeep

    07/02/2020, 2:10 AM
    Also, should keeping a min/max per segment help?
    columnMinMaxValueGeneratorMode: TIME
  • k

    Kishore G

    07/02/2020, 3:21 AM
    Parts of them, such as inverted index, etc apply to old segments
  • k

    Kishore G

    07/02/2020, 3:22 AM
    However, the original encoding cannot be changed
  • k

    Kishore G

    07/02/2020, 3:22 AM
    You can use minion to perform such tasks
  • p

    Pradeep

    07/02/2020, 7:33 AM
    (not urgent, ptal when you guys get a chance, sorry for the late night ping) Also, when I tried adding “timestampMillis” my timestamp column to the table (note that
    Copy code
    {
      "REALTIME": {
        "tableName": "tablename_REALTIME",
        "tableType": "REALTIME",
        "segmentsConfig": {
          "timeColumnName": "timestampMillis",
          "schemaName": "search",
          "timeType": "MILLISECONDS",
          "replicasPerPartition": "1"
        },
        "tenants": {
          "broker": "DefaultTenant",
          "server": "DefaultTenant"
        },
        "tableIndexConfig": {
          "autoGeneratedInvertedIndex": false,
          "createInvertedIndexDuringSegmentGeneration": false,
          "loadMode": "MMAP",
          "streamConfigs": {
            "streamType": "kafka",
            "stream.kafka.consumer.type": "LowLevel",
            "stream.kafka.topic.name": "INPUT",
            "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
            "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
            "stream.kafka.broker.list": "<broker_nodes>:9092",
            "realtime.segment.flush.threshold.size": "0",
            "realtime.segment.flush.threshold.time": "24h",
            "realtime.segment.flush.desired.size": "80M",
            "realtime.segment.flush.autotune.initialRows": "700000",
            "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
          },
          "noDictionaryColumns": [
            "timestampMillis"
          ],
          "enableDefaultStarTree": false,
          "aggregateMetrics": false,
          "nullHandlingEnabled": true
        },
        "metadata": {
          "customConfigs": {}
        }
      }
    } I am seeing this Nullptr exception, works fine when I choose a different string column. noDictionaryColumns should only contain string/bytes fields?
    Copy code
    Could not build segment
    java.lang.NullPointerException: null
            at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:393) ~[pin
    ot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
            at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:360) ~[pinot-all-0.
    4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
            at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java
    :216) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
            at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:199) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
            at org.apache.pinot.core.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:141) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
  • x

    Xiang Fu

    07/02/2020, 8:10 AM
    I think this is due to pinot segment creation uses timestamp column min/max value from dictionary to set segment name and write segment metadata(start/end time)
  • x

    Xiang Fu

    07/02/2020, 8:10 AM
    since it’s configed as a non-dictionary column, hence the npe
1...119120121...166Latest