Apache Pinot #troubleshooting

Dan Hill

06/30/2020, 11:11 PM

I have a Kubernetes batch job that runs a LaunchDataIngestionJob. If the job fails, the kubernetes job is still marked as succeeded and completed. This seems like a bug. I'd expect it to indicate that the job failed.

Copy code

kubectl get pods --namespace $NAMESPACE
NAME                                                READY   STATUS        RESTARTS   AGE
...
pinot-populate-local-data-hwpdm                     0/1     Completed     0          14s

Copy code

kubectl logs --namespace $NAMESPACE pinot-populate-local-data-hwpdm     
...

java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
...

Copy code

kubectl describe --namespace $NAMESPACE pod/pinot-populate-local-data-hwpdm 
...
Status:       Succeeded

Dan Hill

06/30/2020, 11:12 PM

Copy code

# TODO - is outputDirURI set correctly?
apiVersion: v1
kind: ConfigMap
metadata:
  name: pinot-local-data-config
data:
  local_batch_job_spec.yaml: |-
    executionFrameworkSpec:
      name: 'standalone'
      segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
      segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
      segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
    jobType: SegmentCreationAndTarPush
    inputDirURI: '/home/pinot/local-raw-data/'
    outputDirURI: '/tmp/metrics/segments/'
    overwriteOutput: true
    pinotFSSpecs:
      - scheme: file
        className: org.apache.pinot.spi.filesystem.LocalPinotFS
    recordReaderSpec:
      dataFormat: 'json'
      className: 'org.apache.pinot.plugin.inputformat.json.JSONRecordReader'
    tableSpec:
      tableName: 'metrics'
      schemaURI: '<http://pinot-controller:9000/tables/metrics/schema>'
      tableConfigURI: '<http://pinot-controller:9000/tables/metrics>'
    pinotClusterSpecs:
      - controllerURI: '<http://pinot-controller:9000>'

---
apiVersion: batch/v1
kind: Job
metadata:
  name: pinot-populate-local-data
spec:
  template:
    spec:
      containers:
        - name: pinot-populate-local-data
          image: apachepinot/pinot:0.4.0
          args: [ "LaunchDataIngestionJob", "-jobSpecFile", "/home/pinot/pinot-config/local_batch_job_spec.yaml" ]
          volumeMounts:
            - name: pinot-local-data-config
              mountPath: /home/pinot/pinot-config
            - name: pinot-local-data
              mountPath: /home/pinot/local-raw-data
      restartPolicy: OnFailure
      volumes:
        - name: pinot-local-data-config
          configMap:
            name: pinot-local-data-config
        - name: pinot-local-data
          hostPath:
            path: /my/local/path
  backoffLimit: 100

Dan Hill

06/30/2020, 11:12 PM

This isn't blocking me but I'd imagine this would lead to quality bugs in production.

Xiang Fu

06/30/2020, 11:26 PM

I will take a look, it would be helpful if you can paste the stacktrace or create an issue

Xiang Fu

06/30/2020, 11:27 PM

so I can check why the job is not failing

Dan Hill

07/01/2020, 8:12 AM

I'm having issues with slow queries. I recently started moving away from the built in time columns to my own floored to utc_date. Now my queries are taking 5 seconds over 80 mil rows (a lot slower than before). I removed some sensitive parts.

Copy code

metrics_offline_table_config.json: |-
    {
      "tableName": "metrics",
      "tableType":"OFFLINE",
      "segmentsConfig" : {
        "schemaName" : "metrics",
        "timeColumnName": "timestamp",
        "timeType": "MILLISECONDS",
        "retentionTimeUnit": "DAYS",
        "retentionTimeValue": "1461",
        "segmentPushType": "APPEND",
        "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
        "replication" : "1"
      },
      "tableIndexConfig" : {
        "loadMode"  : "MMAP",
        "noDictionaryColumns": ["impressions"],
        "starTreeIndexConfigs": [
          {
            "dimensionsSplitOrder": [
              "utc_date",
              "platform_id",
              "account_id",
              "campaign_id"
            ],
            "skipStarNodeCreationForDimensions": [
            ],
            "functionColumnPairs": [
              "SUM__impressions",
            ]
          }
        ]
      },
      "tenants" : {},
      "metadata": {
        "customConfigs": {}
      }
    }

The query I'm running looks pretty basic. It's asking for aggregate stats at a high-level. In my data, there are 8 unique utc_dates and 1 unique platform.

Copy code

select utc_date, sum(impressions) from metrics where platform_id = 13 group by utc_date

Recent changes: • switched from timestamp to my own utc_date (long). • added

"noDictionaryColumns": ["impressions"],

This previously was 50ms-100ms. I'm going to bed now. No need to rush an answer.

Dan Hill

07/01/2020, 3:51 PM

I'm guessing my latency issue is related to a lack of disk. The ingestion job still succeeded as successful even though I ran into disk issues on my pinot-server.

Kishore G

07/01/2020, 3:53 PM

ingestion job will succeed as long as the data gets uploaded via controller api and stored in deep store

Kishore G

07/01/2020, 3:53 PM

servers can pick it up any time

Dan Hill

07/01/2020, 3:54 PM

Interesting. Is there a way to force the servers to pick it up again after it failed to process internally? I just increased disk and tried again and it worked.

Kishore G

07/01/2020, 3:54 PM

yes, thats the way its supposed to work

Kishore G

07/01/2020, 3:54 PM

restart will work

Kishore G

07/01/2020, 3:55 PM

or a reset command for the segment in ERROR state

Dan Hill

07/01/2020, 3:55 PM

Cool, ty

Pradeep

07/01/2020, 11:44 PM

Hi, “select * from <table> order by <column> limit 10” is timing out I have ~40M rows and ~44columns and data spread across two machines

Copy code

{
  "exceptions": [],
  "numServersQueried": 2,
  "numServersResponded": 0,
  "numSegmentsQueried": 0,
  "numSegmentsProcessed": 0,
  "numSegmentsMatched": 0,
  "numConsumingSegmentsQueried": 0,
  "numDocsScanned": 0,
  "numEntriesScannedInFilter": 0,
  "numEntriesScannedPostFilter": 0,
  "numGroupsLimitReached": false,
  "totalDocs": 0,
  "timeUsedMs": 9999,
  "segmentStatistics": [],
  "traceInfo": {},
  "minConsumingFreshnessTimeMs": 0
}

Close to ~34 segments and all of them seem to be in either “ONLINE” or “CONSUMING” state

Copy code

I just see a timeout exception on one of the server logs
Caught TimeoutException. (brokerRequest = BrokerRequest(querySource:QuerySource(tableName:searchtable_REALTIME), selections:Selection(se
lectionColumns:[*], selectionSortSequence:[SelectionSort(column:timestampMillis, isAsc:true)], size:10), enableTrace:true, queryOptions:{re
sponseFormat=sql, groupByMode=sql, timeoutMs=10000}, pinotQuery:PinotQuery(dataSource:DataSource(tableName:searchtable), selectList:[Exp
ression(type:IDENTIFIER, identifier:Identifier(name:*))], orderByList:[Expression(type:FUNCTION, functionCall:Function(operator:ASC, operan
ds:[Expression(type:IDENTIFIER, identifier:Identifier(name:timestampMillis))]))], limit:10), orderBy:[SelectionSort(column:timestampMillis,
 isAsc:true)], limit:10))
java.util.concurrent.TimeoutException: null
        at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:1.8.0_252]
        at org.apache.pinot.core.operator.CombineOperator.getNextBlock(CombineOperator.java:169) ~[pinot-all-0.4.0-jar-with-dependencies.ja
r:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at org.apache.pinot.core.operator.CombineOperator.getNextBlock(CombineOperator.java:47) ~[pinot-all-0.4.0-jar-with-dependencies.jar
:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:42) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-83
55d2e0e489a8d127f2e32793671fba505628a8]

Wondering if there is a way to improve the query latency? (tried with small subset of columns, query retunrs results)

Kishore G

07/02/2020, 12:05 AM

is the timestampinMillis dictionary encoded?

Kishore G

07/02/2020, 12:06 AM

can you make it noDictionaryColumns

Pradeep

07/02/2020, 12:54 AM

got it, thanks let me try that

Kishore G

07/02/2020, 12:55 AM

there is an optimization that we can do specifically for time column sorting

Kishore G

07/02/2020, 12:55 AM

I remember Uber folks also suggesting this

Jackie

07/02/2020, 2:02 AM

@Pradeep What is the total size of your data? In order to solve this query, the servers need to scan the whole table

Pradeep

07/02/2020, 2:09 AM

It’s close to ~4G

Pradeep

07/02/2020, 2:09 AM

does indexingConfig changes applied to an existing table update the old segments?

Pradeep

07/02/2020, 2:10 AM

Also, should keeping a min/max per segment help?

columnMinMaxValueGeneratorMode: TIME

Kishore G

07/02/2020, 3:21 AM

Parts of them, such as inverted index, etc apply to old segments

Kishore G

07/02/2020, 3:22 AM

However, the original encoding cannot be changed

Kishore G

07/02/2020, 3:22 AM

You can use minion to perform such tasks

Pradeep

07/02/2020, 7:33 AM

(not urgent, ptal when you guys get a chance, sorry for the late night ping) Also, when I tried adding “timestampMillis” my timestamp column to the table (note that

Copy code

{
  "REALTIME": {
    "tableName": "tablename_REALTIME",
    "tableType": "REALTIME",
    "segmentsConfig": {
      "timeColumnName": "timestampMillis",
      "schemaName": "search",
      "timeType": "MILLISECONDS",
      "replicasPerPartition": "1"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "loadMode": "MMAP",
      "streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.consumer.type": "LowLevel",
        "stream.kafka.topic.name": "INPUT",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.broker.list": "<broker_nodes>:9092",
        "realtime.segment.flush.threshold.size": "0",
        "realtime.segment.flush.threshold.time": "24h",
        "realtime.segment.flush.desired.size": "80M",
        "realtime.segment.flush.autotune.initialRows": "700000",
        "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
      },
      "noDictionaryColumns": [
        "timestampMillis"
      ],
      "enableDefaultStarTree": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": true
    },
    "metadata": {
      "customConfigs": {}
    }
  }

} I am seeing this Nullptr exception, works fine when I choose a different string column. noDictionaryColumns should only contain string/bytes fields?

Copy code

Could not build segment
java.lang.NullPointerException: null
        at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:393) ~[pin
ot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:360) ~[pinot-all-0.
4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java
:216) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:199) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
        at org.apache.pinot.core.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:141) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]

Xiang Fu

07/02/2020, 8:10 AM

I think this is due to pinot segment creation uses timestamp column min/max value from dictionary to set segment name and write segment metadata(start/end time)

Xiang Fu

07/02/2020, 8:10 AM

since it’s configed as a non-dictionary column, hence the npe