Apache Pinot #troubleshooting

Dan Hill

07/04/2020, 11:26 PM

Are there guides / advice for how we should iterate on Pinot tables once they are serving traffic? - What sort of schema changes are safe to do live? - How do teams usually roll out breaking changes? A separate Pinot table? How can we roll this out incrementally? For my case, we're using Presto to join so we'll probably have to modify our Presto query and do an extra join. - Any helpful tools for rolling out changes incrementally? E.g. the system populating Pinot will likely have a canary setup. Is there something that helps verify that the deployed canary events match the non-canary events? Is this something separate from Pinot? - How do teams experiment with different Pinot setups to evaluate latencies? A whole separate Pinot stack? How do teams experiment with different indexes?

Somanshu Jindal

07/06/2020, 8:56 AM

Hi, If i want to use zookeeper cluster for production setup, Can i specify all the zookeeper hosts when starting various pinot components like controller, broker etc.

Yash Agarwal

07/06/2020, 11:14 AM

Is it possible to use multiple buckets for S3PinotFs ? We have limitations to the amount of data we can store in a single bucket.

Pradeep

07/06/2020, 7:52 PM

QQ, wondering how difficult would it be to include timestampNanos as part of the time column in pinot? (is it just a matter of pinot parsing and understanding that timestamp is in Nanos or there are more assumptions around?) I believe currently till

millis

is supported. Context is we have system level events (think stream of syscalls) and want to be able to store the nanos timestamp to fix the order among them and also it’s used by other systems in our infrastructure. Currently I am storing nanos column as a different column and created a

millis

column to serve as time column, thinking if I can avoid storing the additional duplicate info if the feature is simple enough to add?

Kishore G

07/06/2020, 7:59 PM

IMO, nanos cannot be used as timestamp

Kishore G

07/06/2020, 7:59 PM

irrespective of Pinot supporting that datatype

Kishore G

07/06/2020, 8:00 PM

nanos is mainly used to measure relative times

Elon

07/06/2020, 11:38 PM

FYI, we have a table which already exists and I wanted to add a sorted column index but getting "bad request 400". Nothing in the controller logs. Can you see what's wrong with the following?

Elon

07/06/2020, 11:38 PM

Copy code

curl -f -k -X POST --header 'Content-Type: application/json' -d '@realtime.json' ${CONTROLLER}/tables

Elon

07/06/2020, 11:39 PM

Copy code

{
  "tableName": "oas_integration_operation_event",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "operation_ts",
    "timeType": "SECONDS",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "7",
    "segmentPushType": "APPEND",
    "segmentPushFrequency": "daily",
    "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
    "schemaName": "oas_integration_operation_event",
    "replicasPerPartition": "3"
  },
  "tenants": {
    "broker": "DefaultTenant",
    "server": "DefaultTenant"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "invertedIndexColumns": [
      "service_slug",
      "operation_type",
      "operation_result",
      "store_id"
    ],
    "sortedColumn": [
      "operation_ts"
    ],
    "noDictionaryColumns": [],
    "aggregateMetrics": "false",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "LowLevel",
      "stream.kafka.topic.name": "oas-integration-operation-completion-avro",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.decoder.prop.schema.registry.rest.url": "<http://XXXX:8081>",
      "stream.kafka.zk.broker.url": "XXXX/",
      "stream.kafka.broker.list": "XXXX:9092",
      "realtime.segment.flush.threshold.time": "6h",
      "realtime.segment.flush.threshold.size": "0",
      "realtime.segment.flush.desired.size": "200M",
      "stream.kafka.consumer.prop.auto.isolation.level": "read_committed",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
      "stream.kafka.consumer.prop.group.id": "oas_integration_operation_event-load-pinot-llprb",
      "stream.kafka.consumer.prop.client.id": "XXXX"
    },
    "starTreeIndexConfigs": [
      {
        "dimensionsSplitOrder": [
          "service_slug",
          "store_id",
          "operation_type",
          "operation_result"
        ],
        "functionColumnPairs": [
          "PERCENTILEEST__operation_latency_ms",
          "AVG__operation_latency_ms",
          "DISTINCTCOUNT__store_id",
          "COUNT__store_id",
          "COUNT__operation_type"
        ]
      },
      {
        "dimensionsSplitOrder": [
          "service_slug",
          "store_id"
        ],
        "functionColumnPairs": [
          "COUNT__store_id",
          "COUNT__operation_type"
        ]
      }
    ]
  },
  "metadata": {
    "customConfigs": {}
  }
}

Mayank

07/06/2020, 11:39 PM

IIRC, uploading segments to realtime tables was not possible (a while back, but not sure if it continues to be the case).

👍 1

Mayank

07/06/2020, 11:40 PM

can you try swagger?

Elon

07/06/2020, 11:41 PM

Sure

Elon

07/06/2020, 11:42 PM

Oh, thanks! Looks like I can't change the time type for the time column, i.e. segmentsConfig.timeType

Pradeep

07/08/2020, 10:36 PM

Hi, I am trying to test following change (https://github.com/apache/incubator-pinot/pull/5661) on my cluster. So, pulled code from the master, but I am seeing below exception. wondering if there’s any change you know of? I only see this change (https://github.com/apache/incubator-pinot/pull/5608) which says that existing behavior shouldn’t change? Below is the exception I see, which seems to be trying to fetch the S3 region from configuration

Copy code

java.lang.IllegalArgumentException: null
        at shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependenci
es.jar:0.5.0-SNAPSHOT-2ec7dee1597021742f68f0ae8b279f7560e55894]
        at org.apache.pinot.plugin.filesystem.S3PinotFS.init(S3PinotFS.java:80) ~[pinot-s3-0.5.0-SNAPSHOT-shaded.jar:0.5.0-SNAPSHOT-2ec7dee
1597021742f68f0ae8b279f7560e55894]
        at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:55) [pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.
jar:0.5.0-SNAPSHOT-2ec7dee1597021742f68f0ae8b279f7560e55894]
        at org.apache.pinot.spi.filesystem.PinotFSFactory.init(PinotFSFactory.java:75) [pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:
0.5.0-SNAPSHOT-2ec7dee1597021742f68f0ae8b279f7560e55894]

Pradeep

07/08/2020, 10:38 PM

Copy code

pinot.server.storage.factory.s3.region=us-east-2

I already have this config,

Pradeep

07/08/2020, 10:39 PM

and this was working fine with earlier version

Kishore G

07/08/2020, 10:39 PM

@Daniel Lavoie ^^

Mayank

07/08/2020, 10:40 PM

Yeah, #5608 seems one PR that is related.

Daniel Lavoie

07/08/2020, 10:41 PM

Definately sounds related, I’ll investigate to tomorrow morning!

Mayank

07/08/2020, 10:46 PM

My guess is subsetting of config is broken.

Mayank

07/08/2020, 10:48 PM

Copy code

PinotConfiguration schemesConfiguration = fsConfig.subset(CLASS);

Daniel Lavoie

07/08/2020, 10:48 PM

Yeah, that would explain the config being object being null.

Mayank

07/08/2020, 10:48 PM

class

new?

Mayank

07/08/2020, 10:48 PM

If so, this is a backward incompatible change?

Daniel Lavoie

07/08/2020, 10:50 PM

We have tests around fsConfit subsetting, if that is broken, it’s definitely not intended. I’m not home right now.

Kishore G

07/08/2020, 10:50 PM

@Pradeep can you paste the configuration

Kishore G

07/08/2020, 10:50 PM

entire file

Pradeep

07/08/2020, 10:51 PM

Copy code

pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.server.storage.factory.s3.accessKey=
pinot.server.storage.factory.s3.secretKey=
pinot.server.storage.factory.s3.region=
pinot.server.segment.fetcher.protocols=file,http,s3
pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.server.instance.dataDir=/home/ubuntu/pinot/data
pinot.server.instance.segmentTarDir=/home/ubuntu/pinot/segments

Pradeep

07/08/2020, 10:51 PM

This is the server config