Apache Pinot #general

Join Slack

Alex

11/16/2019, 9:07 AM

good question 😕

Kishore G

11/16/2019, 4:32 PM

Bad query from presto connector - GC - session timeout - kafka consumption stopped

Kishore G

11/16/2019, 4:33 PM

Kafka consumption not restarting after GC is a bug. We need to look into it.

Elon

11/16/2019, 7:06 PM

When we go to the servers the query runs fine, it's when we hit the broker that happens

Alex

11/17/2019, 2:30 AM

Was it a broker? I could not reproduce it btw, even on much bigger dataset

Alex

11/18/2019, 4:59 PM

another question -> during the ingestion load test we noticed that zookeper log folder grew to 8 Gbs. we wrote about 300M messages into Pinot (2 tables), with a rate of 30K messages per second.

Alex

11/18/2019, 5:00 PM

Log files are pretty big as well-> up to 3 gbs. is it a normal behavior? In production, what is the typical size of the zoo’s data logs folder?

Alex

11/18/2019, 5:00 PM

Copy code

zookeeper@pinot-zookeeper-0:/data/log/version-2$ ls -lh
total 8.9G
-rw-rw-r-- 1 zookeeper zookeeper  65M Nov 16 01:37 log.100000001
-rw-rw-r-- 1 zookeeper zookeeper 3.1G Nov 16 19:10 log.200000001
-rw-rw-r-- 1 zookeeper zookeeper 1.8G Nov 16 23:02 log.20000df77
-rw-r--r-- 1 zookeeper zookeeper 4.1G Nov 17 21:21 log.300000001

Kishore G

11/18/2019, 5:09 PM

What’s the real-time table config? How often are segments getting created?

Alex

11/18/2019, 5:26 PM

Copy code

{
  "tableName": "flattened_orders_hours",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "updatedAtHours",
    "timeType": "HOURS",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "365",
    "segmentPushType": "APPEND",
    "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
    "schemaName": "flattened_orders_hours",
    "replication": "1",
    "replicasPerPartition": "1"
  },
  "tenants": {
    "broker": "DefaultTenant",
    "server": "DefaultTenant"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "invertedIndexColumns": [
      "...",
      "...",
      "...",
      "..."
    ],
    "aggregateMetrics": "true",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "highLevel",
      "stream.kafka.topic.name": "flattened-orders-json-seconds",
      "stream.kafka.decoder.class.name": "org.apache.pinot.core.realtime.impl.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.core.realtime.impl.kafka2.KafkaConsumerFactory",
      "stream.kafka.hlc.zk.connect.string": "IP:2181/",
      "stream.kafka.zk.broker.url": "IP:2181/",
      "stream.kafka.broker.list": "IP:9092",
      "stream.kafka.isolation.level": "read_committed",
      "stream.kafka.hlc.bootstrap.server": "IP:9092",
      "realtime.segment.flush.threshold.time": "3600000",
      "realtime.segment.flush.threshold.size": "50000",
      "stream.kafka.consumer.prop.auto.offset.reset": "earliest",
      "stream.kafka.consumer.prop.group.id": "pinot-flattened_orders_hours"
    }
  },
  "metadata": {
    "customConfigs": {}
  }
}

Alex

11/18/2019, 5:26 PM

we loaded 90 days of data in kafka topic, and then blasted it in a loop into Pinot cluster.

Alex

11/18/2019, 5:27 PM

which means same hour will be written multiple times (is it a good idea for the load test ? )

Kishore G

11/18/2019, 5:29 PM

Copy code

flush threshold size is 50000, its too low.

Kishore G

11/18/2019, 5:29 PM

yes, thats fine

Alex

11/18/2019, 5:30 PM

Copy code

what should it be?

Kishore G

11/18/2019, 5:32 PM

ideally 150 to 500mb is a sweet spot.

Kishore G

11/18/2019, 5:32 PM

whats the current size of the segment

Neha Pawar

11/18/2019, 5:34 PM

you could try using segment size threshold instead of rows/time: https://pinot.readthedocs.io/en/latest/tuning_realtime_performance.html#controlling-number-of-rows-in-consuming-segment

Alex

11/18/2019, 5:38 PM

checked 1 server (running 3 in this setup). segment dir is empty. index dir:

Alex

11/18/2019, 5:38 PM

Copy code

4.0K	./consumers
3.7M	./flattened_orders_hours_REALTIME_1573926928266_0__0__1573996522185/v3
3.7M	./flattened_orders_hours_REALTIME_1573926928266_0__0__1573996522185
3.6M	./flattened_orders_hours_REALTIME_1573926928266_0__0__1573939395196/v3
3.6M	./flattened_orders_hours_REALTIME_1573926928266_0__0__1573939395196
3.7M	./flattened_orders_hours_REALTIME_1573926928266_0__0__1573993404560/v3
3.7M	./flattened_orders_hours_REALTIME_1573926928266_0__0__1573993404560

Neha Pawar

11/18/2019, 5:47 PM

Copy code

"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.threshold.size": "0",

This should enable segment size based threshold. By default, the algorithm tried to create segments of 200M.

Kishore G

11/18/2019, 5:48 PM

Thanks @User. Whats the default setting when none of them are set?

Subbu Subramaniam

11/18/2019, 5:54 PM

The settings mentioned by @User does not work for HLC. From your segment names, it appears you are using LLC. The only things you can tune there are the number of rows and time, adjust them according to your use case. It will be interesting to know what your reasons are to go with HLC, however. We strongly recommend you use LLC, since all the new algorithms etc. have been on that mode.

Alex

11/18/2019, 6:08 PM

@User still in exploring different options. LLC has hard to hit requirement:

Copy code

Events with higher offsets should be more recent (the offsets of events need not be contiguous)

Kishore G

11/18/2019, 6:13 PM

@User that’s true with Kafka rt

Alex

11/18/2019, 6:16 PM

oh, so both HL and LL need this guarantee? What will happen if some events are out of order? Do we need to run some on stream sorting job before sending to Pinot?

Kishore G

11/18/2019, 6:22 PM

Kafka guarantees that within a partition offsets are monotonically increasing

Subbu Subramaniam

11/18/2019, 6:30 PM

@User HL does not know about offsets. It expects the consuming layer to keep track of offsets (or any other way) to ensure that messages from the stream are consumed exactly once to Pinot. LLC, on the other hand, keeps track of offsets, and ensures that Pinot consumes every message in the partition exactly once. The offset (an int or long) is provided by the underlying stream on a per-partition basis. Kafka provides one.

Alex

11/18/2019, 6:31 PM

@User got it

Alex

11/18/2019, 6:32 PM

@User true but event time is created by message producers, so time can me mixed (messages with higher offsets can have a lower timestamp). Maybe I misunderstanding

Copy code

Events with higher offsets should be more recent