Hello I created a default real time table After consuming so Apache Pinot #general

Hello -- I created a default real time table . Aft...

Ambika

05/18/2021, 10:12 AM

Hello -- I created a default real time table . After consuming some 300k events i wanted to add a sorted inverted so i edited the table to add the sorted col. How can I check if my query is using the index and whether the index is successfully created or not.?

Mayank

05/18/2021, 1:03 PM

For segments flushed to disk you can check metadata.properties file. If the column is marked as sorted, you can assume sorted index will be used.

Ambika

05/18/2021, 2:21 PM

Hi Mayank -- this is the metadata from one of the latest segments, but i dont see any sorted column here..

Copy code

{
  "segment.realtime.endOffset": "1209967",
  "segment.start.time": "1621347175827",
  "segment.time.unit": "MILLISECONDS",
  "segment.flush.threshold.size": "50000",
  "segment.realtime.startOffset": "1159967",
  "segment.end.time": "1621347449397",
  "segment.total.docs": "50000",
  "segment.table.name": "schd_1",
  "segment.realtime.numReplicas": "1",
  "segment.creation.time": "1621347177149",
  "segment.realtime.download.url": "<http://172.18.0.2:9000/segments/schd_1/schd_1__0__34__20210518T1412Z>",
  "segment.name": "schd_1__0__34__20210518T1412Z",
  "segment.index.version": "v3",
  "segment.flush.threshold.time": null,
  "segment.type": "REALTIME",
  "segment.crc": "2855665894",
  "segment.realtime.status": "DONE"
}

Ambika

05/18/2021, 2:22 PM

how do I ensure the sorted index is created and put to use ?

Ambika

05/18/2021, 2:22 PM

Copy code

{
  "REALTIME": {
    "tableName": "schd_1_REALTIME",
    "tableType": "REALTIME",
    "segmentsConfig": {
      "timeType": "MILLISECONDS",
      "schemaName": "schd",
      "timeColumnName": "upd_ts",
      "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
      "segmentPushType": "APPEND",
      "replicasPerPartition": "1"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "loadMode": "MMAP",
      "streamConfigs": {
        "streamType": "kafka",
        "stream.kafka.consumer.type": "lowLevel",
        "stream.kafka.topic.name": "schd",
        "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
        "stream.kafka.hlc.zk.connect.string": "localhost:2191/kafka",
        "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
        "stream.kafka.zk.broker.url": "localhost:2191/kafka",
        "stream.kafka.broker.list": "localhost:19092",
        "realtime.segment.flush.threshold.rows": "50000",
        "realtime.segment.flush.threshold.time": "10m"
      },
      "enableDefaultStarTree": false,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "sortedColumn": [
        "post_prd_id"
      ],
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": false
    },
    "metadata": {
      "customConfigs": {}
    },
    "routing": {
      "instanceSelectorType": "strictReplicaGroup"
    },
    "upsertConfig": {
      "mode": "FULL"
    },
    "isDimTable": false
  }
}

Ambika

05/18/2021, 2:22 PM

this is my table config

Neha Pawar

05/18/2021, 2:36 PM

This metadata is from zookeeper. You need to check the metadata.properties file, which you will find on the server, inside each segment dir

Mayank

05/18/2021, 2:37 PM

Yes ^^

Neha Pawar

05/18/2021, 2:41 PM

And for older completed segments, any indexing change in table config will only reflect after a segment reload API invocation. However, I think sorted index cannot be applied this way to old segments

Ambika

05/18/2021, 2:49 PM

where can i find the location of these files ? I am using the docker image.

Mayank

05/18/2021, 3:32 PM

The data dir of the server

Ambika

05/18/2021, 3:43 PM

got it

Copy code

column.post_prd_id.columnType = DIMENSION
column.post_prd_id.isSorted = true

Mayank

05/18/2021, 3:44 PM

Yeah, if column is sorted then You can assume it is being used

Ambika

05/18/2021, 3:46 PM

got it, i assume there is no way to get the query plan since its dynamic for every segment, right?

Mayank

05/18/2021, 3:59 PM

Yeah, right now there isn’t a way to get query plan, mostly because pinot doesn’t support complex joins or nested queries

Ambika

05/18/2021, 4:09 PM

got it.. makes sense..

Ambika

05/18/2021, 4:19 PM

Copy code

timeUsedMs	numDocsScanned	totalDocs	numServersQueried	numServersResponded	numSegmentsQueried	numSegmentsProcessed	numSegmentsMatched	numConsumingSegmentsQueried	numEntriesScannedInFilter	numEntriesScannedPostFilter	numGroupsLimitReached	partialResponse	minConsumingFreshnessTimeMs	offlineThreadCpuTimeNs	realtimeThreadCpuTimeNs
40	31741	1584379	1	1	43	43	34	1	928729	95223	false	-	1621350158012	0	0

Ambika

05/18/2021, 4:19 PM

Where can I read about what each of these mean ?

Mayank

05/18/2021, 4:19 PM

one sec

Mayank

05/18/2021, 4:19 PM

https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql/response-format

Mayank

05/18/2021, 4:20 PM

The search in docs.pinot.apache.org is pretty good, and should be able to point you to any docs related to terms you query.

Ambika

05/18/2021, 4:21 PM

got it.. sorry about that.. i was wondering what is the diff between Docs and Entries ?

Ambika

05/18/2021, 4:21 PM

docs is the actual record, but what does numEntries mean?

Mayank

05/18/2021, 4:21 PM

Doc represents a record

Mayank

05/18/2021, 4:21 PM

Entry represents a value for a column in the record.

Ambika

05/18/2021, 4:23 PM

ok.. so if i see that numEntriesScannedInFilter is really high for a low cardinality col filter, would that mean it's better to have an inverted index on that?

Mayank

05/18/2021, 4:25 PM

If the cardinality is very low (say gender - M/F/U), then adding inv index only prunes out 2/3 or the data. Depending on your case and query latency requirement, it might still be a good idea.

Ambika

05/18/2021, 4:26 PM

right.. understood..

Open in Slack

Previous Next