please help me Apache Pinot #troubleshooting

Join Slack

please help me

# troubleshooting

lâm nguyễn hoàng

12/08/2020, 5:46 PM

please help me

Xiang Fu

12/08/2020, 5:47 PM

@Jackie do we have any specific memory requirements for upsert case?

Xiang Fu

12/08/2020, 5:48 PM

@Yupeng Fu ^^

Xiang Fu

12/08/2020, 5:49 PM

we tried to turn this on, but it doesn't work as well:

Copy code

pinot.server.instance.realtime.alloc.offheap=true
pinot.server.instance.realtime.alloc.offheap.direct=false/true

Yupeng Fu

12/08/2020, 6:13 PM

not sure if this is related to upsert

Yupeng Fu

12/08/2020, 6:13 PM

only upsert metadata is on heap

Yupeng Fu

12/08/2020, 6:13 PM

the rest is the same as normal segments

Yupeng Fu

12/08/2020, 6:14 PM

it would be helpful to use debug endpoint to display the memory use

Xiang Fu

12/08/2020, 6:15 PM

oh ? what's this endpoint

Yupeng Fu

12/08/2020, 6:22 PM

MmapDebugResource

Xiang Fu

12/08/2020, 6:23 PM

@lâm nguyễn hoàng can you try this

lâm nguyễn hoàng

12/08/2020, 6:24 PM

How can I run this "MmapDebugResource"

Xiang Fu

12/08/2020, 6:24 PM

should be on server admin port

Yupeng Fu

12/08/2020, 6:25 PM

debug/memory/offheap/table/{tableName}

lâm nguyễn hoàng

12/08/2020, 6:27 PM

Screen Shot 2020-12-09 at 01.27.14.png

lâm nguyễn hoàng

12/08/2020, 6:31 PM

@Yupeng Fu

Jackie

12/08/2020, 6:32 PM

For upsert, there is a concurrent map storing the mapping from primary key to record location, which is on heap

Jackie

12/08/2020, 6:33 PM

If the cardinality of the primary key is not extremely high, it should be fine

Jackie

12/08/2020, 6:33 PM

@lâm nguyễn hoàng Is this the server rest port?

lâm nguyễn hoàng

12/08/2020, 6:34 PM

yes

lâm nguyễn hoàng

12/08/2020, 6:35 PM

Screen Shot 2020-12-09 at 01.35.05.png

lâm nguyễn hoàng

12/08/2020, 6:37 PM

@Jackie help me

Jackie

12/08/2020, 6:42 PM

This seems like the controller rest port

Yupeng Fu

12/08/2020, 6:42 PM

that shows your controller

Yupeng Fu

12/08/2020, 6:42 PM

run it on server

lâm nguyễn hoàng

12/08/2020, 6:49 PM

Screen Shot 2020-12-09 at 01.48.38.png

lâm nguyễn hoàng

12/08/2020, 6:49 PM

Screen Shot 2020-12-09 at 01.49.20.png

lâm nguyễn hoàng

12/08/2020, 6:50 PM

Can't show @Yupeng Fu

lâm nguyễn hoàng

12/08/2020, 6:50 PM

Screen Shot 2020-12-09 at 01.50.27.png

Yupeng Fu

12/08/2020, 6:50 PM

8000 is netty

Yupeng Fu

12/08/2020, 6:51 PM

not rest

Jackie

12/08/2020, 6:52 PM

8030 is the rest port per the config

lâm nguyễn hoàng

12/08/2020, 6:54 PM

Screen Shot 2020-12-09 at 01.54.25.png

lâm nguyễn hoàng

12/08/2020, 6:54 PM

not found @Jackie

Jackie

12/08/2020, 6:59 PM

Is the host a pinot server?

lâm nguyễn hoàng

12/08/2020, 7:01 PM

yes

lâm nguyễn hoàng

12/08/2020, 7:01 PM

Screen Shot 2020-12-09 at 02.01.29.png

lâm nguyễn hoàng

12/08/2020, 7:03 PM

debug/memory/offheap/table/{tableName} path is corr ?ect

lâm nguyễn hoàng

12/08/2020, 7:03 PM

@Jackie

Jackie

12/08/2020, 7:03 PM

The path is correct

Jackie

12/08/2020, 7:03 PM

Can you also try

/debug/memory/offheap

lâm nguyễn hoàng

12/08/2020, 7:04 PM

Screen Shot 2020-12-09 at 02.04.34.png

Jackie

12/08/2020, 7:06 PM

Can you also share the schema of the table?

lâm nguyễn hoàng

12/08/2020, 7:07 PM

Screen Shot 2020-12-09 at 02.07.11.png

lâm nguyễn hoàng

12/08/2020, 7:07 PM

{ "schemaName": "bhx_bhx_forecast_forecast_item", "dimensionFieldSpecs": [ { "name": "forecastpurchase", "dataType": "DOUBLE" }, { "name": "createduser", "dataType": "STRING" }, { "name": "inputquantity", "dataType": "DOUBLE" }, { "name": "forecast", "dataType": "DOUBLE" }, { "name": "forecastnopromotion", "dataType": "DOUBLE" }, { "name": "storeid", "dataType": "LONG" }, { "name": "storequantity", "dataType": "DOUBLE" }, { "name": "isdeleted", "dataType": "INT" }, { "name": "deleteduser", "dataType": "STRING" }, { "name": "date_key", "dataType": "LONG" }, { "name": "itemid", "dataType": "STRING" }, { "name": "sellquantity", "dataType": "DOUBLE" }, { "name": "forecast15", "dataType": "DOUBLE" }, { "name": "forecast8", "dataType": "DOUBLE" }, { "name": "forecastnopromo", "dataType": "DOUBLE" }, { "name": "branchquantity", "dataType": "DOUBLE" }, { "name": "updateduser", "dataType": "STRING" }, { "name": "_DELETED", "dataType": "INT" } ], "dateTimeFieldSpecs": [ { "name": "createddate", "dataType": "LONG", "format": "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" }, { "name": "deleteddate", "dataType": "LONG", "format": "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" }, { "name": "updateddate", "dataType": "LONG", "format": "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" }, { "name": "_TIMESTAMP", "dataType": "LONG", "format": "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" } ], "primaryKeyColumns": [ "itemid", "storeid", "date_key" ] }

lâm nguyễn hoàng

12/08/2020, 7:07 PM

{ "REALTIME": { "tableName": "bhx_bhx_forecast_forecast_item_REALTIME", "tableType": "REALTIME", "segmentsConfig": { "timeType": "MILLISECONDS", "retentionTimeUnit": "DAYS", "retentionTimeValue": "9125", "segmentPushFrequency": "DAILY", "segmentPushType": "APPEND", "replication": "4", "replicasPerPartition": "4", "timeColumnName": "_TIMESTAMP", "schemaName": "bhx_bhx_forecast_forecast_item" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant", "tagOverrideConfig": {} }, "tableIndexConfig": { "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.topic.name": "PINOT.BHX.bhx_forecast.forecast_item", "stream.kafka.table.tablename": "bhx_forecast.forecast_item", "stream.kafka.table.part.pattern": "_[0-9]+", "stream.kafka.cdc.format": "CDC", "stream.kafka.decoder.class.name": "com.mwg.pinot.realtime.KafkaCDCMessageDecoder", "stream.kafka.consumer.factory.class.name": "com.mwg.pinot.realtime.KafkaCDCConsumerFactory", "stream.kafka.broker.list": "datastore-broker01-kafka-ovm-6-769092,datastore broker02 kafka ovm 6 779093,datastore-broker03-kafka-ovm-6-78:9094", "stream.kafka.consumer.prop.auto.offset.reset": "smallest", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "60m", "realtime.segment.flush.threshold.segment.size": "500M", "group.id": "bhx_bhx_forecast.forecast_item-PINOT_INGESTION", "max.partition.fetch.bytes": "167772160", "receive.buffer.bytes": "67108864", "isolation.level": "read_committed", "max.poll.records": "5000" }, "noDictionaryColumns": [], "onHeapDictionaryColumns": [], "varLengthDictionaryColumns": [], "enableDefaultStarTree": false, "starTreeIndexConfigs": [], "enableDynamicStarTreeCreation": false, "aggregateMetrics": false, "nullHandlingEnabled": false, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "sortedColumn": [], "bloomFilterColumns": [], "loadMode": "MMAP", "rangeIndexColumns": [] }, "metadata": { "customConfigs": {} }, "routing": { "instanceSelectorType": "strictReplicaGroup" }, "instanceAssignmentConfigMap": { "CONSUMING": { "tagPoolConfig": { "tag": "inventory_REALTIME", "poolBased": false, "numPools": 0 }, "replicaGroupPartitionConfig": { "replicaGroupBased": true, "numInstances": 0, "numReplicaGroups": 4, "numInstancesPerReplicaGroup": 5, "numPartitions": 0, "numInstancesPerPartition": 0 } } }, "upsertConfig": { "mode": "FULL" } } }

lâm nguyễn hoàng

12/08/2020, 7:09 PM

@Jackie schema and table config above

Jackie

12/08/2020, 7:13 PM

I think the issue is that the primary key (

itemid, storid, date_key

) is almost always unique, which will make the key map very big

Jackie

12/08/2020, 7:14 PM

What's the purpose of enabling upsert for this table?

lâm nguyễn hoàng

12/08/2020, 7:15 PM

This table of data is recalculated daily

Jackie

12/08/2020, 7:18 PM

I don't follow

Jackie

12/08/2020, 7:18 PM

Do you need to replace the data every day, or just append the data for new day?

Jackie

12/08/2020, 7:23 PM

@Yupeng Fu @lâm nguyễn hoàng This rest endpoint

debug/memory/offheap/table/{tableName}

is added recently (https://github.com/apache/incubator-pinot/pull/6172), and not included in the latest release

Xiang Fu

12/08/2020, 7:24 PM

I think all the recomputed data are also pushed to kafka

Xiang Fu

12/08/2020, 7:24 PM

hence the upsert

Yupeng Fu

12/08/2020, 7:29 PM

oh, i see. then use

memory/offheap

i think

Jackie

12/08/2020, 7:29 PM

The cardinality of the primary key is unbounded, which will make the upsert metadata map size unbounded

Yupeng Fu

12/08/2020, 7:30 PM

it’s bounded by the msgs consumed?

Jackie

12/08/2020, 7:31 PM

Copy code

"retentionTimeUnit": "DAYS",
      "retentionTimeValue": "9125",

About 25 years data lol

Jackie

12/08/2020, 7:33 PM

If we want to re-compute the records for the previous day to fix the data every day, we should use the hybrid table approach, which is designed for this

lâm nguyễn hoàng

12/08/2020, 7:41 PM

ok tks everybody @Xiang Fu @Yupeng Fu @Jackie

lâm nguyễn hoàng

12/08/2020, 7:41 PM

got it

Abhijeet Kushe

05/23/2023, 6:28 PM

@lâm nguyễn hoàng We are also using realtime table in upsert mode with Kinesis.You have mentioned you are 450 million records.Is that total volume ?

2 Views

Open in Slack

Previous Next