I am seeing a problem with upsert queries I have a...
# troubleshooting
a
I am seeing a problem with upsert queries I have all the primary keys on the record having different values but I am only able to see those records with
option (skipUpsert=true)
but those records are not returned with
option (skipUpsert=false)
I added additional primary keys 10 days ago and the segment was created 2 days back.Any ideas what the issue is ? @Neha Pawar
h
@Kartik Khare @Jackie ^
a
I have been using upsert since 7 months have not had the issue so far.I am using full upsert mode.But thanks for sharing this
h
then it might be a different problem @Kartik Khare might know more
j
Do you use time column as the comparison column for upsert? Is it possible that the segment containing the largest comparison value of the primary key just expired and removed?
a
No I don't have time column in comparison
k
Hi Abhijeet can you share the tableConfig before and after adding the additional primary key
a
Copy code
{
  "tableName": "metricEvents",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "eventTimestamp",
    "timeType": "MILLISECONDS",
    "schemaName": "workflowEvents",
    "replicasPerPartition": "1",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "365",
    "segmentPushType": "APPEND"
  },
  "tenants": {
    "broker": "DefaultTenant",
    "server": "DefaultTenant"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kinesis",
      "stream.kinesis.topic.name": "metrics-stream",
      "region": "us-east-1",
      "shardIteratorType": "LATEST",
      "maxRecordsToFetch": 1,
      "stream.kinesis.consumer.type": "lowlevel",
      "stream.kinesis.fetch.timeout.millis": "30000",
      "stream.kinesis.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kinesis.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kinesis.KinesisConsumerFactory",
      "realtime.segment.flush.threshold.size": "5000000",
      "realtime.segment.flush.threshold.time": "1d"
    }
  },
  "upsertConfig": {
    "mode": "FULL"
  },
  "routing": {
    "instanceSelectorType": "strictReplicaGroup"
  },
  "metadata": {
    "customConfigs": {}
  }
}
@Kartik Khare I have not changed the the config before and after the schema change.
@Kartik Khare I have uploaded query results for both
option (skipUpsert=false)
and
option (skipUpsert=true)
Let me know if you need more information
j
@Abhijeet Kushe What is your primary key? Also which record is missing but supposed to show?
Have you restarted the servers after updating the config?
a
I didn't start the servers I was told in the past that the mapping is applied from the next segment the ..records in the above query which have upsert=false are valid records
primary key is a composite key … [“accountId”, “metricDefinitionId”, “metricInstanceId”, “metricRunningId”, “recordType”, “taskId”, “taskKind”, “attributeId”, “automationFlowId”], I only shared attributeId above
j
Oh, if you changed the primary key, you have to restart the servers in order to apply the changes, or the server will use the old primary key to do the upsert
You may restart now, and Pinot should rebuild the upsert metadata with the new primary key
a
Will we lose any data in an consuming segment…we use realtime table ?
@Jackie this works after restart.But I wanted to know what is the recommended way for pinot server restarts.We use kubernetes.I just did a ctrl+k on server pod container it came back up but the controller and broker queries were failing saying that server..is not able to connect.Then I did ctrl+k on server-headless.For a while it was giving the same error but it came back up later on.There was some downtime
j
@Abhijeet Kushe Do you have multiple replicas for this table? In order to perform a no-downtime restart, you need to have multiple replicas, and perform a rolling restart. The already consumed data in the consuming segment should be re-consumed after the restart, so you should not lose any data
a
Our setting is “replicasPerPartition”: “1"
h
when “replicasPerPartition” is “1", downtime is unavoidable
a
Thanks @Haitao Zhang what is the recommended replicasPerPartitiion. And secondly when we make a change now will this again require a restart of the pinot server for it to be effective ?
I have another question about.We have to increase our retentionTimeValue to higher value.Current value is “retentionTimeValue”: “365" .Does this need a restart ?
n
This should not need a restart. Periodic task for retention will pick up fresh values everytime it runs
a
Thanks @Neha Pawar what about the “replicasPerPartition”: “1" change ?
n
As for replicas, generally we recommend at least 3 in production. You would only need to run a rebalance. No restart needed
How many servers and how many replicas currently? And what are you planning to increase it to?
a
We are planning to have 3 servers.So that means 2 replicas right ?
h
if you have 3 servers, you can have 3 replicas
a
Ok we have 1 server right now
@Haitao Zhang do we need to just add more replicas to increase servers or is there some other config. Also do I need to add more replicas for minion, controller and broker as well or just the server for data