Hello, I'm encountering a problem where Pinot is n...
# troubleshooting
p
Hello, I'm encountering a problem where Pinot is not consuming kafka events for a realtime table after defining it. What are some quick places to look at, to understand what might be causing this?
m
Copy code
1. Check external view/ ideal state to see if table was created or not.
2. If table was created, is the segment in external view in error state? Then look at server log
3. If table not created, look at controller log on what happened when you issused the create table command.
👍 1
p
This kafka topic has 13M+ events partitioned over 16 partitions. I can consume these events using kafka tools. Memory-wise & CPU-Wise Pinot components seem stable.
Ideal_state reports "customized" mode:
Copy code
{
  "id": "ComputedView_REALTIME",
  "simpleFields": {
    "BATCH_MESSAGE_MODE": "false",
    "BUCKET_SIZE": "0",
    "IDEAL_STATE_MODE": "CUSTOMIZED",
    "INSTANCE_GROUP_TAG": "ComputedView_REALTIME",
    "MAX_PARTITIONS_PER_INSTANCE": "1",
    "NUM_PARTITIONS": "0",
    "REBALANCE_MODE": "CUSTOMIZED",
    "REPLICAS": "1",
    "STATE_MODEL_DEF_REF": "SegmentOnlineOfflineStateModel",
    "STATE_MODEL_FACTORY_NAME": "DEFAULT"
  },
  "mapFields": {
    "ComputedView__0__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__10__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__11__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__12__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__13__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__14__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__15__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__1__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__2__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__3__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__4__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__5__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__6__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__7__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__8__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    },
    "ComputedView__9__0__20210429T1647Z": {
      "Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098": "CONSUMING"
    }
  },
  "listFields": {}
}
m
This is good. Check external view
p
That is the external view from the zookeeper Browser, unless I'm meant to search elsewhere?
m
You mentioned it was Ideal state
Is above that you pasted Ideal State or External View?
p
I meant the 'IDEAL_STATE_MODE' property from the externalView found in the Zookeeper Browser is set to "customized". My apologies for the confusion.
m
Ok, external view says that segments are in consuming state
Which implies servers must be consuming
What's the issue you are seeing?
p
They have not changed from consuming in 24h and yet trying to count number of records in this Table is always at 0.
m
check any server to see if it has logs stating it is consuming
p
Controller reports not finding segments:
Copy code
2021/04/30 17:01:44.228 WARN [SegmentDeletionManager] [grizzly-http-server-0] Failed to find local segment file for segment file:/var/pinot/controller/data/ComputedView/ComputedView__0__0__20210429T1713Z
m
What is controller data dir?
p
Is there a way to recompute/recreate segments? I think someone deleted the pvcs associated with the k8s deployment.
m
I thikn if you restart all servers they should simply start consuming from beginning (since nothing was saved)
p
Is there a Pinot API to restart the servers? Or simply deleting the k8s resources?
m
Probably no api. Not sure what deleting would do. Does k8s not have option to restart?
p
Only for deployment resources, which my pinot installation is not.
m
If you delete and recreate, will they get the same name? If not, not sure what the behavior would be. In that case simply nuke everything (delete table first) and restart might be cleaner/safer.
p
Names are consistent yes, I will try that on pods first
m
ok
p
Are segments stored in the same place as table & schema definitions?
m
No
Table/Schema is stored in ZK
Segments are backed up in deep-store (that you configure on controller)
p
Restarting the server pod did not work. I think the issue is because deleting pods, does not delete state of those pods which are defined in Persistent Volume Claims.
What about segment metadata? Is that in zookeeper?
m
Whatever you can browse via ZK is stored there (include segmentZKMetadata). There's also a segment metadata in the segment file itself
p
I think I need to delete segment metadata, to force Pinot to re-read from kafka.
m
Oh yea
p
So this would be stored in ZK right?
m
Then easier to delete and recreate the table
It will be much cleaner that way I think
p
Understood, I'm trying to see if there is an alternative in case something like this happens in Prod and deleting + recreating the table is not an option.
m
In prod, you should configure it in a way that restart server is possible. That will fix this
And you will also not lose prior segments that are already committed
p
Is there documentation on how to do configure server restart in K8s?
m
@Xiang Fu ^^
p
Is there a way to from ZK to know the path of each segment of a Table in disk?
And is there a way to edit that information if need be?
m
There are two copies of segments 1) One backup in deep-store 2) Local copy in serving nodes
ZK has the download url of segments, that points to the deep-store. In the segment metadata
p
Download urls are "null" for me, since I don't have deep-store configured yet.
What about the local copies?
m
it should not be null
Oh it is null because no segment was committed
👍 1
Servers store segment on local disk (whatever data dir you specified) for serving
Also, if you didn't specify a deep-store, then it defaults to dataDir in controller. And the download url will be a controller URL
x
why you want to do so in k8s :
Is there documentation on how to do configure server restart in K8s?
m
@Xiang Fu Let me summarize:
Copy code
1. The cluster went into a state where all segments are showing as consuming in EV.
2. However, someone delete the PVC, so no segments.
3. Typically, fixing PVC + restarting servers should have fixed the problems - as in servers would start consuming again.
🙌 1
My recommendation was to delete/recreate the table. But Pedro is asking what if this happens in prod, what are the ways to fix
x
you can try to manually delete the pod
then it will be recreated
and why delete pvc
p
I tried deleting the pods, did not work since the pods themselves don’t store state, they are stateless. The information about the table is I think in the PVCs.
x
right, I think the pvc will be recreated and as long as the pod name is same, pinot server should re-download all the data from s3