One more question - We are loading data for an off...
# troubleshooting
p
One more question - We are loading data for an offline table using a spark job. We see some segments go into ERROR state. What makes a segment go into ERROR state v/s OFFLINE? Does it mean the download of the segment from S3 to local server disc failed? If the server retries to grab the segment, how many retries does it do, and is it configurable? Will
Reload Segment
force download of segment from S3 to pinot server disc?
m
Segment is in ERROR state in EV if the server is unable to load it (for whatever reason, failed download, or any other issue).
Can you check table debug api in swagger to see if it returns any exceptions? If not, check server log.
👍 1
p
For sake of completion, what does OFFLINE state signify?
m
Is this for realtime ingestion of batch ingestion?
p
batch ingestion
m
For batch ingestion, segment should either be ONLINE or ERROR.
👍 1
p
Copy code
{
  "segmentName": "offlinebookingnarrow_poc_OFFLINE_10",
  "serverState": {
    "Server_pinot-server-7.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-5.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-9.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-13.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-1.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-11.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-6.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-8.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-4.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-12.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-14.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-0.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-2.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    },
    "Server_pinot-server-10.pinot-server-headless.svc.cluster.local_8098": {
      "idealState": null,
      "externalView": null,
      "segmentSize": null,
      "consumerInfo": null,
      "errorInfo": null
    }
  }
}
This is what I see for a segment which is in
error
state. I am using
Copy code
curl -X GET "https://<controller_endpoint>/debug/tables/<table_name>?type=OFFLINE&verbosity=0" -H "accept: application/json"
Using any other value for verbosity comes back with status code 500. I'll have a look at logs to see why.
Copy code
C02TW0TMHTDG:~ pbagrecha$ curl -X GET "https://<controller_endpoint>/debug/tables/<table_name>?type=OFFLINE&verbosity=1" -H "accept: application/json"
{"code":500,"error":null}
Copy code
"tableName" : "tablename_OFFLINE",
"numSegments" : 7000,
"numServers" : 15,
"numBrokers" : 3,
"brokerDebugInfos" : [ ],
"tableSize" : {
  "reportedSize" : "2 TB",
  "estimatedSize" : "2 TB"
},
"ingestionStatus" : {
  "ingestionState" : "UNKNOWN",
  "errorMessage" : "Cannot retrieve ingestion status for Table : offlinebookingnarrow_poc_OFFLINE since it does not use the built-in SegmentGenerationAndPushTask task"
C02TW0TMHTDG:~ pbagrecha$ curl -X GET "https://<controller_endpoint>/debug/tables/<table_name>?type=OFFLINE&verbosity=1" -H "accept: application/json" {"code":500,"error":null}
Copy code
Finished reading information for table: <table_namE>_OFFLINE

Server error:

java.lang.NullPointerException: null
at org.apache.pinot.controller.api.resources.TableDebugResource.debugSegments(TableDebugResource.java:266) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.controller.api.resources.TableDebugResource.debugTable(TableDebugResource.java:154) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.apache.pinot.controller.api.resources.TableDebugResource.getTableDebugInfo(TableDebugResource.java:132) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:679) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:353) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
at java.lang.Thread.run(Thread.java:829) [?:?]
Copy code
Reloading single segment: offlinebookingnarrow_poc_OFFLINE_10 in table: offlinebookingnarrow_poc_OFFLINE

Segment metadata is null. Skip reloading segment: offlinebookingnarrow_poc_OFFLINE_10 in table: offlinebookingnarrow_poc_OFFLINE
Segment metadata being null - is this an issue from the spark job, or issue while registering segment with zookeeper? and how can i go about debugging this?
Copy code
curl -X GET "https://<controller_endpoint>/segments/offlinebookingnarrow_poc/offlinebookingnarrow_poc_OFFLINE_10/metadata" -H "accept: application/json"
{"custom.map":"{\"input.data.file.uri\":\"s3 location\"}","segment.crc":"3096624694","segment.creation.time":"1660684167056","segment.download.url":"another s3 location","segment.index.version":"v3","segment.push.time":"1660686001604","segment.total.docs":"63938"}
is this not the same segment metadata?
m
Is there disk space available on the server? Also can you check the segment directory on the server
p
lots of disc space
Copy code
root@pinot-server-3:/opt/pinot# df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay         100G   31G   70G  31% /
tmpfs            64M     0   64M   0% /dev
tmpfs            30G     0   30G   0% /sys/fs/cgroup
/dev/xvda1      100G   31G   70G  31% /etc/hosts
shm              64M     0   64M   0% /dev/shm
/dev/xvdbb      1.9T  216G  1.7T  12% /var/pinot/server/data
tmpfs            30G   12K   30G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs            30G  4.0K   30G   1% /run/secrets/eks.amazonaws.com/serviceaccount
tmpfs            30G     0   30G   0% /proc/acpi
tmpfs            30G     0   30G   0% /proc/scsi
tmpfs            30G     0   30G   0% /sys/firmware
Copy code
root@pinot-server-3:/var/pinot/server/data/index/offlinebookingnarrow_poc_OFFLINE_10# du -sh
412M	.

root@pinot-server-3:/var/pinot/server/data/index/offlinebookingnarrow_poc_OFFLINE_10# ls -lh
total 4.0K
-rw-rw-r-- 1 root 1337    0 Aug 16 21:48 dmp_segments.inv.inprogress
drwxrwsr-x 2 root 1337 4.0K Aug 16 21:48 v3
root@pinot-server-3:/var/pinot/server/data/index/offlinebookingnarrow_poc_OFFLINE_10#
root@pinot-server-3:/var/pinot/server/data/index/offlinebookingnarrow_poc_OFFLINE_10# ls -lh v3/
total 412M
-rw-rw-r-- 1 root 1337 412M Aug 16 21:48 columns.psf
-rw-rw-r-- 1 root 1337   16 Aug 16 21:48 creation.meta
-rw-rw-r-- 1 root 1337 3.2K Aug 16 21:48 index_map
-rw-rw-r-- 1 root 1337  12K Aug 16 21:46 metadata.properties
root@pinot-server-3:/var/pinot/server/data/index/offlinebookingnarrow_poc_OFFLINE_10#
m
Hmm, not sure what the inprogress file is
Does reloading fix the issue
p
Nopes, Logs says skipping reloading as segment metadata is null
m
Can you try deleting and re-pushing the segment?
p
Yeah we have been deleting the entire table, scaling up the cluster and pushing all data again. We will try just pushing this particular segment again.