This message was deleted.
# general
s
This message was deleted.
a
The supervisors run on the Overlord. Could you please check from there (assuming it runs on a different host than the coordinator) and check its logs as well?
a
Couldn’t find anything on logs
@Amatya Avadhanula could it be because of resources
a
I'm not sure. Could you please check the supervisor status from the web-console?
The magnifying glass icon beside the supervisor in the actions column
a
This is what is see
Copy code
{
  "dataSource": "eber_gateways_sensors_data",
  "stream": "eber.gw.sensors.data",
  "partitions": 3,
  "replicas": 1,
  "durationSeconds": 604800,
  "activeTasks": [],
  "publishingTasks": [],
  "latestOffsets": {
    "0": 1982944,
    "1": 1982593,
    "2": 1982209
  },
  "minimumLag": {
    "0": 347306,
    "1": 347624,
    "2": 347179
  },
  "aggregateLag": 1042109,
  "offsetsLastUpdated": "2023-06-16T08:56:02.877Z",
  "suspended": false,
  "healthy": false,
  "state": "UNHEALTHY_SUPERVISOR",
  "detailedState": "UNABLE_TO_CONNECT_TO_STREAM",
  "recentErrors": [
    {
      "timestamp": "2023-06-16T08:54:47.582Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    },
    {
      "timestamp": "2023-06-16T08:55:17.585Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    },
    {
      "timestamp": "2023-06-16T08:55:47.587Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    }
  ]
}
In status
a
It appears that the topic offsets stored in Druid's metadata are no longer available in the Kafka stream.
Was the lag continuously increasing for the supervisor prior to this? Or perhaps the supervisor was left in a suspended state for a long time?
a
the topic is there in kafka and receiving data
yup it was for someother datasource to but i re-submited the ingestion conf which resolve the issue
a
Yes, it just means that the stored offset (
1635638
) is not in the retention period of the topic.
a
but just for these its stuck
any ways to fix that. ?
a
You could reset the offsets, but that could lead to data loss
You can resubmit the supervisor to read from the latest checkpoint if that is ok
a
resubmit the supervisor how can i do this. ??
if i increase the retention period of this topic , i think that will fix the issue
a
if i increase the retention period of this topic , i think that will fix the issue
Yes, that might help prevent this issue
You could try a hard reset if you want to continue reading from latest offset
If you want to read from the earliest available offset, you may have to set
useEarliestOffset
to true in the spec as well. https://druid.apache.org/docs/latest/development/extensions-core/kafka-supervisor-reference.html#kafkasupervisorioconfig
a
Copy code
{
  "dataSource": "eber_gateways_sensors_data",
  "stream": "eber.gw.sensors.data",
  "partitions": 3,
  "replicas": 1,
  "durationSeconds": 604800,
  "activeTasks": [],
  "publishingTasks": [],
  "latestOffsets": {
    "0": 1985770,
    "1": 1985330,
    "2": 1984909
  },
  "minimumLag": {
    "0": 350132,
    "1": 350361,
    "2": 349879
  },
  "aggregateLag": 1050372,
  "offsetsLastUpdated": "2023-06-16T11:12:15.269Z",
  "suspended": false,
  "healthy": false,
  "state": "UNHEALTHY_SUPERVISOR",
  "detailedState": "UNABLE_TO_CONNECT_TO_STREAM",
  "recentErrors": [
    {
      "timestamp": "2023-06-16T11:11:00.260Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    },
    {
      "timestamp": "2023-06-16T11:11:30.260Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    },
    {
      "timestamp": "2023-06-16T11:12:00.263Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    }
  ]
}
ive increased the number of retnetion to 30 days from 7
a
That might help prevent this from happening in the future. Right now, the supervisor offsets need to be reset after suspending the supervisor
After the hard reset, resubmit the supervisor with
useEarliestOffset
set to
true
if you want to ingest all the available data.
a
but it says ill lost all my data , is there any other way to do soo
useEarliestOffset is already on true
ive suspended it
a
You won't lose any exisiting or available data. Only the data which was not ingested and is no longer available due to retention period having passed
You can reingest that data if you have it
a
ive suspended it should i hardreset it now. ?
thats what you are suggesting if im correct or just suspend it and resume it again
a
suspend, hard reset, resume
Just want to reiterate that the data that was not ingested and was past retention period won't be available. You may have to reingest it