This message was deleted Apache Druid #general

Join Slack

This message was deleted.

# general

Slackbot

06/16/2023, 7:23 AM

This message was deleted.

Amatya Avadhanula

06/16/2023, 7:24 AM

The supervisors run on the Overlord. Could you please check from there (assuming it runs on a different host than the coordinator) and check its logs as well?

Anant Sharma

06/16/2023, 7:27 AM

Couldn’t find anything on logs

Anant Sharma

06/16/2023, 7:29 AM

@Amatya Avadhanula could it be because of resources

Amatya Avadhanula

06/16/2023, 7:31 AM

I'm not sure. Could you please check the supervisor status from the web-console?

Amatya Avadhanula

06/16/2023, 7:32 AM

The magnifying glass icon beside the supervisor in the actions column

Anant Sharma

06/16/2023, 8:56 AM

This is what is see

Copy code

{
  "dataSource": "eber_gateways_sensors_data",
  "stream": "eber.gw.sensors.data",
  "partitions": 3,
  "replicas": 1,
  "durationSeconds": 604800,
  "activeTasks": [],
  "publishingTasks": [],
  "latestOffsets": {
    "0": 1982944,
    "1": 1982593,
    "2": 1982209
  },
  "minimumLag": {
    "0": 347306,
    "1": 347624,
    "2": 347179
  },
  "aggregateLag": 1042109,
  "offsetsLastUpdated": "2023-06-16T08:56:02.877Z",
  "suspended": false,
  "healthy": false,
  "state": "UNHEALTHY_SUPERVISOR",
  "detailedState": "UNABLE_TO_CONNECT_TO_STREAM",
  "recentErrors": [
    {
      "timestamp": "2023-06-16T08:54:47.582Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    },
    {
      "timestamp": "2023-06-16T08:55:17.585Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    },
    {
      "timestamp": "2023-06-16T08:55:47.587Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    }
  ]
}

Anant Sharma

06/16/2023, 8:56 AM

In status

Amatya Avadhanula

06/16/2023, 8:58 AM

It appears that the topic offsets stored in Druid's metadata are no longer available in the Kafka stream.

Amatya Avadhanula

06/16/2023, 8:58 AM

Was the lag continuously increasing for the supervisor prior to this? Or perhaps the supervisor was left in a suspended state for a long time?

Anant Sharma

06/16/2023, 9:00 AM

the topic is there in kafka and receiving data

Anant Sharma

06/16/2023, 9:01 AM

yup it was for someother datasource to but i re-submited the ingestion conf which resolve the issue

Amatya Avadhanula

06/16/2023, 9:01 AM

Yes, it just means that the stored offset (

) is not in the retention period of the topic.

Anant Sharma

06/16/2023, 9:01 AM

but just for these its stuck

Anant Sharma

06/16/2023, 9:01 AM

any ways to fix that. ?

Amatya Avadhanula

06/16/2023, 9:02 AM

You could reset the offsets, but that could lead to data loss

Amatya Avadhanula

06/16/2023, 9:02 AM

You can resubmit the supervisor to read from the latest checkpoint if that is ok

Anant Sharma

06/16/2023, 9:03 AM

resubmit the supervisor how can i do this. ??

Anant Sharma

06/16/2023, 9:04 AM

if i increase the retention period of this topic , i think that will fix the issue

Amatya Avadhanula

06/16/2023, 9:57 AM

if i increase the retention period of this topic , i think that will fix the issue

Yes, that might help prevent this issue

Amatya Avadhanula

06/16/2023, 9:58 AM

You could try a hard reset if you want to continue reading from latest offset

Amatya Avadhanula

06/16/2023, 9:59 AM

If you want to read from the earliest available offset, you may have to set

useEarliestOffset

to true in the spec as well. https://druid.apache.org/docs/latest/development/extensions-core/kafka-supervisor-reference.html#kafkasupervisorioconfig

Anant Sharma

06/16/2023, 11:13 AM

Copy code

{
  "dataSource": "eber_gateways_sensors_data",
  "stream": "eber.gw.sensors.data",
  "partitions": 3,
  "replicas": 1,
  "durationSeconds": 604800,
  "activeTasks": [],
  "publishingTasks": [],
  "latestOffsets": {
    "0": 1985770,
    "1": 1985330,
    "2": 1984909
  },
  "minimumLag": {
    "0": 350132,
    "1": 350361,
    "2": 349879
  },
  "aggregateLag": 1050372,
  "offsetsLastUpdated": "2023-06-16T11:12:15.269Z",
  "suspended": false,
  "healthy": false,
  "state": "UNHEALTHY_SUPERVISOR",
  "detailedState": "UNABLE_TO_CONNECT_TO_STREAM",
  "recentErrors": [
    {
      "timestamp": "2023-06-16T11:11:00.260Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    },
    {
      "timestamp": "2023-06-16T11:11:30.260Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    },
    {
      "timestamp": "2023-06-16T11:12:00.263Z",
      "exceptionClass": "org.apache.druid.java.util.common.ISE",
      "message": "org.apache.druid.java.util.common.ISE: Previous sequenceNumber [1635638] is no longer available for partition [0]. You can clear the previous sequenceNumber and start reading from a valid message by using the supervisor's reset API.",
      "streamException": true
    }
  ]
}

ive increased the number of retnetion to 30 days from 7

Amatya Avadhanula

06/16/2023, 11:20 AM

That might help prevent this from happening in the future. Right now, the supervisor offsets need to be reset after suspending the supervisor

Amatya Avadhanula

06/16/2023, 11:22 AM

After the hard reset, resubmit the supervisor with

useEarliestOffset

set to

true

if you want to ingest all the available data.

Anant Sharma

06/16/2023, 11:32 AM

but it says ill lost all my data , is there any other way to do soo

Anant Sharma

06/16/2023, 11:33 AM

useEarliestOffset is already on true

Anant Sharma

06/16/2023, 11:34 AM

ive suspended it

Amatya Avadhanula

06/16/2023, 11:34 AM

You won't lose any exisiting or available data. Only the data which was not ingested and is no longer available due to retention period having passed

Amatya Avadhanula

06/16/2023, 11:34 AM

You can reingest that data if you have it

Anant Sharma

06/16/2023, 11:35 AM

ive suspended it should i hardreset it now. ?

Anant Sharma

06/16/2023, 11:36 AM

thats what you are suggesting if im correct or just suspend it and resume it again

Amatya Avadhanula

06/16/2023, 11:36 AM

suspend, hard reset, resume

Amatya Avadhanula

06/16/2023, 11:37 AM

Just want to reiterate that the data that was not ingested and was past retention period won't be available. You may have to reingest it

Open in Slack

Previous Next