Hi team We < little notebook 46410> use self hosted data pla RudderStack #support

Hi team! We (<@U04M3BY8BQS>) use self hosted data ...

billions-twilight-38411

05/08/2023, 11:46 AM

Hi team! We (@little-notebook-46410) use self hosted data plane (helm installation) and Cloud control plane. All worked fine for weeks or months, but 3 days ago we stopped receiving events in Redshift. Events are received well by the server (code 200), but it seems to be a problem with the Postgresql Database. We have this error (the source and destination IDS are correct, and are the same as always)

Copy code

2023-05-08T11:27:54.327Z        DEBUG   processor       processor/processor.go:986      [Processor: getFailedEventJobs] Error [404] for source "2IGmf7QY65qZo5M3Fod42UK6PdM" and destination "2IJDJoR7cyhThDNKXS8OUZ2RN3T": Not Found

The database logs are:

Copy code

PostgreSQL Database directory appears to contain a database; Skipping initialization

2023-05-08 09:45:35.859 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-05-08 09:45:35.859 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2023-05-08 09:45:35.862 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-05-08 09:45:35.886 UTC [14] LOG:  database system was shut down at 2023-05-08 09:45:31 UTC
2023-05-08 09:45:35.896 UTC [1] LOG:  database system is ready to accept connections
2023-05-08 09:45:44.826 UTC [29] LOG:  incomplete startup packet
2023-05-08 09:46:08.599 UTC [52] LOG:  incomplete startup packet
2023-05-08 09:46:08.704 UTC [53] FATAL:  terminating connection due to administrator command
2023-05-08 09:46:08.707 UTC [54] FATAL:  terminating connection due to administrator command
2023-05-08 09:46:08.713 UTC [56] FATAL:  terminating connection due to administrator command

We don't find the solution. Could you help us, please?

gentle-petabyte-80785

05/08/2023, 12:33 PM

Hi Andres, which rudder-server version do you use?

gentle-petabyte-80785

05/08/2023, 12:34 PM

Is it an issue with this destination only or with all the destinations?

billions-twilight-38411

05/08/2023, 1:13 PM

From the kubernetes config:

Copy code

backend:
  image:
    repository: rudderlabs/rudder-server
    version: 1.3.4

We have the problem with this connection (source->destination) only, but currently it is the only one we have active

gentle-petabyte-80785

05/08/2023, 7:16 PM

are you using the cloud control plane or open source control plane(config-generator)? config-generator is not maintained, move to cloud control plane if you're using config-generator

billions-twilight-38411

05/09/2023, 6:35 AM

yes, I'm using the cloud control plane

gentle-petabyte-80785

05/09/2023, 8:52 AM

Hi Andres, can you please try upgrading your rudder-server version to

v1.7

and then retry

gentle-petabyte-80785

05/09/2023, 8:54 AM

This issue is unexpected. Trying to upgrade will not only make your system more stable but possible that the potential bug has been resolved already. If the issue persists, we will raise a bug and fix it soon.

billions-twilight-38411

05/10/2023, 8:54 AM

The upgrade solved the issue! Now we are receiving the events in Redshift again. However, we have lost the events of the 4 days period where the problem existed. Is there any way to recover these events?

gentle-petabyte-80785

05/12/2023, 5:38 AM

Can you check the logs again? Do you see any error? The recovery depends on where those events failed. There's an automatic retry mechanism for failed events, all you have to do for that is wait for sometime to let this happen. Sometimes, you need to retry manually such as for this case

gentle-petabyte-80785

05/12/2023, 9:20 AM

From previous logs, it seems these events are not recoverable. Can you run this query in your jobsdb

Copy code

SELECT COALESCE(job_state,'unprocessed'), COUNT(*) FROM unionjobsdb('batch_rt',20) 
 WHERE custom_val = 'RS'
 AND (job_state IS NULL OR job_state NOT IN ('succeeded', 'aborted')) 
GROUP BY job_state;

little-notebook-46410

05/12/2023, 9:40 AM

This is the result:

Copy code

coalesce   | count 
-------------+-------
 unprocessed |   329
(1 row)

little-notebook-46410

05/12/2023, 9:40 AM

how we can process these events?

gentle-petabyte-80785

05/15/2023, 3:06 PM

this is likely your live traffic, not the older events

gentle-petabyte-80785

05/15/2023, 3:09 PM

I'm afraid, those dropped events are not recoverable

billions-twilight-38411

05/16/2023, 1:29 PM

Ok, we'll take into account in our downstream models. Thanks for trying to help us!

Open in Slack

Previous Next