Hi team! We (<@U04M3BY8BQS>) use self hosted data ...
# support
b
Hi team! We (@little-notebook-46410) use self hosted data plane (helm installation) and Cloud control plane. All worked fine for weeks or months, but 3 days ago we stopped receiving events in Redshift. Events are received well by the server (code 200), but it seems to be a problem with the Postgresql Database. We have this error (the source and destination IDS are correct, and are the same as always)
Copy code
2023-05-08T11:27:54.327Z        DEBUG   processor       processor/processor.go:986      [Processor: getFailedEventJobs] Error [404] for source "2IGmf7QY65qZo5M3Fod42UK6PdM" and destination "2IJDJoR7cyhThDNKXS8OUZ2RN3T": Not Found
The database logs are:
Copy code
PostgreSQL Database directory appears to contain a database; Skipping initialization

2023-05-08 09:45:35.859 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-05-08 09:45:35.859 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2023-05-08 09:45:35.862 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-05-08 09:45:35.886 UTC [14] LOG:  database system was shut down at 2023-05-08 09:45:31 UTC
2023-05-08 09:45:35.896 UTC [1] LOG:  database system is ready to accept connections
2023-05-08 09:45:44.826 UTC [29] LOG:  incomplete startup packet
2023-05-08 09:46:08.599 UTC [52] LOG:  incomplete startup packet
2023-05-08 09:46:08.704 UTC [53] FATAL:  terminating connection due to administrator command
2023-05-08 09:46:08.707 UTC [54] FATAL:  terminating connection due to administrator command
2023-05-08 09:46:08.713 UTC [56] FATAL:  terminating connection due to administrator command
We don't find the solution. Could you help us, please?
g
Hi Andres, which rudder-server version do you use?
Is it an issue with this destination only or with all the destinations?
b
From the kubernetes config:
Copy code
backend:
  image:
    repository: rudderlabs/rudder-server
    version: 1.3.4
We have the problem with this connection (source->destination) only, but currently it is the only one we have active
g
are you using the cloud control plane or open source control plane(config-generator)? config-generator is not maintained, move to cloud control plane if you're using config-generator
b
yes, I'm using the cloud control plane
g
Hi Andres, can you please try upgrading your rudder-server version to
v1.7
and then retry
This issue is unexpected. Trying to upgrade will not only make your system more stable but possible that the potential bug has been resolved already. If the issue persists, we will raise a bug and fix it soon.
b
The upgrade solved the issue! Now we are receiving the events in Redshift again. However, we have lost the events of the 4 days period where the problem existed. Is there any way to recover these events?
g
Can you check the logs again? Do you see any error? The recovery depends on where those events failed. There's an automatic retry mechanism for failed events, all you have to do for that is wait for sometime to let this happen. Sometimes, you need to retry manually such as for this case
From previous logs, it seems these events are not recoverable. Can you run this query in your jobsdb
Copy code
SELECT COALESCE(job_state,'unprocessed'), COUNT(*) FROM unionjobsdb('batch_rt',20) 
 WHERE custom_val = 'RS'
 AND (job_state IS NULL OR job_state NOT IN ('succeeded', 'aborted')) 
GROUP BY job_state;
l
This is the result:
Copy code
coalesce   | count 
-------------+-------
 unprocessed |   329
(1 row)
how we can process these events?
g
this is likely your live traffic, not the older events
I'm afraid, those dropped events are not recoverable
b
Ok, we'll take into account in our downstream models. Thanks for trying to help us!