Hi team, We are using Rudderstack open-source and...
# support
b
Hi team, We are using Rudderstack open-source and used Helm chart (1.7.3) to deploy it. rudderstack pod is continuously restarting with multiple errors.
Below is the error logs: 2023-05-08 161223.429 /go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0xa5 2023-05-08 161223.429 created by golang.org/x/sync/errgroup.(*Group).Go 2023-05-08 161223.429 /go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x64 2023-05-08 161223.429 golang.org/x/sync/errgroup.(*Group).Go.func1() 2023-05-08 161223.429 /rudder-server/processor/stash/stash.go:108 +0x25 2023-05-08 161223.429 github.com/rudderlabs/rudder-server/processor/stash.(*HandleT).Start.func2() 2023-05-08 161223.429 /rudder-server/processor/stash/stash.go:326 +0xe05 2023-05-08 161223.429 github.com/rudderlabs/rudder-server/processor/stash.(*HandleT).readErrJobsLoop(0xc001924680, {0x3033ad0, 0xc0019c2c80}) 2023-05-08 161223.429 goroutine 1018 [running]: 2023-05-08 161223.429 2023-05-08 161223.429 panic: EOF 2023-05-08 161223.427 2023-05-08T104223.427Z ERROR processor.stash stash/stash.go:325 Error occurred while reading proc error jobs. Err: EOF 2023-05-08 161220.737 2023-05-08T104220.737Z INFO gateway gateway/gateway.go:997 IP: 240149005447b4647433d88fc6b:d37d -- /v1/identify -- Response: 500, EOF 2023-05-08 161220.467 2023-05-08T104220.467Z INFO gateway gateway/gateway.go:997 IP: 203.192.195.168 -- /v1/track -- Response: 500, EOF 2023-05-08 161220.467 2023-05-08T104220.467Z INFO gateway gateway/gateway.go:997 IP: 203.192.195.168 -- /v1/track -- Response: 500, EOF 2023-05-08 161219.913 2023-05-08T104219.913Z INFO gateway gateway/gateway.go:997 IP: 46.235.95.139 -- /v1/track -- Response: 500, EOF 2023-05-08 161219.518 2023-05-08T104219.518Z INFO gateway gateway/gateway.go:997 IP: 66.249.66.68 -- /v1/page -- Response: 500, EOF 2023-05-08 161219.198 2023-05-08 161219.198 /go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0xa5 2023-05-08 161219.198 created by golang.org/x/sync/errgroup.(*Group).Go 2023-05-08 161219.198 /go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x64 2023-05-08 161219.198 golang.org/x/sync/errgroup.(*Group).Go.func1() 2023-05-08 161219.198 /rudder-server/utils/misc/misc.go:1183 +0x70 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/utils/misc.WithBugsnag.func1() 2023-05-08 161219.198 /rudder-server/jobsdb/jobsdb.go:944 +0x25 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/jobsdb.(*HandleT).readerSetup.func1() 2023-05-08 161219.198 /rudder-server/jobsdb/jobsdb.go:2857 +0x20c 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/jobsdb.(*HandleT).refreshDSListLoop(0xc000fa4480, {0x3033ad0, 0xc001798280}) 2023-05-08 161219.198 /rudder-server/jobsdb/internal/lock/lock.go:64 +0x10a 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/jobsdb/internal/lock.(*Locker).WithLockInCtx(0xc0012d72c8, {0x3033b40, 0xc004b65c80}, 0xc005d5feb8) 2023-05-08 161219.198 /rudder-server/jobsdb/jobsdb.go:2858 +0x30 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/jobsdb.(*HandleT).refreshDSListLoop.func1({0x30142c0?, 0x45eadb0?}) 2023-05-08 161219.198 /rudder-server/jobsdb/jobsdb.go:1058 +0x8a 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/jobsdb.(*HandleT).refreshDSRangeList(0xc000fa4480, {0x30142c0, 0x45eadb0}) 2023-05-08 161219.198 /rudder-server/jobsdb/jobsdb.go:1030 +0x65 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/jobsdb.(*HandleT).refreshDSList(0xc000fa4480, {0x30142c0?, 0x45eadb0?}) 2023-05-08 161219.198 /rudder-server/jobsdb/jobsdb_utils.go:26 +0x5e 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/jobsdb.getDSList({0x3028fc8, 0xc000fa4480}, {0x3010980?, 0xc000d8d790?}, {0x29e3c0f, 0x2}) 2023-05-08 161219.198 /rudder-server/jobsdb/jobsdb_utils.go:83 +0x58 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/jobsdb.mustGetAllTableNames({0x3028fc8, 0xc000fa4480}, {0x3010980?, 0xc000d8d790?}) 2023-05-08 161219.198 /rudder-server/jobsdb/jobsdb.go:510 +0xff 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/jobsdb.(*HandleT).assertError(0xc000fa4480, {0x3010c00, 0xc000050140}) 2023-05-08 161219.198 /rudder-server/utils/logger/logger.go:186 +0x92 2023-05-08 161219.198 github.com/rudderlabs/rudder-server/utils/logger.(*logger).Fatal(0xc000f77860, {0xc0048f45a0?, 0x0?, 0x0?}) 2023-05-08 161219.198 2023-05-08T104219.198Z ERROR jobsdb.gw jobsdb/jobsdb.go:510 goroutine 976 [running]: 2023-05-08 161219.198 2023-05-08T104219.198Z ERROR jobsdb.gw jobsdb/jobsdb.go:510 gwREADmap[{gw_jobs_1505 gw_job_status_1505 1505}map[ all map[GWmap[map[not_picked_yet:{No Jobs 2023-05-08 104133.262870844 +0000 UTC m=+478.713423387}] GW_GWmap[not picked yet{No Jobs 2023-05-08 104133.262871368 +0000 UTC m=+478.713423911}]]]] {gw_jobs_1506 gw_job_status_1506 1506}map[2E4PhRsZ1XO2IkIQldnT13Q5SGpmap[GWmap[map[executing:{Has Jobs 2023-05-08 104133.27916532 +0000 UTC m=+478.729717862} succeeded:{Has Jobs 2023-05-08 104122.28467654 +0000 UTC m=+467.735229082}] GW_GWmap[executing{Has Jobs 2023-05-08 104133.279165779 +0000 UTC m=+478.729718324} succeeded:{Has Jobs 2023-05-08 104122.284676974 +0000 UTC m=+467.735229517}]]] _all_map[GWmap[map[executing{Has Jobs 2023-05-08 104133.279167564 +0000 UTC m=+478.729720106} not_picked_yet:{Has Jobs 2023-05-08 104133.276127556 +0000 UTC m=+478.726680098} succeeded:{Has Jobs 2023-05-08 104122.284678656 +0000 UTC m=+467.735231198}] GW_GWmap[executing{Has Jobs 2023-05-08 104133.279167902 +0000 UTC m=+478.729720444} not_picked_yet:{Has Jobs 2023-05-08 104133.276128151 +0000 UTC m=+478.726680694} succeeded:{Has Jobs 2023-05-08 104122.284679075 +0000 UTC m=+467.735231618}]]]]] 2023-05-08 161219.198 Ranges: [{29181495 29199709 0 0 {gw_jobs_1505 gw_job_status_1505 1505}}] 2023-05-08 161219.198 List: [{gw_jobs_1505 gw_job_status_1505 1505} {gw_jobs_1506 gw_job_status_1506 1506}] 2023-05-08 161218.358 2023-05-08 161218.358 /go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0xa5 2023-05-08 161218.358 created by golang.org/x/sync/errgroup.(*Group).Go 2023-05-08 161218.358 /go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x64 2023-05-08 161218.358 golang.org/x/sync/errgroup.(*Group).Go.func1() 2023-05-08 161218.358 /rudder-server/utils/misc/misc.go:1183 +0x70 2023-05-08 161218.358 github.com/rudderlabs/rudder-server/utils/misc.WithBugsnag.func1() 2023-05-08 161218.358 /rudder-server/jobsdb/jobsdb.go:970 +0x25 2023-05-08 161218.358 github.com/rudderlabs/rudder-server/jobsdb.(*HandleT).writerSetup.func1() 2023-05-08 161218.358 /rudder-server/jobsdb/jobsdb.go:2821 +0xdf 2023-05-08 161218.358 github.com/rudderlabs/rudder-server/jobsdb.(*HandleT).addNewDSLoop(0xc00130dd40, {0x3033ad0, 0xc0015eb4a0}) 2023-05-08 161218.358 /rudder-server/jobsdb/jobsdb.go:510 +0xff 2023-05-08 161218.358 github.com/rudderlabs/rudder-server/jobsdb.(*HandleT).assertError(0xc00130dd40, {0x3010c00, 0xc000050140}) 2023-05-08 161218.358 /rudder-server/utils/logger/logger.go:186 +0x92 2023-05-08 161218.358 github.com/rudderlabs/rudder-server/utils/logger.(*logger).Fatal(0xc0013870b0, {0xc0057a5aa0?, 0xc00178de30?, 0x11cb585?}) 2023-05-08 161218.358 2023-05-08T104218.358Z ERROR jobsdb.proc_error jobsdb/jobsdb.go:510 goroutine 972 [running]: 2023-05-08 161218.358 2023-05-08T104218.357Z ERROR jobsdb.proc_error jobsdb/jobsdb.go:510 proc_errormap[{proc_error_jobs_104 proc_error_job_status_104 104}map[ all map[map[map[failed:{Query in progress 2023-05-08 104206.138014149 +0000 UTC m=+511.588566691} not_picked_yet:{No Jobs 2023-05-08 104136.137904844 +0000 UTC m=+481.588457387}] _map[failed{Query in progress 2023-05-08 104206.138014741 +0000 UTC m=+511.588567272} not_picked_yet:{No Jobs 2023-05-08 104136.137905381 +0000 UTC m=+481.588457924}]]]]] 2023-05-08 161218.358 Ranges: [] 2023-05-08 161218.358 List: [{proc_error_jobs_104 proc_error_job_status_104 104}]
h
It seems to be an issue with your postgres pods
b
what type of issue? They are up and healthy with enough resources.
h
Read queries seems to be returning EOF according to logs
b
how can I test this?
h
Try connecting to one of the postgres pods and run a count(*) query on any one of the table
or an Insert query perhaps
b
from the pod?
h
yes from the pod
b
btw I can see events going to destination...could it still be that issue?
h
are the pods up? or are they still crashing?
b
they are up, but crashing every 10mins approx
h
the event delivery is happening in those 10mins
Can you paste the postgres pod logs?
b
it is an external db
h
Then can you check its health and probably metrics? From the logs, it doesn’t seem like a problem with the rudder-server.
1.7.3
is a pretty stable version
b
I just checked it...it is healthy with enough resources.
h
any logs if possible from that external DB?
b
checking...
nothing on db logs
could query timeouts be an issue?
h
Timeouts would return a different error. Can you send the whole log file instead of crash log