https://linen.dev logo
t

Titas Skrebė

02/14/2022, 7:06 PM
Hi. I have a postgres->bigquery connector both of whose pods are in error stage as you see below.
Copy code
k get pods | grep 126044
airbyte-bigquery-sync-126044-0-bghap       0/5     Error       0          20h
source-postgres-sync-126044-0-zgnvm        0/4     Error       0          20h
but on the UI it shows running and doesn’t restart / retry it even after 20h. any idea what’s going on. I am on 0.35.10-alpha
m

Mohammad Safari

02/14/2022, 7:07 PM
I usually go ahead and cancel the job but this is happening a few times a week now
I don’t think this matters but one of the pod logs:
Copy code
k logs source-postgres-sync-126044-0-zgnvm --all-containers
Using existing AIRBYTE_ENTRYPOINT: /airbyte/base.sh
Waiting on CHILD_PID 7
PARENT_PID: 1
Heartbeat to worker failed, exiting...
received ABRT
bothpods are getting ABRT message
j

Javier

02/14/2022, 8:51 PM
It’s only happening to this specific connection or others? Could you check the resources of your k8s cluster?
multiple connections. seems like scheduler or worker going OOM causing this and they don’t autofix the situation
s

Sai Sriram

02/15/2022, 10:39 AM
Is it possible to share the logs for scheduler/server? Also is increasing memory an option to explore?
c

Cayden Brasher

02/16/2022, 1:42 AM
I can confirm it’s OOM issue. raising memory limit helps but doesn’t fundamentally fix the issue. what should we do so scheduler can clean up if it restarts?
don’t have scheduler/server logs now
g

gunu

02/16/2022, 3:19 AM
@Mohammad Safari can you share more info about how much memory are you allocating to Airbyte?
scheduler: 32G and still many errors (code=137)
j

Justin Reynolds

02/18/2022, 12:00 AM
Can you increase the memory for source/destination pods?
4 Views