Hi. I have a postgres->bigquery connector both ...
# troubleshooting
t
Hi. I have a postgres->bigquery connector both of whose pods are in error stage as you see below.
Copy code
k get pods | grep 126044
airbyte-bigquery-sync-126044-0-bghap       0/5     Error       0          20h
source-postgres-sync-126044-0-zgnvm        0/4     Error       0          20h
but on the UI it shows running and doesn’t restart / retry it even after 20h. any idea what’s going on. I am on 0.35.10-alpha
m
I usually go ahead and cancel the job but this is happening a few times a week now
I don’t think this matters but one of the pod logs:
Copy code
k logs source-postgres-sync-126044-0-zgnvm --all-containers
Using existing AIRBYTE_ENTRYPOINT: /airbyte/base.sh
Waiting on CHILD_PID 7
PARENT_PID: 1
Heartbeat to worker failed, exiting...
received ABRT
bothpods are getting ABRT message
j
It’s only happening to this specific connection or others? Could you check the resources of your k8s cluster?
multiple connections. seems like scheduler or worker going OOM causing this and they don’t autofix the situation
s
Is it possible to share the logs for scheduler/server? Also is increasing memory an option to explore?
c
I can confirm it’s OOM issue. raising memory limit helps but doesn’t fundamentally fix the issue. what should we do so scheduler can clean up if it restarts?
don’t have scheduler/server logs now
g
@Mohammad Safari can you share more info about how much memory are you allocating to Airbyte?
scheduler: 32G and still many errors (code=137)
j
Can you increase the memory for source/destination pods?