Hello team, I am running a Big Query destination w...
# ask-community-for-troubleshooting
s
Hello team, I am running a Big Query destination with GCS staging, I see the
_airbyte_tmp
tables show up in the target dataset and looks like the connection from source is running. However I don't see any of the
.avro
files accumulating in gcs and the
_airbyte_tmp
tables are empty. I have set the
gcs staging
to not delete the tmp files? How can I tell if data is being lifted and shifted?
👀 1
same output from the
sync history
logs
message has been deleted
h
Hey is the sync done?
s
Looks like the sync is still running, however both gcs and big query tables are empty, in the
airbyter server
i see logs such as
Copy code
Collecting content into /tmp/toBePublished15261227433470886897.tmp before uploading.
Collecting content into /tmp/toBePublished6742600251625951330.tmp before uploading.
Publishing to S3 (bucket=airbyte-dev-logs; key=job-logging/workspace/2/0/logs.log/20220204161420_airbyte-worker-79f96f596f-k2w7w_f41302792c264c988e4b2dda04be8f09):
Publishing /tmp/toBePublished6742600251625951330.tmp to GCS blob (bucket=production_ddp_airbyte_logs; blob=job-logging/workspace/2/0/logs.log/20220204161420_airbyte-worker-79f96f596f-k2w7w_f41302792c264c988e4b2dda04be8f09):
2022-02-04 16:14:27 INFO i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$3):251 - Records read: 4874000
I also see logs like this
Copy code
2022-02-04 16:46:57 destination > 2022-02-04 16:46:57 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to mix_panel_landing/bq_staging/cohort_members/2022_02_04_1643939039719_0.avro with id ABPnzm4SM...6BefCfix0]: Finished uploading [Part number 205 containing 50.04 MB]
however I don't see those
.avro
files in gcs
@Harshith (Airbyte) the first attempt in the sync looks like it completed, however i see this is the logs
Copy code
2022-02-04 23:56:22 INFO i.a.w.p.KubePodProcess(exitValue):710 - Closed all resources for pod source-zendesk-support-sync-2-0-wenpm
2022-02-04 23:56:22 ERROR i.a.w.DefaultReplicationWorker(run):141 - Sync worker failed.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: Cannot find pod while trying to retrieve exit code. This probably means the Pod was not correctly created.
it is now running a second attempt
however i see it has reached the dbt step
and completed successfully
also found this error in a
mixpanel sync
pod
Copy code
kubectl logs pod/source-mixpanel-sync-3-1-dcuis -f -n airbyte main
Using existing AIRBYTE_ENTRYPOINT: python /airbyte/integration_code/main.py
Waiting on CHILD_PID 7
PARENT_PID: 1
Heartbeat to worker failed, exiting...
received ABRT
h
Hey can you confirm if this is happening frequently? Also could you try it again? Also if you have deployment it over kubernetes and you have long running processes you can edit the sweeper-pod.yml where increase 2 hours to x hours.
s
hi harshith, the sync eventuall succeeded after 3 attempt, looks like the initial sync took 21 hours due to the size of the backfill
i will try to increase the sweeper-pod cadence to hopefully mitigate this
hi @Harshith (Airbyte) each backfill process of a sync (scheduled to run hourly) fails with this even after extending the sweeper pod to 48 hours on successfull task pods, even though the data is ETL'd ,
h
Can you share the log of the sync ?
s
here are the details of the mix_panel sync, it's configured to be synced hourly, however doesn't seem to mark itself as succesfull once the backfill is complete
here are the details of the zendesk sync, it's also configured to be synced hourly incrementally (EDIT: my bad it's configured to be
full refresh
instead of
nncremental
sync), and did mark is self as succeeded (after extending the pod sweeper pod to clean up
success worker sync pods
at 48h instead of 2h, however each subsequent syncs does the entire back fill
h
Hey is this resolved ?
s
hi harshith, yes extending the pod sweeper to
48 h
helped