https://linen.dev logo
a

Andrey Morskoy

09/24/2021, 12:20 PM
Dear Team, I have a question on retry policy. I have tried to do the following in a docker-compose dev run mode on laptop: • create source - file (https), 68K lines • attached destination (local file) • started sync • waited untill ~40% is downloaded (besed on log output
Records read: 29000
• then just disabled WiFi • job remains in
Running state
- leaving it for couple of minutes to simulate network issue • then I've re-enabled WiFi • Nothing changed, job is still in the same
Running
state, distination file is not updated since then. Are there any retry policies for such network/data availability issues? For me it looks like there is couple of issues: • I am not notified on sync error • there is no job continuation - seems that I need manually reset connection?
u

user

09/24/2021, 2:42 PM
this is probably an issue with the file connector not failing when wifi dies
u

user

09/24/2021, 2:42 PM
if the connector failed/exited with code >0 then all the “right” things would happen
u

user

09/27/2021, 6:42 AM
Thanks. I believe network partition is not an unique thing to happen - isn't it expected to fail instead of hang? I will test the same in k8s deployment today - let's see if it differs
u

user

09/27/2021, 6:45 AM
As for container in docker deployment - yes, confirming, it is still alive:
Copy code
root@amorskyi-le-E5L9:~# docker ps | grep source
feb5280e090f        airbyte/source-file:0.2.6       "/airbyte/base.sh re…"   2 days ago          Up 2 days                                                                                      relaxed_poitras
a

Artem Astapenko

09/27/2021, 6:46 AM
Copy code
root@47c2376a4120:/app# ps aux | grep source-file
root       773  0.0  0.0 2088580 46580 ?       Sl   Sep24   0:07 docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/7/0 --network host --log-driver none airbyte/source-file:0.2.6 read --config source_config.json --catalog source_catalog.json
root      1597  0.0  0.0   4832   884 pts/0    S+   06:44   0:00 grep source-file
u

user

09/27/2021, 6:47 AM
Copy code
# strace -ff -p 14528   # it is corresponding PID from host
u

user

09/27/2021, 6:48 AM
Copy code
pid 27390] epoll_pwait(4, [], 128, 0, NULL, 824636977728) = 0
[pid 14553] epoll_pwait(4,  <unfinished ...>
[pid 27390] epoll_pwait(4,  <unfinished ...>
[pid 14553] <... epoll_pwait resumed> [], 128, 0, NULL, 0) = 0
[pid 14553] futex(0xc00007f150, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 14548] <... nanosleep resumed> NULL) = 0
[pid 14548] futex(0x5636776c27d8, FUTEX_WAIT_PRIVATE, 0, {tv_sec
u

user

09/27/2021, 6:48 AM
Seems like endless wait for a futex release over some stale resource (probably tcp socket?)
u

user

09/27/2021, 9:47 AM
The same behavior is with K8S deployment
s

Sawyer Waugh

09/27/2021, 7:36 PM
yeah I suppose that’s not surprising. The team is pretty subscribed on shipping airbyte cloud at the moment so the next steps here should be: 1. create an issue with repro steps if possible 2. we will get to it in mid/late october or we are happy to accept a PR at the moment if you are interested in creating one
u

user

09/28/2021, 7:24 AM
Thanks @s - I will create github issue today
3 Views