Dear Team. Could you please point me to correct re...
# contributing-to-airbyte
a
Dear Team. Could you please point me to correct research direction before I went to deep into the wrong one. I have simple job (read 18 lines CSV from HTTPS -> default normalization -> save to postgres). Made run in 2 modes • local docker-compose, dev run took 7 seconds • local minikube run (hot one, all images - source, dest, normalization - are pulled already) - took 78 seconds. Hotspots are source and destination worker pods workloads, ~ 30 sec each Both job logs are attached in a thread. Could you please advice, where should I look in K8S deployment to understand/debug this performance issue
u
Events table for destination pod:
Copy code
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  5m39s  default-scheduler  Successfully assigned dmesh/destination-postgres-worker-2-0-rpwsf to minikube
  Normal  Pulled     5m38s  kubelet            Container image "busybox:1.28" already present on machine
  Normal  Created    5m38s  kubelet            Created container init
  Normal  Started    5m38s  kubelet            Started container init
  Normal  Started    5m36s  kubelet            Started container remote-stdin
  Normal  Created    5m36s  kubelet            Created container main
  Normal  Started    5m36s  kubelet            Started container main
  Normal  Pulled     5m36s  kubelet            Container image "alpine/socat:1.7.4.1-r1" already present on machine
  Normal  Created    5m36s  kubelet            Created container remote-stdin
  Normal  Pulled     5m36s  kubelet            Container image "airbyte/destination-postgres:0.3.11" already present on machine
  Normal  Pulled     5m36s  kubelet            Container image "alpine/socat:1.7.4.1-r1" already present on machine
  Normal  Created    5m36s  kubelet            Created container relay-stdout
  Normal  Started    5m36s  kubelet            Started container relay-stdout
  Normal  Pulled     5m36s  kubelet            Container image "alpine/socat:1.7.4.1-r1" already present on machine
  Normal  Created    5m36s  kubelet            Created container relay-stderr
  Normal  Started    5m35s  kubelet            Started container relay-stderr
  Normal  Pulled     5m35s  kubelet            Container image "curlimages/curl:7.77.0" already present on machine
  Normal  Created    5m35s  kubelet            Created container call-heartbeat-server
  Normal  Started    5m35s  kubelet            Started container call-heartbeat-server
u
Events for source pod:
Copy code
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  6m43s  default-scheduler  Successfully assigned dmesh/source-file-worker-2-0-zftxi to minikube
  Normal  Pulled     6m42s  kubelet            Container image "busybox:1.28" already present on machine
  Normal  Created    6m42s  kubelet            Created container init
  Normal  Started    6m42s  kubelet            Started container init
  Normal  Created    6m40s  kubelet            Created container relay-stdout
  Normal  Created    6m40s  kubelet            Created container main
  Normal  Started    6m40s  kubelet            Started container main
  Normal  Pulled     6m40s  kubelet            Container image "alpine/socat:1.7.4.1-r1" already present on machine
  Normal  Pulled     6m40s  kubelet            Container image "airbyte/source-file:0.2.6" already present on machine
  Normal  Started    6m40s  kubelet            Started container relay-stdout
  Normal  Pulled     6m40s  kubelet            Container image "alpine/socat:1.7.4.1-r1" already present on machine
  Normal  Created    6m40s  kubelet            Created container relay-stderr
  Normal  Started    6m39s  kubelet            Started container relay-stderr
  Normal  Pulled     6m39s  kubelet            Container image "curlimages/curl:7.77.0" already present on machine
  Normal  Created    6m39s  kubelet            Created container call-heartbeat-server
  Normal  Started    6m39s  kubelet            Started container call-heartbeat-server
u
K8S outbound networking is ok, so not a network issue:
Copy code
root@airbyte-server-79dfbcf4d6-nw75l:/tmp# time curl  <https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv>
"Name",     "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
"Jake",       "M",   32,       69,      143
"Kate",       "F",   47,       69,      139
"Luke",       "M",   34,       72,      163
"Myra",       "F",   23,       62,       98
"Neil",       "M",   36,       75,      160
"Omar",       "M",   38,       70,      145
"Page",       "F",   31,       67,      135
"Quin",       "M",   29,       71,      176
"Ruth",       "F",   28,       65,      131


real    0m1.209s
user    0m0.021s
sys     0m0.009s
a
hey @Andrey Morskoy we are aware of the performance issues in kube. Take a look at this issue for more details.
u
Thank @Subodh (Airbyte) - I will inspect linked issue
u
Actually, based on that fact, that in K8S run version, in job logs, there is stable 30 sec interval after workload, just before pod dies, I am thinking on some timeout or watchdog or stonth effect present - let be check it. Example of this
30 sec
of silence after
PostgresDestination
is done:
Copy code
2021-09-13 08:48:40 INFO  2021-09-13 08:48:40 [32mINFO[m i.a.i.d.p.PostgresDestination(main):87 - {} - completed destination: class io.airbyte.integrations.destination.postgres.PostgresDestination
2021-09-13 08:49:10 INFO  Exit code for pod destination-postgres-worker-2-0-rpwsf is 0
u
Yes, we cache a pod's previous status to avoid overwhelming the Kube api server with too many requests
u
we've been conservative about the setting, since realistically, it takes 10 - 15 secs for everything to wrap up, and we figured an additional 15 second wait time doesn't affect much. is this a big issue for you?
u
Hi @Davin Chia (Airbyte). For small-data scenarios, which are expected to be fast (docker-compose run from my example is satisfactory here) it looks like an issue for me. In that case I would prefer to have a possibility for calling process to finish early and have maybe async channel to capture errors which could happen during that last 30 sec. Or just let external caller to pull for the final status when it's ready if needed.
u
Please let me know if this looks reasonable. Anyway feel free to request more info if needed - I would be happy to help anyway
u
@Davin Chia (Airbyte) Also, for my research - could you please point me to places in code, where I could inspect that delays setup? I believe it is something around
KubePodProcess.java -> getReturnCode(Pod pod)
?
u
Perfect - thanks - will debug there ! Btw, @Davin Chia (Airbyte) - do you specific manual how do we debug workers JVM process (by mean ide debug, breakpoints)? I am searching for something like https://github.com/linkedin/datahub/blob/master/docs/docker/development.md#debugging
u
We have https://docs.airbyte.io/contributing-to-airbyte/developing-locally that has instructions on how to compile code locally
m
Nothing specific on how to attach a debugger to a remote process. We use generic Java tech, so googling for that will probably give you results
u
I have done it already. What it misses is: how to use this code in K8S launch mode and perform debug there
u
I may suppose that I could use
dev
in pod apply yaml instead of release version. But as for debug - I am not sure
u
You can add your own debug code, build the docker images locally, and launch them by specifying the
dev
tag in the Kube yamls
u
Debug code is clear. At this moment it is not clear where do I pass jvm debug params for worker process (aim is to make breakpoints in
KubePodProcess
and others in a worker). Do we support that - or should I add this possibility? (like editing Dockerfile)
u
I haven't tried to do remote debugging with Java on Kube so you would have to play around it yourself. Any JVM flags would be passed in inside that module's gradle file e.g. https://github.com/airbytehq/airbyte/pull/3389/files#diff-faefaa1f21201ee06f9ac02638c54f7a90486957fe34b245c8cbe4575ef0244eR9