Dear Team Could you please point me to correct research dire Airbyte #contributing-to-airbyte

Dear Team. Could you please point me to correct re...

Andrey Morskoy

09/13/2021, 8:25 AM

Dear Team. Could you please point me to correct research direction before I went to deep into the wrong one. I have simple job (read 18 lines CSV from HTTPS -> default normalization -> save to postgres). Made run in 2 modes • local docker-compose, dev run took 7 seconds • local minikube run (hot one, all images - source, dest, normalization - are pulled already) - took 78 seconds. Hotspots are source and destination worker pods workloads, ~ 30 sec each Both job logs are attached in a thread. Could you please advice, where should I look in K8S deployment to understand/debug this performance issue

user

09/13/2021, 8:54 AM

Events table for destination pod:

Copy code

Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  5m39s  default-scheduler  Successfully assigned dmesh/destination-postgres-worker-2-0-rpwsf to minikube
  Normal  Pulled     5m38s  kubelet            Container image "busybox:1.28" already present on machine
  Normal  Created    5m38s  kubelet            Created container init
  Normal  Started    5m38s  kubelet            Started container init
  Normal  Started    5m36s  kubelet            Started container remote-stdin
  Normal  Created    5m36s  kubelet            Created container main
  Normal  Started    5m36s  kubelet            Started container main
  Normal  Pulled     5m36s  kubelet            Container image "alpine/socat:1.7.4.1-r1" already present on machine
  Normal  Created    5m36s  kubelet            Created container remote-stdin
  Normal  Pulled     5m36s  kubelet            Container image "airbyte/destination-postgres:0.3.11" already present on machine
  Normal  Pulled     5m36s  kubelet            Container image "alpine/socat:1.7.4.1-r1" already present on machine
  Normal  Created    5m36s  kubelet            Created container relay-stdout
  Normal  Started    5m36s  kubelet            Started container relay-stdout
  Normal  Pulled     5m36s  kubelet            Container image "alpine/socat:1.7.4.1-r1" already present on machine
  Normal  Created    5m36s  kubelet            Created container relay-stderr
  Normal  Started    5m35s  kubelet            Started container relay-stderr
  Normal  Pulled     5m35s  kubelet            Container image "curlimages/curl:7.77.0" already present on machine
  Normal  Created    5m35s  kubelet            Created container call-heartbeat-server
  Normal  Started    5m35s  kubelet            Started container call-heartbeat-server

user

09/13/2021, 8:54 AM

Events for source pod:

Copy code

Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  6m43s  default-scheduler  Successfully assigned dmesh/source-file-worker-2-0-zftxi to minikube
  Normal  Pulled     6m42s  kubelet            Container image "busybox:1.28" already present on machine
  Normal  Created    6m42s  kubelet            Created container init
  Normal  Started    6m42s  kubelet            Started container init
  Normal  Created    6m40s  kubelet            Created container relay-stdout
  Normal  Created    6m40s  kubelet            Created container main
  Normal  Started    6m40s  kubelet            Started container main
  Normal  Pulled     6m40s  kubelet            Container image "alpine/socat:1.7.4.1-r1" already present on machine
  Normal  Pulled     6m40s  kubelet            Container image "airbyte/source-file:0.2.6" already present on machine
  Normal  Started    6m40s  kubelet            Started container relay-stdout
  Normal  Pulled     6m40s  kubelet            Container image "alpine/socat:1.7.4.1-r1" already present on machine
  Normal  Created    6m40s  kubelet            Created container relay-stderr
  Normal  Started    6m39s  kubelet            Started container relay-stderr
  Normal  Pulled     6m39s  kubelet            Container image "curlimages/curl:7.77.0" already present on machine
  Normal  Created    6m39s  kubelet            Created container call-heartbeat-server
  Normal  Started    6m39s  kubelet            Started container call-heartbeat-server

user

09/13/2021, 9:00 AM

K8S outbound networking is ok, so not a network issue:

Copy code

root@airbyte-server-79dfbcf4d6-nw75l:/tmp# time curl  <https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv>
"Name",     "Sex", "Age", "Height (in)", "Weight (lbs)"
"Alex",       "M",   41,       74,      170
"Bert",       "M",   42,       68,      166
"Carl",       "M",   32,       70,      155
"Dave",       "M",   39,       72,      167
"Elly",       "F",   30,       66,      124
"Fran",       "F",   33,       66,      115
"Gwen",       "F",   26,       64,      121
"Hank",       "M",   30,       71,      158
"Ivan",       "M",   53,       72,      175
"Jake",       "M",   32,       69,      143
"Kate",       "F",   47,       69,      139
"Luke",       "M",   34,       72,      163
"Myra",       "F",   23,       62,       98
"Neil",       "M",   36,       75,      160
"Omar",       "M",   38,       70,      145
"Page",       "F",   31,       67,      135
"Quin",       "M",   29,       71,      176
"Ruth",       "F",   28,       65,      131


real    0m1.209s
user    0m0.021s
sys     0m0.009s

Andrey Morskoy

09/13/2021, 9:01 AM

hey @Andrey Morskoy we are aware of the performance issues in kube. Take a look at this issue for more details.

user

09/13/2021, 9:03 AM

Thank @Subodh (Airbyte) - I will inspect linked issue

user

09/13/2021, 9:16 AM

Actually, based on that fact, that in K8S run version, in job logs, there is stable 30 sec interval after workload, just before pod dies, I am thinking on some timeout or watchdog or stonth effect present - let be check it. Example of this

30 sec

of silence after

PostgresDestination

is done:

Copy code

2021-09-13 08:48:40 INFO  2021-09-13 08:48:40 [32mINFO[m i.a.i.d.p.PostgresDestination(main):87 - {} - completed destination: class io.airbyte.integrations.destination.postgres.PostgresDestination
2021-09-13 08:49:10 INFO  Exit code for pod destination-postgres-worker-2-0-rpwsf is 0

user

09/13/2021, 11:13 AM

Yes, we cache a pod's previous status to avoid overwhelming the Kube api server with too many requests

user

09/13/2021, 11:14 AM

we've been conservative about the setting, since realistically, it takes 10 - 15 secs for everything to wrap up, and we figured an additional 15 second wait time doesn't affect much. is this a big issue for you?

user

09/13/2021, 6:40 PM

Hi @Davin Chia (Airbyte). For small-data scenarios, which are expected to be fast (docker-compose run from my example is satisfactory here) it looks like an issue for me. In that case I would prefer to have a possibility for calling process to finish early and have maybe async channel to capture errors which could happen during that last 30 sec. Or just let external caller to pull for the final status when it's ready if needed.

user

09/13/2021, 6:41 PM

Please let me know if this looks reasonable. Anyway feel free to request more info if needed - I would be happy to help anyway

user

09/14/2021, 6:34 AM

@Davin Chia (Airbyte) Also, for my research - could you please point me to places in code, where I could inspect that delays setup? I believe it is something around

KubePodProcess.java -> getReturnCode(Pod pod)

user

09/14/2021, 6:37 AM

https://github.com/airbytehq/airbyte/blob/master/airbyte-workers/src/main/java/io/airbyte/workers/process/KubePodProcess.java#L111

user

09/14/2021, 6:40 AM

Perfect - thanks - will debug there ! Btw, @Davin Chia (Airbyte) - do you specific manual how do we debug workers JVM process (by mean ide debug, breakpoints)? I am searching for something like https://github.com/linkedin/datahub/blob/master/docs/docker/development.md#debugging

user

09/14/2021, 6:41 AM

We have https://docs.airbyte.io/contributing-to-airbyte/developing-locally that has instructions on how to compile code locally

Mohammad Safari

09/14/2021, 6:42 AM

Nothing specific on how to attach a debugger to a remote process. We use generic Java tech, so googling for that will probably give you results

user

09/14/2021, 6:43 AM

I have done it already. What it misses is: how to use this code in K8S launch mode and perform debug there

user

09/14/2021, 6:44 AM

I may suppose that I could use

dev

in pod apply yaml instead of release version. But as for debug - I am not sure

user

09/14/2021, 6:47 AM

You can add your own debug code, build the docker images locally, and launch them by specifying the

dev

tag in the Kube yamls

user

09/14/2021, 6:49 AM

Debug code is clear. At this moment it is not clear where do I pass jvm debug params for worker process (aim is to make breakpoints in

KubePodProcess

and others in a worker). Do we support that - or should I add this possibility? (like editing Dockerfile)

user

09/14/2021, 6:54 AM

I haven't tried to do remote debugging with Java on Kube so you would have to play around it yourself. Any JVM flags would be passed in inside that module's gradle file e.g. https://github.com/airbytehq/airbyte/pull/3389/files#diff-faefaa1f21201ee06f9ac02638c54f7a90486957fe34b245c8cbe4575ef0244eR9

Open in Slack

Previous Next