Hello Does the migration to the mandatory intermediate versi Airbyte #troubleshooting

Hello, Does the migration to the mandatory interme...

Jorin

02/17/2022, 11:59 AM

Hello, Does the migration to the mandatory intermediate version

v0.32.0-alpha-patch-1

takes longer (upgrading from

v0.29.4

? Doing it in GKE and its been ~30mins. CPU/Mem being utilized but not seeing anything in logs so wondering. Data is not that high . Counts from tables (configs: 230, jobs: ~8000; attempts: ~100000)

Pras

02/17/2022, 12:09 PM

From slack search looks like it might take an hour. Wowza, I will wait and update here.

Pras

02/17/2022, 12:49 PM

1hr20mins and counting. Should I bump up memory and/or drop old data somewhere to make this faster? Not sure where time is being spent Pod is running with 1 cpu and 6gb mem. CPU is barely used and memory is almost fully used

Pras

02/17/2022, 1:15 PM

Ok double pod memory and running again. 🤞

Pras

02/17/2022, 2:03 PM

Hmm, 1hr mem keeps creeping up

Pras

02/17/2022, 2:05 PM

Logs if its helpful

Pras

02/17/2022, 2:06 PM

That migration dir mentioned in log inside container

Pras

02/17/2022, 2:08 PM

Config, Jobs, Attempts size from db ->

Copy code

560 kB,74 MB,262 MB

Pras

02/17/2022, 2:09 PM

Have 0 sync_state records in db, I assume that is the migration that is taking long there

Pras

02/17/2022, 2:39 PM

Hi @Pras No it should not take much time. But I haven't performed it on EKS or GKE. I have seen sometime that docker doesn't start properly, so you can delete the node and try to recreate it

Akshay Saini

02/17/2022, 3:38 PM

Ok that should not be issue GKE since it auto repairs nodes having such problems.

Akshay Saini

02/17/2022, 3:38 PM

Anyway it did not do anything for 2hrs so I rolled it back. Have to figure out a better way to upgrade later I guess

Akshay Saini

02/17/2022, 3:46 PM

If there is interest, I captured jstack output before killing it.

Akshay Saini

02/17/2022, 9:37 PM

@Pras it wouldn’t took so long; the patch version made the migration in smaller batches to not throw OOM errors when doing the process. If you have a huge historical syncs and a lot of connections, probably… can take longer 🙂

Akshay Saini

02/17/2022, 9:38 PM

As you showed in your previous post, 100k attempts.. if you can delete them to reduce the time of migration will helps a lot

Justin Reynolds

02/18/2022, 1:42 AM

How can I safely delete them when sync_state is not yet created/migrated if I don't want to lose the last cursor state and avoid full reset. Any recommendations?

Arasdan

02/18/2022, 11:12 AM

Just as update: Luckily we onboard all connections using a unique namespaceFormat from our side, so we used it to only keep the latest succeeded jobs and dropped others, then dropped all attempts where job_id does not exists anymore. Dropped rows to ~60 and migration went very fast, seconds. Hit with another problems on retrieving spec from custom connector, It was because of taints/tolerations, the spec job pod for that version was not adding tolerations I think so it was unschedulable. Ended up creating a temp node pool with just the labels and no tolerations and got through it.

Open in Slack

Previous Next