My source lever hiring connector is reading data at 1000 rec Airbyte #ask-community-for-troubleshooting

Join Slack

My source-lever hiring connector is reading data a...

# ask-community-for-troubleshooting

Shivam Kapoor

11/03/2022, 12:59 PM

My source-lever hiring connector is reading data at 1000 records every 5 mins. Is there anyway to speed this up ?

Shivam Kapoor

11/03/2022, 1:54 PM

I am running this setup on k8s. I tried increasing the replicas for worker from 1->3 and limits of CPU and memory are ~ 3gigs. But I still see 1000 records being read every 5 mins. The destination is Redshift and it is using the COPY strategy via s3.

Shivam Kapoor

11/04/2022, 10:18 AM

Has anyone used this lever-hiring connector before ? not really able to find the bottleneck. Same result on my local docker setup as well.

Shivam Kapoor

11/05/2022, 12:09 PM

@Saj Dider (Airbyte) do you know someone who can help here?

Shivam Kapoor

11/06/2022, 7:20 PM

2022-11-06 104527 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):354 - Records read: 61000 (42 MB) 2022-11-06 110643 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):354 - Records read: 62000 (43 MB) 2022-11-06 113424 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):354 - Records read: 63000 (44 MB) 2022-11-06 115604 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):354 - Records read: 64000 (44 MB) 2022-11-06 121802 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):354 - Records read: 65000 (45 MB) 2022-11-06 124354 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):354 - Records read: 66000 (46 MB) 2022-11-06 125318 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):354 - Records read: 67000 (47 MB) These are some ridiculous timestamps…

Sunny Hashmi (Airbyte)

11/08/2022, 11:49 PM

Hey there, did you check out the Scaling Airbyte document? If you attach the full sync log I can look into any bottlenecks.

Shivam Kapoor

11/09/2022, 7:25 AM

Untitled.txt

Shivam Kapoor

11/09/2022, 7:25 AM

@Sunny Hashmi (Airbyte) this is the log ^^

Shivam Kapoor

11/09/2022, 7:26 AM

I have read the scaling doc & after reading it I have increased the replicas for worker and individual memory/cpu for every worker.

Shivam Kapoor

01/04/2023, 11:23 AM

Hi folks, I am still trying to debug this issue, any help would be appreciated :)

Axel Waserman

03/22/2023, 7:13 AM

Hi Team, I’m running Airbyte OSS on EKS and have the exact same problem as @Shivam Kapoor with the Lever hiring connector. The worker pod takes almost 3min to pull 1000 records. See logs below :

Copy code

2023-03-21 20:07:06 INFO i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$7):385 - Records read: 1000 (1 MB)
2023-03-21 20:09:57 INFO i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$7):385 - Records read: 2000 (3 MB)
2023-03-21 20:12:48 INFO i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$7):385 - Records read: 3000 (5 MB)
2023-03-21 20:15:39 INFO i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$7):385 - Records read: 4000 (7 MB)
2023-03-21 20:18:34 INFO i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$7):385 - Records read: 5000 (9 MB)
2023-03-21 20:21:30 INFO i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$7):385 - Records read: 6000 (11 MB)

Just to make sure it wasn’t a scaling issue, I scaled up my nodegroup to 3 nodes and tried syncing again. Even with the lever worker pod being the only pod running on an m5.large instance, the issue was still there. I think I identified the root cause. I caught this in the logs :

Copy code

2023-03-22 01:19:40 INFO i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$7):385 - Records read: 81000 (86 MB)
2023-03-22 01:20:47 source > Backing off _send(...) for 5.0s (airbyte_cdk.sources.streams.http.exceptions.DefaultBackoffException: Request URL: <https://api.lever.co/v1/opportunities/783f1562-1f8a-4d98-96bc-d09a1ef960b3/applications?limit=50>, Response Code: 500, Response Text: Internal Server Error)
2023-03-22 01:20:47 source > Caught retryable error 'Request URL: <https://api.lever.co/v1/opportunities/783f1562-1f8a-4d98-96bc-d09a1ef960b3/applications?limit=50>, Response Code: 500, Response Text: Internal Server Error' after 1 tries. Waiting 5 seconds then retrying...

The url has

limit=50

in it so it seems to be pulling records 50 per 50. Therefore, pulling 1000 records requires sending 20 HTTP requests. 3min = 180sec so this means 1 call takes 180/20 = 9s That’s still pretty bad performance … Did anyone have a similar issue with any source connector ?

Open in Slack

Previous Next