Hi every one, I'm running Airbyte at my company i...
# ask-community-for-troubleshooting
j
Hi every one, I'm running Airbyte at my company in order to sync SalesForce (SF) data into BigQuery (BQ). Airbyte was our first choice as it seems to be the simplest, easy to go, and most cost effective solution available on the market in order to implement ELT workflows. However we ran into some troubles synchronizing "large" amounts of data from SF, and we couldn't find out where the issue comes from (SF APIs ? Airbyte SF connector ? Airbyte itself or one of its components such as Temporal ? etc...). Observed Behaviors: • SF -> BQ sync : work perfectly fine with smaller tables • SF(Opportunity) -> BQ : initial sync fails after a few days with the message "job cancelled" (no data committed to BQ) • SF(Opportunity) -> GoogleStorage(parquet) : initial sync failed after a few days with replication error (data committed to GCS in parquet files, but a lot of duplicates as the destination doesn't support DBT normalization and deduped history). • SF(Opportunity, start-date=2022-01-01) -> BQ : initial sync finished successfully this morning (but we only sync a year worth of data). My Question: • Have you any ideas (or well documented patterns 😄 ) on how we could implement batch ELT with Airbyte (ie: sync multiple time-bound batch of data, either in parallel or sequentially). It could be very interesting to be able to sync to the same destination the last month of data, then syncing going back in time month per month (or other time units). • I feel like it would be nice to be able to use Cloud Composer (or other workflow orchestration tools) to programmatically create time-bound connection (From start_date to End_date) to Airbyte, but it seems Airbyte enforce the pattern of 1 time full sync (followed by smaller incremental syncs). Our deployment: • docker-compose deployment on GCP Compute Engine (single VM) • VM: n2-standard-4 ( 4vCPU, 16Gb RAM, 30Gb disk storage) • SF Connector v1.0.23 • BQ Connector v1.2.5 Some informations about our Data: • Table: Opportunity • Row count : 12,5M • SF storage size : 23,9GB (not so much) • Fields : over 500 (300+ user defined fields, + SF hidden system fields).
s
Hey, have you seen the Scaling Airbyte doc? And yes, even incremental syncs will sync all data when they are first run. Some other connectors do have an
end_date
on some streams. Could you submit a request in the airbyte repo so we can look into whether it's possible to add this feature?
j
Thx @Sunny Hashmi (Airbyte) for your response. Yes I have read this documentation (but I will read it again, I might have miss something ^^) I'll try to submit my request, thx 👍