Hi Team, few records present in source are missing...
# troubleshooting
r
Hi Team, few records present in source are missing in destination Is this your first time deploying Airbyte: No OS Version / Instance: EC2, Ubuntu 20.04 Memory / Disk: 120Gb / 100GB SSD Deployment: EC2 Airbyte Version: 0.35.12-alpha Source name/version: Salesfroce 0.1.21 Destination name/version: Redshift 0.3.23 (s3 is used for staging data) Description: Records present in source is missing in destination
h
Hey can you help with more details on what are the records missing and do you see any pattern. Also can you share the job sync logs
r
Copy code
##Account table
Redshift
select count(*) from sf_airbyte_redshift_prod_v1."account" where isdeleted = false; 33511625

SF Bulk API
SF 33602898

33602898-33511625 = 91273 missing records
h
Is this just one table? If not can we reduce the scope by reducing the number of tables. If it's just one table can you try some other ways (reducing the start_date) so that we can scope it down and see what's the missing piece
r
Hi Harshith, we are syncing one table per connector with small incremental sync also data is missing, 50 records out of 20k records we use S3 for data staging and we do not see missing data there also
m
@Roy Peter could you do a test, disable the S3 staging and execute the same sync of the 20k. The number are equal?
This could help us isolating and understand the problem! thanks
r
we do not see any data mismatch in low frequency tables and in the initial load
We triggered a sync with below changes to validate our hypothesis (some inflight data is getting missed at the end of each sync)
m
we do not see any data mismatch in low frequency tables and in the initial load
This is running without S3 staging?
We triggered a sync with below changes to validate our hypothesis (some inflight data is getting missed at the end of each sync)
Any chances your Salesforce had a different Timezone and Airbyte is not recognizing it?
r
Update: Our findings We we looked at logs we and see a skip in
SystemModstamp
Logs
Copy code
2022-03-10 06:21:27 source > Setting state of Opportunity stream to {'SystemModstamp': '2022-03-10T03:51:07.000+0000'}
2022-03-10 06:21:29 source > Setting state of Opportunity stream to {'SystemModstamp': '2022-03-10T06:21:18.000+0000'}
In Salesforce few records are present with
2022-03-10T06:21:14.000Z
timestamp
@Sumit Mahamuni @Siddarth Ramaswamy @ACHINTA ROY
m
@Roy Peter looks there is a problem with S3 staging, please check discussion here: https://airbytehq-team.slack.com/archives/C01MFR03D5W/p1646747130928729 workaround is disable the s3 staging
r
@[DEPRECATED] Marcos Marx we tested it without s3 staging and data is still missing
Btw, above mentioned hack is working with a larger time overlap (3hrs)
m
Btw, above mentioned hack is working with a larger time overlap (3hrs)
editing the connector code?
r
yes