Joeri Smits
07/03/2025, 9:32 AMdocker run
I receive a success response using this config json
{
"provider": {
"bucket": "************",
"region": "eu-west-1",
"access_key_id": "******",
"secret_access_key": "**********",
"endpoint": ""
},
"path_pattern": "preprocessed/bolcom/product-feed_baby-v2/*.csv",
"format": {
"format_type": "csv",
"delimiter": "|"
},
"schema": "s3",
"dataset": "bolcom"
}
When I configure the same values in the Airbyte UI I however receive a 502 error. Anything I’m doing wrong here?Usman Pasha
07/03/2025, 9:38 AMKailash Bisht
07/03/2025, 11:21 AMSebastien vaudour
07/03/2025, 12:36 PMMax
07/03/2025, 4:26 PMSlackbot
07/03/2025, 7:48 PMVít Mrňávek
07/03/2025, 7:52 PMEuan Blackledge
07/03/2025, 8:04 PMEd
07/03/2025, 9:44 PMBlake
07/03/2025, 10:07 PMtotal_items
and page_count
are returned by every API query are consistent and correct.
• Looking at the results of the endpoint id
value that should largely be contiguous, I notice a large gap after some initial results.
Could you please let me know:
1. What mechanism does Airbyte use use to determine a stream has synced successfully? Could you point me to any Airbyte source code that is used to determine a sync is complete? The connector builder indicates the python requests library is used (the Request indicates a query header of "User-Agent": "python-requests/2.32.3") when I'm testing endpoints.
2. How can I increase the amount of info in Airbyte logs so I can see what query URLs were used in the api? I can't find any URLs or Track PMS API response information in the logs, so it's hard to know exactly what was sent to the Track PMS API.森亮介
07/04/2025, 12:40 AMCould not infer schema as there are no rows in year=2025/month=01/data.csv. If having an empty CSV file is expected, ignore this. Else, please contact Airbyte.
Problem Description: The sync job fails because it encounters a CSV file with no rows (e.g., year=2025/month=01/data.csv
), which prevents schema inference.
Key Context / What's Changed:
• Version Upgrade: This issue began occurring after upgrading Airbyte to version 1.7. Our previous Airbyte version handled empty CSV files in S3 without causing the sync to fail.
• Expected Behavior: We do expect some CSV files for certain periods to be empty (i.e., contain headers but no data rows) as there might be no data for those specific periods. In previous Airbyte versions, this did not lead to a job failure.
• File Verification: We've confirmed the specific file (year=2025/month=01/data.csv
) is indeed empty (0KB).
• Log Snippets: While the UI shows "Failed," the check phase logs still indicate "Check succeeded" and "Connector exited with exit code 0" for the initial check. The specific "Could not infer schema" error arises when the system attempts to process these empty CSVs during the actual sync.
Our Goal: We need to ensure that empty CSV files (which are an expected occurrence in our data source) do not cause the entire sync job to fail in Airbyte 1.7. We require guidance on how to configure Airbyte to handle such files gracefully, ideally by skipping them or creating an empty table/stream, without stopping the sync.
Could you please provide guidance on:
• What changes in Airbyte version 1.7 might have altered the handling of empty CSV files during schema inference or data processing?
• Are there any new or existing configuration options within the S3 source connector (or global settings) that can allow the sync to successfully proceed when encountering empty CSVs?
• What is the recommended approach for handling expected empty files in S3 sources with Airbyte 1.7 to prevent job failures?
We appreciate your prompt assistance in resolving this regression.Maria Ana Ortiz Botero
07/04/2025, 2:04 AMLeonardo Muñoz M.
07/04/2025, 5:21 AMaditya kumar
07/04/2025, 8:27 AMLui Pillmann
07/04/2025, 8:46 AMSYNC_JOB_MAX_TIMEOUT_DAYS
in the .env file?Affan Zafar
07/04/2025, 10:30 AMHari Haran R
07/04/2025, 10:36 AMDon Berkes
07/04/2025, 11:14 AMSujeet Yadav
07/04/2025, 11:18 AMVirginie Desharnais
07/04/2025, 12:29 PMKonathala Chaitanya
07/04/2025, 12:58 PMConfiguration check failed
Could not connect to the Iceberg catalog with the provided configuration. User: arn:aws:iam::413928405733:user/POS-170-spx is not authorized to perform: glue:GetTable on resource: arn:aws:glue:us-east-1:413928405733:table/testing_airbyte_connection/temp_1751633231304 because no identity-based policy allows the glue:GetTable action (Service: Glue, Status Code: 400, Request ID: fa81dd6f-6c39-43be-beb4-70750c71b5aa), root cause: AccessDeniedException(User: arn:aws:iam::413928405733:user/POS-170-spx is not authorized to perform: glue:GetTable on resource: arn:aws:glue:us-east-1:413928405733:table/testing_airbyte_connection/temp_1751633231304 because no identity-based policy allows the glue:GetTable action (Service: Glue, Status Code: 400, Request ID: fa81dd6f-6c39-43be-beb4-70750c71b5aa))
Konathala Chaitanya
07/04/2025, 1:00 PMConfiguration check failed
Could not connect to the Iceberg catalog with the provided configuration. User: arn:aws:iam::413928405733:user/POS-170-spx is not authorized to perform: glue:GetTable on resource: arn:aws:glue:us-east-1:413928405733:table/testing_airbyte_connection/temp_1751633231304 because no identity-based policy allows the glue:GetTable action (Service: Glue, Status Code: 400, Request ID: fa81dd6f-6c39-43be-beb4-70750c71b5aa), root cause: AccessDeniedException(User: arn:aws:iam::413928405733:user/POS-170-spx is not authorized to perform: glue:GetTable on resource: arn:aws:glue:us-east-1:413928405733:table/testing_airbyte_connection/temp_1751633231304 because no identity-based policy allows the glue:GetTable action (Service: Glue, Status Code: 400, Request ID: fa81dd6f-6c39-43be-beb4-70750c71b5aa))
do i need to create any table in glue manually ?Kothapalli Venkata Avinash
07/04/2025, 1:15 PMDarko Macoritto
07/04/2025, 1:37 PMWarning from destination: com.google.cloud.bigquery.BigQueryException: Incompatible table partitioning specification. Expected partitioning specification interval(type:day,field:_airbyte_extracted_at) clustering(id,_airbyte_extracted_at), but input partitioning specification is interval(type:day,field:_airbyte_extracted_at) clustering(_airbyte_extracted_at)
Why ? What can I do ?Sree Shanthan Kuthuru
07/04/2025, 2:15 PM>> response.text'<!DOCTYPE html>\n<html>\n<head>\n<title>Error</title>\n<style>\nhtml { color-scheme: light dark; }\nbody { width: 35em; margin: 0 auto;\nfont-family: Tahoma, Verdana, Arial, sans-serif; }\n</style>\n</head>\n<body>\n<h1>An error occurred.</h1>\n<p>Sorry, the page you are looking for is currently unavailable.<br/>\nPlease try again later.</p>\n<p>If you are the system administrator of this resource then you should check\nthe error log for details.</p>\n<p><em>Faithfully yours, nginx.</em></p>\n</body>\n</html>\n'
Leonardo Muñoz M.
07/04/2025, 3:06 PMHadrien Lepousé
07/04/2025, 3:44 PMChris
07/04/2025, 5:31 PMSree Shanthan Kuthuru
07/04/2025, 5:40 PM>> res = requests.delete(cancel_job_url, headers=header)
>> print(res.json()){'status': 409, 'type': 'https://reference.airbyte.com/reference/errors#409-state-conflict', 'title': 'state-conflict', 'detail': 'State conflict', 'documentationUrl': None, 'data': {'message': 'Job is not currently running'}}
Hari Haran R
07/05/2025, 6:07 AM