Avinash Mishra
03/12/2024, 9:15 AMRad Extrem
03/13/2024, 8:58 AMAmazon SQS
. I am trying to achieve this by calling the read()
function again and again.
• Amazon SQS has the option to delete messages on read, and it's a part of the configuration. It is clearly added in my configuration for source-amazon-sqs
.
• But, calling read()
again and again, somehow returns back the last read data, even if it supposed to have been deleted. I modified PyAirbyte
's source code to return the logs from the connector as well- and it says
Completed `source-amazon-sqs` read operation at 14:19:35.
Started `source-amazon-sqs` read operation at 14:19:37...
[
'Amazon SQS Source Read - stream is: testing_streaming_source',
'Amazon SQS Source Read - Creating SQS connection ---',
'Amazon SQS Source Read - Connected to SQS Queue ---',
'Amazon SQS Source Read - Beginning message poll ---',
'Amazon SQS Source Read - No messages recieved during poll, time out reached ---'
]
• So, I want to understand how might caching records work here, and how I can make this better for my use-case.AJ Steers (Airbyte)
03/13/2024, 11:11 PMCarsten Hohnke
03/15/2024, 2:01 PMErick
03/15/2024, 11:47 PMAJ Steers (Airbyte)
03/19/2024, 3:03 PM🐛 Fixes
• Resolve Windows compatibility issues (#100)
• Resolve issue where BigQuery caches fail to load on streams without a primary key, or when a table rename is required (#122)
AJ Steers (Airbyte)
03/22/2024, 4:00 AMSource.confic_spec
property and Source.print_config_spec()
method to get and/or print any connector's configuration spec (#80) - Thanks, @Tino Merl !
> • Allow incremental refresh in combination with the REPLACE
write strategy (#136)
> • Always pre-validate source config, with more clear messaging of validation failures (#134)Miguel Rodas
04/01/2024, 3:58 PMBindi Pankhudi (Airbyte)
04/01/2024, 6:07 PMSébastien Haentjens
04/02/2024, 8:03 AMTraceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/airbyte/sources/base.py", line 713, in read
cache.processor.process_airbyte_messages(
File "/usr/local/lib/python3.10/dist-packages/airbyte/_processors/base.py", line 174, in process_airbyte_messages
self.process_record_message(record_msg)
File "/usr/local/lib/python3.10/dist-packages/airbyte/_processors/sql/base.py", line 252, in process_record_message
self.file_writer.process_record_message(record_msg)
File "/usr/local/lib/python3.10/dist-packages/airbyte/_processors/file/base.py", line 167, in process_record_message
self._write_record_dict(
File "/usr/local/lib/python3.10/dist-packages/airbyte/_processors/file/jsonl.py", line 39, in _write_record_dict
open_file_writer.write(orjson.dumps(record_dict) + b"\n")
TypeError: Recursion limit reached
According to this issue, the weird TypeError is an actual Recursionerror
, but interestingly bumping the recursion limit with sys.setrecursionlimit(1000000)
didn’t change anything. I would be very surprised that the jira API returns records that are that much nested, so I would like to understand what is wrong here 🙏jeroen ophetweb
04/02/2024, 10:31 AMjeroen ophetweb
04/02/2024, 12:10 PMfrom airbyte.caches import PostgresCacheConfig, PostgresCache
and I try to run the rest of the code with the correct settings I get an error saying …
ab.caches.PostgresCache.Config(
TypeError: BaseConfig() takes no arguments
The same when I try with __config__
instead of Congig
.
It is clear I think I am a very beginner with Python so I might miss some obvious here. But if somebody can point me in the right direction I would very thankfull.
Regards,
JeroenAJ Steers (Airbyte)
04/04/2024, 12:09 AM_airbyte_raw_id
) and timestamp (_airbyte_extracted_at
). This release also adds a feature to auto-add new columns if they are missing from cache tables._
> 🚀 New Features
> • Add new internal Airbyte columns, aligned with Airbyte Destinations "v2" (#144)
> • Auto-add missing columns to Cache tables (#144)
> • New StreamRecord
class and public records
module (#166)
> 📖 Documentation Updates
> • Several new topics added to reference docs (#144, #170):
> ◦ Schema Evolution
> ◦ Table and Field Name Normalization
> ◦ Airbyte-Managed Metadata Columnsjeroen ophetweb
04/05/2024, 2:02 PMFROM python:3.11-slim
and in the requirements.txt I added both airbyte
and airbyte-source-jira
. Then I locally build the image and then ran it. In the build I see airbyte and airbyte-source-jira both being installed but when the python script is running it is installing airbyte-source-jira again, but in a seperate .venv environment. As it was locally also. The script is working correctly and updating the JIRA issues in the database. But I do not want it to install airbyte-source-jira on every run. How can I install airbyte-source-jira in a way that it won’t reinstall or install it next to the installed version? Thanks for any pointers.jeroen ophetweb
04/05/2024, 2:43 PMpip install airbyte
on a 3.12 environment I get error like the following. Just letting you know.
20.21 creating build/temp.linux-aarch64-cpython-312/src/snowflake/connector/nanoarrow_cpp/ArrowIterator/Util
20.21 creating build/temp.linux-aarch64-cpython-312/src/snowflake/connector/nanoarrow_cpp/Logging
20.21 gcc -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O3 -Wall -fPIC -Isrc/snowflake/connector/nanoarrow_cpp/ArrowIterator -Isrc/snowflake/connector/nanoarrow_cpp/ArrowIterator -Isrc/snowflake/connector/nanoarrow_cpp/Logging -I/usr/local/include/python3.12 -c src/snowflake/connector/nanoarrow_cpp/ArrowIterator/BinaryConverter.cpp -o build/temp.linux-aarch64-cpython-312/src/snowflake/connector/nanoarrow_cpp/ArrowIterator/BinaryConverter.o -std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0
20.21 error: command 'gcc' failed: No such file or directory
20.21 [end of output]
20.21
This is while using the docker hub python:3.12-slim
tag. With python:3.11-slim
it is working correctly.Damon Henry
04/11/2024, 1:30 PMdb_cache = DuckDBCache(db_path="geo.db")
source.read(cache=db_cache, streams="*", write_strategy=WriteStrategy.APPEND)
Traceback (most recent call last):
File "/Users/romanumero/project/scratchpad/google-ads-geo/genv/lib/python3.10/site-packages/airbyte/sources/base.py", line 740, in read
cache.processor.process_airbyte_messages(
File "/Users/romanumero/project/scratchpad/google-ads-geo/genv/lib/python3.10/site-packages/airbyte/_processors/base.py", line 200, in process_airbyte_messages
self.write_all_stream_data(
File "/Users/romanumero/project/scratchpad/google-ads-geo/genv/lib/python3.10/site-packages/airbyte/_processors/base.py", line 211, in write_all_stream_data
self.write_stream_data(stream_name, write_strategy=write_strategy)
File "/Users/romanumero/project/scratchpad/google-ads-geo/genv/lib/python3.10/site-packages/airbyte/_processors/sql/base.py", line 561, in write_stream_data
self._write_temp_table_to_final_table(
File "/Users/romanumero/project/scratchpad/google-ads-geo/genv/lib/python3.10/site-packages/airbyte/_processors/sql/base.py", line 727, in _write_temp_table_to_final_table
has_pks: bool = bool(self._get_primary_keys(stream_name))
File "/Users/romanumero/project/scratchpad/google-ads-geo/genv/lib/python3.10/site-packages/airbyte/_processors/sql/base.py", line 824, in _get_primary_keys
raise NotImplementedError(msg)
NotImplementedError: Nested primary keys are not yet supported. Found: {pk}
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/romanumero/project/scratchpad/google-ads-geo/googleads_run.py", line 20, in <module>
source.read(cache=db_cache, streams="*", write_strategy=WriteStrategy.APPEND)
File "/Users/romanumero/project/scratchpad/google-ads-geo/genv/lib/python3.10/site-packages/airbyte/sources/base.py", line 749, in read
raise exc.AirbyteConnectorFailedError(
airbyte.exceptions.AirbyteConnectorFailedError: AirbyteConnectorFailedError: Connector failed.
Sébastien Haentjens
04/11/2024, 4:01 PMFailed `source-greenhouse` read operation at 17:46:15.
Traceback (most recent call last):
File "/Users/sebastienhaentjens/Documents/pyairbyte/.venv/lib/python3.10/site-packages/airbyte/sources/base.py", line 740, in read
cache.processor.process_airbyte_messages(
File "/Users/sebastienhaentjens/Documents/pyairbyte/.venv/lib/python3.10/site-packages/airbyte/_processors/base.py", line 200, in process_airbyte_messages
self.write_all_stream_data(
File "/Users/sebastienhaentjens/Documents/pyairbyte/.venv/lib/python3.10/site-packages/airbyte/_processors/base.py", line 211, in write_all_stream_data
self.write_stream_data(stream_name, write_strategy=write_strategy)
File "/Users/sebastienhaentjens/Documents/pyairbyte/.venv/lib/python3.10/site-packages/airbyte/_processors/sql/base.py", line 555, in write_stream_data
temp_table_name = self._write_files_to_new_table(
File "/Users/sebastienhaentjens/Documents/pyairbyte/.venv/lib/python3.10/site-packages/airbyte/_processors/sql/snowflake.py", line 75, in
_write_files_to_new_table
self._execute_sql(put_files_statements)
File "/Users/sebastienhaentjens/Documents/pyairbyte/.venv/lib/python3.10/site-packages/airbyte/_processors/sql/base.py", line 629, in _execute_sql
raise SQLRuntimeError(msg) from None # from ex
airbyte._processors.sql.base.SQLRuntimeError: Error when executing SQL:
PUT 'file:///Users/sebastienhaentjens/Documents/pyairbyte/.cache/job_posts_01HV6X9KE5NHAW5R4VPA7N4WMB.jsonl.gz' @%job_posts_01hv6xc3bgypcpsh709wk1d4mf;
PUT 'file:///Users/sebastienhaentjens/Documents/pyairbyte/.cache/job_posts_01HV6XC3BG0Q5NY66CYW4QWSBX.jsonl.gz' @%job_posts_01hv6xc3bgypcpsh709wk1d4mf;
ProgrammingError(snowflake.connector.errors.ProgrammingError) 100132 (P0000): JavaScript execution error: Uncaught Execution of multiple statements failed on
statement "PUT 'file:///Users/sebastienha..." (at line 1, position 0).
Stored procedure execution error: Unsupported statement type 'PUT_FILES'. in SYSTEM$MULTISTMT at ' throw `Execution of multiple statements failed on statement
{0} (at line {1}, position {2}).`.replace('{1}', LINES[i])' position 4
stackstrace:
SYSTEM$MULTISTMT line: 10
[SQL: PUT 'file:///Users/sebastienhaentjens/Documents/pyairbyte/.cache/job_posts_01HV6X9KE5NHAW5R4VPA7N4WMB.jsonl.gz' @%%job_posts_01hv6xc3bgypcpsh709wk1d4mf;
PUT 'file:///Users/sebastienhaentjens/Documents/pyairbyte/.cache/job_posts_01HV6XC3BG0Q5NY66CYW4QWSBX.jsonl.gz' @%%job_posts_01hv6xc3bgypcpsh709wk1d4mf;]
(Background on this error at: <https://sqlalche.me/e/14/f405>)
This seems related to the fact that Snowflake doesn’t support running multiple PUT
queries at once. I guess the fix would be fairly easy here since we would only need to change those lines to run the queries sequentially. Does that sound reasonable to you? Should I open a PR for this?jeroen ophetweb
04/12/2024, 9:23 AMTraceback (most recent call last):
File "/app/./src/jira.py", line 1, in <module>
import airbyte as ab
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/airbyte/__init__.py", line 11, in <module>
from airbyte import caches, cloud, datasets, documents, exceptions, results, secrets, sources
File "/usr/local/lib/python3.11/site-packages/airbyte/caches/__init__.py", line 5, in <module>
from airbyte.caches import bigquery, duckdb, motherduck, postgres, snowflake, util
File "/usr/local/lib/python3.11/site-packages/airbyte/caches/motherduck.py", line 23, in <module>
from airbyte.secrets import SecretString
File "/usr/local/lib/python3.11/site-packages/airbyte/secrets/__init__.py", line 6, in <module>
from airbyte.secrets import (
File "/usr/local/lib/python3.11/site-packages/airbyte/secrets/google_gsm.py", line 49, in <module>
from google.cloud import secretmanager_v1 as secretmanager
ImportError: cannot import name 'secretmanager_v1' from 'google.cloud' (unknown location)
Even when I revert it to my local postgres install that worked before it keeps failing with this meesage?
This is all local on a Mac but I fail to see why it needs google secrets.AJ Steers (Airbyte)
04/12/2024, 8:30 PMAJ Steers (Airbyte)
04/18/2024, 4:00 PMnyep
04/23/2024, 3:35 AMIn processing pipe GET_LEDGER_DETAIL_VIEW_DATA: extraction of resource GET_LEDGER_DETAIL_VIEW_DATA in generator ledger_report_resource caused an exception: HTTPSConnectionPool(host='<http://connectors.airbyte.com|connectors.airbyte.com>', port=443): Max retries exceeded with url: /files/registries/v0/oss_registry.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7ffa9300a740>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
PaoloF
04/23/2024, 1:14 PMPaoloF
04/23/2024, 1:16 PMException ignored in: <generator object VenvExecutor.execute at 0x0000024B7B506340>
Traceback (most recent call last):
File "E:\Python\projects\pyairbyte\.venv\Lib\site-packages\airbyte\_executor.py", line 397, in execute
with _stream_from_subprocess([str(connector_path), *args]) as stream:
File "E:\Python\_versions\3.11.4\Lib\contextlib.py", line 155, in __exit__
self.gen.throw(typ, value, traceback)
File "E:\Python\projects\pyairbyte\.venv\Lib\site-packages\airbyte\_executor.py", line 128, in _stream_from_subprocess
raise exc.AirbyteSubprocessFailedError(
airbyte.exceptions.AirbyteSubprocessFailedError: AirbyteSubprocessFailedError: Subprocess failed.
Run Args: ['E:\\Python\\projects\\pyairbyte\\.venv-source-google-analytics-data-api\\Scripts\\source-google-analytics-data-api.exe', 'discover', '--caolof\\AppData\\Local\\Temp\\tmp1g_idg59']
Exit Code: 1
Exception ignored in: <generator object VenvExecutor.execute at 0x0000024B7BB00E40>
Traceback (most recent call last):
File "E:\Python\projects\pyairbyte\.venv\Lib\site-packages\airbyte\_executor.py", line 397, in execute
with _stream_from_subprocess([str(connector_path), *args]) as stream:
File "E:\Python\_versions\3.11.4\Lib\contextlib.py", line 155, in __exit__
self.gen.throw(typ, value, traceback)
File "E:\Python\projects\pyairbyte\.venv\Lib\site-packages\airbyte\_executor.py", line 128, in _stream_from_subprocess
raise exc.AirbyteSubprocessFailedError(
airbyte.exceptions.AirbyteSubprocessFailedError: AirbyteSubprocessFailedError: Subprocess failed.
Run Args: ['E:\\Python\\projects\\pyairbyte\\.venv-source-google-analytics-data-api\\Scripts\\source-google-analytics-data-api.exe', 'spec']
Exit Code: 1
PaoloF
04/23/2024, 6:37 PMAJ Steers (Airbyte)
04/23/2024, 7:26 PMAJ Steers (Airbyte)
04/23/2024, 7:27 PMPaoloF
04/23/2024, 8:02 PMnyep
04/24/2024, 2:07 AMfatima afa
04/25/2024, 3:29 PMKing Ho
04/26/2024, 9:16 AMairbyte.exceptions.AirbyteConnectionSyncTimeoutError: AirbyteConnectionSyncTimeoutError: An timeout occurred while waiting for the remote Airbyte job to complete.