Mahesh
03/01/2022, 10:22 AMMahesh
03/01/2022, 11:37 AMBERKIN
03/02/2022, 11:48 AMKemp Po
03/08/2022, 4:15 PMs3a://
instead of gs://
?
All default settings except compression codec = SNAPPYkonrad schlatte
03/10/2022, 12:08 PM2022-03-10 08:35:13 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Done retrieving results from 'sent' endpoint
2022-03-10 08:35:13 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Updating state.
2022-03-10 08:35:13 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Fetching sent from 2022-03-09T12:00:00Z to 2022-03-09T12:30:00Z
2022-03-10 08:35:13 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Making RETRIEVE call to 'sent' endpoint with filters '{'Property': 'EventDate', 'SimpleOperator': 'between', 'Value': ['2022-03-09T12:00:00Z', '2022-03-09T12:30:00Z']}'.
2022-03-10 08:35:13 ERROR () DefaultAirbyteStreamFactory(internalLog):88 - Request failed with 'Error: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.'
2022-03-10 08:35:13 ERROR () DefaultAirbyteStreamFactory(internalLog):88 - Traceback (most recent call last):
2022-03-10 08:35:13 ERROR () DefaultAirbyteStreamFactory(internalLog):88 - File "/usr/local/lib/python3.7/site-packages/tap_exacttarget/__init__.py", line 135, in do_sync
I can resolve this by reducing the "pagination window" from 30 minutes to 5 minutes for example. i.e. it appears that at this time interval there is too much data that needs to be processed - hence the timeout. I am wondering whether there is another way to handle this error.
There is an outstanding Pr for this connector as well https://github.com/airbytehq/airbyte/pull/10026.Oluwapelumi Adeosun
03/11/2022, 6:52 AMrefresh source schema
. Is this a bug or how can I ensure all the tables from the source
are loaded into the destination
I specified?
The source is a PostgreSQL DB running on Amazon RDS.Gary K
03/11/2022, 7:29 AMnumber -> double precision
conversion that appears to be happening with the postgres connector (0.3.15 in airbyte 0.35.42-alpha)?
I've got a mysql source bigint column stored with full precision in the _airbyte_data json, but the normalization is converting it to a double and I'm losing precision 😱
(Note, I'd rather not have to do a custom normalisation (from raw) of all the connection streams manually; ie no heavy lifting on my part if possible 🏋️)Connor Francis
03/11/2022, 10:51 PMmoe, larry and curly.
All three of these source schemas have the same table called stooges
. My destination would only have a single schema called public
and I would like all three sources to dump into the same stooges
table in this destination schema; however, I would like to add an additional text column in the destination table called source_schema which would take on the value of moe, larry and curly
.Adam Schmidt
03/14/2022, 6:53 AMmy-org%2fsome-subgroup
as the group ID (which works!)Nahid Oulmi
03/14/2022, 10:32 AM-memory
parameter seems a good option but I am not sure if it works well within Airbyte deployments : https://docs.docker.com/config/containers/resource_constraints/#limit-a-containers-access-to-memory
• The Airbyte specific parameter JOB_MAIN_CONTAINER_MEMORY_LIMIT
works at a job level if I am not mistaken ; As I don’t know how many Airbyte jobs can be triggered at the same time, if I have 10 jobs consuming only 1GB of RAM at the same time it will cause the same issue, which is why I would prefer to set a global RAM threshold.
What do you think would be the best option ?Arash Layeghi
03/14/2022, 2:00 PMNitin Jain
03/14/2022, 3:17 PMRobert Andrews
03/14/2022, 4:32 PMFilipe Araújo
03/14/2022, 5:43 PMMadhup Sukoon
03/14/2022, 6:04 PMerror validating data: unknown object type "nil" in Secret.data.postgresql-password
I'm trying to get it to run with an external AWS RDS PGSQL DB. I Have defined the following params:
postgresql.enabled
externalDatabase.host
externalDatabase.user
externalDatabase.existingSecret
externalDatabase.existingSecretPasswordKey
externalDatabase.database
I have not defined externalDatabase.password
(because I want it to take the password from the secret) and the port number (The default should be correct.)
Any ideas where I might be going wrong?William Graham
03/14/2022, 6:44 PMOwen Kephart
03/14/2022, 8:24 PMstreamName
field in the jobs/get response. Intuitively, I expected this name to be the same as the name
field for the matching source in the syncCatalog
of connections/get, but it seems that streamName
actually includes the prefix, while name
does not. So for example, if I had a connector with a prefix of foo
, if streamName
would be foo_actions
, while name
would be just actions
.Aditya Rane
03/15/2022, 1:17 AMOctavia Squidington III
03/15/2022, 7:03 AMOctavia Squidington III
03/15/2022, 8:07 AMKevin Soenandar
03/15/2022, 9:52 AMcompanies
table's ticket associations would have the following value:
My expectation is it should create a separate table once ingested into my Snowflake warehouse with[ {"company_id": <some_value>, "ticket_id": <some_value>}, {"company_id": <some_value>, "ticket_id": <some_value>} ]
company_id
and ticket_id
as the fields, per this documentation. However, this is not the case. Any idea what I'm missing here?Keshav Agarwal
03/15/2022, 10:30 AMOctavia Squidington III
03/15/2022, 11:12 AMBrian Soares
03/15/2022, 12:18 PMNitin Jain
03/15/2022, 12:22 PMINSERT
replica strategies, data is being synced but the pipeline is very slow. Looking at the docs we changed the replica strategy to COPY
via giving the s3 credentials in the redshift destination. In COPY
replica strategy, csv files are being written on s3, but only some partial data is being inserted into our redshift db. In the exmaple below, you can see pipeline read 39,100 records, I verified 4 different csvs were written on s3 one having 16252
records, another one having somewhere around 22k records, another one with 2k records.But the number of records written to redshift db is around 16301
. I have seen this if the multiple files are written to s3, only one of the file (randomly chosen ) is being synced with db. I m using full refresh | append mode for the pipeline. Attaching Image for better understandingJayesh Patil
03/15/2022, 1:07 PMMaxime Sabran
03/15/2022, 1:31 PMMichael Horvath
03/15/2022, 1:39 PMDrew Fustin
03/15/2022, 1:57 PMSaman Arefi
03/15/2022, 1:58 PMt2.large
instance and describe, in details, how Airbyte is mainly memory and disk bound.
I've been testing stuff out now on an t3.xlarge
and noticed the following:
Loading one large-ish Oracle table (~9GB, 7M rows) takes me about 30min, which I think is pretty good. Now, loading two at the same time via the same connector (9GB, 7M rows, 13 GB, 7M rows) takes an hour in total, with both taking up roughly an hour each.
What gives?
Looking at htop, I seem to be running more into a CPU limit as well, so I'm not sure what's causing this. These are my two largest table, but in production I'd use Airbyte for another 30 or so tables, each between 10k and 1M rows as well, so this doesn't seem to scale well. Or am I doing something wrong?