Airbyte #feedback-and-requests

Andik Achmad

02/19/2022, 2:54 AM

Hi Airbyte Team, hope you guys good as always. I am currenty using airbyte to sync from postgres to another postgres using FULL REFRESH OVERWRITE mode. It seems that airbyte created a lot of temporary table for a single table sync (as show on the pic) and filling up my disk spaces. Please let me know maybe there is a setting i missed, or how to overcome this. Thank you in advance.

George Xing

02/19/2022, 6:28 AM

Hi Airbyte team, are you using Temporal for dbt transformations (as opposed to dagster, airflow, etc). We were looking into Temporal as well for some of our data pipeline use cases and was wondering about your experience.

Muhammad Al Ghifari

02/19/2022, 6:43 PM

Hi airbyte team, im tring to connect to facebook marketing, but error happen. it say this: FacebookAPIException(‘Error: 2635, (#2635) You are calling a deprecated version of the Ads API. Please update to the latest version: v13.0.’) what should i do? thank you

Paul-Matthieu Riolacci

02/22/2022, 1:51 PM

Hi airbyte team, hope you guys are doing good 🙂 We've been using clickhouse as a destination for our connections for a few weeks now We wanted to add custom transformation to the connection but the option doesn't seem to be available with clickhouse - did I miss something or is the feature not implemented yet ? I'm trying to see if we should implement our own solution to transform data from airbyte instead

matt_innerspaceio

02/22/2022, 2:57 PM

Echoing the requests for

timescaledb

support, given their logical replication issue. Strategically, timescale has a managed service which presumably has trouble onboarding larger databases (mine included) because the onboarding strategy requires a pgdump/restore, or code changes on the customer side. I'm sure there is a way to figure it out with them. For context, I'm running a few 3TB postgres/timescale dbs that I'd love to move over to timescale managed services, but there is no way to do it without significant downtime.

James Jefferies

02/22/2022, 3:24 PM

We’ve been looking at using Airbyte but we’ve hit a problem using Postgres in a Heroku private space as a destination. If the Postgres connector could run with SSL & provide the option to use client side certificates then it would facilitate this use case. Would be great if this feature could be added!

Abrar ul farhan Mohammed

02/22/2022, 11:44 PM

Hi Airbyte Team, Just wanted to confirm if Airbyte team has SSDLC?

Clovis Masson

02/23/2022, 8:28 AM

Hi everyone 👋 ! We have currently multiple connections running since a few months. From time to time, we need to update them to add one or more new table to the integration and to do so, we export the Airbyte config files and make updates on some

.yaml

files before re-uploading it. Problem here is that these yaml files are getting bigger and bigger (especially

JOBS.yaml

and

ATTEMPTS.yaml

) and it's getting complicated to browse easily each to make my changes. What could be a solution to trim these files safely without causing side effects ? For instance, I would have deleted all inactive connection from files, as well as old job but I don't want to break or to loose my existing data sync. Is there any existing tool/solution for cleaning / trimming these config files ?

Aakash Kumar

02/24/2022, 12:23 PM

Hi Community, For google ads connector, Performance Max campaigns rows are not coming in the data.

Andy Yeo (Airbyte)

02/24/2022, 1:22 PM

Re-posting here in case we have some contributors here eager to share feedback with us! https://airbytehq-team.slack.com/archives/C019WEENQRM/p1645708923462759

Tony Hu

02/24/2022, 10:27 PM

Hey AirByte team, about Kafka-as-source connector, i am curious if there are any roadmap and timeline to make is more streaming fashion instead of batch way? experiencing some memo insufficient issue when testing it and notice the code is using an infinite loop to persist all the data in memo then dump to destination (https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/s[…]main/java/io/airbyte/integrations/source/kafka/KafkaSource.java) , another question is about the version I can see from {my airByte}/settings/source has latest as

0.1.4

but changelog in https://docs.airbyte.com/integrations/sources/kafka only has 0.1.3, is it expected? thanks

Huib

02/25/2022, 10:54 AM

Hey team! Since I’m not a java developer I won’t submit a PR for this myself, but I noticed something which I think really makes your lives a lot harder than it should. For a few destinations, we write simple files. All of these destinations currently implement the same code for all the file formats that are supported (see, for instance, https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/d[…]tination/azure_blob_storage/writer/ProductionWriterFactory.java) The downside of this approach is that even though we have (for instance) parquet output for S3, we don’t have it for azure blob storage. even though all the code is already written to support this IMO it would save a lot of effort if there were a single “remote file destination” that handles all the type conversions, the schema, etc, and then let S3, blob, ftp, … deal with the interface to the storage itself.

Imane

02/25/2022, 1:51 PM

Hey Airbyte team, about the

google_ads

connector, I found that the

labels

returned in the

campaigns

and in

ad_group_ads

stream do not have the right data. Do you know why? I compared it with the values shown in the UI. I created this issue.

VMaldonado

02/25/2022, 10:21 PM

Hello to everyone, I am currently using Airbyte connector to ingest data to an S3 bucket, I am looking for if it is possible to change the output path at every sync automatically by date, for example, to create a new folder with the execution date at every sync, because actually I see that the data for every execution is loaded in the same folder. Thanks in advance.

Lucas Wiley

02/25/2022, 11:02 PM

Hi, I don't know if this is the case for all versions, but the dox/tooltip seems to be incorrect on what the correct destination connection parameters are for Snowflake. I am on

_0.35.36-alpha_

on an AWS Linux machine. I created a pull request referencing @ethan's suggestion here.

Lucas Wiley

02/25/2022, 11:03 PM

Referring to this, which only needs to be

<http://account.snowflakecomputing.com|account.snowflakecomputing.com>

Vikram Bhamidipati

02/26/2022, 2:35 AM

Hello, I am trying to set up Airbyte on EKS and would like the logging to go to S3. The CloudStorageConfig for S3 has a couple of issues that is making this hard for us. It expects an Access Key and Secret Access Key to be configured and is required per this. Also it uses

AwsBasicCredentials

. This is not allowed per our security policy, we need to use IAM roles or in case of EKS can use IRSA. Issue # 5282 addresses this in some way. I am wondering if this can be enhanced to use the DefaultCredentialsProvider. This will allow the logging to be more generic and work everywhere irrespective of how the AWS credentials are configured. Q - can I submit changes as a PR to this core functionality or does this need to be handled by the Airbyte Team ?

Saurabh Mathur

02/28/2022, 6:56 AM

Hi there, I just started exploring Airbyte and realized that Airbyte depends on cursor field for incremental updates. My source is MongoDB and I was wondering why it can't use oplog to determine the delta instead (or at least optionally). Do let me know if this is something that might be pursued over time.

Anatole Callies

02/28/2022, 9:21 AM

Hi, Would it be possible to keep the changelog of connectors up to date ? Currently BigQuery and CGS changelogs don't show the latest version : https://docs.airbyte.com/integrations/destinations/gcs#changelog (latest version should be 0.1.24 instead of 0.1.22) https://docs.airbyte.com/integrations/destinations/bigquery#bigquery (latest version should be 0.6.10 instead of 0.6.8)

William Phillips

02/28/2022, 3:17 PM

When should airbyte be HIPAA certified?

Vinny Tunnell

03/01/2022, 1:11 AM

Hey guys, is there any way to output multiple files with an AWS S3 destination? I notice in the docs it says "Currently, each data sync will only create one file per stream. In the future, the output file can be partitioned by size. Each partition is identifiable by the partition ID, which is always 0 for now." My use case is that I am trying to sync a massive Azure Table storage table (over 1 billion rows) to S3, and a single file will not be efficient to work with once in S3. I'd also like to see the output files come in to S3 as the sync runs, so I can make sure the data is coming through correctly. Right now I am just seeing the following in the logs:

Copy code

...
2022-03-01 01:07:45 INFO i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):300 - Records read: 14455000
2022-03-01 01:07:46 INFO i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):300 - Records read: 14456000
2022-03-01 01:07:47 INFO i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):300 - Records read: 14457000
2022-03-01 01:07:47 INFO i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):300 - Records read: 14458000
...

If multiple output files is not possible, is there any way I can at least see staged data that has been processed so far? My desired output format is Parquet with SNAPPY compression.

Anatole Callies

03/01/2022, 1:31 PM

Hi, Has anyone setup cron jobs to trigger airbyte syncs ? If so, should I do it on the VM where Airbyte is deployed or is there any benefit to do it from the outside via the API ? If the latter, then I guess I need to setup an SSH tunnel to access it ?

Remi Salmon

03/01/2022, 4:57 PM

Hello! The changelog for Snowflake destination is missing for 0.4.15 and 0.4.16: https://github.com/airbytehq/airbyte/blob/master/docs/integrations/destinations/snowflake.md#changelog - any idea where we can find them?

Paul Cothenet

03/02/2022, 4:27 AM

Is there a plan to allow a timestamp to be set manually for a column? • Use case: I have a table that's too big and somehow fails the postgres to BigQuery replication (a problem I would like to fix) • The best way I've found to fix this is avoid replicating older records. I do this by directly updating the state table in the airbyte backend • Every time I resync the full schema, I have to reset that timestamp manually • I would love to be able to set it as part of the

streams

definition

Sohit Kumar

03/02/2022, 6:59 AM

Hi Team ,We are exploring to use hosted airbyte solution and I am trying to do incremental sync from BigQuery to Snowflake . I understand that we need to set

cursor

column for this. Is there any workaround if I have tables in bigquery which does not have column which is incremented every-time we update or create record.

Oliver Franz

03/02/2022, 7:39 AM

Hi there! Is there any timeline associated with the logical replication functionality of the oracle DB connector?

Keren Guz

03/02/2022, 10:45 AM

Anyone from the commercial team that can help with adding credits to our account ASAP ?????

Joaquin Oroño Bugnon

03/02/2022, 1:00 PM

Hello Guys, I'm new on Airbyte and congratulations for this product, i want to integrate several sources but there is one without source connector (Salesforce Marketing Cloud), i was reading there is a way to integrate singet taps in airbyte, does anyone can share some info? Thanks a lot!

hendrik

03/02/2022, 9:38 PM

hey all, I have a question that could be a suggestion: Does Airbyte have any sort of program to connect Airbyte users with developers? Many companies seem to have some sort of agency partner program to help connect users who need some custom work done with agencies who are familiar with the product. Personally, I’m on the side of being a user looking for dev resources for connector development. Posted about my current request here. I see that there’s this top contributors list on the website, but it’s not clear if these are people who would be willing/interested in taking on some freelance work.

Jin Gong

03/02/2022, 10:07 PM

Hi team, I am using Jitsu to integrate with Airbytes to connect to Zoom. Based on Zoom documents to get

registrant_id

report_meeting_participants

response you need to pass the fields to query param

include_fields

. I don't think this is supported by Airbytes. Could anyone help with this request? I am happy to contribute but not sure where to get started 🙏