Airbyte #feedback-and-requests

Boggdan Barrientos

12/20/2021, 9:56 PM

Hello, I would like your recommendation and advice. We move your information to our customer with ELT tools such as Airbyte. The type of information we move to our customer is wide, and in many cases it changes over time, can be corrected, or there is no set schedule, because it is sales/purchase information. We have several flows created in Airbyte, a problem that arises and we want to avoid is when the synchronization is done, and there is missing information, how to get the missing information without having to reprocess all the information. Our flows are configured as incremental, but our cursors are dates like 2021-12-20 or some are date and time 2021-12-20 000000 but we gave us there is no order, the loading date to the source has no order and does not always reflect the amount of new data. Ex: We synchronized the information of 2021-12-20 but the information of one location is missing, they upload the information and the cursor is the same, how to get only that information if the change is not reflected in the cursor. What options do we have on our side? I thought about modifying the cursor directly from the database and extracting but I want to hear your comments on this. Thanks.

Matt Freeman

12/21/2021, 6:47 AM

I've just been briefly testing Hevo Data cloud and found some horrors from a privacy/security aspect so I've canned that experiment, for example on pipeline source and destination setup of a pipeline they send the information as you use the page to smartlook and intercom (although intercom ajax might be timer based and perhaps not based on each form input/blur), the implications of this are concerning for example if someone is configuring an endpoint that is publically accessible with just a set of credentials (not our case but still concerning) you then have to worry about them leaking elsewhere. I see the dreaded Intercom on the airbyte homepage page and just want to make sure there is none of this sillyness here - i.e. there is currently no third party watchers or RUM crap that listening for each form event on setup pages or otherwise in the Airbyte admin UX/UI (homepage I can forgive) - or if there is this can be opted out - and more importantly (since I can verify the first step) that marketing doesn't have some GTM access to add their own third parties randomly at a later date.. I guess as I write this I thinking maybe open source version and self hosted is better suited to us

Roy Peter

12/21/2021, 11:58 AM

Hi Team, We are facing this error with Salesforce integration https://github.com/airbytehq/airbyte/issues/6021

Sudhendu Pandey

12/21/2021, 5:04 PM

Hi there! just getting started on Airbyte and loving it already. I work as a consultant and would be recommending Airbyte to one of my client soon. I was wondering what and if any is the flexibility to first start with open source core version and then move to Cloud down the line in future? Is that even a feasible approach in terms of redoing everything vs 'lift and shift' approach from core to cloud? Thanks 🙂 👋

Nick Miller

12/22/2021, 8:19 PM

Hello, all 👋 ! Loving Airbyte so far. I’m currently prototyping some of our implementations. Specifically planning to use the S3 destination & Incremental Sync to hold the cloned data pending some custom additional processing+analytics. The data size is a concern (ideally I’d be able to cap at X MB or Y records per file) which is currently not supported but the docs do call it out

Currently, each data sync will only create one file per stream. In the future, the output file can be partitioned by size. Each partition is identifiable by the partition ID, which is always 0 for now.

It would be amazing if anyone could point me to updates / roadmaps / possibility of contributing / ability to advocate for this feature. We’ll be using

Incremental - Append Sync

, So any concerns could be mitigated by just skipping/manually doing the initial sync, and then manually setting the cursor field to start at current time, I’m curious if that’s possible/appropriate at all given my file size concern?

developersteve

12/22/2021, 11:57 PM

Just wondering if theres plans for a security channel, did some scans yesterday and found some config issues particularly with some of the container images

Abhijat

12/23/2021, 11:32 AM

Hello Team, we are just getting started with Airbyte. My developers are finding it difficult to get tokens for FB Ads and Google Ads. Any help on this would be appreciated.

Miguel Diaz

12/23/2021, 8:48 PM

Hi, would it be possible to update the helm chart version when there are changes?

Sudhendu Pandey

12/24/2021, 1:54 PM

Back again 🙂 ! Busy last few weeks of the year for me evaluating some of the EL/T tools around. Airbyte is definitely my top favorite one now 😍, but it does lack few important success criteria important for my use-case. Before I discuss this with my clients, I would like to take opinion of experts here on what they think of below (Source MYSQL, Target: Snowflake) : 1. Configure WHERE clause to filter and bring in only required data octavia thanks Nay!! 2. Exclude certain columns from the source table (in compliance with PII) octavia thanks Nay!! 3. Get delta information without any created/updated Date columns 4. Incremental CDC for tables without primary key 5. Support for Full Refresh / Incremental Sync - 👋 Ye!! 6. Tables and Views both replicated - 👋 Ye!! 7. Airbyte Cloud availability timeline - octavia thanks Nay!! 🤶🎅🎄 Merry Christmas 🤶🎅🎄

Arnold Cheong

12/27/2021, 2:29 AM

Hello team! Referring to the issue on ignoring records too big for Redshift here, what is the recommended solution to address these ignored records? We’d still want them to end up in Redshift eventually, so wondering whether anybody has ideas on how to address this?

Titas Skrebė

12/28/2021, 3:09 PM

hello, it would be great if operator-guide would have a section on rollbacks. For example, when they are possible, limitations, good practices, etc.

Tyler Buth

12/28/2021, 6:58 PM

Can we get

Accounts

on Stripe added: https://stripe.com/docs/api/accounts

Paul Cothenet

12/29/2021, 9:53 PM

Going through the docs for Postgres and CDC (https://docs.airbyte.io/integrations/sources/postgres#setting-up-cdc-on-aws-postgres-rds-or-aurora) -> a lot of the links to headers lower in the document are broken (e.g. https://docs.airbyte.io/integrations/sources/postgres#setting-up-cdc-on-aws-postgres-rds-or-aurora)

Zach Rait

12/30/2021, 6:20 AM

Hi--is there any update on schema change support? I don't actually even care if it doesn't work properly with basic normalization, but I'm evaluating Airbyte for my company and having to do a full resync on any schema change makes this a non-starter. I've seen a bunch of github issues and discussions in this slack about this talking about this happening in Q4, but it's difficult for me to see what the exact current state/roadmap is.

Paul Cothenet

12/30/2021, 7:30 PM

I was able to get set-up pretty easily on EC2 and running my first replication tasks. 🎉 I thought the docs were generally helpful. I'm pointing out a few things that I think could be improved. Most might have been mentioned before so feel free to ignore!

Paul Cothenet

01/01/2022, 6:27 PM

UX requests for connection setups: • Connection: postgres to bigquery • Context: I am replicating about 50 tables , with some of them failing for reasons unknown. Therefore, I am adding tables one by one. • Ask: When refreshing the schema in the UI: ◦ Do not select all new tables by default ◦ Remember which tables were deliberately not selected ◦ Do not force resets if I'm only adding new tables • Current workaround: ◦ I'm using the API to add my tables one by one (with a combination of

/v1/sources/discover_schema

and

/v1/connections/update

)

Jens

01/04/2022, 8:45 AM

Update to the below feedback: Our frustration was homemade. Our data sporadically contained NaN values which made postgres fail silently. We propose to increase logging verbosity somehow, to reveal such issues. Original feedback: Hi there, we evaluated Airbyte for our company, developed a custom connector for an initial test and are now stuck where records are fetched from the source but only a fraction of them are ingested into postgres. We have a hard time to debug what is going on. We were hoping that Airbyte could replace our quite stable and performant Python ETL scripts for the better. But from what I have experienced so far is, that Airbyte will need to exit Alpha state, before we can use it in production. My question: There are so many users here... is Airbyte running stable for those of you with larger setups? I have a feeling, that it will be an awesome tool soon, but not at the moment yet.

Deepak Shah

01/04/2022, 11:13 AM

Does anyone have any estimates on how much data is read for egressing the data out of Google BigQuery (BQ)? We want to move data out of BQ to self hosted ClickHouse (CH) Assuming my BQ data is 1TB, will Airbyte read 1 TB or marginally over that? (Assuming I am running an incremental sync).

Jeff Crooks

01/04/2022, 3:08 PM

Is there any way to scan more than 10k documents in the Mongo connector?

Jason Ofua

01/04/2022, 5:01 PM

Hi all, i deployed airbyte to gcp and tried using the api instead of the gui to create a source and destination, following the api docs was great but when it came to creating the source and destinations the docs was not helping for instance i wanted to use bigquery as my destination but there was not docs on the required field if you want to use the api. I had to run the request knowing it would fail but also the error response shows me the required field

Ayyoub Maulana Hadidy

01/05/2022, 6:09 AM

Hi, Airbyte is a great tool to ingest from various source, hopefully we can integrate Twitter Marketing to target database using Airbyte in the next patch

Anatole Callies

01/05/2022, 10:27 AM

Hi, How safe is the upgrading process ? Will it automatically revert if it fails ? I have airbyte v 0.32-5 deployed with docker-compose on AWS and want to follow these steps to upgrade it: https://docs.airbyte.com/operator-guides/upgrading-airbyte#upgrading-on-docker But I want to make sure I won't lose the data in case it fails. Is it possible to run a dry run upgrade or to duplicate my instance to test the upgrade ? Thanks

Jens

01/05/2022, 5:21 PM

We start to realy like Airbyte. At the moment, we have already 35 connections configured. And there's more to come. So it would be great feature, to equip all list controls in the UI with some filter functionality so that the user can find items quickly.

Jonah K (Schickler)

01/05/2022, 5:47 PM

Hi all, I tried to use the new Google Firestore destination today but I couldnt find it in the list of destinations in the UI. Do I need to add it by myself? Many thanks 🙂

01/05/2022, 5:51 PM

I'm new here. I work for Avery Dennison IT and we're considering Airbyte and Airflow for POC implementation. I'm looking for information or sample code to setup Databrics as a destination as well as any sample code to setup MS SQL server to Parquet format extraction pipeline. If anyone can point me to the right place I'd greatly appreciate it. Thank you

Brandon Barclay

01/05/2022, 10:04 PM

I signed up to give feedback. But it didn't save to my calendar and it's not in my inbox. Dynamic emails.

Sean McKeever

01/06/2022, 1:12 AM

Hey, anyone have any tips on how to make the data transfer process faster? I have just recently set up my first connector between Stripe and BigQuery, and even being hosted on the GCP Compute Engine, it seems to take about 10 minutes for 10k rows. I have 500k rows in Stripe, so the whole process will take about 8 hours. I'm not sure if this is just to be expected or if there are ways to speed this up. For reference, I am using a "e2-standard-4" compute engine, and CPU utilization has only been hovering around 2.5% while the job has been running. Thanks!

Emeka Boris Ama

01/06/2022, 9:09 AM

where can i report bug or error

Ethan Veres

01/06/2022, 5:38 PM

Trying to connect to a large Redshift source with many schemas. is there a way to limit the schema discovery to only a single redshift schema?

Keshav Agarwal

01/07/2022, 3:14 PM

Pagination when setting up the connection. Having too many tables slows down the website a lot I think having some more would have crashed my browser. A bug : After searching for a table in the same Set up connection page - pressing backspace is not loading all the tables.