Airbyte #feedback-and-requests

Marco Fontana

07/21/2021, 1:08 PM

Hi all , have a feedback reg the connection screen. • When you browse the source schema for selecting the tables you want to replicate the list sometimes disappears when removing a filter; e.g.: i search for tables containing ML, i select that table, and then when i remove the filter the list disappears and i need to wait again for the list to be fetched from source. ◦ it's because the used has access to some public oracle tables.APEX, OLAPSYS that have a lot of tables.... -->> Would be nice to have the option to select the schema or something similar.

Alexandru Manolache

07/21/2021, 1:33 PM

Hi all! I just installed Airbyte on AWS and I am now extracting data from an on-prem Postgres database over a VPN connection, working smoothly! I am now looking for a way to extract a table from postgres and also include certain filter options (where statement) as I am required to not pull all data from the tables. Is it possible to include a where filter before capturing the data?

Nils de Bruin

07/21/2021, 3:47 PM

I see MongoDB source was updated which excites me greatly, however changelog shows the PR and comments from my Stripe fix 🙂 Any details on the change here in 0.3.3

Jeff Crooks

07/21/2021, 9:12 PM

Hello guys. We implemented CDC replication from Postgres to Snowflake, it works fine. Do you also have a mode in which the log events are transparently played, so the destination table mirrors the source table? We are aware we can further do it with Snowflake streams and tasks. But maybe you can already do it transparently with your internal dbt?

Cristian Scutaru

07/21/2021, 11:29 PM

Another question please: Is the SSH Tunnel Connection for Postgres source connector implemented today? It says "Coming Soon" in the online doc, but it's good to ask anyway.

Nils de Bruin

07/22/2021, 12:05 PM

I had a chat with @John (Airbyte) and promised him to share the following two items it would be nice if you would be able to start a pipeline with a certain selected docker container. In my case I need to set-up a VPN connection before extracting the data. I am now able to do this with the EC2 instance on which I am running Airbyte with docker-compose up. The docker containers make use of the underlying EC2 VPN connection, but I think a better model would be to build a docker container containing the VPN connection and then triggering a pipeline job with that specific docker container. At a client of mine we are now running Dagter and we made use of the Dagster deployment to ECS approach, which worked quite well. The interesting thing is that the deployment to ECS makes use of a docker-compose file and starts the different services in different containers, connecting them as needed. I can imagine that this would be a nice approach for you guys as well, see the example deployment: https://github.com/dagster-io/dagster/tree/master/examples/deploy_ecs

Nils de Bruin

07/22/2021, 2:44 PM

@[DEPRECATED] Marcos Marx - Looks like the Stripe schema has been updated to add more fields and tables... Of note, the "updated" column has been added, but I can't select for cursor field. Is there any way around this? Its coming across as string instead of int

Jeff Crooks

07/22/2021, 4:43 PM

is there a plan to change the connection sync frequency to allow specific times? generally more flexibility around at least when (eg. 4 AM) these 5min -> 24 hours are kicked off could gelp

Adhithya Ravichandran

07/22/2021, 8:31 PM

I have a few transformation tasks that I’m hoping Airbyte can handle and I’m looking for advice on how to approach things. Our current stack is mysql, dbt, and snowflake. (I’m trying to extract wordpress multisite data into a sane format.) 1. I have 1000s of tables with the same schema named like this wp_10_reg, wp_11_reg, wp_12_reg, that I want to combine into one table ‘reg’. 2. I have some data stored as PHP serialized format, I can use a python script to unserialize these. Would I leverage dbt for this? how? 3. I want to join table data together - I see Airbyte has basic transform tools, but then leans on dbt for the harder stuff. How would I accomplish all this with Airbyte and dbt? I’m guessing I’ll need to hire a consultant to help do this or give advice on how to do this. Any suggestions on how to find one?

Deryk Wenaus

07/22/2021, 9:24 PM

Does anyone have the super secret decoder ring for figuring out a salesforce client_id? I'm trying to connect to a sandbox with

<mailto:nate.atkins@company-domain--partibox.my.salesforce.com|nate.atkins@company-domain--partibox.my.salesforce.com>

I'm getting a

Response from Salesforce: {"error":"invalid_client_id","error_description":"client identifier invalid"}

Nathan Atkins

07/23/2021, 2:23 PM

Hello, One of my use cases is the following: we're aggregating data for different customers, and a lot of them deliver data daily by files. For now we'll use File sources and update the link manually every time we know we have new data, but I was wondering if that's maybe something that could be improved once Webhooks are implemented, namely: posting to a hook that a new file is available (with data about how to deal with it) on file upload? Or is it a different kind of webhook? Thanks!

Porter Westling

07/23/2021, 7:36 PM

This link is broken for me (link was on the source page for the mysql source) https://docs.airbyte.io/integrations/source/mysql

Boopathy Raja

07/25/2021, 6:23 AM

I want to understand something on the normalization and transformation in the MYSQL - BIGQUERY binlog model From the exploration, I found that from the raw tables the normalization and transformation under the Basic Normalization, is recreating the table for every pipeline run…, which may incur more analytical cost on BIGQUERY. Is there any optimization we can do over there?

Boopathy Raja

07/25/2021, 9:30 PM

I have Airbyte running my dbt transforms out of a git repository. Very cool! I’m running into a problem my get_custom_schema.sql jinja template isn’t being run. It works as expected when I run locally. Any thoughts would be helpful.

Nathan Atkins

07/27/2021, 7:19 AM

Hello all, It would be great to get an example tables (or just an example json) with 100 records or so (maybe latest day's data) before setting up historical sync for a connector. It would speed up data modelling since we wouldn't need to wait for a full sync. My current use case is that I want to compare Paypal's connector data with the manual export, to make sure that all the transactions captured on our analytics DB side. Usually we take one day and check all the records. But it is not trivial to get only one day's data fast. Cheers, Rytis

Rytis Zolubas

07/27/2021, 9:00 AM

Hi All I'm having a problem I want to edit the notification content, how should I handle it? please help me!

Lucky Boy

07/27/2021, 9:56 AM

👋 Hi everyone. I have a very specific usecase of Hosting Airbyte. We currently use Singer taps + target framework and are executing them with custom code via ECS + Fargate (AWS). 1. I was thinking if Airbyte could be customised to use the same or it will always use a single node architecture if we are not using Kubernetes. 2. How well Airbyte play with AWS EKS (Kubernetes) 3. How will Airbyte play with always running tasks (also known as services in ECS) Looking forward for hearing about your experiences. Open for any blog posts reading as well.

Sumit (Postman)

07/27/2021, 5:26 PM

Hi everyone! I'm trying to evaluate if Airbyte will work for some of our workflows and was hoping to get clarity on some things • Can we pass parameters from Apache Airflow to an Airbyte job? For example, if my workflow involves picking up "new" files from GCS, I can have Airflow figure out what files are new, but I'm not seeing a way to pass that filename to the Airbyte job. ◦ We also might have situations where the "new" file name/location could be triggered by a Pub/Sub message, which we can pick up with Airflow, but then need to pass the file name/location to Airbyte for processing.

Nicolas Grenié

07/28/2021, 8:53 AM

Hi, I was trying to sync data from my postgres to mysql db what was not able to complete it. Got this error

Copy code

ERROR () LineGobbler(voidCall):85 - Exception in thread "main" java.lang.RuntimeException: com.mysql.cj.jdbc.exceptions.MysqlDataTruncation: Data truncation: Invalid JSON text: "Invalid escape character in string." at position 49 in value for column '_airbyte_tmp_rvq_user._airbyte_data'.

Boopathy Raja

07/28/2021, 10:56 AM

Hi, the airbyte version=0.28.1-alpha is pretty buggy, won't let me enable/disable my database syncs and every time I try to modify my connection I cannot reset my schema, select/deselect tables from my schema or change my namespace configurations without starting over Was using the 0.22.0-alpha version previously and everything was fine

Qira Ahmad

07/28/2021, 12:28 PM

Hello everyone 😄 So, I am trying out the custom dbt transform step in airbyte and for that I have authenticated my private git repo. But there is one security concern, that when entering the github repo URL I have to enter like

username:<mailto:ACCESS_TOKEN@github.com|ACCESS_TOKEN@github.com>/URL

and when I am doing a GET call to fetch connection-ID details, the ACCESS_TOKEN is visible in the response. Ideally, it should be masked. Maybe if we can have separate fields for ACCESS_TOKEN and github repo URL, we can mask the access token in the response. Thank you 🙂

Jeff Crooks

07/28/2021, 9:56 PM

Hi Guys, I’m following along the CDK tutorial. And I noticed some links are broken. On Step 6: “To do this, we’ll need a ConfiguredCatalog” <- This link is broken. On Step 7: “Then, follow the instructions from the building a toy source tutorial” <- This link is broken.

Marvin

07/28/2021, 11:35 PM

Hey guys, my job looks ok in the log file (see the tail below). However, it triggered another attempt. Is this normal? 0.28.2-alpha

Copy code

...
2021-07-28 23:07:42 INFO () LineGobbler(voidCall):85 - 23:07:42 | Finished running 12 table models in 11.72s.
2021-07-28 23:07:43 INFO () LineGobbler(voidCall):85 - 
2021-07-28 23:07:43 INFO () LineGobbler(voidCall):85 - Completed successfully
2021-07-28 23:07:43 INFO () LineGobbler(voidCall):85 - 
2021-07-28 23:07:43 INFO () LineGobbler(voidCall):85 - Done. PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12
2021-07-28 23:07:43 INFO () EnvConfigs(getEnvOrDefault):302 - WORKER_ENVIRONMENT not found or empty, defaulting to DOCKER
2021-07-28 23:07:43 INFO () DefaultNormalizationWorker(run):77 - Normalization executed in 0.
2021-07-28 23:07:43 INFO () TemporalAttemptExecution(get):133 - Stopping cancellation check scheduling...

Matej Hamas

07/29/2021, 8:56 AM

This link on the website does not work (https://docs.airbyte.io/contributing-to-airbyte/python/tutorials/cdk-tutorial-python-http/8-test-your-connector) (points to https://docs.airbyte.io/tutorials/tutorials/building-a-python-source#step-8-set-up-standard-tests)

Matej Hamas

07/29/2021, 7:17 PM

May I have a suggestion... The Postgres connector lists a bunch of tables - from INFORMATION_SCHEMA I suspect - but this includes tables the user has no access to. We get a "Permission denied" runtime error and the job crushes. We have to read the log and then exclude/unselect one by one these tables, which is a tremendous process when you have 100+ tables. My suggestion is check also if the user can SELECT data from the tables you display in the Connection setup. And either completely hide the inaccessible tables, or at least unselect them by default, and show them with a different gray color. But most people deny access to tables to a user because they don't want that user to transfer data - so we should not have them in the list, at all. And this may apply to any other database source connector, not just for Postgres.

Alderson

07/30/2021, 2:31 PM

Hey guys, wondering what is the current status of Clickhouse as a destination? What's the latest news on it? Is it in active dev or far in the backlog? Are we close to something? Would it support incrementals? https://github.com/airbytehq/airbyte/issues/1903

Justin Leung

07/30/2021, 3:12 PM

Hi, I was looking into the internal Airbyte DB recently. I noticed that in

airbyte_configs.config_blob

, the raw credentials of sources/destinations are stored in plaintext. This is problematic as a lot of these sources/destinations require secret keys. Does Airbyte have any way to encrypt these credentials? If not, is there any recommended workaround?

07/30/2021, 3:59 PM

This threshold of "about 100 employees" - what is that based on? (if anyone knows)

Alistair Henderson

07/31/2021, 10:34 AM

Hi As a brand new user of this we have what could be considered a strange request. Our ERP system based on Oracle has a concept of an information scheme where the users of system can create views. The only issue is that this scheme user has access to all the objects in the database. The obvious issue being when the discover happens it fails and even when you have fixed the issue which you have already acknowledge it will still be very slow. Could you adapt the OracleDB to use another way of identifying the views and tables using oracles internal. This would bring the count down to 2400 in this case Alistair

Igor Sechyn

08/01/2021, 9:17 AM

Hey folks 👋 not sure if it is a feedback or request or just a general enquiry, but I was wondering if anyone else has the same use case. We are aiming to use the platform in terms of

Powered by airbyte

, where our customers would be able to define multiple integrations for different sources. IIUC right now, if two customers want to connect 5 different google analytics views that are synced into 5 different s3 buckets, i would have to create 10 different google analytcis

sources

, a number of different s3

destinations

(I suppose i could be clever and reuse some of destinations by using namespaces and prefix) and 10 different

connections

. Was wondering if anyone has experience using airbyte in such scenario and whether a huge amount of entities can become a problem down the road. If so what is the max capacity of a single deployment