Airbyte #ask-community-for-troubleshooting

Khristina Rustanovich

10/31/2021, 1:04 PM

Is this your first time deploying Airbyte: Yes OS Version / Instance: n1-standard-2 Memory / Disk: 30Gb Deployment: GCP (Compute Engine) Airbyte Version: 0.30.19-alpha Source name/version: NA Destination name/version: NA Description: When I deployed the first time on GCP the link in GCP terminal redirected me to the url like https://8000-8d954b67-b5e4-4836-acfa-0327d882a3b4.cs-europe-west4-bhnf.cloudshell.dev/onboarding instead of https://localhost/8000. I set up a connection and it is running. The question: should I use the link in cloud shell to edit, set up new pipelines and forget about https://localhost/8000 from the docs https://docs.airbyte.io/deploying-airbyte/on-gcp-compute-engine? If I loose this link can I recreate it without loosing current setup (pipelines, sources)?

👀 1

✅ 1

Thien Pham

11/01/2021, 8:29 AM

Hello team, I just hear about Airbyte some days ago, but cannot find anything describe the relationship between Airflow & Airbyte. I'm imaging that we should create an Airflow DAG to enable an Airbyte Job, and this job will run by Airbyte Scheduler & Worker. However if we do something like that, we will have Airflow Webserver, Scheduler, Worker & Airbyte Webserver, Scheduler, Worker. Do we have redundancy Scheduler & Worker? Or doing something like that will have some pros that I didn't notice?

👀 1

Blake Enyart

11/01/2021, 1:54 PM

Is there documentation or recommendations from Airbyte on how to parallelize a single data source? I have a MS SQL source to Snowflake connection with about 2 billion rows which is unable to complete in under 18 hours which as about the max time I’m able to connect to the MS SQL server to perform the backfill. The whole system is running on AWS EC2 at the moment with a 2 core instance, 16GB memory, and 100GB of EBS storage.

✅ 1

Blake Enyart

11/01/2021, 9:18 PM

This is potentially a dumb question, but is there a way to add a single new table to a connection without requiring a reset of the data? Am I missing something here about managing connections/schema updates on sources?

✅ 1

Jacob Roe

11/02/2021, 12:06 AM

Hi All, I have been setting up airbyte to test it out. Is there a terraform provider or a way to manage the config in git rather then the UI

👀 1

✅ 1

Khristina Rustanovich

11/03/2021, 6:05 AM

Is this your first time deploying Airbyte: Yes OS Version / Instance: n1-standard-2 Memory / Disk: 30Gb Deployment: GCP (Compute Engine) Airbyte Version: 0.30.19-alpha Source name/version: Klaviyo, Shopify Destination name/version: BigQuery Description: we set 3 pipelines. 2 Shopify and 1 Klaviyo. The last log record for Klaviyo is for yesterrday:

2021-11-02 08:53:11 INFO () DefaultAirbyteStreamFactory(internalLog):98 - Backing off _send(...) for 5.0s (requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))

The sync is running. In Bigquery there are only temp tables. The last log record for a Shopify pipeline is

2021-11-02 20:17:09 INFO () DefaultAirbyteStreamFactory(internalLog):98 - Backing off _send(...) for 5.0s (requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))

In Bigquery there are only temp tables. The third pipeline is running and logs are updated every minute. The source datasets is a million or 2 million of rows expected for historical sync. The question is: what to do with pipelines with the errors above? Should I change settings and run again?

👀 1

Ismail

11/03/2021, 3:12 PM

If I have multiple connections setup to run every 24 hours on the same airbyte ec2 instance, will airbyte try to run them all at the same time or wait until one is complete before running the next?

✅ 1

Stephan Claus

11/05/2021, 8:36 AM

Hi everyone, I have a question and I am not sure what is the preferred channels for this. Please advice where I should post it: We are currently experimenting with the google ads connector and we add

login_customer_ID

which is our a) master account ID in google ads and

customer id

which is b) a specific sub-account id for which we download the reports from. As we have a lot of these b) sub-accounts we are wondering if we can bulk load all sub accounts with one connection or do we need to set up a single connection for every sub-account? Thank you for your support, appreciate any guidance!

Frank Mayo

11/05/2021, 4:37 PM

Hello, My question is about database user permissions. I have a existing postrges db that I am connecting airbyte to. Currently airbyte is using the root db user and I would like to create a new user with minimal permissions for airbyte to use. Is there a document somewhere that lists the minimum permissions a postgres db user would need so airbyte will run correctly? My first attempt at trying this wasn’t successful.

✅ 1

gunu

11/07/2021, 10:22 PM

When using CDC configuration, occasionally the connector needs to be paused and when attempting to continue syncing, the source DB binlogs are no longer available, requiring a

reset

and full sync, which is painful for large connectors. How do people get around this? What is the correct configuration, do people persist binary logs indefinitely.

Tony Mao

11/08/2021, 4:48 AM

Hi everyone, interesting problem. I have identical Postgres databases in the US and UK and I want to do multi-master replication across the databases (i.e. both databases sync and both have read-write access). I am thinking of using CDC with Airbytes to create a master database, so the data flow would look something like (updates from US Postgres) -> |master database| <- (updates from UK Postgres). I would then do (US database receives updates from UK) <- |master database| -> (UK database receives updates from US) through CDC. I'm not sure if this would work since if I configured the master database with CDC it would log updates from the source database, and then propagate the updates back to the same database, creating duplicates. Is there any way around this? I think Airbytes might not be the solution for this.

Preetam Balijepalli

11/08/2021, 5:26 AM

Can you point me to Airbyte Rest API documentation

Deividas J

11/08/2021, 4:13 PM

Hi, is it possible to change time and date format in the UI?

✅ 1

Dawid Karczewski

11/08/2021, 5:02 PM

Hi! Do you have any advice on how to setup dev->prod pipeline? I have Airflow + dbt project that I can run locally, develop all the stuff needed and then commit to GitHub and it will update the production environment on the server. Everything is stored in code and env files. I'd like to add Airbyte to that stack and ideally I'd like to schedule everything from Airflow, including Airbyte connections. I managed to do that on my local env, but I don't see an easy way to recreate connections the production server aside from manually dumping the config export and importing it on prod (I've seen connectionIds are part of the backup, so those shouldn't change so DAGs in Airflow won't break). That's not ideal, cause I'd like to be able to eventually hand over project to a different developer (or maybe even develop in parallel) and we would have to sync the config manually each time. Another solution is to use Airbyte separately from Airflow (just like we do now with Segment), but that means we'd have yet another tool to check for errors and Airflow scheduling wouldn't be as utilized as I'd like. I could also call the same prod Airbyte instance from both dev and prod Airflow instances, but that could create some conflicts. Is there any way to put source, destination and connections definitions in GitHub alongside rest of the code? Or maybe there's another way I don't see to build ELT suite using those tools? Any pointers on how to set up something that is easily redeployable, versionable and can be worked on efficiently would be appreciated.

Prateek Gupta

11/09/2021, 5:43 AM

hey, I am using a psql to psql pipeline with the latest verions and am getting the following error, Can anyone tellme what it means 2021-11-09 054245 WARN () ActivityExecutionContextImpl(doHeartBeat):153 - Heartbeat failed io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 9.999991794s. [closed=[], open=[[remote_addr=airbyte-temporal/172.20.0.7:7233]]]

👀 1

dasol kim

11/09/2021, 6:45 AM

Hi all, I got an error while following this tutorial, https://docs.airbyte.io/operator-guides/transformation-and-normalization/transformations-with-dbt. and I modified

modules-path: "../dbt_modeuls"

in dbt_project.yaml file, referring to https://github.com/airbytehq/airbyte/issues/5590.

Copy code

dbt deps --profiles-dir=$NORMALIZE_DIR --project-dir=$NORMALIZE_DIR
Running with dbt=1.0.0-b2
* Deprecation Warning: The `data-paths` config has been deprecated in favor of `seed-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
Encountered an error while reading the project:
  ERROR: Runtime Error
  at path []: Additional properties are not allowed ('modules-path' was unexpected)
Error encountered in /home/dskim/airbyte/dbt_customizing/normalization-files/normalize/dbt_project.yml
Encountered an error:
Runtime Error
  Could not run dbt

I don't know why this error occurs. please help.

Andreas

11/09/2021, 8:52 AM

Hi! We are in the process of setting up Airbyte Docker via Pulumi on a GCP compute engine and there are some open questions: • How would we persist state in case of updating the docker image or replacing the instance? I saw this ticket (https://github.com/airbytehq/airbyte/issues/3605) for setting up an external config db, but one reply suggests, this won't be enough for persisting state? • What would you recommend for setting up the initial config with basic sources and destinations. Are there options to do this via Pulumi as well? Thanks for your help!

👀 1

✅ 1

Alam

11/09/2021, 11:36 AM

Hi, I used AirByte to copy MongoDB data into Snowflake in RAW format, after finishing successful copy of the DB I noticed the records are not in pure Json format instead it contains "Document" key word for multiple items inside array. This keyword is creating issue while transforming it. Any help would be appreciated about how to properly handle such data and formatting in Snowflake.

👀 1

Slackbot

11/09/2021, 8:02 PM

This message was deleted.

✅ 1

Srivallabh

11/10/2021, 11:56 AM

Hi, I wanted to ask whether it is possible to selective sync few data from a table in postgres?

✅ 1

Kyle Phillips

11/10/2021, 6:52 PM

Hi all. My company is looking into create a common data model off of parquet files in S3. Can Airbyte output Parquet files into another S3 bucket?

✅ 1

Srivallabh

11/11/2021, 11:35 AM

How do I add database connection details while calling airbyte APIs?

👀 1

Andrew Greenburg

11/11/2021, 3:45 PM

We’re looking to use airbyte to aggregate data files of the same type from different sources into the same destination, but we need to be able to tag the data with which source it came from. Is it possible to get source metadata added to the destination? Anyone have a similar workflow to this?

Madhup Sukoon

11/12/2021, 6:19 AM

Hey guys, I am using DBT via Airbyte to run some transformations and generate docs. Is it possible to move the generated docs to S3? If not, I intend to extend the DBT Docker image to facilitate S3 upload of docs. Can you explain how the DBT integration works with Airbyte (what parameters are passed to the DBT container, how is the repo made available etc.) - Might make for a good contribution 🙂

✅ 1

Laza Nantenaina

11/12/2021, 1:48 PM

Hey, I trying to run Airbyte locally and got this error `error: FetchError: request to http://localhost:8000/api/v1/source_definition_specifications/get failed, reason: connect ECONNREFUSED 127.0.0.1:8000`while all services are up and I can access it via the web interface. Can anyone help me please? thx

✅ 1

Ignacio Aranguren Rojas

11/12/2021, 5:48 PM

Hey all 🙂 I just joined a company to help build the DataPlatform. We are planning to do the data ingestion soon into the DatWarehouse (Probably BigQUery or Snowflake). The challenge is that the data is stored in a SQL server with some tables having >1billion rows. Just the customer table have 21M rows (I know its a bit crazy that they did not have a DWH before). Airbyte seems like a really good option in comparison with other competitors that I used in the past (like Fivetran / Stitch). I was was wondering if any of you have any guidance on how to approach this special use case with >1TB of data to be ingested at first! Thanksss! 🙂

Adam Buie

11/12/2021, 10:50 PM

Hello all! I am on a journey to see if Airbyte might just be the answer to a challenge that I have encountered in my professional career, and have made a personal quest on my own. Large ERP systems today are being designed with API-first approaches, which will sometimes not accommodate some of the core concepts of Electronic Data Interchange and integrations needed to incorporate the interchanged data with very well, if at all! I am hoping that Airbyte will provide a user-friendly interface for the management of ETL processes, or even ELT processes, for non-tech savvy clients of mine who understand that managing and owning their data and transformation processes is very important if they want to achieve an EDI ecosystem/Integrations ecosystem that is fully enabled and highly robust but do not hae the skills to manage programmtically using code or workflow abstractions. Can anyone here let me know if they have used Airbyte for similar uses, like presenting ETL management to non-tech savvy or using it as a low or no code alternative to having to manage customized scripting or workflows?

✅ 1

Sandeep Devarapalli

11/13/2021, 6:08 PM

Hi all,

Copy code

Error airbyte --discover:
	W, [2021-11-13T18:05:40.416512 #1]  WARN -- : MONGODB | Error running ismaster on <http://test-cluster-1.khach.mongodb.net:27017|test-cluster-1.khach.mongodb.net:27017>: SocketError: getaddrinfo: Name does not resolve

Stefan Otte

11/15/2021, 9:22 AM

Hey there! I'm doing some research on deploying a small airbyte setup (just a few GB database sources). Our normal setup is k8s, but given that the aibyte's k8s-deployment is still in beta, there are some known_issues, and how easy it is to

docker-compose up

on a VM, I'm leaning towards just using

docker-compose

. The biggest advantage of the k8s deployment is the scalability, which in our setup is most likely not needed. So I'm wondering how stable the k8s beta deployment is? When is the beta

deployment

moving to

stable

✅ 1

👍 1

Caio César P. Ricciuti

11/15/2021, 2:54 PM

Hello all! I need some help with getting facebook approval for my limit increase, did anyone have gone through this process? I've made a request but Facebook refused... Any help is mostly welcome! Thanks in advance! 💪

🙌 1