Airbyte #ask-community-for-troubleshooting

Jason Rich

03/31/2022, 1:50 AM

Hey team. I am setting up an

airtable

connector for my marketing team, and I am running into an issue when specifying the tables within the base to pull in. When I specify the table name I get the

HTTPSConnectionPool(host='<http://api.airtable.com|api.airtable.com>', port=443): Max retries exceeded with ...

error. However, when don't specify the table the source in created (and ultimately connector), however, no data pulls in. Many thanks for the assist.

👀 1

Anand

03/31/2022, 4:55 AM

Hi, I am pulling data from salesforce to redshift. After the first run, i am adding some new calculated columns to the opportunity table. However , in the subsequent run [ full refresh/Increment ] , those custom columns are getting deleted. Any option to retain those 2 columns ? Any help on this is much appreciated. Thanks !

👀 1

Manish Tomar

03/31/2022, 7:35 AM

How can I sync the data on a weekly basis ? There is no option

✅ 1

mus mus

03/31/2022, 8:05 AM

Hi Guys, i`m new using Airbyte. i`m got stuck to initiate the source mysql. my airbyte installed in server A (centos) and its run as expected at localhost:8000. and now i`m going to setup the source mysql DB which located in different server B (run on vm level) . already tried access this mysql db using docker run from server A and its good. But i keep receive "_Could not connect with provided configuration. Error: Cannot create PoolableConnectionFactory (Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server._)" in airbyte to set the source part. already put "enabledTLSProtocols=TLSv1.2" in jdbc url param but still not working. need your guys help please

Mert Karabulut

03/31/2022, 9:04 AM

Hi All, I am working with Google Analytics data and I was wondering, is it possible to extract multiple view's data with one connector?

✅ 1

Shanu Sukoor

03/31/2022, 4:57 PM

Hi all, I am trying to export the configurations as gz file from one deployment (0.35.47-alpha) and import that into another server (0.35.44-alpha). But this keeps failing with the below error (I had to use the import API to get the error message as the UI does not return any message on import). Has anyone experienced this? Am I doing anything wrong here?

Copy code

{
  "status": "failed",
  "reason": "io.airbyte.validation.json.JsonValidationException: json schema validation failed when comparing the data to the json schema. \nErrors: $.id: is missing but it is required, $.catalog: is missing but it is required, $.catalogHash: is missing but it is required \nSchema: \n{\n  \"$schema\" : \"<http://json-schema.org/draft-07/schema#>\",\n  \"$id\" : \"<https://github.com/airbytehq/airbyte/blob/master/airbyte-config/models/src/main/resources/types/AttemptFailureSummary.yaml>\",\n  \"title\" : \"ActorCatalog\",\n  \"description\" : \"Catalog of an actor.\",\n  \"type\" : \"object\",\n  \"additionalProperties\" : false,\n  \"required\" : [ \"id\", \"catalog\", \"catalogHash\" ],\n  \"properties\" : {\n    \"id\" : {\n      \"type\" : \"string\",\n      \"format\" : \"uuid\"\n    },\n    \"catalog\" : {\n      \"type\" : \"object\",\n      \"existingJavaType\" : \"com.fasterxml.jackson.databind.JsonNode\"\n    },\n    \"catalogHash\" : {\n      \"type\" : \"string\"\n    }\n  }\n}"
}

Allen Cebrian

03/31/2022, 5:02 PM

Hello. I would like to learn airbyte. Is there a course available I can take? Also, I've been watching some tutorials in youtube but I didn't find any info about versioning of pipelines. How do you do this in airbyte? Whats the best approach in developing from test environment to deployment in production?

✅ 1

Samuel Rodríguez

03/31/2022, 7:57 PM

I have this inconvenience PROCEDURE AirbyteDev.activity_logs_proc does not exist.

👀 1

🙏 1

Samuel Rodríguez

03/31/2022, 7:58 PM

what is the reason of the problem?

Nicholas Van Kuren

03/31/2022, 8:13 PM

Hi, I am exploring Airbyte and like how easy it has been to get things setup so far! My question is about custom Singer taps. I have already developed a tap using Meltano's SDK for singer tap development and wondering if its possible to just point to that or if I have to migrate/redevelop using Airbyte's CDK?

✅ 1

William Phillips

04/01/2022, 1:58 AM

Can someone explain to the me the differences between airbyte core and cloud?

✅ 1

Rafael Auyer

04/01/2022, 2:34 AM

Hi. I'm trying to create a S3 destination that writes to the root of a bucket. But since the

s3_bucket_path

variable is mandatory, I cant. 😢

Rafael Auyer

04/01/2022, 2:47 AM

I'm trying to create a S3 destination that writes to the root of a bucket. But since the

s3_bucket_path

variable is mandatory, I cant. 😢 Did I miss something ?

Shubham Pinjwani

04/01/2022, 8:01 AM

Is there a workaround for the Multi Tenancy issue. I want to sync too many tables from multiple sources then push all the data to BigQuery for Analysis in append mode. The thing is that when pushing data to the same destination the data from different sources will get mixed up. For that I might need to add like a certain column having something like a sourceID or something so that data might not get mixed up. Is there something like that or is there any other solution for this multi Tenancy issue?

beer

04/01/2022, 8:06 AM

i’m install Airbyte’s CDK after that i create source • cd airbyte-integrations/connectors/source-<name> • python -m venv .venv # Create a virtual environment in the .venv directory • source .venv/bin/activate # enable the venv • pip install -r requirements.txt i’m run pip install -r requirements.txt error Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [1 lines of output] ERROR: Can not execute

setup.py

since setuptools is not available in the build environment. [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. computer : mac M1

Shubham Pinjwani

04/01/2022, 8:41 AM

While using airbyte API is there a default value I can add to a column inside jsonSchema?

✅ 1

Timur Amirov

04/01/2022, 11:47 AM

Hi! First of all thanks for creating this great tool: loving the documentation and the learning curve seems perfect for me. I’m reworking a bit our analytics setup a bit, trying to make operational data (from a django app) mixed with lots of tracking data (we’re using adjust for it) available in an analytics rds aurora (postgres) db. The trouble is, that when trying to import all those xx,xxx CSVs at the same time, even when I provide the schema, overall process takes a long time. When schema is not provided, it seems like S3 source connector is trying to iterate over the whole bucket to derive and validate it. Based on this observations I have a few questions: • Would you recommend using airbyte for importing lots and lots of CSVs files (event logs) into postgres? • Would you maybe suggest switching to a proper data warehouse (e.g. snowflake) instead and do import there? • Are there any settings in the S3 source connector that would help me speed up the process? • Would having more workers help (as I’m just experimenting, my setup is based on docker-compose up, but that can be changed)? • Are there any plans for S3 source connector on the roadmap which would address this or similar issue? Cheers ^_^

Muhammed Ali Dilekci

04/01/2022, 2:20 PM

Hello, I have a question. I've made some changes in airbyte-webapp , when I start webapp with "npm start" it works just as I want. But when I build all application with "docker compose up" I see that changes are not reflected. I couldn't figure out how would this changes reflect so is there anything I can read / watch to learn about this. Thanks for helping.

Kevin Phan

04/01/2022, 3:28 PM

Quick question on Airbyte.. Is it possible to combine two sources? ie, use output from one source to execute a second source? It looks like it’s a simple pull -> transform -> store kind of setup, and I’m trying to understand how we might use it

Hicham Zghari

04/01/2022, 4:28 PM

Hello, I have a question. I'm trying to add a python connector, but I'm getting this error while executing in the UI web: source > No module named 'binance' (this library is already installed in my computer)

👀 2

🙏 1

👍 1

Loman Barnett

04/01/2022, 9:59 PM

Brand new user. I am watching a sync from postgres to postgres, and am wondering why Airbyte processes rows of views? Am I misunderstanding? (Same thing with foreign tables.)

Dimitriy Ni

04/03/2022, 9:42 PM

Hey everyone, great to be here now too 🙂 Whats the right way to deploy and run an Airbyte instance in AWS? Are connectors deployed within a forked Airbyte repository or how do I add connetors in a programaticall and version controlled way?

Naphat Theerawat

04/04/2022, 4:40 AM

Hi. I am new to Airbyte. I have some questions I would like to ask. Question 1

Let's say I have 2 instances of Airbyte running, prod
and dev
respectively. In prod, the environment variables such as Database config would differ from dev (for obvious reasons 😅). Is there a way I can do CI/CD or automatically port the change made on dev to prod without having to recreate the Airbyte connection on prod manually?

I found this article, but I am not too sure if it is related https://airbytehq.slack.com/archives/C01MFR03D5W/p1636143791206800

Let's say I have 1 connection which has 1 source (
postgres-sandbox database
) and 1 destination (
bigquery-sandbox project
).

I would like to create Airbyte connection which will then sync the source to destination.

After testing out that this connection works, I want to be able to export the connection config but instead of using
postgres-sandbox database
and
bigquery-sandbox project
, I want to use
postgres-production database
and
bigquery-production project
instead.

Is there a proper way I can do this?

Question 2

I have noticed that, for any schema change to reflect on destination, I need to "Update the source schema". If I click on it, it will reset the setting and ask me to reselect all the tables that I want and set the sync mode and cursor field again. Is there any other way I could achieve the same result but less hassle?

Let's say I have 40 tables in this connection. Updating the source schema means I would then have to re-select all of these tables. It is not very convenient to do so.

Shubham Kumar

04/04/2022, 8:22 AM

Hi All, we have setup airbyte on k8 with logging on s3. I have attached the screenshot of

<s3://log-bucket/job-logging/workspace/>

. Can somebody please help me understand on what basis these folders are created in the workspace directory. Thanks

✅ 1

Mona Makinian

04/04/2022, 8:55 AM

Hello people, I was wondering if airbyte cloud hosts Europe now? Thanks!

🙏 2

Anusha Maddi

04/04/2022, 5:57 PM

Hi team, I am new to airbyte. I tried to build it locally facing below build failure. can you help me solving this issue. Command i am using to build is "gradle build".

build logs 2.txt

Tien Nguyen

04/04/2022, 11:05 PM

Hello everyone, I am a data engineering at Brighthive and my company wants me to work with Airbyte. I started working with airbyte Browser, but it is no use since my company wants to run it against with some other scripts. I am trying to set up cli env via bash. when I run the local environemnt and try to download airbyte-cdk~-0.1.25 it gives me an error that version not found. Can someone help me with the correct version for this ?

chris evans

04/05/2022, 3:21 AM

I am currently exploring data engineering tools. it would be really great if you could resolve the query (use case given below) • Do airbyte supports data ingestion part of ETL/ELT? if yes how? 1. let's say we have mysql prod db and i want to run bunch of airbyte piplines (using spark for larger data processing) and dump data in partition parquet format in one of the s3 bucket. can i do this using airbyte..? 2. cause usually in past experience what i did basically enable bin logs and reading cdc sequence we were reading all new rows or updates rows using nifi and spark on emr applications were reading this data. to dump in s3. all new data/ updated data can airbyte read this and dump in s3 using running some spark applications? Pls let me know, Thanks

Khristina Rustanovich

04/05/2022, 1:13 PM

I saw this advice in one of the issues in Airbyte github. I am wondering how can I apply it, if I run Airbyte on Google Compute Engine using standard setup? Is it possible to modify the file via a docker command while upgrading Airbyte in a ssh session in GCP?

Shubham Pinjwani

04/05/2022, 1:28 PM

Can I use one dbt script for all the tables? Or will I have to do it separately for each tables? I want to include a constant value as a column after the dbt transformation.