Airbyte #feedback-and-requests

Eugene Krall

01/25/2022, 9:55 AM

Is the normalisation routine performed on the entire "raw data" table when a new sync happens? I am using my own dbt transformations with MongoDB as a source and I'm wondering what happens if I remove all the tables but the the "raw data" while changing schema directly in dbt. Will final table be fully replicated from the raw data table on a subsequent sync?

Anatole Callies

01/26/2022, 8:46 AM

Hi, Is this page discontinued : https://docs.airbyte.com/project-overview/changelog/platform ? It was nice to have a summarized view of evolutions. Because it's hard to see what is important on the git release page : https://github.com/airbytehq/airbyte/releases Thanks

Elias Djurfeldt

01/26/2022, 8:11 PM

Hey not sure anyone saw this yet. But have 2 bug reports regarding the Google Secret Manager implementation for storing airbyte secrets: https://airbytehq.slack.com/archives/C019WJFR1JT/p1643104958277900?thread_ts=1642764265.216300&cid=C019WJFR1JT

Jens

01/27/2022, 8:29 AM

Good morning, Airbyte is great but has a major issue with regards to tmp tables being created and not vacuumed after failed syncs (we ingest into the postgres connector). This creates out of disk memory situations in our setup constantly. In our setup, Microsoft SQL server and amazon ads connectors in particular are triggers of this bug. I am thinking about setting up a a cron that drops tmp tables that are not referenced by a running sync. Any ideas how to find those through SQL?

Nahid Oulmi

01/27/2022, 2:17 PM

Hello there, I was wondering if it is possible to schedule a “Reset your data” programmatically, using an Airbyte/Airflow operator preferably. The use case is a MongoDB collection getting data update operations, that we won’t be able to replicate if the Airbyte connection is not reset periodically. Thanks,

Daniele

01/27/2022, 6:05 PM

Hi, I see on airbyte doc that Oracle logminer integration is "coming soon". Any idea on when "soon" could be?

Ping-Lin Chang

01/27/2022, 10:12 PM

Hello folks, I am from a startup Instill AI doing visual data preparation. We are keen to adopt Airbyte’s standard for the data source and destination of our open-source visual data pipeline. However, I just realised that Airbyte currently doesn’t support unstructured data at all (both source and destination). I am wondering what’s the most efficient way to make this happen? Is there any technical obstacle blocking you from handling unstructured data? or is unstructured data even on the road map? We are keen to contribute to this feature too. Thanks.

Kyle Mok

01/28/2022, 4:32 PM

hi all, wondering why on an

incremental | append

sync method, in Snowflake, the final destination table is still dropped and refreshed? it seems a bit inefficient and counter-intuitive

Armand

01/30/2022, 1:59 PM

Hello all, do you have an ETA on when Redshift Source will support incremental extraction?

Sadik Bakiu

01/30/2022, 2:58 PM

Hey folks, I was trying to use the partners contact form (https://airbyte.com/partners#contact-us) but it seems to be broken. Who would be the right contact person for the partner topic? Cheers

Jefferson da Silva Martins

01/31/2022, 3:23 PM

Hi there, is there a way that i can use, for an Oracle Connection, Service name instead of SID ?

Gorkem Yurtseven

01/31/2022, 5:20 PM

Hello airbyte team! Any updates on the roadmap regarding the "declarative interface"? https://docs.airbyte.com/project-overview/roadmap#coming-within-a-few-weeks-months Is there a design doc or spec I can have a peek at?

thomas holvoet

01/31/2022, 7:54 PM

Hey guys! I’m working on a start-up runconverge.com. We are building ML-based automation pipelines for e-commerce businesses. To scale the data infrastructure, we are currently exploring the option of integrating Airbyte into our architecture. However, I have a couple of questions I would love some advice on! I’ll elaborate in the replies 🙂 Thanks!

Ben la Grange

02/01/2022, 6:59 AM

Has anyone had success with the Freshsales connector? I’m trying Freshsales -> Bigquery and get it to build empty tables in Bigquery, but haven’t been able to get any data transferred.

Bob De Schutter

02/01/2022, 8:01 AM

Hi airbyte team 🙂 I'm wondering if it would be possible to allow us to name connections. It would be helpful to assign custom names to different connections in order to discern between them more easily in the UI. Now, when you have multiple connections with the same source and destination it's kind of hard to see the difference between them in the UI...

Andreas

02/01/2022, 1:12 PM

Hi! I have some general feedback on connectors I've tested: Both Facebook connectors as well as the Google Search Console connector are really difficult to set up and for Facebook I didn't manage to even create a long living page token in a reasonable time (The docs aren't helping either here). Coming from Hevo and Fivetran, the process feels really clunky. There should be an oAuth flow taking care of the login whereever possible instead.

Paul Payne

02/01/2022, 5:44 PM

Hi All, I am new and am wanting to get onboarded and become a Partner -- says there is a waitlist?? Anyone from Airbyte able to assist?

Jay Bujala

02/01/2022, 6:06 PM

Could you please introduce a clean up mechanism for failed pods? There was an issue in the connector, but the pods were in pending forever

Martin Prejean

02/01/2022, 8:00 PM

Hello Airbyte Team ! First thing first, thank you for developing Airbyte, it is a fantastic tool ! :)) I transfered GA3 (Universal Analytics) data to BigQuery and it worked like a charm. Sadly, I didn't see any tables created regarding Ecommerce data. Do you know if it is currently in development or not ? (It could be really helpful for transfering historical data !) I search on this channel but I didn't find any response on that subject (maybe it labeled on a roadmap but I didn't see anything) Thank you for your answer, /Martin

juan manuel martinez gonzalez

02/02/2022, 6:16 PM

Hi!!! I'm trying to take tables from a SHOPIFY api to SNOWFLAKE and it fails to use basic normalization with the endpoint abandoned checkouts in the field customer the log says: 2022-02-01 161647 normalization > Database Error in model ABANDONED_CHECKOUTS_CUSTOMER (models/generated/airbyte_incremental/SHOPIFY/ABANDONED_CHECKOUTS_CUSTOMER.sql) 2022-02-01 161647 normalization > 100037 (22018): Boolean value '[]' is not recognized 2022-02-01 161647 normalization > compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_incremental/SHOPIFY/ABANDONED_CHECKOUTS_CUSTOMER.sql 2022-02-01 161647 normalization > 2022-02-01 161647 normalization > Done. PASS=254 WARN=0 ERROR=1 SKIP=2 TOTAL=257 if anyone can help i'm grateful

Collin Scangarella

02/02/2022, 8:10 PM

Is there any way to ELT email data?

Kyle Cheung

02/02/2022, 9:55 PM

anyone have a solution for PagerDuty -> Snowflake?

Ameya Bapat

02/03/2022, 9:50 AM

Hi We have following requirements. I have listed them as per domain in which they might be related. • S3-source 1. support csv along with json. • S3-Destination: 1. output files can be partitioned based on configured files size(ex. 10mb each) instead of dumping one huge file(~Gb) in every sync. 2. If one of the csv value has nested data in it then output csv creates mulitple rows in csv to indicate single row record. It disturbs our csv consumer processing and rows/records count. • Snowflake-Source There should be a way to filter out some of columns from syncing as sometimes it is not advisable to sync all columns to the destination as some columns could contain irrelevant or sensitive information. • Connection: - It should take first sync start time for the connection - Along with frequency, it should also support day, time schedule like (Ex. every Monday 3pm, everyday at 1pm) or cron strings. • Sync Jobs: - Observer/subscriber Callback model to inform external systems about the completion of job. The callback could inform all the jobs details in it.(Ex. success/failure, data_synced, count etc). Currently external system has to periodically make jobs api calls to get the status of job. As we don't know the amount of time the job could take, it demands periodic calls from the external systems.

Dan Siegel

02/03/2022, 8:35 PM

is it possible with S3 CSV target to do string quoting?

Alessandro Pietrobon

02/07/2022, 10:21 AM

hi team, first of all, we love our airbyte tests so far. Took a bit to setup but we're good now. Quick question: we are looking for a way to scrape through a directory (on s3 or google drive) which contains a series of csv files. Instead of connecting to each one, we'd want to scrape the directory and combine all the results on the same table (each file has consistent headers, they are versions of the same report). Is this something that is you guys think is achievable with a custom connector or is it outside the scope of airbyte entirely? Thanks

Naveen Sai Patnana

02/07/2022, 11:24 AM

Hi Team, We have triggered 2 jobs where source is GoogleAds(airbyte/source-google-ads:0.1.20) and destination is Snowflake(airbyte/destination-snowflake:0.3.14).All of a sudden airbyte server got crashed. We got an error saying "DEADLINE_EXCEEDED: deadline exceeded after 69.999905719s". We even saw the related issues on github and we are running on the suggested configuration. It used to work fine in Airbyte version: 0.29.22. But we are facing this issue after we have upgraded to 0.35.15. Airbyte Version: 0.35.15-alpha Server Details: Ubuntu 20.04.3 LTS RAM: 16 GB 4 Core CPU DiskSpace 200gb Im adding the job logs for better understanding:

Copy code

2022-02-07 09:24:27 WARN i.t.i.r.GrpcSyncRetryer(retry):56 - Retrying
after failure
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded
after 69.999905719s. [closed=[],
open=[[remote_addr=airbyte-temporal/192.168.144.6:7233]]]
at
io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
~[grpc-stub-1.42.1.jar:1.42.1]
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
~[grpc-stub-1.42.1.jar:1.42.1]
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
~[grpc-stub-1.42.1.jar:1.42.1]
at
io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.getWorkflowExecutionHistory(WorkflowServiceGrpc.java:2642)
~[temporal-servicecl
ient-1.6.0.jar:?]
at
io.temporal.internal.client.WorkflowClientLongPollHelper.lambda$getInstanceCloseEvent$0(WorkflowClientLongPollHelper.java:143)
~[temporal-sdk-1.6.0.jar:?]
at
io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:61)
~[temporal-serviceclient-1.6.0.jar:?]
at
io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:51)
~[temporal-serviceclient-1.6.0.jar:?]
at
io.temporal.internal.client.WorkflowClientLongPollHelper.getInstanceCloseEvent(WorkflowClientLongPollHelper.java:131)
~[temporal-sdk-1.6.0.jar:?]
at
io.temporal.internal.client.WorkflowClientLongPollHelper.getWorkflowExecutionResult(WorkflowClientLongPollHelper.java:72)
~[temporal-sdk-1.6.0.jar:?]
at
io.temporal.internal.client.RootWorkflowClientInvoker.getResult(RootWorkflowClientInvoker.java:93)
~[temporal-sdk-1.6.0.jar:?]
at
io.temporal.internal.sync.WorkflowStubImpl.getResult(WorkflowStubImpl.java:243)
~[temporal-sdk-1.6.0.jar:?]
at
io.temporal.internal.sync.WorkflowStubImpl.getResult(WorkflowStubImpl.java:225)
~[temporal-sdk-1.6.0.jar:?]
at
io.temporal.internal.sync.WorkflowInvocationHandler$SyncWorkflowInvocationHandler.startWorkflow(WorkflowInvocationHandler.java:315)
~[temporal-sdk-1.6.0.jar:?]
at
io.temporal.internal.sync.WorkflowInvocationHandler$SyncWorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:270)
~[temporal-sdk-1.6.0.jar:?]
at
io.temporal.internal.sync.WorkflowInvocationHandler.invoke(WorkflowInvocationHandler.java:178)
~[temporal-sdk-1.6.0.jar:?]
at jdk.proxy2.$Proxy40.run(Unknown Source) ~[?:?]
at
io.airbyte.workers.temporal.TemporalClient.lambda$submitSync$3(TemporalClient.java:148)
~[io.airbyte-airbyte-workers-0.35.15-alpha.jar:?]
at
io.airbyte.workers.temporal.TemporalClient.execute(TemporalClient.java:439)
~[io.airbyte-airbyte-workers-0.35.15-alpha.jar:?]
at
io.airbyte.workers.temporal.TemporalClient.submitSync(TemporalClient.java:147)
~[io.airbyte-airbyte-workers-0.35.15-alpha.jar:?]
at
io.airbyte.workers.worker_run.TemporalWorkerRunFactory.lambda$createSupplier$0(TemporalWorkerRunFactory.java:83)
~[io.airbyte-airbyte-workers-0.35.15-alpha.jar:?]
at io.airbyte.workers.worker_run.WorkerRun.call(WorkerRun.java:51)
[io.airbyte-airbyte-workers-0.35.15-alpha.jar:?]
at io.airbyte.workers.worker_run.WorkerRun.call(WorkerRun.java:22)
[io.airbyte-airbyte-workers-0.35.15-alpha.jar:?]
at
io.airbyte.commons.concurrency.LifecycledCallable.execute(LifecycledCallable.java:94)
[io.airbyte-airbyte-commons-0.35.15-alpha.jar:?]
at
io.airbyte.commons.concurrency.LifecycledCallable.call(LifecycledCallable.java:78)
[io.airbyte-airbyte-commons-0.35.15-alpha.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[?:?]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]

Kyle Mok

02/07/2022, 2:53 PM

hey all, it would be nice to be able to name our connections. as of right now, I have to click into each of our connections to see which tables the connection houses.

Kyle Mok

02/07/2022, 4:49 PM

it would also be helpful to have compound cursors for incremental loading (select multiple fields)

Ram

02/08/2022, 8:29 AM

Hi Team, I see a lot of sources & destination connectors - But in documentation page - I could not see the actual GIT page of the sources . For example If a source is RedShift - Where can i find the GIT repo of the RedShift Source Connector ?

Rachel RIZK

02/08/2022, 11:14 AM

Hi Airbyte team, First, thanks a lot for developing this project! I have a question regarding logs. We'd like to create a Slack alerting which would ping specific data owners (1 connection = 1 owner) when something bad happens when syncing. So far, we: • sent our logs to an external monitoring tool ✅ • when a sync has a failed status in worker logs, it triggers a slack message in a specific channel ✅ • however, we're missing the "get the connection/config id" in the sync summary log, that would allow us to make the connection to a data owner (stored in a table) ❌ For now, the only solution that I see is to get the jobid (which is in the sync summary) & use the API to get the corresponding configid. By any chance, do you see a better way to retrieve the connection/configid in the sync summary logs? Thanks!