https://linen.dev logo
Join Slack
Powered by
# advice-data-ingestion
  • c

    Charles VERLEYEN

    05/24/2022, 4:57 PM
    Hello, We have set up a connection to a postgresql database on AWS with sink to BigQuery. All is working well except that Airbyte is creating one dataset per schema we have in postgreSQL. Is it possible to force somehow Airbyte to write all the data in one unique BigQuery dataset and not create multiple datasets ?
    a
    c
    • 3
    • 3
  • e

    Eugene Krall

    05/25/2022, 10:11 AM
    If I update to a newer version of airbyte, will I have to reset my data storage?
    m
    • 2
    • 1
  • d

    Damian Crisafulli

    05/25/2022, 1:08 PM
    Hey everyone, I’ve got a Freshdesk -> Redshift connection failing on the normalization step. Airbyte Version: 0.39.1-alpha Source: Freshdesk (0.2.11) Destination: Redshift (0.3.35) Environment: k8s Here is the error from the logs:
    Copy code
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31  Finished running 12 incremental models in 205.05s.
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31  Completed with 1 error and 0 warnings:
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31  Database Error in model freshdesk_tickets (models/generated/airbyte_incremental/airbyte/freshdesk_tickets.sql)
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31    Invalid input
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31    DETAIL:
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31      -----------------------------------------------
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31      error:  Invalid input
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31      code:      8001
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31      context:   CONCAT() result too long for type varchar(65535)
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31      query:     10971779
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31      location:  string_ops.cpp:110
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31      process:   query2_114_10971779 [pid=31540]
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31      -----------------------------------------------
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31
    2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31  Done. PASS=11 WARN=0 ERROR=1 SKIP=0 TOTAL=12
    Is there a way to configure normalization such that it truncates values that are too long?
    a
    • 2
    • 1
  • b

    Ben Jordan

    05/25/2022, 2:25 PM
    Hi, we're using the Google Ads connector which works well, but the token expired and it does not seem Airbyte is refreshing it. Does anyone know if this is a bug or not yet implemented? Why provide the refresh token if it won't be used?
    a
    • 2
    • 4
  • s

    Simon Thelin

    05/25/2022, 4:24 PM
    Hello. I have a quick question (hope it is the right place). Is it possible to set partitioning when loading data as parquet to S3? In my case I have a connection of
    Postgres -> S3
    . It feels a bit off this functionality is not there since this is a quite natural thing you might want to do? I can’t find any setting for it currently. Cheers
    a
    • 2
    • 2
  • c

    Coşkan Selçuk

    05/25/2022, 5:55 PM
    Hi all. I am a Data Solutions Architecture Manager from Turkey. I needed to consume data from an API. I am aware that the HTTP source is graveyarded. However it is really high maintenance to develop my own connector using the SDK when I only need to send a simple http request and receive a json file as a result. I would appreciate a "Walking Dead" episode regarding on HTTP source.
    🙏 1
    a
    • 2
    • 4
  • y

    Yifan Sun

    05/26/2022, 12:50 AM
    Hey guys! I'm building a source-workday connector using Singer. When I run the 'pip install -r requirement.txt', it always gave me this ResolutionTooDeep(max_rounds) error. Any one know why? I didn't modify setup.py file, but shall I modify it?
    a
    • 2
    • 2
  • s

    Sergi Gómez

    05/26/2022, 10:24 AM
    Hey Product folks working at Airbyte! When will it be possible to add a new table in a source without having to refresh the whole source schema and therefore having to go cherrypick them one by one over and over again. I am a hardcore Airbyte user (I’ve implemented it across many teams and companies), and I beg you for this feature, the cherrypicking is REALLY annoying. Else please let me know where I should address to any other channel or platform. Thanks for making my voice be heard!! 🙏
    ❤️ 1
    🙏 2
    g
    d
    m
    • 4
    • 4
  • h

    Hawkar Mahmod

    05/26/2022, 3:02 PM
    hey folks, I’m not sure if this is the best channel, it feels like an ingestion concern but please let me know if you think it belongs elsewhere. I setup an Airbyte server over a year a go to get data from a production Aurora (MySQL) database. We had some big tables (10M+) and Airbyte really struggled with them. I was told Airbyte wasn’t very suitable for such loads at that time. I wonder if the state of things has changed and we can confidently use Airbyte to replicate data between this database and Snowflake/Redshift on a regular basis (every hour). Most tables don’t grow that fast it’s just handful of tables that might have 100k records per day.
    h
    a
    • 3
    • 6
  • y

    Yifan Sun

    05/26/2022, 10:19 PM
    Hey folks, I'm building a source-workday connector with airbyte cdk(python). In the README, the step "build via Gradle" needs a java JDK to build. Is this a requisite for or a equivalent to later build docker-image step? Thanks!
    m
    • 2
    • 2
  • p

    Pascal Cohen

    05/27/2022, 12:29 PM
    Hi I would like to set up a Custom connection toward a GRPC service. I did not find any relevant existing source so I tried to start my own Custom source - as this could be in any case a good exercise I started with the generate.sh to scaffold a Python project I wonder how to deal with the state passed in parameter to the read method in order to deal with incremental behavior Typically the signature is:
    Copy code
    def read(
        self, logger: AirbyteLogger, config: json, catalog: ConfiguredAirbyteCatalog, state: Dict[str, any]
    ) -> Generator[AirbyteMessage, None, None]:
    And when I return the AirbyteMessage there are several places where I can return the state:
    Copy code
    yield AirbyteMessage(
        type=Type.RECORD,
        record=AirbyteRecordMessage(stream=stream_name, data=data, emitted_at=int(datetime.now().timestamp()) * 1000,),
        state=AirbyteStateMessage(data= XXX,
                                  global_=YYY,
                                  streams=[ZZZ])
    )
    I am not sure how to deal with that. Furthermore the documentation states that I should deal with state on my own but what is the point to pass a state in that case ? I think I missed something Any advice on best practice to persist and retrieve the state ? In my test case, I simply want to use an incremental id to ask for all the ids after this one and store this as a cursor for next read Thanks for any help
    a
    • 2
    • 3
  • y

    Yudian

    05/27/2022, 8:21 PM
    Hi airbyte team, not sure if this is the right channel. Recently I am working on a sync job between two Snowflakes (both src and target are Snowflake). However, if the source table is big (e.g., > 200GB), then the job often failed. I just pasted the part that the job triggered some errors, looks like you stored big chunk of temporary data in some temporary place and later cannot fetch it?
    Copy code
    2022-05-23 03:24:13 [32mINFO[m i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):301 - Records read: 76853000 (223 GB)
    2022-05-23 03:24:13 [32mINFO[m i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):301 - Records read: 76854000 (223 GB)
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.RestRequest execute
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Error response: HTTP Response code: 403, request: GET <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****> HTTP/1.1
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.DefaultResultStreamProvider getInputStream
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Error fetching chunk from: <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****>
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Response status line reason: Forbidden
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Response content: <?xml version="1.0" encoding="UTF-8"?>
    2022-05-23 03:24:14 [44msource[0m > <Error><Code>AccessDenied</Code><Message>Request has expired</Message><Expires>2022-05-23T03:24:00Z</Expires><ServerTime>2022-05-23T03:24:15Z</ServerTime><RequestId>KVT0V6FQ3SBDN3VR</RequestId><HostId>5Oztd4n8a6vWsAIHnaNKLMkXfmyYdQS9zwGpS1ebyb1E8JWxqZT8FFCwJWltzEm6hHOUsHnvGMg=</HostId></Error>
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.RestRequest execute
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Error response: HTTP Response code: 403, request: GET <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****> HTTP/1.1
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.DefaultResultStreamProvider getInputStream
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Error fetching chunk from: <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****>
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Response status line reason: Forbidden
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Response content: <?xml version="1.0" encoding="UTF-8"?>
    2022-05-23 03:24:14 [44msource[0m > <Error><Code>AccessDenied</Code><Message>Request has expired</Message><Expires>2022-05-23T03:24:00Z</Expires><ServerTime>2022-05-23T03:24:15Z</ServerTime><RequestId>KVT87B5B2XRVG33J</RequestId><HostId>K7nziICuSHtr4I40+W08RwiAcd2seylrpGlT5gs36PX0DX7tIhZDsFgWcV1MplB+xDtZ93fADns=</HostId></Error>
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.RestRequest execute
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Error response: HTTP Response code: 403, request: GET <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****> HTTP/1.1
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.DefaultResultStreamProvider getInputStream
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Error fetching chunk from: <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****>
    2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
    2022-05-23 03:24:14 [44msource[0m > SEVERE: Response status line reason: Forbidden
    If I reduced the source table size to be within 100GB, then there is no problem. Would like to get some feedbacks / suggestions based on this. Thank you!
    a
    • 2
    • 1
  • s

    Siddharth Putuvely

    05/28/2022, 9:31 AM
    Hi Airbyte Team, I am trying to set an append+dedup history connection on a table. I see that all records are present in the _airbyte_raw_table but they are not seen in the scd table and the final table. The cursor field I am using is the "*updated_at*" column.
    Copy code
    select max(updated_at) from "PROD_DB"."SOURCE_SCHEMA"."CYCLES";
    -> 2022-05-27T05:56:21.565000
    select max(updated_at) from "PROD_DB"."SOURCE_SCHEMA"."CYCLES_SCD";
    ->2022-05-27T05:56:21.565000;
    select max(_AIRBYTE_DATA:updated_at) from "PROD_DB"."SOURCE_SCHEMA"."_AIRBYTE_RAW_CYCLES";
    -> 2022-05-28T08:04:19.061000;
    Can anybody explain what I am missing? Airbyte version : 0.32.5
    a
    • 2
    • 1
  • b

    Ben Nicole

    05/29/2022, 6:03 PM
    Hi, if there is a new column added to the source table, would it be reflected automatically in the target side without manual changing Airbyte setting? If it's not, what is the best way to cater this DDL update? kindly advice . Thank you 🙏
    a
    • 2
    • 2
  • m

    Magnus Berg Sletfjerding

    05/30/2022, 11:24 AM
    Hey Airbyte teams 😄 We’re looking for a solution for ingesting data from our CockroachDB. In the Airbyte CockroachDB docs, I can’t tell whether the CockroachDB cluster needs to use the CRDB Enterprise license or not to work with the Airbyte connector. Do you know where I could find this information? 🙏
    m
    • 2
    • 2
  • r

    Ramon Vermeulen

    05/30/2022, 12:05 PM
    I am having an API with for each model 3 different endpoints: •
    /data.xml
    Giving back all records until now •
    /updates.xml
    Giving back all updates since a certain point in time •
    /deletes.xml
    Giving back all deletes since a certain point in time What are the best practices setting up an incremental sync in Python knowing the concept of these 3 endpoints. With only updates it was easy, and I could use the incremental sync - deduped history concept I suppose after reading about the sync modes. But how can I implement this if I want to also manage incremental deletes? Or is the only way with this set-up to use a full refresh every time, and incremental isn't possible? Or is the idea that I should add another field to the model in the data warehouse, for instance deleted true/false, and handle the "deletes" as an actual update to the records where deleted is set to true? The upside to this is that you still have those records in your data warehouse. Does anyone know any connectors with similar behavior in airbyte (python), would be nice to take a look at an example implementation.
    m
    • 2
    • 2
  • a

    Apostol Tegko

    05/31/2022, 8:55 AM
    Hey All 👋, We’ve updated yesterday from 0.35.29 -> 0.39.5 Since, we’re not seeing any sources/destinations under
    settings -> sources
    Looking at requests, it seems that this request is not returning any items:
    <http://localhost:8000/api/v1/source_definitions/list_for_workspace>
    Same for destination endpoint as well. Can’t see any errors in the server logs either. Do you have any advice?
    ⚠️ 1
    m
    • 2
    • 9
  • v

    Vytautas Bartkevičius

    06/01/2022, 5:31 AM
    Hello Airbyte community. I have question, why Airbyte deletes all the data after editing connection streams? For example if I add new stream to connection I’m getting such message:
    WARNING! Updating the schema will delete all the data for this connection in your destination and start syncing from scratch
    . Why is that? Why the data is deleted from all streams, not only from new one, but also from currently existing. So after this I need to collect all data from from scratch? Or how I could prevent from this?
    g
    • 2
    • 1
  • p

    Pranav Hegde

    06/02/2022, 5:33 AM
    Hey all, we are using Airbyte to ingest data from mixpanel to bigquery. It was working fine till now, however for the past few days we are getting Schema Validation error
    Copy code
    2022-06-02 05:27:08 [32mINFO[m i.a.v.j.JsonSchemaValidator(test):56 - JSON schema validation failed. 
    errors: $: null found, object expected
    2022-06-02 05:27:08 [1;31mERROR[m i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$1):70 - Validation failed: null
    2022-06-02 05:27:08 [43mdestination[0m > 2022-06-02 05:27:08 [32mINFO[m i.a.i.d.b.u.AbstractBigQueryUploader(uploadData):99 - Final state message is accepted.
    2022-06-02 05:27:08 [43mdestination[0m > 2022-06-02 05:27:08 [32mINFO[m i.a.i.d.b.u.AbstractBigQueryUploader(dropTmpTable):111 - Removing tmp tables...
    2022-06-02 05:27:08 [43mdestination[0m > 2022-06-02 05:27:08 [32mINFO[m i.a.i.d.b.u.AbstractBigQueryUploader(dropTmpTable):113 - Finishing destination process...completed
    2022-06-02 05:27:08 [43mdestination[0m > 2022-06-02 05:27:08 [32mINFO[m i.a.i.d.b.u.AbstractBigQueryUploader(close):85 - Closed connector: AbstractBigQueryUploader{table=_airbyte_raw_indodana_mixpanel_export, tmpTable=_airbyte_tmp_mbo_indodana_mixpanel_export, syncMode=WRITE_APPEND, writer=class io.airbyte.integrations.destination.bigquery.writer.BigQueryTableWriter, recordFormatter=class io.airbyte.integrations.destination.bigquery.formatter.DefaultBigQueryRecordFormatter}
    2022-06-02 05:27:08 [43mdestination[0m > 2022-06-02 05:27:08 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):171 - Completed integration: io.airbyte.integrations.destination.bigquery.BigQueryDestination
    2022-06-02 05:27:08 [1;31mERROR[m i.a.w.DefaultReplicationWorker(run):141 - Sync worker failed.
    java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: Source process exited with non-zero exit code 137
    	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?]
    	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[?:?]
    	at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:134) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
    	at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:49) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
    	at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:174) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
    	at java.lang.Thread.run(Thread.java:833) [?:?]
    	Suppressed: io.airbyte.workers.WorkerException: Source process exit with code 137. This warning is normal if the job was cancelled.
    		at io.airbyte.workers.protocols.airbyte.DefaultAirbyteSource.close(DefaultAirbyteSource.java:136) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
    		at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:118) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
    		at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:49) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
    		at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:174) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
    		at java.lang.Thread.run(Thread.java:833) [?:?]
    Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Source process exited with non-zero exit code 137
    	at io.airbyte.workers.DefaultReplicationWorker.lambda$getReplicationRunnable$2(DefaultReplicationWorker.java:230) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
    	at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    	... 1 more
    Caused by: java.lang.RuntimeException: Source process exited with non-zero exit code 137
    	at io.airbyte.workers.DefaultReplicationWorker.lambda$getReplicationRunnable$2(DefaultReplicationWorker.java:222) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
    	at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
    	... 1 more
    Would appreciate any help regarding this issue. We are using the latest version of the mixpanel and bigquery connectors
    m
    • 2
    • 2
  • r

    raphaelauv

    06/02/2022, 6:51 PM
    I'm trying to setup salesforce CDC -> bigquery the salesforce airbyte connector is not compatible with CDC for salesforce is that correct ? thanks
    m
    • 2
    • 1
  • n

    ni

    06/02/2022, 7:37 PM
    Hi, I'm attempting to use airbyte to pull data from a smartsheet table into a postgresql db. Connectivity appears to work fine, however when I check the data imported into Postgresql, the values are placed under the wrong columns. I searched around on the internet but was unable to find any relevant answers. Can anyone in here help me sort this out? it also appears that I'm not the only person who has experienced this, but i'm failing to find any resolutions: https://github.com/airbytehq/airbyte/issues/5520
    m
    • 2
    • 1
  • h

    HKR

    06/03/2022, 11:13 AM
    Hey, sry if this has been asked before, but I can't find an answer to that: Is Airbyte able to ingest an unstructured file somehow? My use case would be to download the file from some URL and save it either in the file system or mongodb
    a
    • 2
    • 1
  • a

    Adam Bloom

    06/03/2022, 4:32 PM
    Hi folks, Not sure if this is the right channel, so feel free to redirect me elsewhere. We've been working with airbyte for a few weeks now quite heavily and are preparing to deploy in k8s in our production environment. Due to the type of data we handle, we have very strict security requirements. Even though this lives in our own k8s cluster, we require all traffic between pods to be encrypted - this of course will include the airbyte source and destination pods. We attempted to do this with a proxy/service mesh, but discovered that the airbyte protocol is not very proxy friendly (specifically, airbyte-worker -> destination containers when no records are found in the source). We're willing to implement TLS with a self-signed CA (we've used https://cert-manager.io/ elsewhere) between airbyte-worker and the remote-stdin and relay-stdout/relay-stderr containers. Before doing so, we wanted to make sure that we weren't missing a roadmap item to address this (we noticed that the blog post describing the current architecture refers to a future v2...). We also wanted to ensure that this change is one that airbyte would be willing to accept (presumably needs to be configurable and not on by default).
    m
    b
    • 3
    • 4
  • i

    ijac wei

    06/06/2022, 2:53 AM
    Hi, does anyone ingest data from Notion? Does anyone know what is the id of “_airbyte_data” for? I though it is “_airbyte_ab_id” but not.
    m
    • 2
    • 2
  • b

    Bastien Gandouet

    06/06/2022, 3:40 PM
    Hi everyone! My Amplitude source is stuck in an infinite loop as described here, is anyone else experiencing that? Any workaround?
    m
    • 2
    • 14
  • j

    João Pedro Smielevski Gomes

    06/06/2022, 6:45 PM
    Hi everyone.I'm trying to ingest a custom model from Zoho CRM, but it is always raising the IncompleteMetaDataException. The weird thing is that some custom models are being loaded, but the one we need only loads 2 old test records and the table name comes with the "x" prefix, which I believe has something to do with incremental sync. We are using a supper admin account with all the permissions listed on the documentation. We've also tried to add the stream on the configured_catalog.json and the test_stream_factory.py (even if from what I understand it should build the schemas automatically) and rebuild the image, but it did not work. Am I missing some config or, since the connector is still on alpha, do I have to write new code to pull the custom modules?
    m
    • 2
    • 1
  • p

    Prashant Golash

    06/07/2022, 3:03 AM
    Hi, I would like to know if there is a programmatic way to fetch sync history for a particular connection (Context - I am planning to add some monitoring/notifications on periodic basis). If there are any other suggestions, please let me know them as well
    m
    • 2
    • 3
  • g

    gunu

    06/07/2022, 3:57 AM
    hey team, can we please sort out MySQL source CDC issues for large tables. Im not sure any airbyte user is able to successfully perform DB replication on large table using CDC. and the only real benefit of CDC is when migrating large datasets. is it on the roadmap anywhere? wasn’t able to see it here and it feels like a key feature to airbyte (at least when comparing against competitors). I keep stumbling upon this article Airbyte Commoditizes Database Replication by Open-Sourcing Log-Based Change Data Capture which ideally I’d like to champion to other users interested in implementing airbyte
    m
    g
    • 3
    • 4
  • k

    Kishore Sahoo

    06/07/2022, 6:04 AM
    Anyone used Airbyte to send data to API i.e. API as a destinations?
    m
    • 2
    • 6
  • a

    Abhiruchi Shinde

    06/08/2022, 3:09 AM
    hi Team I have created a connector In Airbyte and trying to pull data from zendesk to SQL but the incremental load is failing with normalization error can someone please help me with the error
    m
    • 2
    • 1
12345...12Latest