Charles VERLEYEN
05/24/2022, 4:57 PMEugene Krall
05/25/2022, 10:11 AMDamian Crisafulli
05/25/2022, 1:08 PM2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 Finished running 12 incremental models in 205.05s.
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 Completed with 1 error and 0 warnings:
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 Database Error in model freshdesk_tickets (models/generated/airbyte_incremental/airbyte/freshdesk_tickets.sql)
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 Invalid input
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 DETAIL:
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 -----------------------------------------------
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 error: Invalid input
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 code: 8001
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 context: CONCAT() result too long for type varchar(65535)
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 query: 10971779
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 location: string_ops.cpp:110
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 process: query2_114_10971779 [pid=31540]
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 -----------------------------------------------
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31
2022-05-25 12:06:31 ^[[42mnormalization^[[0m > 12:06:31 Done. PASS=11 WARN=0 ERROR=1 SKIP=0 TOTAL=12
Is there a way to configure normalization such that it truncates values that are too long?Ben Jordan
05/25/2022, 2:25 PMSimon Thelin
05/25/2022, 4:24 PMPostgres -> S3
.
It feels a bit off this functionality is not there since this is a quite natural thing you might want to do? I can’t find any setting for it currently.
CheersCoşkan Selçuk
05/25/2022, 5:55 PMYifan Sun
05/26/2022, 12:50 AMSergi Gómez
05/26/2022, 10:24 AMHawkar Mahmod
05/26/2022, 3:02 PMYifan Sun
05/26/2022, 10:19 PMPascal Cohen
05/27/2022, 12:29 PMdef read(
self, logger: AirbyteLogger, config: json, catalog: ConfiguredAirbyteCatalog, state: Dict[str, any]
) -> Generator[AirbyteMessage, None, None]:
And when I return the AirbyteMessage there are several places where I can return the state:
yield AirbyteMessage(
type=Type.RECORD,
record=AirbyteRecordMessage(stream=stream_name, data=data, emitted_at=int(datetime.now().timestamp()) * 1000,),
state=AirbyteStateMessage(data= XXX,
global_=YYY,
streams=[ZZZ])
)
I am not sure how to deal with that. Furthermore the documentation states that I should deal with state on my own but what is the point to pass a state in that case ?
I think I missed something
Any advice on best practice to persist and retrieve the state ?
In my test case, I simply want to use an incremental id to ask for all the ids after this one and store this as a cursor for next read
Thanks for any helpYudian
05/27/2022, 8:21 PM2022-05-23 03:24:13 [32mINFO[m i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):301 - Records read: 76853000 (223 GB)
2022-05-23 03:24:13 [32mINFO[m i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):301 - Records read: 76854000 (223 GB)
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.RestRequest execute
2022-05-23 03:24:14 [44msource[0m > SEVERE: Error response: HTTP Response code: 403, request: GET <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****> HTTP/1.1
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.DefaultResultStreamProvider getInputStream
2022-05-23 03:24:14 [44msource[0m > SEVERE: Error fetching chunk from: <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****>
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
2022-05-23 03:24:14 [44msource[0m > SEVERE: Response status line reason: Forbidden
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
2022-05-23 03:24:14 [44msource[0m > SEVERE: Response content: <?xml version="1.0" encoding="UTF-8"?>
2022-05-23 03:24:14 [44msource[0m > <Error><Code>AccessDenied</Code><Message>Request has expired</Message><Expires>2022-05-23T03:24:00Z</Expires><ServerTime>2022-05-23T03:24:15Z</ServerTime><RequestId>KVT0V6FQ3SBDN3VR</RequestId><HostId>5Oztd4n8a6vWsAIHnaNKLMkXfmyYdQS9zwGpS1ebyb1E8JWxqZT8FFCwJWltzEm6hHOUsHnvGMg=</HostId></Error>
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.RestRequest execute
2022-05-23 03:24:14 [44msource[0m > SEVERE: Error response: HTTP Response code: 403, request: GET <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****> HTTP/1.1
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.DefaultResultStreamProvider getInputStream
2022-05-23 03:24:14 [44msource[0m > SEVERE: Error fetching chunk from: <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****>
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
2022-05-23 03:24:14 [44msource[0m > SEVERE: Response status line reason: Forbidden
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
2022-05-23 03:24:14 [44msource[0m > SEVERE: Response content: <?xml version="1.0" encoding="UTF-8"?>
2022-05-23 03:24:14 [44msource[0m > <Error><Code>AccessDenied</Code><Message>Request has expired</Message><Expires>2022-05-23T03:24:00Z</Expires><ServerTime>2022-05-23T03:24:15Z</ServerTime><RequestId>KVT87B5B2XRVG33J</RequestId><HostId>K7nziICuSHtr4I40+W08RwiAcd2seylrpGlT5gs36PX0DX7tIhZDsFgWcV1MplB+xDtZ93fADns=</HostId></Error>
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.RestRequest execute
2022-05-23 03:24:14 [44msource[0m > SEVERE: Error response: HTTP Response code: 403, request: GET <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****> HTTP/1.1
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.DefaultResultStreamProvider getInputStream
2022-05-23 03:24:14 [44msource[0m > SEVERE: Error fetching chunk from: <https://sfc-ds1-customer-stage.s3.us-west-2.amazonaws.com/nwmj-s-HIDDEN/results/01a47188-0604-2855-0004-1504c0d4959b_0/main/data_0_6_54?x-amz-server-side-encryption-customer-algorithm=AES256&response-content-encoding=gzip&AWSAccessKeyId=****&Expires=1653276240&Signature=****>
2022-05-23 03:24:14 [44msource[0m > May 23, 2022 3:24:14 AM net.snowflake.client.jdbc.SnowflakeUtil logResponseDetails
2022-05-23 03:24:14 [44msource[0m > SEVERE: Response status line reason: Forbidden
If I reduced the source table size to be within 100GB, then there is no problem. Would like to get some feedbacks / suggestions based on this. Thank you!Siddharth Putuvely
05/28/2022, 9:31 AMselect max(updated_at) from "PROD_DB"."SOURCE_SCHEMA"."CYCLES";
-> 2022-05-27T05:56:21.565000
select max(updated_at) from "PROD_DB"."SOURCE_SCHEMA"."CYCLES_SCD";
->2022-05-27T05:56:21.565000;
select max(_AIRBYTE_DATA:updated_at) from "PROD_DB"."SOURCE_SCHEMA"."_AIRBYTE_RAW_CYCLES";
-> 2022-05-28T08:04:19.061000;
Can anybody explain what I am missing?
Airbyte version : 0.32.5Ben Nicole
05/29/2022, 6:03 PMMagnus Berg Sletfjerding
05/30/2022, 11:24 AMRamon Vermeulen
05/30/2022, 12:05 PM/data.xml
Giving back all records until now
• /updates.xml
Giving back all updates since a certain point in time
• /deletes.xml
Giving back all deletes since a certain point in time
What are the best practices setting up an incremental sync in Python knowing the concept of these 3 endpoints. With only updates it was easy, and I could use the incremental sync - deduped history concept I suppose after reading about the sync modes. But how can I implement this if I want to also manage incremental deletes? Or is the only way with this set-up to use a full refresh every time, and incremental isn't possible?
Or is the idea that I should add another field to the model in the data warehouse, for instance deleted true/false, and handle the "deletes" as an actual update to the records where deleted is set to true? The upside to this is that you still have those records in your data warehouse.
Does anyone know any connectors with similar behavior in airbyte (python), would be nice to take a look at an example implementation.Apostol Tegko
05/31/2022, 8:55 AMsettings -> sources
Looking at requests, it seems that this request is not returning any items:
<http://localhost:8000/api/v1/source_definitions/list_for_workspace>
Same for destination endpoint as well.
Can’t see any errors in the server logs either. Do you have any advice?Vytautas Bartkevičius
06/01/2022, 5:31 AMWARNING! Updating the schema will delete all the data for this connection in your destination and start syncing from scratch
. Why is that? Why the data is deleted from all streams, not only from new one, but also from currently existing. So after this I need to collect all data from from scratch? Or how I could prevent from this?Pranav Hegde
06/02/2022, 5:33 AM2022-06-02 05:27:08 [32mINFO[m i.a.v.j.JsonSchemaValidator(test):56 - JSON schema validation failed.
errors: $: null found, object expected
2022-06-02 05:27:08 [1;31mERROR[m i.a.w.p.a.DefaultAirbyteStreamFactory(lambda$create$1):70 - Validation failed: null
2022-06-02 05:27:08 [43mdestination[0m > 2022-06-02 05:27:08 [32mINFO[m i.a.i.d.b.u.AbstractBigQueryUploader(uploadData):99 - Final state message is accepted.
2022-06-02 05:27:08 [43mdestination[0m > 2022-06-02 05:27:08 [32mINFO[m i.a.i.d.b.u.AbstractBigQueryUploader(dropTmpTable):111 - Removing tmp tables...
2022-06-02 05:27:08 [43mdestination[0m > 2022-06-02 05:27:08 [32mINFO[m i.a.i.d.b.u.AbstractBigQueryUploader(dropTmpTable):113 - Finishing destination process...completed
2022-06-02 05:27:08 [43mdestination[0m > 2022-06-02 05:27:08 [32mINFO[m i.a.i.d.b.u.AbstractBigQueryUploader(close):85 - Closed connector: AbstractBigQueryUploader{table=_airbyte_raw_indodana_mixpanel_export, tmpTable=_airbyte_tmp_mbo_indodana_mixpanel_export, syncMode=WRITE_APPEND, writer=class io.airbyte.integrations.destination.bigquery.writer.BigQueryTableWriter, recordFormatter=class io.airbyte.integrations.destination.bigquery.formatter.DefaultBigQueryRecordFormatter}
2022-06-02 05:27:08 [43mdestination[0m > 2022-06-02 05:27:08 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):171 - Completed integration: io.airbyte.integrations.destination.bigquery.BigQueryDestination
2022-06-02 05:27:08 [1;31mERROR[m i.a.w.DefaultReplicationWorker(run):141 - Sync worker failed.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: Source process exited with non-zero exit code 137
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[?:?]
at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:134) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:49) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:174) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Suppressed: io.airbyte.workers.WorkerException: Source process exit with code 137. This warning is normal if the job was cancelled.
at io.airbyte.workers.protocols.airbyte.DefaultAirbyteSource.close(DefaultAirbyteSource.java:136) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:118) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:49) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:174) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Source process exited with non-zero exit code 137
at io.airbyte.workers.DefaultReplicationWorker.lambda$getReplicationRunnable$2(DefaultReplicationWorker.java:230) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
... 1 more
Caused by: java.lang.RuntimeException: Source process exited with non-zero exit code 137
at io.airbyte.workers.DefaultReplicationWorker.lambda$getReplicationRunnable$2(DefaultReplicationWorker.java:222) ~[io.airbyte-airbyte-workers-0.35.2-alpha.jar:?]
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
... 1 more
Would appreciate any help regarding this issue. We are using the latest version of the mixpanel and bigquery connectorsraphaelauv
06/02/2022, 6:51 PMni
06/02/2022, 7:37 PMHKR
06/03/2022, 11:13 AMAdam Bloom
06/03/2022, 4:32 PMijac wei
06/06/2022, 2:53 AMBastien Gandouet
06/06/2022, 3:40 PMJoão Pedro Smielevski Gomes
06/06/2022, 6:45 PMPrashant Golash
06/07/2022, 3:03 AMgunu
06/07/2022, 3:57 AMKishore Sahoo
06/07/2022, 6:04 AMAbhiruchi Shinde
06/08/2022, 3:09 AM