Hi all I have a google ads gt Redshift connection with 7 Str Airbyte #ask-community-for-troubleshooting

Hi all, I have a google ads -> Redshift connect...

laila ribke

11/02/2022, 1:44 PM

Hi all, I have a google ads -> Redshift connection with 7 Streams. I´ve set an Incremental sync mode (deduped + history), with a sync every 24 hours. I see the Redshift unblended cost is 450€ per day!!!!, which is impossible. Can you set a meeting with me to see what is the best practice on working with Redshift destination? Because as you can see in the logs below, it doesn´t load the data in one batch but runs every 5 seconds..

Harshith (Airbyte)

11/02/2022, 2:22 PM

Hey sure before that could you share the sync log file here? Also happy to help you out

laila ribke

11/02/2022, 3:58 PM

I cancelled it when we realized it pushes every 5 seconds

9d9aea6f_8f2e_4ef1_9c3e_0bbf57935780_logs_236_txt.txt

laila ribke

11/02/2022, 3:59 PM

This is what the engineer sent me There is a query that run every 5 second(!!!!!!) that is the main cause of the mess we see: INSERT INTO indiana._airbyte_tmp_zyc_indiana_clickout ( _airbyte_ab_id, _airbyte_data, _airbyte_emitted_at ) VALUES ($ 1, JSON_PARSE($ 2), $ 3), ($ 4, JSON_PARSE($ 5), $ 6), ($ 7, JSON_PARSE($ 8), $ 9), ($ 10, JSON_PARSE($ 11), $ 12), ($ 13, JSON_PARSE($ 14), $ 15), ($ 16, JSON_PARSE($ 17), $ 18), ($ 19, JSON_PARSE($ 20), $ 21), ($ 22, JSON_PARSE($ 23), $ 24), ($ 25, JSON_PARSE($ 26), $ 27), ($ 28, JSON_PARSE($ 29), $ 30), ($ 31, JSON_PARSE($ 32), $ 33), ($ 34, JSON_PARSE($ 35), $ 36), ($ 37, JSON_PARSE($ 38), $ 39), ($ 40, JSON_PARSE($ 41), $ 42), ($ 43, JSON_PARSE($ 44), $ 45), ($ 46, JSON_PARSE($ 47), $ 48), ($ 49, JSON_PARSE($ 50), $ 51), ($ 52, JSON_PARSE($ 53), $ 54), ($ 55, JSON_PARSE($ 56), $ 57), ($ 58, JSON_PARSE($ 59), $ 60), ($ 61, JSON_PARSE($ 62), $ 63), ($ 64, JSON_PARSE($ 65), $ 66), ($ 67, JSON_PARSE($ 68), $ 69), ($ 70, JSON_PARSE($ 71), $ 72), ($ 73, JSON_PARSE($ 74), $ 75), ($ 76, JSON_PARSE($ 77), $ 78), ($ 79, JSON_PARSE($ 80), $ 81), ($ 82, JSON_PARSE($ 83), $ 84), ($ 85, JSON_PARSE($ 86), $ 87), ($ 88, JSON_PARSE($ 89), $ 90), ($ 91, JSON_PARSE($ 92), $ 93), ($ 94, JSON_PARSE($ 95), $ 96), ($ 97, JSON_PARSE($ 98), $ 99), ($ 100, JSON_PARSE($ 101), $ 102), ($ 103, JSON_PARSE($ 104), $ 105), ($ 106, JSON_PARSE($ 107), $ 108), ($ 109, JSON_PARSE($ 110), $ 111), ($ 112, JSON_PARSE($ 113), $ 114), ($ 115, JSON_PARSE($ 116), $ 117), ($ 118, JSON_PARSE($ 119), $ 120), ($ 121, JSON_PARSE($ 122), $ 123), ($ 124, JSON_PARSE($ 125), $ 126), ($ 127, JSON_PARSE($ 128), $ 129), ($ 130, JSON_PARSE($ 131), $ 132), ($ 133, JSON_PARSE($ 134), $ 135), ($ 136, JSON_PARSE($ 137), $ 138), ($ 139, JSON_PARSE($ 140), $ 141), ($ 142, JSON_PARSE($ 143), $ 144), ($ 145, JSON_PARSE($ 146), $ 147), ($ 148, JSON_PARSE($ 149), $ 150), ($ 151, JSON_PARSE($ 152), $ 153), ($ 154, JSON_PARSE($ 155), $ 156), ($ 157, JSON_PARSE($ 158), $ 159), ($ 160, JSON_PARSE($ 161), $ 162), ($ 163, JSON_PARSE($ 164), $ 165), ($ 166, JSON_PARSE($ 167), $ 168), ($ 169, JSON_PARSE($ 170), $ 171), ($ 172, JSON_PARSE($ 173), $ 174), ($ 175, JSON_PARSE($ 176), $ 177), ($ 178, JSON_PARSE($ 179), $ 180), ($ 181, JSON_PARSE($ 182), $ 183), ($ 184, JSON_PARSE($ 185), $ 186), ($ 187, JSON_PARSE($ 188), $ 189), ($ 190, JSON_PARSE($ 191), $ 192), ($ 193, JSON_PARSE($ 194), $ 195), ($ 196, JSON_PARSE($ 197), $ 198), ($ 199, JSON_PARSE($ 200), $ 201), ($ 202, JSON_PARSE($ 203), $ 204), ($ 205, JSON_PARSE($ 206), $ 207), ($ 208, JSON_PARSE($ 209), $ 210), ($ 211, JSON_PARSE($ 212), $ 213), ($ 214, JSON_PARSE($ 215), $ 216), ($ 217, JSON_PARSE($ 218), $ 219), ($ 220, JSON_PARSE($ 221), $ 222), ($ 223, JSON_PARSE($ 224), $ 225), ($ 226, JSON_PARSE($ 227), $ 228), ($ 229, JSON_PARSE($ 230), $ 231), ($ 232, JSON_PARSE($ 233), $ 234), ($ 235, JSON_PARSE($ 236), $ 237), ($ 238, JSON_PARSE($ 239), $ 240), ($ 241, JSON_PARSE($ 242), $ 243), ($ 244, JSON_PARSE($ 245), $ 246), ($ 247, JSON_PARSE($ 248), $ 249), ($ 250, JSON_PARSE($ 251), $ 252), ($ 253, JSON_PARSE($ 254), $ 255), ($ 256, JSON_PARSE($ 257), $ 258), ($ 259, JSON_PARSE($ 260), $ 261), ($ 262, JSON_PARSE($ 263), $ 264), ($ 265, JSON_PARSE($ 266), $ 267), ($ 268, JSON_PARSE($ 269), $ 270), ($ 271, JSON_PARSE($ 272), $ 273), ($ 274, JSON_PARSE($ 275), $ 276), ($ 277, JSON_PARSE($ 278), $ 279), ($ 280, JSON_PARSE($ 281), $ 282), ($ 283, JSON_PARSE($ 284), $ 285), ($ 286, JSON_PARSE($ 287), $ 288), ($ 289, JSON_PARSE($ 290), $ 291), ($ 292, JSON_PARSE($ 293), $ 294), ($ 295, JSON_PARSE($ 296), $ 297), ($ 298, JSON_PARSE($ 299), $ 300), ($ 301, JSON_PARSE($ 302), $ 303), ($ 304, JSON_PARSE($ 305), $ 306), ($ 307, JSON_PARSE($ 308), $ 309), ($ 310, JSON_PARSE($ 311), $ 312), ($ 313, JSON_PARSE($ 314), $ 315), ($ 316, JSON_PARSE($ 317), $ 318), ($ 319, JSON_PARSE($ 320), $ 321), ($ 322, JSON_PARSE($ 323), $ 324), ($ 325, JSON_PARSE($ 326), $ 327), ($ 328, JSON_PARSE($ 329), $ 330), ($ 331, JSON_PARSE($ 332), $ 333), ($ 334, JSON_PARSE($ 335), $ 336), ($ 337, JSON_PARSE($ 338), $ 339), ($ 340, JSON_PARSE($ 341), $ 342), ($ 343, JSON_PARSE($ 344), $ 345), ($ 346, JSON_PARSE($ 347), $ 348), ($ 349, JSON_PARSE($ 350), $ 351), ($ 352, JSON_PARSE($ 353), $ 354), ($ 355, JSON_PARSE($ 356), $ 357), ($ 358, JSON_PARSE($ 359), $ 360), ($ 361, JSON_PARSE($ 362), $ 363), ($ 364, JSON_PARSE($ 365), $ 366), ($ 367, JSON_PARSE($ 368), $ 369), ($ 370, JSON_PARSE($ 371), $ 372), ( $ 373, JSON_PARSE($ 374), $

laila ribke

11/02/2022, 4:00 PM

and this is the Redshift billing

laila ribke

11/03/2022, 9:01 AM

Good Morning!

Harshith (Airbyte)

11/03/2022, 10:58 AM

Oh my god. I hope we could have got this information before 😅. Sending this thread across to the team to see if someone can takeover and help you in more details

laila ribke

11/03/2022, 12:33 PM

Can we have a meeting? I'm desperate about this. And we need to give a good solution to our client..

Edward Gao (Airbyte)

11/03/2022, 12:37 PM

It looks like this connection is using standard inserts - switching the upload method to S3 staging should reduce this cost by a lot. (our docs have some info about the difference - the COPY strategy is recommended for production workloads)

laila ribke

11/04/2022, 8:48 AM

Thanks, I'll try it. But do you know if I should send the data without the normalization, and do the normalization and transformation in my destination Redshift?

Edward Gao (Airbyte)

11/04/2022, 2:26 PM

are you referring to the transformation setting? (i.e. raw data vs normalized) that’s more a question of what you need the data to look like - raw data means you’ll just get the

*_raw

tables, which just has your data in JSON blobs; normalized data will expand that out into normalized tables (and also handles deduping, if you’re using the incremental dedup+history sync mode)

laila ribke

11/04/2022, 2:56 PM

Hi, I would prefer normalized data, with incremental + deduped history, but it consumes A LOT of money on my destination (redshift).

laila ribke

11/04/2022, 2:59 PM

But as I red in the forum, someone mentioned that he thinks the normalization is the one consuming redshift resource. Therefore I´m thinking to take the normalization out airbyte. Ideally I would prefer to do everything through airbyte, but we received a predicted monthly cost from redshift of 23000€. Only for loading few tables..

Edward Gao (Airbyte)

11/04/2022, 3:17 PM

got it - normalization does consume some redshift credits, but (once you’re past the first two syncs) it should run incrementally, i.e. only processing recent data. Have you had a chance to run a sync using s3 staging + normalized tables? Certainly if your client is cost-sensitive then it’s potentially worth disabling normalization though.

laila ribke

11/07/2022, 9:40 AM

Hi.. I will try today. I intend to create a data lake on S3, therefore, store there the raw data (haven´t found it in the documentation if it´s possible to set incremental + history deduped in the connection.. If it´s not possible, I don´t know where to transform the data.. I wanted with glue, but the client insists on DBT

Edward Gao (Airbyte)

11/07/2022, 4:17 PM

hm, are you planning to directly use the destination-s3 connector for this? That connector doesn’t support normalization (since it’s just writing raw blobs to s3)

laila ribke

11/07/2022, 4:48 PM

I just realized that.. I can´t find a good solution. I just want to have few sources via airbyte, transform the data and sent it to redshift destination.. but as dbt custom transform and normalization are done at destination.. not sure what to do

Edward Gao (Airbyte)

11/07/2022, 5:21 PM

mmm. I would recommend giving destination-redshift via s3 staging a try - under this loading method, it’ll do pretty much what you want (i.e. push data to S3 and the

COPY

it into redshift). And then you’d be able to choose basic normalization or a custom dbt transformation

laila ribke

11/07/2022, 5:26 PM

I tried it. It´s really good, but it sends 4 tables to redshift (of every table I sync. the scd, the stg, raw , and normalized one). So if a table weights 30G, I´ll transfer it 4 times to redshift. I´m trying the 2 connections option: google ads -> S3 which works correctly. I see data in avro format in bucket. Which Output format should I use when setting the S3 destination? and S3 -> Redshift. Sync works, but not data transferes/ normalized. Any ideas why?

Edward Gao (Airbyte)

11/07/2022, 5:34 PM

did your s3 -> redshift sync have normalization enabled? (it sounds like you got the

_raw

tables, but not the normalized tables? Or did you mean that no data got transferred at all?) generally this setup feels a little redundant - the s3 -> redshift sync will do a second copy into s3, and then do a COPY into redshift. It would be more efficient to do a single GAds -> redshift sync directly, unless I’m misunderstanding the setup. Creating 4 tables is (currently) baked into how normalization works - if that’s problematic cost-wise, then you’d probably need to roll your own custom transformation unfortunately, which would pull data from the

_raw

table and generate a normalized table. (but certainly this is useful feedback, I’ll pass it on to the rest of the team)

laila ribke

11/07/2022, 6:16 PM

The purpose of the second connection, the S3-> redshift is for me to be able to select only data I really need, transform it and write it on redshift. The connection succeeded but I´ve got empty tables (raw, scd and normalized). I´m confident the bucket contains data. here is the log https://airbyte.apps.7lds.de/workspaces/9d9aea6f-8f2e-4ef1-9c3e-0bbf57935780/connections/87d5fd47-3c9d-4062-9d17-5acd5b569aec/status#272::0

laila ribke

11/07/2022, 6:17 PM

I must say that I´m thankful for your time! Gracias totales!

Edward Gao (Airbyte)

11/07/2022, 6:41 PM

select only data I really need

interesting! is this at a greater granularity than what the google ads source provides? I.e. you’re looking for more control than just “sync streams X, Y, Z”?

log

there’s an auth layer in front of this (which is good!) but means I can’t access it 🙂 can you download the logs and upload them here?

laila ribke

11/07/2022, 7:00 PM

Here is the log. I have multiple sources.. so I´m trying to optimize..

9d9aea6f_8f2e_4ef1_9c3e_0bbf57935780_logs_276_txt.txt

Edward Gao (Airbyte)

11/07/2022, 8:33 PM

hm. you mentioned that the S3 bucket contains avro data - it looks like source-s3 is trying to read csv? (at

2022-11-07 18:47:08

'filetype': 'csv'

) I’m not super familiar with sourec-s3, but IIRC there’s a config option to have it read avro instead and of course - it’s always useful to doublecheck that the files on S3 match whatever path is configured on sourec-s3, just in case

laila ribke

11/07/2022, 9:22 PM

no.. I switched to csv, and both are in csv.. buff. what a day 😅

Edward Gao (Airbyte)

11/07/2022, 10:55 PM

😕 in the replication tab on your connection, does it at least detect the schema correctly?

laila ribke

11/08/2022, 8:27 AM

Hi, good morning! Yes, it detects the schema

laila ribke

11/08/2022, 11:31 AM

Another question.. Let´s say I have 4 tables in google ads source, and redshift destination (with S3 Staging upload method). I set the connection (incremental + history deduped). I also have basic normalization and I set in the connection a dbt transformation that creates a table with fields from the 4 source tables. In redshift I only need the table I created on dbt, which might be a table of 20 rows. Are the other 4 tables that I synced from source going to be written in Redshift destination?

Edward Gao (Airbyte)

11/08/2022, 5:17 PM

yeah, they’ll need to get written to redshift before the dbt transformation can run

Edward Gao (Airbyte)

11/08/2022, 5:23 PM

lemme reach out to the team about source-s3 - I’m not really sure what’s happening there

Edward Gao (Airbyte)

11/09/2022, 4:00 PM

the API team (which owns source-s3) suggested opening a github issue with some more details: • screenshots of your configuration (especially the S3 source config) • some sample files (if possible, or at least with sensitive info removed) also, can you expand the

gaS3

stream (click the little arrow on the left) to confirm that it has the expected fields?

Open in Slack

Previous Next