Andrew Groh
01/07/2022, 4:39 PMPablo Tovar
01/07/2022, 10:07 PMflow
01/08/2022, 6:23 PMJeremy Owens
01/10/2022, 3:38 PMMaxime edfeed
01/10/2022, 7:23 PMJove Zhong
01/12/2022, 12:24 AMJonas Bolin
01/12/2022, 8:18 AMIhor Holoviy
01/12/2022, 10:28 AMRajesh Koilpillai
01/12/2022, 10:35 AMJason Edwards
01/12/2022, 6:47 PM_stg
tables. After that the _scd
tables are created. Then the final tables are created. This means each row is processed at least 4 times (I’m not including indexing those tables). In my case, working with about 70 million records, 280 million records are getting processed. Incremental syncs should be better.
But this problem is compounded by Postgres. Quickly, for anyone who is unfamiliar with Postgres, when you do an update, Postgres actually writes a new record to disk and marks the old one as dead. Later a process (Autovacuum) scans through the table looking for dead records, catalogs them, then removes them. Deleting a record simply marks it as dead. The way DBT creates/updates/deletes records creates a lot of dead records. Like a significant portion of those 280 million records will need to be vacuumed. Vacuuming, of course takes resources, and the database instance grinds to a crawl under the load DBT transforms and vacuuming. So far the 50 million record table has never completely synced, even letting it churn for, literally, days.
Would a database other than Postgres be a better choice for a warehouse? Probably, but for now it’s what I have to work with. Could tuning Postgres/autovacuum improve performance? Possibly, but that’s a bit beyond my Postgres skill/knowledge/experience. I’ve also wondered a different sync mode would work better.
Sorry, that turned into a bit of a rant. But hopefully it gives you a sense of some of the pain points of, at least, an initial sync of a significant dataset. I don’t know if there’s any possibility in the future to cut down on the amount of processing that happens in the destination database.Mijbel Alqattan
01/12/2022, 8:56 PMTitas Skrebė
01/13/2022, 6:03 AMAngie Marable
01/13/2022, 7:45 PMOmar Ghalawinji
01/14/2022, 9:59 AMJoseph Reis
01/15/2022, 12:26 AMSurya Prakash
01/15/2022, 11:27 PMJoël Luijmes
01/17/2022, 11:10 AMRonny Ritongadi
01/17/2022, 2:55 PM2022-01-17 14:52:33 ERROR () LineGobbler(voidCall):85 - Exception in thread "main" java.sql.SQLSyntaxErrorException: SELECT command denied to user ''@'%' for column 'organizationId' in table 'IDX_PUTTOLIGHT'
(more complete log attached)
I am very sure the DB user is having sufficient permission. Is it possible that Airbyte have stuck in one particular log causing every connection to error?
Thank you,
RonnyBlake Enyart
01/17/2022, 9:07 PMElias Djurfeldt
01/18/2022, 5:32 PMHitesh Khandelwal
01/19/2022, 4:54 AMRonny Ritongadi
01/19/2022, 5:19 AMJean-François Paccini
01/19/2022, 9:56 AMNamer Medina
01/19/2022, 12:19 PMLukas Novotny
01/21/2022, 11:24 AMactor
table. Are we missing some trivial config that switches to a different secret store? Can we load secrets from environment variables?Jens
01/21/2022, 12:58 PMAnouar Hnini
01/21/2022, 1:46 PMJens
01/24/2022, 8:39 AMYashasvi Chaudhary
01/24/2022, 1:57 PMLihan
01/24/2022, 9:21 PM