Hey gang! I’ve been working on an initial import f...
# ask-community-for-troubleshooting
j
Hey gang! I’ve been working on an initial import for 2 weeks and haven’t been able to get it to complete. This process dies at the very end of the DBT normalization step while the Postgres target database is doing hours of vacuuming. It eventually results in a timeout. Here’s a screenshot of the output. The table it’s attempting to normalize is 68GB at the source. Any guidance or experience here? Thanks in advance!
✍️ 1
u
@Marcos Marx (Airbyte) turned this message into Zendesk ticket 2433 to ensure timely resolution!
u
Do you have any special connection to your Postgres destination (ssh, ssl or other)? If you sync only one table small the sync works?
u
Do you have any special connection to your Postgres destination (ssh, ssl or other)? If you sync only one table small the sync works?
j
Hey there. Sorry, I was offline for a few days No, the connection is neither SSH tunneled nor SSL. Pretty vanilla. Everything’s on AWS. It’s just a couple big tables that seem to be the problem child.
u
My recommendation is to see your database resources memory/cpu. Maybe the dbt is consuming a lot of the database and lock it. Also; just to validate could you try only one small table and sync it? This way you can guarantee that Airbyte can connect to your database to send data (ingestion) and during transformation (using dbt)
j
Small tables definitely complete. The CPU & Memory don’t seem to reach maximums although they are high.
We’re on a rather large EC2 box.
u
What is the Postgres instance size (memory/cpu/disk?) Maybe take a look in this article: https://www.dbrnd.com/2017/07/postgresql-error-canceling-statement-due-to-statement-timeout/ what is strange is postgres taking more than 4 hours to finish to run dbt. My recommendation is to increase resources in the database for the first load.
u
What is the Postgres instance size (memory/cpu/disk?) Maybe take a look in this article: https://www.dbrnd.com/2017/07/postgresql-error-canceling-statement-due-to-statement-timeout/ what is strange is postgres taking more than 4 hours to finish to run dbt. My recommendation is to increase resources in the database for the first load.
j
We’re on db.r6g.2xlarge for the data warehouse.
I do note that the target table is 4GB bigger than the memory on this instance size.
m
Jonathan would be possible to increase it? for test. Even way I’ll raise an issue to investigate the high consumption of memory
j
Hey Marcos. I doubled the RAM to a db.r6g.4xlarge database and let it run for 10 hours. It failed with the same error. As a data point, the Airbyte instance is on an m6a.2xlarge EC2 instance. Any further thoughts? I suppose I could double the database again but that seems excessive.
m
@Edward Gao (Airbyte) do you mind giving your thoughts here? Normalization is blowing the database and reaching timeout (+3h running). Any advice?
e
whoa, that’s a new one to me. Not the greatest idea in the world, but iirc postgres has a config to for the query timeout - maybe increasing that by a lot would help here? also tagging @Chris Duong [Airbyte] - it sounds like normalization into dest-postgres is failing because the query times out, do you know if there’s an easier workaround for this?
c
Sorry, I can’t help either here
j
The crazy part is that I think it’s autovacuum statements that are timing out.
m
Can you check other database config params? This is the first time I’m seeing this hapenning to postgres+normalization
j
Disabling the statement_timeout parameter fixed it. Failed syncs were taking 8+ hours. With the setting disabled, the initial sync completed in just 3 hours.