Hi everyone I m trying to debug a `redshift usage` ingestion DataHub #troubleshoot

Hi everyone! I'm trying to debug a `redshift_usage...

incalculable-branch-51967

08/19/2022, 7:15 PM

Hi everyone! I'm trying to debug a

redshift_usage

ingestion, but I'm getting hardly any logs. I tried both python sdk in airflow and cli. In airflow I only see this log, both in successful and failed ingestions:

Copy code

INFO - Setting gms config

When I run the cli I also see this:

Copy code

INFO     {datahub.cli.ingest_cli:91} - Starting metadata ingestion

The problem I'm facing is that the process ends because it consumes all the available ram, so I'd like to find which tables are the ones that cause these memory requirements. Is it possible that this callback isn't working as expected?

helpful-optician-78938

08/23/2022, 9:04 PM

Hi @incalculable-branch-51967, It is hard to tell without any stack trace. Could you try restricting it to a single table initially using the

table_pattern

config param and setting

start_time

and

end_time

to a small window to see if the recipe needs some tuning?

incalculable-branch-51967

08/24/2022, 7:31 PM

Hi @lemon-engine-23512, I investigated a bit further. Looks like I'm not getting any logs because the process runs out of memory while getting workunits, so it never gets to the point in which the callback shoud be triggered, I tried with a bigger machine and it looks like the process takes roughly 12gb ram and it's been running for 4 hours now (our Redshift instance is pretty big). Also, note that I'm using an outdated version (v0.8.31). Is memory allocation improved in newer versions? Do you recommend a certain strategy to make better use of resources (such as partitioning the ingestion somehow or maybe using different time deltas)?

helpful-optician-78938

08/24/2022, 7:37 PM

Hi @incalculable-branch-51967, partitioning the ingestion recipe by datasets(via patterns) should definitely help. We will definitely look into the memory consumption at some point.

incalculable-branch-51967

08/24/2022, 7:41 PM

great, thanks!

3 Views

Open in Slack

Previous Next