Hey folks, i'm trying to ingest some data from my ...
# troubleshoot
i
Hey folks, i'm trying to ingest some data from my redshift datasource but it's running for like 5 hours (and we have about 24 tables). Seems like that the ingestion finished but the status is wrong in the ingestion section. What can i do to help you to debug this?
Can it be the kafka configuration? Or there's no relation? Because i was looking in the kafka pods log and theres a warning:
i
Did you enable profiling on this ingestion?
i
Yes
I'll try without profiling and see if it's work, probably yes because the amount of data in our tables
One of them have like 64gb
b
Yeah its most likely that the container that is running ingestion is having trouble keeping up if you have profiling enabled - one way to combat this is to increase the resources assigned to the
datahub-actions
container!
thank you 1
i
John, i've tryied to remove the profiling but it's still taking a long time (more than 24h because i forgot to turn it of 🤦‍♂️ )
I think that i found the motive of that. The point is the redshift lineage. Basically it is taking too long to check every archive that generates the table using the copy statement because it's more than 100+ archives
b
I see - cc @dazzling-judge-80093 due to the lineage scaling issue. If you disable lineage extraction that should also help increase the latency
We would recommend generally having different sources with different schedules for these things due to the difference in execution cost
i
Cool @big-carpet-38439 ! I've disabled it for now. Thanks for helping