Hi teams! I try to get top_n_queries with source t...
# troubleshoot
m
Hi teams! I try to get top_n_queries with source type snowflake-usage. But this query “select min(query_start_time) as min_time, max(query_start_time) as max_time from snowflake.account_usage.access_history” spends too much time. [2022-06-12 183107,518] INFO {datahub.ingestion.source.usage.snowflake_usage:438} - Checking usage date ranges Here is my recipe: top_n_queries: 5 start_time: ‘2022-06-9T00:00Z’ end_time: ‘2022-06-12T00:00Z’ database_pattern: allow:
It seems that the query for snowflake access_history is time consuming. How can I speed up this process?
d
What do you mean in too much time? Can you try to ingest only 1 day?
m
No matter what’s the start_time and end_time, I found there is always a query running in Snowflake, which is “select min(query_start_time) as min_time, max(query_start_time) as max_time from snowflake.account_usage.access_history”. This query is time consuming.
d
Please, can you quantify it a bit? Is it hours or minutes?
m
At least 20 mins I don’t know how long it will take. If I use source.type : snowflake to ingest the same schema, the total time is about 10 mins.
The reason could be that the SNOWFLAKE.ACCESS_HISTORY table is so huge. Even I click preview data on snowflake, it will take a long time (at least 20 mins)
d
Wow, that is huge