Hi team! I would like some more context on the `pr...
# ingestion
g
Hi team! I would like some more context on the
profiling.limit
configuration for SQL profiling. • Scenario 1: Without this config parameter, the profiling runs successfully. • Scenario 2: However, upon introducing this to say 20 rows, I run into Operational Error:
Copy code
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1044, "Access denied for user '<username>'@'%' to database '<database_name>'")
[SQL: CREATE TEMPORARY TABLE ge_temp_<temp_table> AS SELECT * 
FROM <table_name> 
 LIMIT 20]
(Background on this error at: <http://sqlalche.me/e/13/e3q8>)
My question is more in terms of how this parameter is implemented. Given that in both the scenarios above it runs a
SELECT
query, why does
LIMIT
result in access denied error but without
LIMIT
, there's no error?
d
For limit queries Great Expectation (the profiler which we use under the hood) creates temporary tables and you might not have permission to create tables. Is it possible you don’t have permission to create tables?
I think by default it tries to create tables where your original table is but you set a specific schema with the
profiling.bigquery_temp_table_schema
option and then all these temp tables should be created in the schema you specify here -> https://datahubproject.io/docs/metadata-ingestion/source_docs/sql_profiles/
g
Thanks for a quick response. The MySql tables I am trying to profile are a READ ONLY account. Can the
profiling.bigquery_temp_table_schema
option be used for profiling tables in MySql though?
d
Ahh, sorry, I thought this is bigquery table somehow.
This option only make sense on bigquery
But it is true as well for MySQL if limit or offset parameter is set then it tries to create temporary table.
g
I see. And the only way to be able to do it is have write permissions to create table on MySql instance?
d
yes, currently profiler creates a temporary table in case you use limit or offset 😞
Copy code
REATE TEMPORARY TABLE ge_temp_<temp_table> AS SELECT * 
FROM <table_name> 
 LIMIT 20]
s
@dazzling-judge-80093 I think creating temp tables is optional in GE.
create_temp_table
option is mentioned here https://github.com/great-expectations/great_expectations/blob/develop/docs_rtd/gui[…]ow_to_load_a_database_table_view_or_query_result_as_a_batch.rst Not sure if there are any limitations on that or not in GE that needs to be explored. Creating temp tables will not be possible when people are working with read replicas of databases.
d
Ohh, I did not know about this. We should expose this parameter.
g
Let me see if I got this correct, the
limit
option uses the
CREATE TEMPORARY TABLE
query that ends up needing write permissions, and writing onto disk. However, without enabling the
limit
option, and having only
profiling.enabled
set to
True
, does not create/write temp tables? And this brings me to the next question. With one of the large MySQL tables profiling results in an error:
pymysql.err.InternalError: (3, "Error writing file '/var/lib/mysql/MYLb1KTy' (Errcode: 28 - No space left on device)")
and it is a Read-Only account. Is it not creating files here?
m
@dazzling-judge-80093 Hello, please let me know if this problem has been solved? because i also have only read-only account and faced with same issue.
d
Bigquery source doesn’t create temporary tables anymore
m
how about MySql? :))
maybe there is some extra parameter to avoid creating the table?
@dazzling-judge-80093 oh, as i understand you did it only with Bigquery😥