Regarding profiling, what would be a good way to o...
# ingestion
s
Regarding profiling, what would be a good way to override behavior for getting count of rows for tables. The current
count(*)
that is fired takes time and is not practical for large tables. For large tables it might be much better to use metadata tables when possible.
e.g. In case of mysql we can run
Copy code
SELECT 
  table_name, TABLE_ROWS 
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_SCHEMA = 'SCHMEA_NAME'
order by TABLE_ROWS desc;
and this is going to be much faster compared to what is being done right now
For large tables having profiling just send the correct number of rows instead of doing profiling for all tables might be a good thing
Size of the tables is another good thing to log in the stats. Some databases support that. e.g. for mysql we may use
Copy code
SELECT 
     table_schema as `Database`, 
     table_name AS `Table`, 
     round(((data_length + index_length) / 1024 / 1024 /1024), 0) `Size in GB` 
FROM information_schema.TABLES 
ORDER BY (data_length + index_length) DESC;
m
These are great suggestions @square-activity-64562
Let me think about how to make these things pluggable so we can build these shortcuts
meanwhile if you have more ideas keep them coming!
s
Sure @mammoth-bear-12532 if it is implemented for one database (mysql or postgres or whatever) so that API and UI is defined I can try and send PRs for the others that we are using