We have a Datalake(Cloudera) built on HDFS and Hive (ORC format). Users perform Hive SQL queries for performing data quality KPI with lot of data blending queries across schemas (Ex: Sales, Billing, Finance.. ) , data cleansing and also build dashboards on Tableau. Some of the tables contains 800+ columns with many null values fields and can contain few million records