Hi Team I am trying to enable the data profiling o...
# troubleshoot
n
Hi Team I am trying to enable the data profiling option in datahub. I am using the following config but the stats option is not enabled after running the datahub ingest command Hiding the port db and username and password in the config. source: type: oracle config: host_port: '' database: username: password: profiling: enabled: true include_field_null_count: true include_field_min_value: true include_field_max_value: true include_field_mean_value: true include_field_median_value: true include_field_stddev_value: true include_field_quantiles: false # include_field_distinct_value_frequencies: include_field_histogram: true include_field_sample_values: false # query_combiner_enabled: true # max_number_of_fields_to_profile: # profile_table_level_only: # limit: # offset: sink: type: datahub-rest config: server: 'http://datahub-datahub-gms.test.svc.cluster.local:8080' token: null
@careful-pilot-86309 Can you please look into it?
@little-megabyte-1074 @dazzling-judge-80093 I followed your tutorial but still it is not enabled
l
Hi @numerous-account-62719 thanks for the heads up - I’m trying to find someone who can help
n
Please resolve it on priority @little-megabyte-1074 @dazzling-judge-80093
d
Do you see any errors in the logs? Can you share your logs? I wonder why the profiling did not start.
n
I did not see any error in the logs
d
what kind of data types do you have in your tables? Profiling starts, but it ignores all??/most of the columns because of :
Copy code
[2022-08-05 07:28:37,303] WARNING  {great_expectations.dataset.sqlalchemy_dataset:1813} - No recognized sqlalchemy types in type_list for current dialect.
n
I have the standard data types which are supported in oracle I am not sure why it is ignoring the the columns
Even if we consider the tables that have no datatype issue. They also do not have the stats enabled. [2022-08-05 072828,568] INFO {datahub.ingestion.source.ge_data_profiler:817} - Profiling science.jc_rep_evento_html [2022-08-05 072828,687] INFO {datahub.ingestion.source.ge_data_profiler:817} - Profiling science.jc_rep_grid_style [2022-08-05 072830,105] INFO {datahub.ingestion.source.ge_data_profiler:817} - Profiling science.evento_email_uf [2022-08-05 072830,961] INFO {datahub.ingestion.source.ge_data_profiler:817} - Profiling science.contrato_software_hist [2022-08-05 072831,346] INFO {datahub.ingestion.source.ge_data_profiler:817} - Profiling science.jc_rep_evento_comp [2022-08-05 072831,425] INFO {datahub.ingestion.source.ge_data_profiler:817} - Profiling science.jc_rep_grid_header_footer
d
I think stats are not enabled because it ignores all the columns, and in the end, it doesn’t do profiling. Stats is enabled if there is some profiling data.
n
I have tried with other sources as well But the profiling is not working
d
this is a different issue as here it fails with oom ->
Copy code
org.apache.hadoop.ipc.RemoteException(java.lang.OutOfMemoryError): unable to create new native thread
Please, can you try to set this property to False?
profiling.query_combiner_enabled
it might help with the oom
n
I have set it to False but still the error is same
I am not able to enable profiling in oracle Can you please help me out there?