dazzling-insurance-83303
07/26/2022, 3:49 AMallow_deny_pattern
signify?
profiling:
enabled: true
allow_deny_patterns:
allow:
- .*
deny:
-
ignoreCase: True
alphabet: '[A-Za-z0-9 .-]'
Is that filtering for data within the columns? If so, are there any examples to refer to?
I am interested in knowing if those can be regexes to do Luhn algorithm checks.dazzling-insurance-83303
07/26/2022, 3:49 AMbetter-orange-49102
07/26/2022, 3:51 AMbetter-orange-49102
07/26/2022, 3:51 AMdazzling-insurance-83303
07/26/2022, 4:04 AM^3[47][0-9]{13}$
) and deny every other credit card in a table column from profiling, what would be the syntax for that?better-orange-49102
07/26/2022, 4:05 AMdazzling-insurance-83303
07/26/2022, 4:06 AMprofiling:
enabled: true
allow_deny_patterns:
allow:
- ^3[47][0-9]{13}$
deny:
- ^(^3[47][0-9]{13}$)
ignoreCase: True
alphabet: '[A-Za-z0-9 .-]'
better-orange-49102
07/26/2022, 4:09 AMbetter-orange-49102
07/26/2022, 4:10 AMmammoth-bear-12532
finance
database with the public
schema (to keep profiling costs and load on operational system low), then you would use the allow_deny patterns. It is not meant to filter the values within the columns themselves.dazzling-insurance-83303
07/26/2022, 1:11 PMdazzling-insurance-83303
07/28/2022, 3:41 AMCREATE TABLE IF NOT EXISTS public.test_to_exclude_table_columns_from_datahub
(
id integer NOT NULL,
val character varying(255) COLLATE pg_catalog."default" NOT NULL,
time_stamp timestamp without time zone NOT NULL DEFAULT clock_timestamp(),
secret_value character varying(255) COLLATE pg_catalog."default" NOT NULL,
secret_value2 character varying(255) COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT test_to_exclude_table_columns_from_datahub_pkey PRIMARY KEY (id)
)
Specification for allow_deny
profiling:
allow_deny_patterns:
allow:
- .*
deny:
- 'dvdrental.public.test_to_exclude_table_columns_from_datahub.secret_value*'
ignoreCase: True
alphabet: '[A-Za-z0-9 .-]'
OR
profiling:
allow_deny_patterns:
allow:
- .*
deny:
- 'dvdrental.public.test_to_exclude_table_columns_from_datahub.secret_value'
- 'dvdrental.public.test_to_exclude_table_columns_from_datahub.secret_value2'
ignoreCase: True
alphabet: '[A-Za-z0-9 .-]'
better-orange-49102
07/28/2022, 3:52 AMdazzling-insurance-83303
07/28/2022, 2:20 PMdvdrental.public.test_to_exclude_table_from_datahub
dvdrental.public.test_to_exclude_table_from_datahub_2
dvdrental.public.test_to_exclude_table_from_datahub_3
I tried the following configuration/syntax by moving the table deny - 'dvdrental.public.test_to_exclude_table_from_datahub*'
from above profiling
to under it. I noticed that tables got ingested (which was expected) however they got profiled as well, as in the images above, which was not expected. Previously the tables did not get ingested.
profiling:
allow_deny_patterns:
# allow:
# - .*
deny:
- 'dvdrental.public.test_to_exclude_table_from_datahub*'
- 'dvdrental.public.test_to_exclude_table_columns_from_datahub.secret_value'
- 'dvdrental.public.test_to_exclude_table_columns_from_datahub.secret_value2'
It is suggesting that the deny under profiling
either needs a different specification or not working as expected AFAICT.dazzling-insurance-83303
07/28/2022, 2:27 PMprofiling:
enabled: true # default false
limit:
offset:
report_dropped_profiles: False
turn_off_expensive_profiling_metrics: False
profile_table_level_only: false # default false
include_field_null_count: True
include_field_min_value: True
include_field_max_value: True
include_field_mean_value: True
include_field_median_value: True
include_field_stddev_value: True
include_field_quantiles: False
include_field_distinct_value_frequencies: False
include_field_histogram: True
include_field_sample_values: True
allow_deny_patterns:
# allow:
# - .*
deny:
- 'dvdrental.public.test_to_exclude_table_from_datahub*'
- 'dvdrental.public.test_to_exclude_table_columns_from_datahub.secret_value'
- 'dvdrental.public.test_to_exclude_table_columns_from_datahub.secret_value2'
ignoreCase: True
alphabet: '[A-Za-z0-9 .-]'
max_number_of_fields_to_profile:
# profile_if_updated_since_days: 1 # BigQuery only
# profile_table_size_limit: 1 # BigQuery only
# profile_table_row_limit: 50000 # BigQuery only
max_workers: 10
query_combiner_enabled: True
catch_exceptions: True
partition_profiling_enabled: True
# bigquery_temp_table_schema: None # BigQuery only
# partition_datetime: None # BigQuery only
dazzling-insurance-83303
07/29/2022, 4:22 PMdazzling-insurance-83303
07/29/2022, 4:26 PMallow_deny_patterns
on profiling.
As per my testing I could not get it to work.
TY
CC @little-megabyte-1074little-megabyte-1074
dazzling-insurance-83303
07/29/2022, 7:02 PMhelpful-optician-78938
08/04/2022, 8:29 PM*
to .*
and try e.g.: 'dvdrental.public.test_to_exclude_table_from_datahub.*'
?dazzling-insurance-83303
08/04/2022, 8:43 PMDatabase : dvdrental
Schema : public
Tables : test_to_exclude_table_from_datahub, test_to_exclude_table_from_datahub_1, test_to_exclude_table_from_datahub_2
I will give it you suggestion a try.dazzling-insurance-83303
08/05/2022, 7:30 PMgray-shoe-75895
08/08/2022, 11:24 PMprofile_pattern
instead of the profiling.allow_deny_patterns
option