incalculable-branch-51967
08/18/2022, 8:11 PMredshift-usage
source. We've set up a pipeline with the following configuration:
...
table_pattern:
deny:
- 'analytics.*.*requests*'
- 'analytics.public.requests_raw_stg'
...
we triggered ingestion and in the gms logs we observed entries like the following:
16:07:58.319 [qtp1830908236-16] INFO c.l.m.r.entity.AspectResource:126 - INGEST PROPOSAL proposal: {aspectName=datasetUsageStatistics, systemMetadata={lastObserved=1660832494657, runId=redshift-usage-2022_08_18-14_10_28}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:redshift,analytics.public.requests_current_year_old,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=1336,bytes=7b227469...205b5d7d)}, changeType=UPSERT}
...
16:08:35.622 [qtp1830908236-1878] INFO c.l.m.r.entity.AspectResource:126 - INGEST PROPOSAL proposal: {aspectName=datasetUsageStatistics, systemMetadata={lastObserved=1660832496738, runId=redshift-usage-2022_08_18-14_10_28}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:redshift,analytics.analytics_sources.potential_signup_requests,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=1526,bytes=7b227469...205b5d7d)}, changeType=UPSERT}
...
there are no records for analytics.public.requests_raw_stg
. Could it be that only the last regex is being considered?incalculable-branch-51967
08/18/2022, 8:12 PMAllowDenyPattern
and it seems to work fine:
>>> p = AllowDenyPattern(deny=['analytics.*.*requests*','analytics.public.requests_raw_stg'],allow=['.*'])
>>> p.allowed('analytics.public.requests_current_year_old')
False
>>> p.allowed('analytics.analytics_sources.potential_signup_requests')
False
>>> p.allowed('analytics.public.requests_raw_stg')
False
>>> p.allowed('analytics.temp.school_teacher_to_be_renamed')
True
this makes me think that the first regex may be ignored when parsing the yml recipegray-shoe-75895
08/19/2022, 4:47 AM*
operator means “0 or more of the preceding character, while .
means any character. What you probably want is 'analytics\..+\..*requests.*'
gray-shoe-75895
08/19/2022, 4:48 AMincalculable-branch-51967
08/19/2022, 7:09 PMAllowDenyPattern
works with the ones I provided. I know it's not a classic regex, but it's alligned to what's expected according to documentation (see this comment as well). I have reasons to believe that there's a problem when creating the AllowDenyPattern
from configuration. Also, I tried the regex you suggested and didn't work either.gray-shoe-75895
08/22/2022, 8:12 PM