Hello everyone, my team encountered an issue while...
# troubleshoot
i
Hello everyone, my team encountered an issue while working with the
redshift-usage
source. We've set up a pipeline with the following configuration:
Copy code
...
table_pattern:
      deny:
        - 'analytics.*.*requests*'
        - 'analytics.public.requests_raw_stg'
...
we triggered ingestion and in the gms logs we observed entries like the following:
Copy code
16:07:58.319 [qtp1830908236-16] INFO  c.l.m.r.entity.AspectResource:126 - INGEST PROPOSAL proposal: {aspectName=datasetUsageStatistics, systemMetadata={lastObserved=1660832494657, runId=redshift-usage-2022_08_18-14_10_28}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:redshift,analytics.public.requests_current_year_old,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=1336,bytes=7b227469...205b5d7d)}, changeType=UPSERT}
...
16:08:35.622 [qtp1830908236-1878] INFO  c.l.m.r.entity.AspectResource:126 - INGEST PROPOSAL proposal: {aspectName=datasetUsageStatistics, systemMetadata={lastObserved=1660832496738, runId=redshift-usage-2022_08_18-14_10_28}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:redshift,analytics.analytics_sources.potential_signup_requests,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=1526,bytes=7b227469...205b5d7d)}, changeType=UPSERT}
...
there are no records for
analytics.public.requests_raw_stg
. Could it be that only the last regex is being considered?
teamwork 1
I've done some tests with
AllowDenyPattern
and it seems to work fine:
Copy code
>>> p = AllowDenyPattern(deny=['analytics.*.*requests*','analytics.public.requests_raw_stg'],allow=['.*'])
>>> p.allowed('analytics.public.requests_current_year_old')
False
>>> p.allowed('analytics.analytics_sources.potential_signup_requests')
False
>>> p.allowed('analytics.public.requests_raw_stg')
False
>>> p.allowed('analytics.temp.school_teacher_to_be_renamed')
True
this makes me think that the first regex may be ignored when parsing the yml recipe
g
It looks like the first regex isn’t quite right - the
*
operator means “0 or more of the preceding character, while
.
means any character. What you probably want is
'analytics\..+\..*requests.*'
pro-tip: I’m a big fan of using https://regex101.com/ to write and test regex expressions
i
hi, thanks for your answer harshal. It's clearly not a regex issue because the
AllowDenyPattern
works with the ones I provided. I know it's not a classic regex, but it's alligned to what's expected according to documentation (see this comment as well). I have reasons to believe that there's a problem when creating the
AllowDenyPattern
from configuration. Also, I tried the regex you suggested and didn't work either.
g
Looked into it a bit more - seems like this is a bug in our redshift-usage connector. I have a PR up that should fix the issue https://github.com/datahub-project/datahub/pull/5702