There is a bug in the Snowflake ingestion with acr...
# troubleshoot
c
There is a bug in the Snowflake ingestion with acryl-datahub 0.8.16.1; with a config like:
Copy code
source:
  type: "snowflake"
  config:
    username: ...
    password: ...
    host_port: ...
    warehouse: ...
    role: "accountadmin"
    database_pattern:
      ignoreCase: true
      allow:
        - "db1"
        - "db2"
    table_pattern:
      ignoreCase: true
      allow:
        - "db1.schema1.table1"
        - "db2.schema2.table2"
    include_tables: true
    include_table_lineage: false
Only
db1.schema1.table1
get ingested and the log show
db2.schema1.table1
getting filtered out, even though there are no
db2.schema1
schema, and
db2.schema2.table2
does not show up in the log.
m
Thanks for reporting Remi.
h
Hi @curved-sandwich-81699, could you double check the schemas and the tables in your snowflake instance? I tried to reproduce this issue with similar settings...
Copy code
source:
  type: "snowflake"
  config:
    username: ...
    password: ...
    host_port: ...
    warehouse: ...
    role: "accountadmin"
    database_pattern:
        ignoreCase: True
        allow:
            - "demo_db"
            - "demo_pipeline"
    table_pattern:
        ignoreCase: True
        allow:
            - "demo_db.test_schema.t1"
            - "demo_pipeline.test_schema.t2"
    include_tables: True
    include_table_lineage: False
    include_views: False
And here is ingestion summary log. Everything works as expected...
Copy code
[2021-10-28 09:38:52,300] INFO     {datahub.cli.ingest_cli:57} - Starting metadata ingestion
[2021-10-28 09:38:56,063] INFO     {datahub.ingestion.run.pipeline:61} - sink wrote workunit demo_db.test_schema.t1
[2021-10-28 09:38:57,496] INFO     {datahub.ingestion.run.pipeline:61} - sink wrote workunit demo_pipeline.test_schema.t2
[2021-10-28 09:38:57,902] INFO     {datahub.cli.ingest_cli:59} - Finished metadata ingestion

Source (snowflake) report:
{'failures': {},
 'filtered': ['demo_db.test_schema.t2',
              'demo_db.test_schema.t3',
              'demo_db.test_schema.t4',
              'demo_db.test_schema.t5',
              'demo_db.test_schema.t6',
              'demo_db.test_schema.t7',
              'demo_pipeline.test_schema.t1',
              'demo_pipeline.test_schema.t3',
              'demo_pipeline.test_schema.t4',
              'demo_pipeline.test_schema.t5',
              'demo_pipeline.test_schema.t6',
              'demo_pipeline.test_schema.t7',
              'PROFILING_TESTS',
              'SNOWFLAKE',
              'SNOWFLAKE_SAMPLE_DATA',
              'UTIL_DB'],
 'tables_scanned': 14,
 'views_scanned': 0,
 'warnings': {},
 'workunit_ids': ['demo_db.test_schema.t1', 'demo_pipeline.test_schema.t2'],
 'workunits_produced': 2}
Sink (file) report:
{'downstream_end_time': None,
 'downstream_start_time': None,
 'downstream_total_latency_in_seconds': None,
 'failures': [],
 'records_written': 2,
 'warnings': []}

Pipeline finished successfully
c
@helpful-optician-78938 thanks for looking at this. Could you try with different schema and table names in one of the db? My issue is that the schemas and tables get mixed up between 2 dbs.
Downgrading to
acryl-datahub[snowflake]==0.8.14.2
fixes my issue.
That issue happen with
acryl-datahub[snowflake]==0.8.15.9
and later versions, previous versions have no issue.
h
I can repro the issue with different schemas now. Thanks for reporting. Looks like this regression got introduced on 10/19. We'll fast-follow with a fix.
👍 1
c
Sounds good, thank you!
^ confirmed fixed with with
acryl-datahub[snowflake]==0.8.16.2
.
🎉 2