https://datahubproject.io logo
#ingestion
Title
# ingestion
c

crooked-holiday-47153

09/29/2022, 8:53 AM
Hi All, I am trying to ingest a single table's data from Snowflake without success. I tried various regular expression values in the ingestion config but none of them seems to work. The table I want to ingest full name is SANDBOX_DB.P_USERA.LAB_STRUCTURE and these are the config params I am using, am I missing anything (sharing the relevant part from the config)?
Copy code
source:
    type: snowflake
    config:
...
        schema_pattern:
            allow:
                - ^SANDBOX_DB\.P_USERA$
        database_pattern:
            allow:
                - ^SANDBOX_DB$
        table_pattern:
            allow:
                - ^SANDBOX_DB\.P_USERA\.LAB_STRUCTURE$
...
I used the same config for snowflake-usage ingestion as well. Both of them finish successfully but the table doesn't turn up when searching it in the catalog.
d

dazzling-judge-80093

09/29/2022, 9:19 AM
@crooked-holiday-47153 I think schema pattern should be defined without the database:
Copy code
schema_pattern:
            allow:
                - ^P_USERA$
        database_pattern:
            allow:
                - ^SANDBOX_DB$
        table_pattern:
            allow:
                - ^SANDBOX_DB\.P_USERA\.LAB_STRUCTURE$
c

crooked-holiday-47153

09/29/2022, 9:22 AM
@dazzling-judge-80093 I will try this config and update, 10x
d

dazzling-judge-80093

09/29/2022, 9:22 AM
let me know if this works 🤞
c

crooked-holiday-47153

09/29/2022, 9:33 AM
@dazzling-judge-80093 Ingestion finished successfully, but table doesn;t exist when searching the catalog. Here is the ingestion run log:
Copy code
~~~~ Execution Summary ~~~~

RUN_INGEST - {'errors': [],
 'exec_id': '8e47fbef-c675-4517-8f5d-2bb950708ace',
 'infos': ['2022-09-29 09:23:09.634831 [exec_id=8e47fbef-c675-4517-8f5d-2bb950708ace] INFO: Starting execution for task with name=RUN_INGEST',
           '2022-09-29 09:27:26.951685 [exec_id=8e47fbef-c675-4517-8f5d-2bb950708ace] INFO: stdout=venv setup time = 0\n'
           'This version of datahub supports report-to functionality\n'
           'datahub  ingest run -c /tmp/datahub/ingest/8e47fbef-c675-4517-8f5d-2bb950708ace/recipe.yml --report-to '
           '/tmp/datahub/ingest/8e47fbef-c675-4517-8f5d-2bb950708ace/ingestion_report.json\n'
           '[2022-09-29 09:23:11,421] INFO     {datahub.cli.ingest_cli:182} - DataHub CLI version: 0.8.45\n'
           '[2022-09-29 09:23:11,440] INFO     {datahub.ingestion.run.pipeline:175} - Sink configured successfully. DataHubRestEmitter: configured '
           'to talk to <http://datahub-gms:8080> with token: **********\n'
           "[2022-09-29 09:23:13,349] INFO     {datahub.ingestion.source.sql.sql_common:278} - Applying table_pattern {'allow': "
           "['^SANDBOX_DB\\\\.P_USERA\\\\.LAB_STRUCTURE$']} to view_pattern.\n"
           '[2022-09-29 09:23:13,349] INFO     {datahub.ingestion.source_config.sql.snowflake:231} - using authenticator type '
           "'DEFAULT_AUTHENTICATOR'\n"
           "[2022-09-29 09:23:13,349] WARNING  {datahub.ingestion.source_config.sql.snowflake:159} - snowflake's `host_port` option has been "
           'deprecated; use account_id instead\n'
           '[2022-09-29 09:23:13,350] INFO     {datahub.ingestion.run.pipeline:200} - Source configured successfully.\n'
           '/tmp/datahub/ingest/venv-snowflake-0.8.45/lib/python3.10/site-packages/datahub/cli/ingest_cli.py:211: DeprecationWarning: There is no '
           'current event loop\n'
           '  loop = asyncio.get_event_loop()\n'
           '[2022-09-29 09:23:13,351] INFO     {datahub.cli.ingest_cli:129} - Starting metadata ingestion\n'
           '[2022-09-29 09:23:13,870] INFO     {datahub.ingestion.source.snowflake.snowflake_v2:957} - Checking current version\n'
           '[2022-09-29 09:23:14,043] INFO     {datahub.ingestion.source.snowflake.snowflake_v2:963} - Checking current role\n'
           '[2022-09-29 09:23:14,096] INFO     {datahub.ingestion.source.snowflake.snowflake_v2:969} - Checking current warehouse\n'
           '[2022-09-29 09:23:14,468] INFO     {datahub.ingestion.source.snowflake.snowflake_usage_v2:95} - Checking usage date ranges\n'
           '[2022-09-29 09:24:24,630] INFO     {datahub.ingestion.source.snowflake.snowflake_usage_v2:122} - Getting aggregated usage statistics\n'
           '[2022-09-29 09:25:52,389] INFO     {datahub.ingestion.source.snowflake.snowflake_usage_v2:226} - Getting access history\n'
           '[2022-09-29 09:27:25,490] INFO     {datahub.ingestion.reporting.file_reporter:54} - Wrote SUCCESS report successfully to '
           "<_io.TextIOWrapper name='/tmp/datahub/ingest/8e47fbef-c675-4517-8f5d-2bb950708ace/ingestion_report.json' mode='w' encoding='UTF-8'>\n"
           '[2022-09-29 09:27:25,490] INFO     {datahub.cli.ingest_cli:150} - Finished metadata ingestion\n'
           '\n'
           'Cli report:\n'
           "{'cli_version': '0.8.45',\n"
           " 'cli_entry_location': '/tmp/datahub/ingest/venv-snowflake-0.8.45/lib/python3.10/site-packages/datahub/__init__.py',\n"
           " 'py_version': '3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 20210110]',\n"
           " 'py_exec_path': '/tmp/datahub/ingest/venv-snowflake-0.8.45/bin/python3',\n"
           " 'os_details': 'Linux-5.10.112-108.499.amzn2.x86_64-x86_64-with-glibc2.31',\n"
           " 'mem_info': '352.41 MB'}\n"
           'Source (snowflake) report:\n'
           "{'events_produced': '0',\n"
           " 'events_produced_per_sec': '0',\n"
           " 'event_ids': [],\n"
           " 'warnings': {},\n"
           " 'failures': {},\n"
           " 'soft_deleted_stale_entities': [],\n"
           " 'tables_scanned': '0',\n"
           " 'views_scanned': '0',\n"
           " 'entities_profiled': '0',\n"
           " 'filtered': ['PROD.*', 'SNOWFLAKE.*', 'SNOWFLAKE_SAMPLE_DATA.*'],\n"
           " 'window_end_time': '2022-09-29 09:23:13.349285+00:00 (4 minutes and 12.3 seconds ago).',\n"
           " 'window_start_time': '2022-09-28 00:00:00+00:00 (1 day, 9 hours and 27 minutes ago).',\n"
           " 'num_table_to_table_edges_scanned': '0',\n"
           " 'num_table_to_view_edges_scanned': '0',\n"
           " 'num_view_to_table_edges_scanned': '0',\n"
           " 'num_external_table_edges_scanned': '0',\n"
           " 'ignore_start_time_lineage': 'False',\n"
           " 'upstream_lineage_in_report': 'False',\n"
           " 'upstream_lineage': {},\n"
           " 'lineage_start_time': '2022-09-28 00:00:00+00:00 (1 day, 9 hours and 27 minutes ago).',\n"
           " 'lineage_end_time': '2022-09-29 09:23:13.349285+00:00 (4 minutes and 12.3 seconds ago).',\n"
           " 'cleaned_account_id': 'ib91586.us-east-1',\n"
           " 'run_ingestion': 'False',\n"
           " 'provision_role_done': 'False',\n"
           " 'provision_role_success': 'False',\n"
           " 'saas_version': '6.31.1',\n"
           " 'default_warehouse': 'WH_DATAHUB',\n"
           " 'role': 'DATAHUB_PROD_ROLE',\n"
           " 'check_role_grants': 'False',\n"
           " 'role_grants': [],\n"
           " 'profile_candidates': {},\n"
           " 'start_time': '2022-09-29 09:23:13.349997 (4 minutes and 12.3 seconds ago).',\n"
           " 'running_time': '4 minutes and 12.3 seconds',\n"
           " 'include_usage_stats': 'True',\n"
           " 'include_operational_stats': 'True',\n"
           " 'include_technical_schema': 'True',\n"
           " 'databases_scanned': '3',\n"
           " 'min_access_history_time': '2021-09-01 00:00:00.105000+00:00 (1 year, 4 weeks and 1 day ago).',\n"
           " 'max_access_history_time': '2022-09-29 09:21:30.975000+00:00 (5 minutes and 54.68 seconds ago).',\n"
           " 'access_history_range_query_secs': '70.16',\n"
           " 'usage_aggregation_query_secs': '86.94629098905716',\n"
           " 'access_history_query_secs': '90.03',\n"
           " 'rows_processed': '23861',\n"
           " 'rows_zero_base_objects_accessed': '23861',\n"
           " 'rows_zero_direct_objects_accessed': '23861',\n"
           " 'rows_zero_objects_modified': '23861',\n"
           " 'rows_missing_email': '22398'}\n"
           'Sink (datahub-rest) report:\n'
           "{'total_records_written': '0',\n"
           " 'records_written_per_second': '0',\n"
           " 'warnings': [],\n"
           " 'failures': [],\n"
           " 'start_time': '2022-09-29 09:23:11.438141 (4 minutes and 14.22 seconds ago).',\n"
           " 'current_time': '2022-09-29 09:27:25.654616 (now).',\n"
           " 'total_duration_in_seconds': '254.22',\n"
           " 'gms_version': 'v0.8.45',\n"
           " 'pending_requests': '0'}\n"
           '\n'
           ' Pipeline finished successfully ; produced 0 events in 4 minutes and 12.3 seconds.\n',
           "2022-09-29 09:27:26.952006 [exec_id=8e47fbef-c675-4517-8f5d-2bb950708ace] INFO: Successfully executed 'datahub ingest'"],
 'structured_report': '{"source": {"type": "snowflake", "report": {"events_produced": "0", "events_produced_per_sec": "0", "event_ids": [], '
                      '"warnings": {}, "failures": {}, "soft_deleted_stale_entities": [], "tables_scanned": "0", "views_scanned": "0", '
                      '"entities_profiled": "0", "filtered": ["PROD.*", "SNOWFLAKE.*", "SNOWFLAKE_SAMPLE_DATA.*"], "window_end_time": "2022-09-29 '
                      '09:23:13.349285+00:00 (4 minutes and 12.14 seconds ago).", "window_start_time": "2022-09-28 00:00:00+00:00 (1 day, 9 hours '
                      'and 27 minutes ago).", "num_table_to_table_edges_scanned": "0", "num_table_to_view_edges_scanned": "0", '
                      '"num_view_to_table_edges_scanned": "0", "num_external_table_edges_scanned": "0", "ignore_start_time_lineage": "False", '
                      '"upstream_lineage_in_report": "False", "upstream_lineage": {}, "lineage_start_time": "2022-09-28 00:00:00+00:00 (1 day, 9 '
                      'hours and 27 minutes ago).", "lineage_end_time": "2022-09-29 09:23:13.349285+00:00 (4 minutes and 12.14 seconds ago).", '
                      '"cleaned_account_id": "ib91586.us-east-1", "run_ingestion": "False", "provision_role_done": "False", '
                      '"provision_role_success": "False", "saas_version": "6.31.1", "default_warehouse": "WH_DATAHUB", "role": "DATAHUB_PROD_ROLE", '
                      '"check_role_grants": "False", "role_grants": [], "profile_candidates": {}, "start_time": "2022-09-29 09:23:13.349997 (4 '
                      'minutes and 12.14 seconds ago).", "running_time": "4 minutes and 12.14 seconds", "include_usage_stats": "True", '
                      '"include_operational_stats": "True", "include_technical_schema": "True", "databases_scanned": "3", "min_access_history_time": '
                      '"2021-09-01 00:00:00.105000+00:00 (1 year, 4 weeks and 1 day ago).", "max_access_history_time": "2022-09-29 '
                      '09:21:30.975000+00:00 (5 minutes and 54.51 seconds ago).", "access_history_range_query_secs": "70.16", '
                      '"usage_aggregation_query_secs": "86.94629098905716", "access_history_query_secs": "90.03", "rows_processed": "23861", '
                      '"rows_zero_base_objects_accessed": "23861", "rows_zero_direct_objects_accessed": "23861", "rows_zero_objects_modified": '
                      '"23861", "rows_missing_email": "22398"}}, "sink": {"type": "datahub-rest", "report": {"total_records_written": "0", '
                      '"records_written_per_second": "0", "warnings": [], "failures": [], "start_time": "2022-09-29 09:23:11.438141 (4 minutes and '
                      '14.05 seconds ago).", "current_time": "2022-09-29 09:27:25.489356 (now).", "total_duration_in_seconds": "254.05", '
                      '"gms_version": "v0.8.45", "pending_requests": "0"}}}'}
Execution finished successfully!