late-ability-59580
11/17/2022, 1:47 PM<s3://bucket/pref/pref/*/*> in the source.config.path_specs.include
I understand it expects something like s3://.../*.* , but this won't match the pattern of my files.
Am I missing something?dazzling-judge-80093
11/17/2022, 1:50 PMdazzling-judge-80093
11/17/2022, 1:51 PMpath_specs:
- include: "<s3://mypath/{table}/{partition_key[0]}/{partition_key[1]}/{partition_key[2]}/*>"
default_extension: csvlate-ability-59580
11/17/2022, 3:30 PMdazzling-judge-80093
11/17/2022, 4:16 PMdatahub --debug ingest ...late-ability-59580
11/20/2022, 8:07 AM~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': '554972bc-e60d-4cc2-8210-0efff4115a3c',
'infos': ['2022-11-20 08:04:09.274310 [exec_id=554972bc-e60d-4cc2-8210-0efff4115a3c] INFO: Starting execution for task with name=RUN_INGEST',
'2022-11-20 08:04:13.342947 [exec_id=554972bc-e60d-4cc2-8210-0efff4115a3c] INFO: stdout=venv setup time = 0\n'
'This version of datahub supports report-to functionality\n'
'datahub ingest run -c /tmp/datahub/ingest/554972bc-e60d-4cc2-8210-0efff4115a3c/recipe.yml --report-to '
'/tmp/datahub/ingest/554972bc-e60d-4cc2-8210-0efff4115a3c/ingestion_report.json\n'
'[2022-11-20 08:04:11,178] INFO {datahub.cli.ingest_cli:177} - DataHub CLI version: 0.8.43.5\n'
'[2022-11-20 08:04:11,208] INFO {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured '
'to talk to <http://datahub-datahub-gms:8080>\n'
'[2022-11-20 08:04:11,492] ERROR {logger:26} - Please set env variable SPARK_VERSION\n'
'[2022-11-20 08:04:12,194] INFO {datahub.cli.ingest_cli:127} - Starting metadata ingestion\n'
'[2022-11-20 08:04:12,196] INFO {datahub.ingestion.reporting.file_reporter:54} - Wrote SUCCESS report successfully to '
"<_io.TextIOWrapper name='/tmp/datahub/ingest/554972bc-e60d-4cc2-8210-0efff4115a3c/ingestion_report.json' mode='w' encoding='UTF-8'>\n"
'[2022-11-20 08:04:12,196] INFO {datahub.cli.ingest_cli:145} - Finished metadata ingestion\n'
'\n'
'Cli report:\n'
"{'cli_version': '0.8.43.5',\n"
" 'cli_entry_location': '/usr/local/lib/python3.10/site-packages/datahub/__init__.py',\n"
" 'py_version': '3.10.7 (main, Sep 13 2022, 14:31:33) [GCC 10.2.1 20210110]',\n"
" 'py_exec_path': '/usr/local/bin/python',\n"
" 'os_details': 'Linux-5.4.181-99.354.amzn2.x86_64-x86_64-with-glibc2.31'}\n"
'Source (s3) report:\n'
"{'events_produced': '0',\n"
" 'events_produced_per_sec': '0',\n"
" 'event_ids': [],\n"
" 'warnings': {},\n"
" 'failures': {},\n"
" 'filtered': [],\n"
" 'start_time': '2022-11-20 08:04:11.853852',\n"
" 'running_time_in_seconds': '0',\n"
" 'read_rate': '0'}\n"
'Sink (datahub-rest) report:\n'
"{'total_records_written': '0',\n"
" 'records_written_per_second': '0',\n"
" 'warnings': [],\n"
" 'failures': [],\n"
" 'start_time': '2022-11-20 08:04:10.574181',\n"
" 'current_time': '2022-11-20 08:04:12.262083',\n"
" 'total_duration_in_seconds': '1.69',\n"
" 'gms_version': 'v0.9.2',\n"
" 'pending_requests': '0'}\n"
'\n'
' Pipeline finished successfully ; produced 0 events\n',
"2022-11-20 08:04:13.343114 [exec_id=554972bc-e60d-4cc2-8210-0efff4115a3c] INFO: Successfully executed 'datahub ingest'"],
'structured_report': '{"source": {"type": "s3", "report": {"events_produced": "0", "events_produced_per_sec": "0", "event_ids": [], "warnings": {}, '
'"failures": {}, "filtered": [], "start_time": "2022-11-20 08:04:11.853852", "running_time_in_seconds": "0", "read_rate": '
'"0"}}, "sink": {"type": "datahub-rest", "report": {"total_records_written": "0", "records_written_per_second": "0", '
'"warnings": [], "failures": [], "start_time": "2022-11-20 08:04:10.574181", "current_time": "2022-11-20 08:04:12.195627", '
'"total_duration_in_seconds": "1.62", "gms_version": "v0.9.2", "pending_requests": "0"}}}'}
Execution finished successfully!late-ability-59580
11/20/2022, 9:27 AM