Hi, I am evaluating Datahub to be implemented in ...
# troubleshoot
m
Hi, I am evaluating Datahub to be implemented in our company. I am trying to ingest a sample file that you provide in the docs . I am not sure where I should drop the file, I just dropped it in a folder like so /home/miquelp/datahub/file_onboarding/test_containers.json However I am getting an error while executing the recipe, seems like it doesn't find the file (The error message could be improved)? I have looked for a similar error but couldn't find anything. What linux user is the one that executes the ingestion? Can you give us a hand? Thanks
Copy code
~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': '06b7698c-048e-470e-bf2c-1ff4fca75bd0',
 'infos': ['2023-05-29 10:18:33.415813 INFO: Starting execution for task with name=RUN_INGEST',
           "2023-05-29 10:18:37.476974 INFO: Failed to execute 'datahub ingest'",
           '2023-05-29 10:18:37.477118 INFO: Caught exception EXECUTING task_id=06b7698c-048e-470e-bf2c-1ff4fca75bd0, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
 'errors': []}

~~~~ Ingestion Report ~~~~
{
  "cli": {
    "cli_version": "0.10.0.7",
    "cli_entry_location": "/usr/local/lib/python3.10/site-packages/datahub/__init__.py",
    "py_version": "3.10.10 (main, Mar 14 2023, 02:37:11) [GCC 10.2.1 20210110]",
    "py_exec_path": "/usr/local/bin/python",
    "os_details": "Linux-5.15.0-72-generic-x86_64-with-glibc2.31",
    "peak_memory_usage": "57.82 MB",
    "mem_info": "57.82 MB"
  },
  "source": {
    "type": "file",
    "report": {
      "events_produced": 0,
      "events_produced_per_sec": 0,
      "entities": {},
      "aspects": {},
      "warnings": {},
      "failures": {},
      "total_num_files": 0,
      "num_files_completed": 0,
      "files_completed": [],
      "percentage_completion": "0%",
      "estimated_time_to_completion_in_minutes": -1,
      "total_bytes_read_completed_files": 0,
      "total_parse_time_in_seconds": 0,
      "total_count_time_in_seconds": 0,
      "total_deserialize_time_in_seconds": 0,
      "aspect_counts": {},
      "entity_type_counts": {},
      "start_time": "2023-05-29 10:18:35.206188 (now)",
      "running_time": "0 seconds"
    }
  },
  "sink": {
    "type": "datahub-rest",
    "report": {
      "total_records_written": 0,
      "records_written_per_second": 0,
      "warnings": [],
      "failures": [],
      "start_time": "2023-05-29 10:18:35.161225 (now)",
      "current_time": "2023-05-29 10:18:35.208860 (now)",
      "total_duration_in_seconds": 0.05,
      "gms_version": "v0.10.3",
      "pending_requests": 0
    }
  }
}

~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv setup time = 0
This version of datahub supports report-to functionality
datahub  ingest run -c /tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/recipe.yml --report-to /tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/ingestion_report.json
[2023-05-29 10:18:35,113] INFO     {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.0.7
No ~/.datahubenv file found, generating one for you...
[2023-05-29 10:18:35,164] INFO     {datahub.ingestion.run.pipeline:184} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
[2023-05-29 10:18:35,206] INFO     {datahub.ingestion.run.pipeline:201} - Source configured successfully.
[2023-05-29 10:18:35,207] INFO     {datahub.cli.ingest_cli:129} - Starting metadata ingestion
[2023-05-29 10:18:35,209] INFO     {datahub.ingestion.reporting.file_reporter:52} - Wrote UNKNOWN report successfully to <_io.TextIOWrapper name='/tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/ingestion_report.json' mode='w' encoding='UTF-8'>
[2023-05-29 10:18:35,209] INFO     {datahub.cli.ingest_cli:134} - Source (file) report:
{'events_produced': 0,
 'events_produced_per_sec': 0,
 'entities': {},
 'aspects': {},
 'warnings': {},
 'failures': {},
 'total_num_files': 0,
 'num_files_completed': 0,
 'files_completed': [],
 'percentage_completion': '0%',
 'estimated_time_to_completion_in_minutes': -1,
 'total_bytes_read_completed_files': 0,
 'total_parse_time_in_seconds': 0,
 'total_count_time_in_seconds': 0,
 'total_deserialize_time_in_seconds': 0,
 'aspect_counts': {},
 'entity_type_counts': {},
 'start_time': '2023-05-29 10:18:35.206188 (now)',
 'running_time': '0 seconds'}
[2023-05-29 10:18:35,210] INFO     {datahub.cli.ingest_cli:137} - Sink (datahub-rest) report:
{'total_records_written': 0,
 'records_written_per_second': 0,
 'warnings': [],
 'failures': [],
 'start_time': '2023-05-29 10:18:35.161225 (now)',
 'current_time': '2023-05-29 10:18:35.210294 (now)',
 'total_duration_in_seconds': 0.05,
 'gms_version': 'v0.10.3',
 'pending_requests': 0}
[2023-05-29 10:18:35,809] ERROR    {datahub.entrypoints:188} - Command failed: Failed to process /home/miquelp/datahub/file_onboarding/test_containers.json
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 175, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
    return func(ctx, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 198, in run
    loop.run_until_complete(run_func_check_upgrade(pipeline))
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
    ret = await the_one_future
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_async
    return await loop.run_in_executor(
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
    pipeline.run()
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 339, in run
    for wu in itertools.islice(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/file.py", line 196, in get_workunits
    for f in self.get_filenames():
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/file.py", line 193, in get_filenames
    raise Exception(f"Failed to process {self.config.path}")
Exception: Failed to process /home/miquelp/datahub/file_onboarding/test_containers.json
1
g
Hi @many-rocket-80549 You can use cli command
datahub ingest run -c recipe.yaml