shy-lion-56425
09/20/2022, 4:30 PMsource:
type: s3
config:
profiling:
enabled: false
path_specs:
-
include: '<s3://MY_EXAMPLE_BUCKET/AWSLogs/0123456789/CloudTrail/us-east-1/2022/08/23/*.*>'
-
enable_compresion: true
aws_config:
aws_access_key_id: '${AWS_ACCESS_KEY_ID_CLOUDTRAIL}'
aws_region: us-east-1
aws_secret_access_key: '${AWS_SECRET_ACCESS_KEY_CLOUDTRAIL}'
However I get the following error:
~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': '9e0f190f-05fd-407c-bdb9-16cebaed1d0c',
'infos': ['2022-09-20 16:24:16.151405 [exec_id=9e0f190f-05fd-407c-bdb9-16cebaed1d0c] INFO: Starting execution for task with name=RUN_INGEST',
'2022-09-20 16:24:18.213813 [exec_id=9e0f190f-05fd-407c-bdb9-16cebaed1d0c] INFO: stdout=Elapsed seconds = 0\n'
' --report-to TEXT Provide an output file to produce a\n'
'This version of datahub supports report-to functionality\n'
'datahub --debug ingest run -c /tmp/datahub/ingest/9e0f190f-05fd-407c-bdb9-16cebaed1d0c/recipe.yml --report-to '
'/tmp/datahub/ingest/9e0f190f-05fd-407c-bdb9-16cebaed1d0c/ingestion_report.json\n'
'[2022-09-20 16:24:17,736] INFO {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.43.2\n'
'[2022-09-20 16:24:17,769] INFO {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured '
'to talk to <http://datahub-datahub-gms:8080>\n'
"[2022-09-20 16:24:17,770] ERROR {datahub.ingestion.run.pipeline:127} - s3 is disabled; try running: pip install 'acryl-datahub[s3]'\n"
'Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/api/registry.py", line 85, in _ensure_not_lazy\n'
' plugin_class = import_path(path)\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/api/registry.py", line 32, in import_path\n'
' item = importlib.import_module(module_name)\n'
' File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module\n'
' return _bootstrap._gcd_import(name[level:], package, level)\n'
' File "<frozen importlib._bootstrap>", line 1030, in _gcd_import\n'
' File "<frozen importlib._bootstrap>", line 1007, in _find_and_load\n'
' File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked\n'
' File "<frozen importlib._bootstrap>", line 680, in _load_unlocked\n'
' File "<frozen importlib._bootstrap_external>", line 850, in exec_module\n'
' File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/s3/__init__.py", line 1, in <module>\n'
' from datahub.ingestion.source.s3.source import S3Source\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/s3/source.py", line 10, in <module>\n'
' import pydeequ\n'
"ModuleNotFoundError: No module named 'pydeequ'\n"
'\n'
'The above exception was the direct cause of the following exception:\n'
'\n'
'Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 172, in __init__\n'
' source_class = source_registry.get(source_type)\n'
' File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/api/registry.py", line 127, in get\n'
' raise ConfigurationError(\n'
"datahub.configuration.common.ConfigurationError: s3 is disabled; try running: pip install 'acryl-datahub[s3]'\n"
'[2022-09-20 16:24:17,773] INFO {datahub.cli.ingest_cli:119} - Starting metadata ingestion\n'
'[2022-09-20 16:24:17,774] INFO {datahub.cli.ingest_cli:137} - Finished metadata ingestion\n'
"[2022-09-20 16:24:17,919] ERROR {datahub.entrypoints:188} - Command failed with 'Pipeline' object has no attribute 'source'. Run with "
'--debug to get full trace\n'
'[2022-09-20 16:24:17,920] INFO {datahub.entrypoints:191} - DataHub CLI version: 0.8.43.2 at '
'/usr/local/lib/python3.9/site-packages/datahub/__init__.py\n',
"2022-09-20 16:24:18.214118 [exec_id=9e0f190f-05fd-407c-bdb9-16cebaed1d0c] INFO: Failed to execute 'datahub ingest'",
'2022-09-20 16:24:18.214380 [exec_id=9e0f190f-05fd-407c-bdb9-16cebaed1d0c] INFO: Caught exception EXECUTING '
'task_id=9e0f190f-05fd-407c-bdb9-16cebaed1d0c, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 142, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
helpful-optician-78938
09/20/2022, 8:37 PMpip install 'acryl-datahub[s3]'
in the venv that you are trying to run the datahub ingest
command from?shy-lion-56425
09/20/2022, 8:39 PMhelpful-optician-78938
09/20/2022, 8:57 PMmodern-artist-55754
09/21/2022, 8:48 AMdatahub-action
container. Thats where datahub ingest
runsmodern-artist-55754
09/22/2022, 4:33 AM