astonishing-kite-41577
10/13/2022, 4:25 PMlittle-megabyte-1074
bland-orange-13353
10/13/2022, 8:54 PMastonishing-kite-41577
10/13/2022, 9:36 PMsource:
type: s3
config:
profiling:
enabled: false
path_specs:
-
include: '<s3://dev-presentation/Study/Combined/*.*>'
env: PROD
aws_config:
aws_access_key_id: '${AWS_ACCESS_KEY_ID_JR}'
aws_secret_access_key: '${AWS_SECRET_KEY_JR}'
aws_session_token: '${AWS_SESSION_TOKEN_JR}'
aws_region: us-east-1
pipeline_name: 'urn:li:dataHubIngestionSource:61dcc24b-824c-4b60-858b-bd309e51c81a'
transformers:
-
type: simple_add_dataset_domain
config:
domains:
- 'urn:li:domain:Domain'
error:
~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': 'a3eb1cad-f8f4-4b19-a6c8-9429d46af126',
'infos': ['2022-10-13 15:49:41.037343 [exec_id=a3eb1cad-f8f4-4b19-a6c8-9429d46af126] INFO: Starting execution for task with name=RUN_INGEST',
'2022-10-13 15:49:45.079425 [exec_id=a3eb1cad-f8f4-4b19-a6c8-9429d46af126] INFO: stdout=venv setup time = 0\n'
'This version of datahub supports report-to functionality\n'
'datahub ingest run -c /tmp/datahub/ingest/a3eb1cad-f8f4-4b19-a6c8-9429d46af126/recipe.yml --report-to '
'/tmp/datahub/ingest/a3eb1cad-f8f4-4b19-a6c8-9429d46af126/ingestion_report.json\n'
'[2022-10-13 15:49:42,639] INFO {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.42\n'
'[2022-10-13 15:49:42,660] INFO {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured '
'to talk to <http://datahub-gms:8080>\n'
'[2022-10-13 15:49:43,027] ERROR {logger:26} - Please set env variable SPARK_VERSION\n'
"[2022-10-13 15:49:43,510] ERROR {datahub.entrypoints:188} - Command failed with 'Did not find a registered class for "
"simple_add_dataset_domain'. Run with --debug to get full trace\n"
'[2022-10-13 15:49:43,510] INFO {datahub.entrypoints:191} - DataHub CLI version: 0.8.42 at '
'/tmp/datahub/ingest/venv-s3-0.8.42/lib/python3.10/site-packages/datahub/__init__.py\n',
"2022-10-13 15:49:45.079651 [exec_id=a3eb1cad-f8f4-4b19-a6c8-9429d46af126] INFO: Failed to execute 'datahub ingest'",
'2022-10-13 15:49:45.079831 [exec_id=a3eb1cad-f8f4-4b19-a6c8-9429d46af126] INFO: Caught exception EXECUTING '
'task_id=a3eb1cad-f8f4-4b19-a6c8-9429d46af126, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 203, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
simple_add_dataset_ownership:
On this one, I'm not tied into Okta or anything yet as I'm still just testing, so the users and groups were created directly in datahub, so I'm not sure if that will impact the urns.
recipe:
source:
type: s3
config:
profiling:
enabled: false
path_specs:
-
include: '<s3://dev-presentation/Study/Combined/*.*>'
env: PROD
aws_config:
aws_access_key_id: '${AWS_ACCESS_KEY_ID_JR}'
aws_secret_access_key: '${AWS_SECRET_KEY_JR}'
aws_session_token: '${AWS_SESSION_TOKEN_JR}'
aws_region: us-east-1
pipeline_name: 'urn:li:dataHubIngestionSource:61dcc24b-824c-4b60-858b-bd309e51c81a'
transformers:
-
tyoe: simple_add_dataset_ownership
config:
owner_urns:
- 'urn:li:corpuser:accc8zz'
- 'urn:li:corpGroup:Admin'
error:
~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': 'c2781634-4839-4d2e-a879-43a4350fe512',
'infos': ['2022-10-13 16:23:14.921852 [exec_id=c2781634-4839-4d2e-a879-43a4350fe512] INFO: Starting execution for task with name=RUN_INGEST',
'2022-10-13 16:23:18.959750 [exec_id=c2781634-4839-4d2e-a879-43a4350fe512] INFO: stdout=venv setup time = 0\n'
'This version of datahub supports report-to functionality\n'
'datahub ingest run -c /tmp/datahub/ingest/c2781634-4839-4d2e-a879-43a4350fe512/recipe.yml --report-to '
'/tmp/datahub/ingest/c2781634-4839-4d2e-a879-43a4350fe512/ingestion_report.json\n'
'[2022-10-13 16:23:17,095] INFO {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.42\n'
'2 validation errors for PipelineConfig\n'
'transformers -> 0 -> type\n'
' field required (type=value_error.missing)\n'
'transformers -> 0 -> tyoe\n'
' extra fields not permitted (type=value_error.extra)\n',
"2022-10-13 16:23:18.959978 [exec_id=c2781634-4839-4d2e-a879-43a4350fe512] INFO: Failed to execute 'datahub ingest'",
'2022-10-13 16:23:18.961421 [exec_id=c2781634-4839-4d2e-a879-43a4350fe512] INFO: Caught exception EXECUTING '
'task_id=c2781634-4839-4d2e-a879-43a4350fe512, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 203, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
bulky-soccer-26729
10/18/2022, 5:04 PMdatahub --version
what do you get?bulky-soccer-26729
10/18/2022, 5:20 PMastonishing-kite-41577
10/18/2022, 5:57 PMbulky-soccer-26729
10/18/2022, 5:57 PMastonishing-kite-41577
10/18/2022, 6:35 PMbulky-soccer-26729
10/19/2022, 3:03 PM