Hi everyone! I'm getting some errors trying to us...
# troubleshoot
a
Hi everyone! I'm getting some errors trying to use a simple add_domain and add_owner transformers. Any help would be appreciated! Please see the code and error messages in the thread.
l
Hi @astonishing-kite-41577! Gentle reminder to please stick to our Slack Guidelines & post large blocks of code/stack trace in threads; it’s a HUGE help for us to keep track of unaddressed questions across our various support channels! teamwork
b
You can view our Slack Gluidelines here: https://datahubproject.io/docs/slack/
a
Here is my code and error messages for context... simple_add_dataset_domain: recipe:
Copy code
source:
    type: s3
    config:
        profiling:
            enabled: false
        path_specs:
            -
                include: '<s3://dev-presentation/Study/Combined/*.*>'
        env: PROD
        aws_config:
            aws_access_key_id: '${AWS_ACCESS_KEY_ID_JR}'
            aws_secret_access_key: '${AWS_SECRET_KEY_JR}'
            aws_session_token: '${AWS_SESSION_TOKEN_JR}'
            aws_region: us-east-1
pipeline_name: 'urn:li:dataHubIngestionSource:61dcc24b-824c-4b60-858b-bd309e51c81a'
transformers:
    -
        type: simple_add_dataset_domain
        config:
            domains:
                - 'urn:li:domain:Domain'
error:
Copy code
~~~~ Execution Summary ~~~~

RUN_INGEST - {'errors': [],
 'exec_id': 'a3eb1cad-f8f4-4b19-a6c8-9429d46af126',
 'infos': ['2022-10-13 15:49:41.037343 [exec_id=a3eb1cad-f8f4-4b19-a6c8-9429d46af126] INFO: Starting execution for task with name=RUN_INGEST',
           '2022-10-13 15:49:45.079425 [exec_id=a3eb1cad-f8f4-4b19-a6c8-9429d46af126] INFO: stdout=venv setup time = 0\n'
           'This version of datahub supports report-to functionality\n'
           'datahub  ingest run -c /tmp/datahub/ingest/a3eb1cad-f8f4-4b19-a6c8-9429d46af126/recipe.yml --report-to '
           '/tmp/datahub/ingest/a3eb1cad-f8f4-4b19-a6c8-9429d46af126/ingestion_report.json\n'
           '[2022-10-13 15:49:42,639] INFO     {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.42\n'
           '[2022-10-13 15:49:42,660] INFO     {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured '
           'to talk to <http://datahub-gms:8080>\n'
           '[2022-10-13 15:49:43,027] ERROR    {logger:26} - Please set env variable SPARK_VERSION\n'
           "[2022-10-13 15:49:43,510] ERROR    {datahub.entrypoints:188} - Command failed with 'Did not find a registered class for "
           "simple_add_dataset_domain'. Run with --debug to get full trace\n"
           '[2022-10-13 15:49:43,510] INFO     {datahub.entrypoints:191} - DataHub CLI version: 0.8.42 at '
           '/tmp/datahub/ingest/venv-s3-0.8.42/lib/python3.10/site-packages/datahub/__init__.py\n',
           "2022-10-13 15:49:45.079651 [exec_id=a3eb1cad-f8f4-4b19-a6c8-9429d46af126] INFO: Failed to execute 'datahub ingest'",
           '2022-10-13 15:49:45.079831 [exec_id=a3eb1cad-f8f4-4b19-a6c8-9429d46af126] INFO: Caught exception EXECUTING '
           'task_id=a3eb1cad-f8f4-4b19-a6c8-9429d46af126, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 203, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
simple_add_dataset_ownership: On this one, I'm not tied into Okta or anything yet as I'm still just testing, so the users and groups were created directly in datahub, so I'm not sure if that will impact the urns. recipe:
Copy code
source:
    type: s3
    config:
        profiling:
            enabled: false
        path_specs:
            -
                include: '<s3://dev-presentation/Study/Combined/*.*>'
        env: PROD
        aws_config:
            aws_access_key_id: '${AWS_ACCESS_KEY_ID_JR}'
            aws_secret_access_key: '${AWS_SECRET_KEY_JR}'
            aws_session_token: '${AWS_SESSION_TOKEN_JR}'
            aws_region: us-east-1
pipeline_name: 'urn:li:dataHubIngestionSource:61dcc24b-824c-4b60-858b-bd309e51c81a'
transformers:
    -
        tyoe: simple_add_dataset_ownership
        config:
            owner_urns:
                - 'urn:li:corpuser:accc8zz'
                - 'urn:li:corpGroup:Admin'
error:
Copy code
~~~~ Execution Summary ~~~~

RUN_INGEST - {'errors': [],
'exec_id': 'c2781634-4839-4d2e-a879-43a4350fe512',
'infos': ['2022-10-13 16:23:14.921852 [exec_id=c2781634-4839-4d2e-a879-43a4350fe512] INFO: Starting execution for task with name=RUN_INGEST',
           '2022-10-13 16:23:18.959750 [exec_id=c2781634-4839-4d2e-a879-43a4350fe512] INFO: stdout=venv setup time = 0\n'
           'This version of datahub supports report-to functionality\n'
           'datahub  ingest run -c /tmp/datahub/ingest/c2781634-4839-4d2e-a879-43a4350fe512/recipe.yml --report-to '
           '/tmp/datahub/ingest/c2781634-4839-4d2e-a879-43a4350fe512/ingestion_report.json\n'
           '[2022-10-13 16:23:17,095] INFO     {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.42\n'
           '2 validation errors for PipelineConfig\n'
           'transformers -> 0 -> type\n'
           '  field required (type=value_error.missing)\n'
           'transformers -> 0 -> tyoe\n'
           '  extra fields not permitted (type=value_error.extra)\n',
           "2022-10-13 16:23:18.959978 [exec_id=c2781634-4839-4d2e-a879-43a4350fe512] INFO: Failed to execute 'datahub ingest'",
           '2022-10-13 16:23:18.961421 [exec_id=c2781634-4839-4d2e-a879-43a4350fe512] INFO: Caught exception EXECUTING '
           'task_id=c2781634-4839-4d2e-a879-43a4350fe512, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 203, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
b
hey @astonishing-kite-41577! when you run
datahub --version
what do you get?
ah i'm seeing 0.8.42 in your first error message. that dataset transformer for domains was actually added in the very next version! so if you upgrade your CLI this might fix the issue
a
I'll give that a try and see if it works, thanks Chris!
b
of course! let me know how it goes when you get to it
a
That did it, really appreciate your help!
b
glad to hear it!