numerous-account-62719
09/02/2022, 7:41 AM~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': '918dc5d5-c95c-4051-ad58-0867a0bc89f8',
'infos': ['2022-09-01 14:06:39.052519 [exec_id=918dc5d5-c95c-4051-ad58-0867a0bc89f8] INFO: Starting execution for task with name=RUN_INGEST',
'2022-09-01 14:07:05.133419 [exec_id=918dc5d5-c95c-4051-ad58-0867a0bc89f8] INFO: stdout=Requirement already satisfied: pip in '
'/tmp/datahub/ingest/venv-918dc5d5-c95c-4051-ad58-0867a0bc89f8/lib/python3.9/site-packages (21.2.4)\n'
'WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fce28cfaf70>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/pip/\n"
'WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fce28cfaeb0>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/pip/\n"
'WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fce28cfadf0>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/pip/\n"
'WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fce28cfad00>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/pip/\n"
'WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fce28d00a60>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/pip/\n"
'WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fce28cc2e20>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/wheel/\n"
'WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fce28cc6070>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/wheel/\n"
'WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fce28cc6220>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/wheel/\n"
'WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fce28cc63d0>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/wheel/\n"
'WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fce28cc6580>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/wheel/\n"
'ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)\n'
'ERROR: No matching distribution found for wheel\n'
'WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fbd10d26f40>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/acryl-datahub/\n"
'WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fbd10d42190>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/acryl-datahub/\n"
'WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fbd10d42340>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/acryl-datahub/\n"
'WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fbd10d424f0>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/acryl-datahub/\n"
'WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by '
"'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fbd10d261c0>: Failed to establish a new connection: "
"[Errno -3] Temporary failure in name resolution')': /simple/acryl-datahub/\n"
'ERROR: Could not find a version that satisfies the requirement acryl-datahub[datahub-rest,oracle]==0.8.41 (from versions: none)\n'
'ERROR: No matching distribution found for acryl-datahub[datahub-rest,oracle]==0.8.41\n'
'/tmp/datahub/ingest/venv-918dc5d5-c95c-4051-ad58-0867a0bc89f8/bin/python3: No module named datahub\n',
"2022-09-01 14:07:05.133542 [exec_id=918dc5d5-c95c-4051-ad58-0867a0bc89f8] INFO: Failed to execute 'datahub ingest'",
'2022-09-01 14:07:05.140360 [exec_id=918dc5d5-c95c-4051-ad58-0867a0bc89f8] INFO: Caught exception EXECUTING '
'task_id=918dc5d5-c95c-4051-ad58-0867a0bc89f8, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
some-hairdresser-53679
09/02/2022, 11:09 AMenough-monitor-24292
09/02/2022, 11:34 AMripe-tiger-90198
09/02/2022, 12:11 PM~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': '5d21d1f2-21ba-48d5-abcb-1099d069f959',
'infos': ['2022-09-02 12:02:55.086878 [exec_id=5d21d1f2-21ba-48d5-abcb-1099d069f959] INFO: Starting execution for task with name=RUN_INGEST',
'2022-09-02 12:03:01.329263 [exec_id=5d21d1f2-21ba-48d5-abcb-1099d069f959] INFO: stdout=Elapsed seconds = 0\n'
' --report-to TEXT Provide an output file to produce a\n'
'This version of datahub supports report-to functionality\n'
'datahub ingest run -c /tmp/datahub/ingest/5d21d1f2-21ba-48d5-abcb-1099d069f959/recipe.yml --report-to '
'/tmp/datahub/ingest/5d21d1f2-21ba-48d5-abcb-1099d069f959/ingestion_report.json\n'
'[2022-09-02 12:02:57,138] INFO {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.42\n'
'[2022-09-02 12:02:57,197] INFO {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured '
'to talk to <http://datahub-gms:8080>\n'
"[2022-09-02 12:02:59,501] INFO {datahub.ingestion.source.sql.sql_common:284} - Applying table_pattern {'allow': ['.*\\\\.tracks']} "
'to view_pattern.\n'
'[2022-09-02 12:02:59,501] ERROR {datahub.ingestion.run.pipeline:127} - 1 validation error for BigQueryConfig\n'
'include_view_lineage\n'
' extra fields not permitted (type=value_error.extra)\n'
'[2022-09-02 12:02:59,502] INFO {datahub.cli.ingest_cli:119} - Starting metadata ingestion\n'
'[2022-09-02 12:02:59,502] INFO {datahub.cli.ingest_cli:137} - Finished metadata ingestion\n'
"[2022-09-02 12:03:00,041] ERROR {datahub.entrypoints:188} - Command failed with 'Pipeline' object has no attribute 'source'. Run with "
'--debug to get full trace\n'
'[2022-09-02 12:03:00,041] INFO {datahub.entrypoints:191} - DataHub CLI version: 0.8.42 at '
'/tmp/datahub/ingest/venv-bigquery-0.8.42/lib/python3.9/site-packages/datahub/__init__.py\n',
"2022-09-02 12:03:01.331049 [exec_id=5d21d1f2-21ba-48d5-abcb-1099d069f959] INFO: Failed to execute 'datahub ingest'",
'2022-09-02 12:03:01.332045 [exec_id=5d21d1f2-21ba-48d5-abcb-1099d069f959] INFO: Caught exception EXECUTING '
'task_id=5d21d1f2-21ba-48d5-abcb-1099d069f959, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 142, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
limited-forest-73733
09/02/2022, 1:29 PMlimited-forest-73733
09/02/2022, 1:27 PMlimited-forest-73733
09/02/2022, 1:28 PMadamant-rain-51672
09/02/2022, 4:08 PMcool-gpu-73611
09/03/2022, 6:01 AMenough-monitor-24292
09/03/2022, 6:30 AMmillions-sundown-65420
09/03/2022, 7:14 AM22/09/02 10:16:00 ERROR DatahubSparkListener: java.lang.NullPointerException
at datahub.spark.DatahubSparkListener$3.apply(DatahubSparkListener.java:258)
at datahub.spark.DatahubSparkListener$3.apply(DatahubSparkListener.java:254)
at scala.Option.foreach(Option.scala:407)
at datahub.spark.DatahubSparkListener.processExecutionEnd(DatahubSparkListener.java:254)
at datahub.spark.DatahubSparkListener.onOtherEvent(DatahubSparkListener.java:241)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at <http://org.apache.spark.scheduler.AsyncEventQueue.org|org.apache.spark.scheduler.AsyncEventQueue.org>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1381)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
aloof-oil-31167
09/04/2022, 1:55 PMinlets
or outlets
inside a DAG, the datahub-airflow plugin will create these datasets and lineage these non-existing datasets
is there an option to make it skip on datasets who aren’t exist?melodic-beach-18239
09/05/2022, 3:40 AMbetter-orange-49102
09/05/2022, 5:41 AMcurr_policy = graph.get_aspect_v2(
entity_urn=policy_urn,
aspect="dataHubPolicyInfo",
aspect_type=DataHubPolicyInfoClass,
)
I always get the error message:
File "/home/*redacted*/datahub/*redacted*/policy.py", line 54, in <module>
curr_policy = graph.get_aspect_v2(
File "/home/*redacted*/datahub/metadata-ingestion/src/datahub/ingestion/graph/client.py", line 171, in get_aspect_v2
return aspect_type.from_obj(post_json_obj)
File "/home/*redacted*/miniconda3/envs/*redacted*/lib/python3.9/site-packages/avrogen/dict_wrapper.py", line 41, in from_obj
return conv.from_json_object(obj, cls.RECORD_SCHEMA)
File "/home/*redacted*/miniconda3/envs/*redacted*/lib/python3.9/site-packages/avrogen/avrojson.py", line 104, in from_json_object
return self._generic_from_json(json_obj, writers_schema, readers_schema)
File "/home/*redacted*/miniconda3/envs/*redacted*/lib/python3.9/site-packages/avrogen/avrojson.py", line 257, in _generic_from_json
result = self._record_from_json(json_obj, writers_schema, readers_schema)
File "/home/*redacted*/miniconda3/envs/*redacted*/lib/python3.9/site-packages/avrogen/avrojson.py", line 345, in _record_from_json
field_value = self._generic_from_json(json_obj[field.name], writers_field.type, field.type)
File "/home/*redacted*/miniconda3/envs/*redacted*/lib/python3.9/site-packages/avrogen/avrojson.py", line 255, in _generic_from_json
result = self._union_from_json(json_obj, writers_schema, readers_schema)
File "/home/*redacted*/miniconda3/envs/*redacted*/lib/python3.9/site-packages/avrogen/avrojson.py", line 314, in _union_from_json
raise schema.AvroException('Datum union type not in schema: %s', value_type)
avro.schema.AvroException: ('Datum union type not in schema: %s', 'filter')
any idea what causes this?
What is weird is that it can be overcame by going to the policy, adding another entity, saving it, and then undoing it again from UI. Then querying the policy again will not have the same error. Almost like there was something missing the first time round when it was created...
I was trying to query all my policies and store them as a json file (backup)flat-painter-78331
09/05/2022, 7:01 AMdatahub_lineage_task_1 = BashOperator(
task_id="extract_data",
dag= dag,
inlets=[Dataset("mysql", "extract_sql.dag")],
outlets=[Dataset("s3", "test-project-100/project_100/table_01")],
bash_command="echo Dummy Task 1",
)
datahub_lineage_task_2 = BashOperator(
task_id="load_data",
dag= dag,
inlets=[Dataset("s3", "test-project-100/project_100/table_01")],
outlets=[Dataset("bigquery", "project-test.tb_bq_datahub")],
bash_command="echo Dummy Task 2",
)
enough-monitor-24292
09/05/2022, 8:22 AMbumpy-journalist-41369
09/05/2022, 8:49 AM'[2022-09-05 08:32:57,596] ERROR {datahub.entrypoints:188} - Command failed with An error occurred (AccessDenied) when calling the '
'ListObjects operation: Access Denied. Run with --debug to get full trace\n'
I have created an iamserviceaccount associated with the kubernetes cluster called acryl-datahub-actions, having an the following policy:
{
“Version”: “2012-10-17",
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“s3:*”
],
“Resource”: [
“arnawss3:::cdca-dev-us-east-1-product-metrics”,
“arnawss3:::cdca-dev-us-east-1-product-metrics/*”
]
}
]
}
The receipe that I am trying is the following:
sink:
type: datahub-rest
config:
server: ‘http://datahub-datahub-gms:8080’
source:
type: s3
config:
profiling:
enabled: false
path_spec:
include: ‘s3://my-bucket/table/sh_date=2021-06-23/test.parquet’
env: DEV
aws_config:
aws_region: us-east-1
P.S in the policy I have given all the permission for S3, which will eventually I will narrow down.adamant-rain-51672
09/05/2022, 9:48 AMmysterious-dress-35051
09/05/2022, 10:51 AM"AttributeError: 'Insert' object has no attribute 'columns'\n"
'[2022-08-31 09:46:52,291] ERROR {datahub.utilities.sqlalchemy_query_combiner:249} - Failed to execute query normally, using '
'fallback: \n'
'CREATE TABLE "#ge_temp_dbf5dfdd" (\n'
'\tcondition INTEGER NOT NULL\n'
')\n'
'\n'
I found this question in the history of Slack, but there is no answer there. Could you help me with this problem?🙏bumpy-journalist-41369
09/05/2022, 11:15 AMmelodic-beach-18239
09/05/2022, 3:40 AMfull-chef-85630
09/05/2022, 1:35 PMadamant-rain-51672
09/05/2022, 7:32 PM~~~~ Execution Summary ~~~~
RUN_INGEST - {'errors': [],
'exec_id': '894f9189-bbb0-4d44-8dfb-2a7056fd6e65',
'infos': ['2022-09-05 19:28:16.998654 [exec_id=894f9189-bbb0-4d44-8dfb-2a7056fd6e65] INFO: Starting execution for task with name=RUN_INGEST',
'2022-09-05 19:28:40.653581 [exec_id=894f9189-bbb0-4d44-8dfb-2a7056fd6e65] INFO: stdout=Requirement already satisfied: pip in '
'/tmp/datahub/ingest/venv-894f9189-bbb0-4d44-8dfb-2a7056fd6e65/lib/python3.9/site-packages (21.2.4)\n'
[...PACKAGE INSTALLATION...]
'[2022-09-05 19:28:39,965] INFO {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured '
'to talk to <http://datahub-datahub-gms:8080>\n'
'[2022-09-05 19:28:40,128] INFO {datahub.cli.ingest_cli:119} - Starting metadata ingestion\n'
'[2022-09-05 19:28:40,129] INFO {datahub.cli.ingest_cli:123} - Source (okta) report:\n'
"{'workunits_produced': '0',\n"
" 'workunit_ids': [],\n"
" 'warnings': {},\n"
" 'failures': {},\n"
" 'cli_version': '0.8.43',\n"
" 'cli_entry_location': '/tmp/datahub/ingest/venv-894f9189-bbb0-4d44-8dfb-2a7056fd6e65/lib/python3.9/site-packages/datahub/__init__.py',\n"
" 'py_version': '3.9.9 (main, Dec 21 2021, 10:03:34) \\n[GCC 10.2.1 20210110]',\n"
" 'py_exec_path': '/tmp/datahub/ingest/venv-894f9189-bbb0-4d44-8dfb-2a7056fd6e65/bin/python3',\n"
" 'os_details': 'Linux-5.4.209-116.363.amzn2.x86_64-x86_64-with-glibc2.31',\n"
" 'filtered': []}\n"
'[2022-09-05 19:28:40,130] INFO {datahub.cli.ingest_cli:126} - Sink (datahub-rest) report:\n'
"{'records_written': '0', 'warnings': [], 'failures': [], 'gms_version': 'v0.8.43'}\n"
'[2022-09-05 19:28:40,418] ERROR {datahub.entrypoints:188} - Command failed with There is no current event loop in thread '
"'asyncio_0'.. Run with --debug to get full trace\n"
'[2022-09-05 19:28:40,418] INFO {datahub.entrypoints:191} - DataHub CLI version: 0.8.43 at '
'/tmp/datahub/ingest/venv-894f9189-bbb0-4d44-8dfb-2a7056fd6e65/lib/python3.9/site-packages/datahub/__init__.py\n',
"2022-09-05 19:28:40.654203 [exec_id=894f9189-bbb0-4d44-8dfb-2a7056fd6e65] INFO: Failed to execute 'datahub ingest'",
'2022-09-05 19:28:40.654552 [exec_id=894f9189-bbb0-4d44-8dfb-2a7056fd6e65] INFO: Caught exception EXECUTING '
'task_id=894f9189-bbb0-4d44-8dfb-2a7056fd6e65, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
false
better-actor-97450
09/06/2022, 2:50 AMbrainy-intern-50400
09/06/2022, 7:17 AMreact-dom.production.min.js:216 Error: Unrecognized key NOTEBOOK provided in map {}
at Gt (EntityRegistry.tsx:11:11)
at e.value (EntityRegistry.tsx:153:24)
at renderItem (EntityNameList.tsx:118:53)
at index.js:143:12
at index.js:243:14
at Array.map (<anonymous>)
at A (index.js:242:33)
at ai (react-dom.production.min.js:157:137)
at Xc (react-dom.production.min.js:267:460)
at _s (react-dom.production.min.js:250:347)
here is the data i ingest: #Datahub Emitter
emitter: DatahubRestEmitter = DatahubRestEmitter(gms_server=DATAHUB_SERVER, extra_headers={},)# token=DATAHUB_API_KEY)
emitter.test_connection()
#milliseconds since epoch
now: int = int(time.time() * 1000)
current_timestamp: AuditStampClass = AuditStampClass(time=now, actor="urn:li:corpuser:ingestion")
last_modified = ChangeAuditStampsClass(current_timestamp)
inputs_notebook: List[NotebookCellClass] = [
NotebookCellClass(
type=NotebookCellTypeClass().CHART_CELL,
chartCell=ChartCellClass(
cellId="2",
changeAuditStamps=last_modified,
cellTitle="second",
),
)
]
properties: dict[str,str] = {}
properties = {
'..': '..'
}
notebook_info: NotebookInfoClass = NotebookInfoClass(
title="Janatka Notebook",
changeAuditStamps=last_modified,
customProperties=properties,
externalUrl="",
)
browse_path: BrowsePathsClass = BrowsePathsClass(
["/test/notebook/test/querybook"]
)
#notebook_key: NotebookKeyClass = NotebookKeyClass(
# notebookTool="Zeppelin",
# notebookId="Janatka_Test"
#)
notebook_urn = "urn:li:notebook:(querybook,1234)"
#Construct a MetadataChangeProposalWrapper object with the Notebook aspects.
notebook_info_mce = MetadataChangeProposalWrapper(
entityType="notebook",
changeType=ChangeTypeClass.UPSERT,
entityUrn=notebook_urn,
aspectName="notebookInfo",
aspect=notebook_info,
)
notebook_content_mce = MetadataChangeProposalWrapper(
entityType="notebook",
changeType=ChangeTypeClass.UPSERT,
entityUrn=notebook_urn,
aspectName="notebookContent",
aspect=NotebookContentClass(inputs_notebook),
)
notebook_path_mce = MetadataChangeProposalWrapper(
entityType="notebook",
changeType=ChangeTypeClass.UPSERT,
entityUrn=notebook_urn,
aspectName="browsePaths",
aspect=browse_path,
)
#Emit metadata!
emitter.emit(notebook_info_mce)
emitter.emit(notebook_content_mce)
emitter.emit(notebook_path_mce)
ancient-apartment-23316
09/06/2022, 7:47 AM'failures': [{'error': 'Unable to emit metadata to DataHub GMS',
'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class '
'com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" '
'is invalid\n'
'\n'
'\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:142)',
'message': 'Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: '
'"Provided urn urn:li:corpuser:" is invalid\n',
'status': '422'}}],
I’m used the search here and found that I must use the transformers block
What should I add?
transformers:
- type: "simple_add_dataset_ownership"
config:
owner_urns:
- "urn:li:corpuser" #like this?
adamant-rain-51672
09/06/2022, 7:48 AMsalmon-angle-92685
09/06/2022, 8:00 AMjolly-traffic-67085
09/06/2022, 8:18 AMmicroscopic-mechanic-13766
09/06/2022, 8:30 AM