future-florist-65080
02/21/2023, 12:53 AMpattern_add_dataset_schema_terms
to add glossary terms to fields within a specific database schema?
I have tried using the regex .*<schema>.*<field_name>.*
, similar to the example for Pattern Add Dataset Domain. However this is not applying any glossary terms.
It seems like the regex is applying only to the field name, not to the full URN?red-action-68363
02/21/2023, 9:58 AMnumerous-account-62719
02/21/2023, 12:59 PM(urn:li:dataPlatform:postgres,inventory_data.public._v_router_interface_master,PROD)\n'
'[2023-02-21 12:20:33,827] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit '
'inventory_data.public._v_router_interface_master\n'
'[2023-02-21 12:20:33,949] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_router_interface_master-subtypes\n'
'[2023-02-21 12:20:34,027] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_router_interface_master-viewProperties\n'
'[2023-02-21 12:20:34,119] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit '
'container-urn:li:container:3bd5879a590e50509e9cf6b786d6170e-to-urn:li:dataset:(urn:li:dataPlatform:postgres,inventory_data.public._v_sitedetails,PROD)\n'
'[2023-02-21 12:20:34,237] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit inventory_data.public._v_sitedetails\n'
'[2023-02-21 12:20:34,366] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_sitedetails-subtypes\n'
'[2023-02-21 12:20:34,384] INFO {datahub.ingestion.source.ge_data_profiler:930} - Finished profiling '
'inventory_data.public.correlation_groups_master_new; took 66.890 seconds\n'
'[2023-02-21 12:20:35,687] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_sitedetails-viewProperties\n'
'[2023-02-21 12:20:35,742] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit '
'container-urn:li:container:3bd5879a590e50509e9cf6b786d6170e-to-urn:li:dataset:(urn:li:dataPlatform:postgres,inventory_data.public._v_sitelist1,PROD)\n'
'[2023-02-21 12:20:35,824] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit inventory_data.public._v_sitelist1\n'
'[2023-02-21 12:20:35,898] INFO {datahub.ingestion.source.ge_data_profiler:930} - Finished profiling '
'inventory_data.public.bbip_route_details; took 63.394 seconds\n'
'[2023-02-21 12:20:35,967] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_sitelist1-subtypes\n'
'[2023-02-21 12:20:36,087] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_sitelist1-viewProperties\n'
'[2023-02-21 12:20:36,154] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit '
'container-urn:li:container:3bd5879a590e50509e9cf6b786d6170e-to-urn:li:dataset:(urn:li:dataPlatform:postgres,inventory_data.public._v_edgelist2,PROD)\n'
'[2023-02-21 12:20:36,250] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit inventory_data.public._v_edgelist2\n'
'[2023-02-21 12:20:36,377] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_edgelist2-subtypes\n'
'[2023-02-21 12:20:36,516] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_edgelist2-viewProperties\n'
'[2023-02-21 12:20:36,618] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit '
'container-urn:li:container:3bd5879a590e50509e9cf6b786d6170e-to-urn:li:dataset:(urn:li:dataPlatform:postgres,inventory_data.public._v_sitetopology2,PROD)\n'
'[2023-02-21 12:20:36,800] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit inventory_data.public._v_sitetopology2\n'
'[2023-02-21 12:20:36,855] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_sitetopology2-subtypes\n'
'[2023-02-21 12:20:36,933] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_sitetopology2-viewProperties\n'
'[2023-02-21 12:20:37,004] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit '
'container-urn:li:container:3bd5879a590e50509e9cf6b786d6170e-to-urn:li:dataset:(urn:li:dataPlatform:postgres,inventory_data.public._v_topologygraphsites,PROD)\n'
'[2023-02-21 12:20:37,101] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit '
'inventory_data.public._v_topologygraphsites\n'
'[2023-02-21 12:20:37,226] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_topologygraphsites-subtypes\n'
'[2023-02-21 12:20:37,342] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit _v_topologygraphsites-viewProperties\n'
'[2023-02-21 12:20:37,487] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit '
'container-urn:li:container:3bd5879a590e50509e9cf6b786d6170e-to-urn:li:dataset:(urn:li:dataPlatform:postgres,inventory_data.public._v_nodetopology2,PROD)\n'
'[2023-02-21 12:20:37,633] INFO {datahub.ingestion.run.pipeline:102} - sink wrote workunit inventory_data.public._v_nodetopology2\n'
'/usr/local/bin/run_ingest.sh: line 26: 695 Killed ( python3 -m datahub ingest -c "$4/$1.yml" )\n',
"2023-02-21 12:20:39.139032 [exec_id=26e8f736-796a-430b-8987-81242e1d53b2] INFO: Failed to execute 'datahub ingest'",
'2023-02-21 12:20:39.140298 [exec_id=26e8f736-796a-430b-8987-81242e1d53b2] INFO: Caught exception EXECUTING '
'task_id=26e8f736-796a-430b-8987-81242e1d53b2, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
green-lock-62163
02/21/2023, 5:41 PMred-waitress-53338
02/21/2023, 6:09 PMacceptable-nest-20465
02/21/2023, 6:17 PMwhite-horse-97256
02/21/2023, 6:28 PMfew-library-66655
02/21/2023, 6:54 PMaws_access_key_id
, aws_secret_access_key
and aws_session_token
will get expired.
ClientError: An error occurred (ExpiredTokenException) when calling the GetDatabases operation: The security token included in the request is expired
So I have to generate a new token every time I run the execution manually. I would like to Configure an Ingestion Schedule
to sync Glue everyday, then what should I do to automatically get the latest aws tokens? Or how can I make it not expired?white-horse-97256
02/21/2023, 9:50 PMfew-library-66655
02/22/2023, 12:13 AMsource:
type: glue
config:
aws_region: us-west-2
aws_role: 'arn:aws:iam::ACCOUNT_ID:role/ROLE_NAME'
When I run this in prod, it is succeeded, and I want to test it in my local. But it failed in local with the error message
PipelineInitError: Failed to configure the source (glue): 'NoneType' object has no attribute 'access_key'
can anyone help me with that?bright-wall-5515
02/22/2023, 9:18 AM[2023-02-22 00:26:03,368] {{pod_launcher.py:156}} INFO - b' File "pydantic/validators.py", line 715, in find_validators\n'
[2023-02-22 00:26:03,368] {{pod_launcher.py:156}} INFO - b"RuntimeError: no validator found for <class 're.Pattern'>, see `arbitrary_types_allowed` in Config\n"
[2023-02-22 00:26:04,425] {{pod_launcher.py:171}} INFO - Event: workspaces-pipeline-5357b4ab2cdd4abf933f2d307a107cf2 had an event of type Failed
Can someone help to find the source of this problem, or where to look for the error please? 🙏ambitious-umbrella-5901
02/22/2023, 1:29 PMbitter-evening-61050
02/22/2023, 3:03 PMgreat-flag-53653
02/22/2023, 5:14 PMacoustic-quill-54426
02/22/2023, 6:34 PMMetadataChangeProposal
instead of MetadataChangeEvent
in our transform
. Before we could add aspects to the `proposedSnapsot.aspects`list. Do you have any tip on what we can do now? Maybe yield a new RecordEnvelope with a MCP?white-horse-97256
02/22/2023, 8:14 PMwhite-horse-97256
02/22/2023, 10:54 PMMetadataChangeProposalWrapper
and UpsertAspectRequest?
silly-intern-25190
02/23/2023, 2:19 AMrich-daybreak-77194
02/23/2023, 5:17 AMbest-notebook-58252
02/23/2023, 7:36 AMlookml
ingestion with CLI, but it’s stuck because my ssh key requires a passphrase.
Is it possible to pass it? I was trying something like echo $PASSWORD | datahub ingest -c lookml.yaml
but it doesn’t workquiet-jelly-11365
02/23/2023, 9:17 AMdatahub check plugins
Sources:
[2023-02-22 17:52:56,159] ERROR {datahub.entrypoints:225} - Command failed: code() argument 13 must be str, not int
lemon-scooter-69730
02/23/2023, 10:41 AMred-easter-85320
02/23/2023, 10:56 AMcreamy-portugal-88620
02/23/2023, 12:33 PMcreamy-portugal-88620
02/23/2023, 12:33 PM2023-02-23 12:22:16,182 INFO sqlalchemy.engine.Engine [raw sql] {}
[2023-02-23 12:22:16,182] INFO {sqlalchemy.engine.Engine:1858} - [raw sql] {}
[2023-02-23 12:22:17,543] WARNING {datahub.ingestion.source.sql.sql_common:643} - Unable to ingest sampledb.temp_batch_dlq due to an exception.
Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-athena-0.10.0/lib/python3.10/site-packages/datahub/ingestion/source/sql/sql_common.py", line 639, in loop_tables
yield from self._process_table(
File "/tmp/datahub/ingest/venv-athena-0.10.0/lib/python3.10/site-packages/datahub/ingestion/source/sql/sql_common.py", line 738, in _process_table
yield from self.add_table_to_schema_container(
File "/tmp/datahub/ingest/venv-athena-0.10.0/lib/python3.10/site-packages/datahub/ingestion/source/sql/athena.py", line 238, in add_table_to_schema_container
parent_container_key=self.get_database_container_key(db_name, schema),
File "/tmp/datahub/ingest/venv-athena-0.10.0/lib/python3.10/site-packages/datahub/ingestion/source/sql/athena.py", line 220, in get_database_container_key
assert db_name == schema
AssertionError
bitter-furniture-95993
02/23/2023, 2:08 PM[2023-02-23, 13:36:35 UTC] {logging_mixin.py:137} INFO - Exception: Traceback (most recent call last): File "/home/debian/.local/lib/python3.7/site-packages/urllib3/connection.py", line 175, in _new_conn (self._dns_host, self.port), self.timeout, **extra_kw File "/home/debian/.local/lib/python3.7/site-packages/urllib3/util/connection.py", line 95, in create_connection raise err File "/home/debian/.local/lib/python3.7/site-packages/urllib3/util/connection.py", line 85, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/debian/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen chunked=chunked, File "/home/debian/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/home/debian/.local/lib/python3.7/site-packages/urllib3/connection.py", line 239, in request super(HTTPConnection, self).request(method, url, body=body, headers=headers) File "/usr/lib/python3.7/http/client.py", line 1260, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.7/http/client.py", line 1306, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.7/http/client.py", line 1255, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.7/http/client.py", line 1030, in _send_output self.send(msg) File "/usr/lib/python3.7/http/client.py", line 970, in send self.connect() File "/home/debian/.local/lib/python3.7/site-packages/urllib3/connection.py", line 205, in connect conn = self._new_conn() File "/home/debian/.local/lib/python3.7/site-packages/urllib3/connection.py", line 187, in _new_conn self, "Failed to establish a new connection: %s" % e urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fce2cc7e828>: Failed to establish a new connection: [Errno 111] Connection refused During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/debian/.local/lib/python3.7/site-packages/requests/adapters.py", line 499, in send timeout=timeout, File "/home/debian/.local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 788, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/home/debian/.local/lib/python3.7/site-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8081): Max retries exceeded with url: /subjects/MetadataChangeProposal_v1-value/versions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce2cc7e828>: Failed to establish a new connection: [Errno 111] Connection refused')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/debian/.local/lib/python3.7/site-packages/confluent_kafka/serializing_producer.py", line 172, in produce value = self._value_serializer(value, ctx) File "/home/debian/.local/lib/python3.7/site-packages/confluent_kafka/schema_registry/avro.py", line 251, in __call__ self._schema) File "/home/debian/.local/lib/python3.7/site-packages/confluent_kafka/schema_registry/schema_registry_client.py", line 338, in register_schema body=request) File "/home/debian/.local/lib/python3.7/site-packages/confluent_kafka/schema_registry/schema_registry_client.py", line 127, in post return self.send_request(url, method='POST', body=body) File "/home/debian/.local/lib/python3.7/site-packages/confluent_kafka/schema_registry/schema_registry_client.py", line 169, in send_request headers=headers, data=body, params=query) File "/home/debian/.local/lib/python3.7/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "/home/debian/.local/lib/python3.7/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "/home/debian/.local/lib/python3.7/site-packages/requests/adapters.py", line 565, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8081): Max retries exceeded with url: /subjects/MetadataChangeProposal_v1-value/versions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce2cc7e828>: Failed to establish a new connection: [Errno 111] Connection refused')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/debian/.local/lib/python3.7/site-packages/datahub_provider/_plugin.py", line 281, in custom_on_success_callback datahub_task_status_callback(context, status=InstanceRunResult.SUCCESS) File "/home/debian/.local/lib/python3.7/site-packages/datahub_provider/_plugin.py", line 145, in datahub_task_status_callback dataflow.emit(emitter, callback=_make_emit_callback(task.log)) File "/home/debian/.local/lib/python3.7/site-packages/datahub/api/entities/datajob/dataflow.py", line 140, in emit emitter.emit(mcp, callback) File "/home/debian/.local/lib/python3.7/site-packages/datahub/emitter/kafka_emitter.py", line 119, in emit return self.emit_mcp_async(item, callback or _error_reporting_callback) File "/home/debian/.local/lib/python3.7/site-packages/datahub/emitter/kafka_emitter.py", line 150, in emit_mcp_async on_delivery=callback, File "/home/debian/.local/lib/python3.7/site-packages/confluent_kafka/serializing_producer.py", line 174, in produce raise ValueSerializationError(se) confluent_kafka.error.ValueSerializationError: KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="HTTPConnectionPool(host='localhost', port=8081): Max retries exceeded with url: /subjects/MetadataChangeProposal_v1-value/versions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce2cc7e828>: Failed to establish a new connection: [Errno 111] Connection refused'))"} [2023-02-23, 13:36:35 UTC] {local_task_job.py:159} INFO - Task exited with return code 0 [2023-02-23, 13:36:35 UTC] {taskinstance.py:2582} INFO - 0 downstream tasks scheduled from follow-on schedule check
straight-laptop-6275
02/23/2023, 2:42 PMstraight-laptop-6275
02/23/2023, 2:42 PMstraight-laptop-6275
02/23/2023, 2:52 PMlemon-scooter-69730
02/23/2023, 5:03 PM