brainy-crayon-53549
07/04/2022, 6:13 AMproud-baker-56489
07/04/2022, 6:55 AMbland-smartphone-67838
07/04/2022, 6:55 AMproud-baker-56489
07/04/2022, 6:56 AMmysterious-nail-70388
07/04/2022, 8:08 AMmysterious-nail-70388
07/04/2022, 9:52 AMgray-hair-27030
07/04/2022, 9:39 PMhello, I'm trying to make the airflow connection with datahub, but when importing the dag it throws a library error. I already installed an acryl-datahub[airflow]==0.8.40.2 library in the worker and webserver container, but it keeps throwing me the problem, am I missing another library?
magnificent-camera-71872
07/05/2022, 7:37 AMbillions-twilight-48559
07/05/2022, 12:41 PMastonishing-byte-5433
07/05/2022, 1:06 PMbland-balloon-48379
07/05/2022, 3:56 PMgray-architect-29447
07/06/2022, 1:37 AMsource:
type: mssql
config:
# Coordinates
host_port: '192.168.1.1:1433'
database: 'prod-db-2'
scheme: 'PROD-DB2'
# Credentials
username: db2admin
password: "password*"
sink:
type: "datahub-rest"
config:
server: "<http://localhost:6080>"
transformers:
- type: "simple_add_dataset_tags"
config:
tag_urns:
- "urn:li:tag:db2"
ArgumentError: Could not parse rfc1738 URL from string '<PROD-DB2://db2admin:password%2A@192.168.1.1:1433/prod-db-2>'
[2022-07-06 01:26:39,723] INFO {datahub.entrypoints:176} - DataHub CLI version: 0.8.34.2 at /usr/local/lib/python3.8/dist-packages/datahub/__init__.py
[2022-07-06 01:26:39,723] INFO {datahub.entrypoints:179} - Python version: 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] at /usr/bin/python3 on Linux-5.4.0-104-generic-x86_64-with-glibc2.29
[2022-07-06 01:26:39,723] INFO {datahub.entrypoints:182} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.34', 'commit': 'f847fa31c9010bbb9df0d13ae7660e59083ea03e'}}, 'managedIngestion': {'defaultCliVersion': '0.8.34.1', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'noCode': 'true'}
brash-sundown-77702
07/06/2022, 5:07 AMbright-cpu-56427
07/06/2022, 5:41 AMbrash-sundown-77702
07/06/2022, 5:51 AM"Connect to ipv4#127.0.0.1:9092 failed: Connection refused (after 0ms in state CONNECT, 4 identical error(s) suppressed)."
Looks like it is trying to connect to localhost 9092 instead of remote server's kafka. Here is my recipe .yaml source:
type: "mysql"
config:
env: "DEV"
username: datahub
password: datahub
host_port: <RomoteIPAddr>:3306
sink:
type: "datahub-kafka"
config:
connection:
bootstrap: "<RomoteIPAddr>:9092"
schema_registry_url: "http//<RomoteIPAddr>8081"brash-sundown-77702
07/06/2022, 5:53 AMbrash-sundown-77702
07/06/2022, 5:53 AMlemon-zoo-63387
07/06/2022, 6:08 AM" 'CRM.crmqas_FE.systemuser': ['unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type BIT() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema'],\n"
" 'CRM.crmqas_FE.transactioncurrency': ['unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema',\n"
" 'unable to map type UNIQUEIDENTIFIER() to metadata schema']},\n"
' \'failures\': {\'DELTA\\\\EDITH.CE.CHANG\': ["Tables error: (pytds.tds_base.OperationalError) Database \'DELTA\\\\EDITH.CE\' does not '
'exist. Make sure that "\n'
" 'the name is entered correctly.\\n'\n"
" '[SQL: use [DELTA\\\\EDITH.CE]]\\n'\n"
" '(Background on this error at: <http://sqlalche.me/e/13/e3q8)>',\n"
' "Views error: (pytds.tds_base.OperationalError) Database \'DELTA\\\\EDITH.CE\' does not exist. '
'Make sure that "\n'
" 'the name is entered correctly.\\n'\n"
" '[SQL: use [DELTA\\\\EDITH.CE]]\\n'\n"
" '(Background on this error at: <http://sqlalche.me/e/13/e3q8)']>},\n"
" 'cli_version': '0.8.38',\n"
" 'cli_entry_location': '/tmp/datahub/ingest/venv-73071ee2-6365-4acf-b3e5-8fcaa08684dd/lib/python3.9/site-packages/datahub/__init__.py',\n"
" 'py_version': '3.9.9 (main, Dec 21 2021, 10:03:34) \\n[GCC 10.2.1 20210110]',\n"
" 'py_exec_path': '/tmp/datahub/ingest/venv-73071ee2-6365-4acf-b3e5-8fcaa08684dd/bin/python3',\n"
" 'os_details': 'Linux-3.10.0-1160.62.1.el7.x86_64-x86_64-with-glibc2.31',\n"
" 'tables_scanned': 4,\n"
" 'views_scanned': 167,\n"
" 'entities_profiled': 0,\n"
" 'filtered': [],\n"
" 'soft_deleted_stale_entities': [],\n"
" 'query_combiner': None}\n"
'Sink (datahub-rest) report:\n'
"{'records_written': 755,\n"
" 'warnings': [],\n"
" 'failures': [],\n"
" 'downstream_start_time': datetime.datetime(2022, 6, 25, 4, 0, 7, 991097),\n"
" 'downstream_end_time': datetime.datetime(2022, 6, 25, 4, 0, 32, 45487),\n"
" 'downstream_total_latency_in_seconds': 24.05439,\n"
" 'gms_version': 'v0.8.38'}\n"
'\n'
'Pipeline finished with 2 failures in source producing 755 workunits\n',
"2022-06-25 04:00:55.833939 [exec_id=73071ee2-6365-4acf-b3e5-8fcaa08684dd] INFO: Failed to execute 'datahub ingest'",
'2022-06-25 04:00:55.834347 [exec_id=73071ee2-6365-4acf-b3e5-8fcaa08684dd] INFO: Caught exception EXECUTING '
'task_id=73071ee2-6365-4acf-b3e5-8fcaa08684dd, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
busy-waiter-6669
07/06/2022, 6:12 AMmagnificent-camera-71872
07/06/2022, 6:42 AMGET
the dataset and dump it to JSON, and then use POST
action to recreate. However it seems the format of the JSON delivered by GET
is considerably different than that required by PUT
???
Does anyone have a simple method of saving an entity and recreating it ?wonderful-egg-79350
07/06/2022, 7:49 AMlate-bear-87552
07/06/2022, 9:30 AMbitter-dusk-52400
07/06/2022, 9:36 AM{"tbl_lgcl_name_eng":"Stock Header","tbl_lgcl_name_lcl":"ストックヘッダ","tbl_desc":"entity name:Stock Header\\nlayer:cleansed\\nnote:Cleansed table of loaded.t_r_oiv_stk_hdr_jp\\n\\n#interface_logical_name_english:Stock Information Raw Data"}}
metadatachangeproposalwrapper to string:
MetadataChangeProposalWrapper(entityType=dataset, entityUrn=urn:li:dataset:(urn:li:dataPlatform:bigquery,dataset.bq.datalake.stg_dataset.table,STG), changeType=UPSERT, aspect={externalUrl="", customProperties={created_time=2022-02-24 10:47:08.296, created_by=event_driven}, description={"tbl_lgcl_name_eng":"Stock Header","tbl_lgcl_name_lcl":"ストックヘッダ","tbl_desc":"entity name:Stock Header\\nlayer:cleansed\\nnote:Cleansed table of loaded.t_r_oiv_stk_hdr_jp\\n\\n#interface_logical_name_english:Stock Information Raw Data"}}, aspectName=datasetProperties)
better-orange-49102
07/06/2022, 11:07 AMlate-bear-87552
07/06/2022, 11:33 AMsource:
type: mysql
config:
host_port: 'X.X.X.X:3306'
username: x
password: x
platform: test-pattern
include_tables: true
include_views: true
schema_pattern:
ignoreCase: true
allow:
- dp_datahub
table_pattern:
ignoreCase: true
allow:
- 'dp_datahub.stocks_bse*'
profiling:
enabled: true
bigquery_temp_table_schema: abc.datahub
turn_off_expensive_profiling_metrics: false
query_combiner_enabled: false
max_number_of_fields_to_profile: 2
profile_table_level_only: false
include_field_null_count: true
include_field_min_value: true
include_field_max_value: true
include_field_mean_value: true
include_field_median_value: true
include_field_stddev_value: false
include_field_quantiles: false
include_field_distinct_value_frequencies: false
include_field_histogram: false
include_field_sample_values: false
allow_deny_patterns:
allow:
- dp_datahub.stocks_bse_ci_test.date
sink:
type: datahub-rest
config:
server: '<http://X.X.X.X:8080>'
square-solstice-69079
07/06/2022, 11:49 AMfew-air-56117
07/06/2022, 11:52 AMhandsome-alarm-6227
07/06/2022, 4:15 PMchilly-gpu-46080
07/07/2022, 4:02 AMbest-umbrella-24804
07/07/2022, 4:09 AM