red-pizza-28006
08/16/2022, 7:15 AM[2022-08-16, 01:10:09 UTC] {{taskinstance.py:1703}} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1509, in _execute_task
result = execute_callable(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python.py", line 149, in execute
self.op_kwargs = determine_kwargs(self.python_callable, self.op_args, context)
File "/usr/local/lib/python3.7/site-packages/airflow/utils/operator_helpers.py", line 111, in determine_kwargs
raise ValueError(f"The key {name} in args is part of kwargs and therefore reserved.")
ValueError: The key conn in args is part of kwargs and therefore reserved.
brave-tomato-16287
08/16/2022, 9:33 AMalert-fall-82501
08/16/2022, 1:51 PMalert-fall-82501
08/16/2022, 1:52 PMrefined-ability-35859
08/16/2022, 3:30 PMkind-whale-32412
08/16/2022, 5:29 PMDataHubGraph
For instance in Python library I can make get requests (like get_aspect_v2) whereas for Java HTTP Emitter I can't. I know that I can write my own http library to do this, but knowing that this is already supported in DataHubs own source code in DefaultRestliClientFactory
it shouldn't really be that much work to allow these operations in Java library. (can just make them available in datahub-client maven package)
Please do let me know if I missed out an easy way to get the REST client in Java (other than writing my own one or copy/pasting classes around)creamy-tent-10151
08/16/2022, 7:29 PMeager-terabyte-73886
08/16/2022, 8:55 PMstraight-agent-79732
08/17/2022, 2:49 AMstraight-agent-79732
08/17/2022, 7:37 AMeager-terabyte-73886
08/17/2022, 8:33 AMsquare-hair-99480
08/17/2022, 8:38 AMclean-monkey-7245
08/17/2022, 8:43 AMclean-monkey-7245
08/17/2022, 8:43 AM'container-urn:li:container:d0d5d558096e9a8f47ffc324b65cddc0-to-urn:li:dataset:(urn:li:dataPlatform:snowflake,max_dev.identity.hbo_adst_full_analysis,PROD)\n'
'/usr/local/bin/run_ingest.sh: line 33: 583 Killed ( datahub ingest run -c "${recipe_file}" ${report_option} )\n',
sparse-forest-98608
08/17/2022, 8:51 AMsparse-forest-98608
08/17/2022, 8:52 AMmicroscopic-mechanic-13766
08/17/2022, 9:10 AMfresh-cricket-75926
08/17/2022, 9:25 AMalert-fall-82501
08/17/2022, 9:57 AMbusy-glass-61431
08/17/2022, 10:23 AMNumberOfBuckets
, StoredAsSubDirectories
, SortColumns
in Datahub which show incorrect values. Eg: NumberOfBuckets: 0, StoredAsSubDirectories: false.. Anyone else seeing the same issue OR could point me as to what I may be missing/doing incorrectly? I have cross verified the IAM Permissions, I have given :
"Action": [
"glue:GetDatabases",
"glue:GetTables",
"glue:GetDataflowGraph",
"glue:GetJobs",
"s3:GetObject"
]
jolly-traffic-67085
08/17/2022, 10:36 AMmicroscopic-mechanic-13766
08/17/2022, 11:23 AMTypeError: __init__() got an unexpected keyword argument 'kerberos_service_name'
my recipe is the following:
sink:
type: datahub-rest
config:
server: '<http://datahub-gms:8080>'
source:
type: trino
config:
database: null
host_port: 'trino-coordinator:8080'
username: trino
options:
connect_args:
http_scheme: https
auth: KERBEROS
kerberos_service_name: trino-hive
Note: There is no port colission as the port of the gms is not exposed externally.calm-dinner-63735
08/17/2022, 11:28 AMcreamy-church-10353
08/17/2022, 12:25 PM'records_written': 0
, whereas I have defined 2 entities in lineage.yaml
. Adding below more details FYI. These entities are at root
level in the lineage tree, they have no upstream.
- lineage.yaml
file (I have tried with blank upstream: []
as well but getting the same result)
lineage:
- entity:
env: dev
name: self
platform: airflow
platform_instance: demo
type: dataset
- entity:
env: dev
name: etl_clseq_36
platform: airflow
platform_instance: demo
type: dataset
version: 1
- airflow-recipe.yaml
file
sink:
config:
server: <http://datahub-gms:8080>
token: '$DATAHUB_GMS_TOKEN'
type: datahub-rest
source:
config:
file: lineage.yaml
preserve_upstream: true
type: datahub-lineage-file
- datahub ingestion command
$ datahub ingest run -c airflow-recipe.yaml
Source (datahub-lineage-file) report:
{'workunits_produced': 0,
'workunit_ids': [],
'warnings': {},
'failures': {},
'cli_version': '0.8.33',
'cli_entry_location': 'python3.8/site-packages/datahub/__init__.py',
'py_version': '3.8.13 (default, May 8 2022, 17:52:27) \n[Clang 13.1.6 (clang-1316.0.21.2)]',
'py_exec_path': 'python3.8',
'os_details': '....'}
Sink (datahub-rest) report:
{'records_written': 0,
'warnings': [],
'failures': [],
'downstream_start_time': None,
'downstream_end_time': None,
'downstream_total_latency_in_seconds': None,
'gms_version': 'v0.8.33'}
...
I'm not getting any error log. Please point out what am I doing wrong here.few-holiday-55907
08/17/2022, 1:41 PMbland-teacher-2077
08/17/2022, 3:47 PMsimple_add_dataset_domain
available for the s3 source? I am using CLI version: 0.8.43.1 and receiving the error:
"[2022-08-17 15:44:14,225] ERROR {datahub.entrypoints:188} - Command failed with 'Did not find a registered class for "
"simple_add_dataset_domain'. Run with --debug to get full trace\n"
• Is it possible to also upload metadata in addition to the bucket and object tags?
Here's the successful recipe. The error occurs when I add the transformer type simple_add_dataset_domain
:
transformers:
-
type: simple_add_dataset_tags
config:
tag_urns:
- 'urn:li:tag:dummytagone'
- 'urn:li:tag:dummytagtwo'
-
type: simple_add_dataset_terms
config:
term_urns:
- 'urn:li:glossaryTerm:sampleterm1'
- 'urn:li:glossaryTerm:sampleterm2'
-
type: simple_add_dataset_ownership
config:
owner_urns:
- 'urn:li:corpuser:datahub'
- 'urn:li:corpGroup:Sample Group'
ownership_type: PRODUCER
sink:
type: datahub-rest
config:
server: '<http://datahub-gms:8080>'
source:
type: s3
config:
profiling:
enabled: false
use_s3_object_tags: true
use_s3_bucket_tags: true
path_specs:
-
include: 's3://*****/*****/*.*'
env: PROD
aws_config:
aws_access_key_id: *****
aws_region: *****
aws_secret_access_key: *****
alert-fall-82501
08/17/2022, 11:57 AMdazzling-insurance-83303
08/17/2022, 7:20 PMbrash-airport-6045
08/17/2022, 8:04 PMbreezy-controller-54597
08/18/2022, 5:25 AM