steep-laptop-41463
09/01/2022, 9:19 AMsteep-laptop-41463
09/01/2022, 9:19 AM[2022-09-01 09:16:21,514] ERROR {datahub.ingestion.run.pipeline:54} - failed to write record with workunit lineage-urn:li:dataset:(urn:li:dataPlatform:kafka,topic3,DEV) with Expecting value: line 1 column 1 (char 0) and info {}
[2022-09-01 09:16:21,522] ERROR {datahub.ingestion.run.pipeline:54} - failed to write record with workunit lineage-urn:li:dataset:(urn:li:dataPlatform:kafka,topic2,DEV) with Expecting value: line 1 column 1 (char 0) and info {}
[2022-09-01 09:16:21,528] INFO {datahub.cli.ingest_cli:143} - Finished metadata ingestion
Cli report:
{'cli_entry_location': '/usr/local/lib/python3.7/site-packages/datahub/__init__.py',
'cli_version': '0.8.43.6',
'os_details': 'Linux-5.10.130-118.517.amzn2.x86_64-x86_64-with-glibc2.2.5',
'py_exec_path': '/usr/bin/python3',
'py_version': '3.7.10 (default, Jun 3 2021, 00:02:01) \n[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)]'}
Source (datahub-lineage-file) report:
{'event_ids': ['lineage-urn:li:dataset:(urn:li:dataPlatform:kafka,topic3,DEV)', 'lineage-urn:li:dataset:(urn:li:dataPlatform:kafka,topic2,DEV)'],
'events_produced': '2',
'events_produced_per_sec': '0',
'failures': {},
'read_rate': '0',
'running_time_in_seconds': '0',
'start_time': '2022-09-01 09:16:21.486920',
'warnings': {}}
Sink (datahub-rest) report:
{'current_time': '2022-09-01 09:16:21.695982',
'failures': [{'e': 'Expecting value: line 1 column 1 (char 0)'}, {'e': 'Expecting value: line 1 column 1 (char 0)'}],
'gms_version': 'v0.8.43',
'pending_requests': '0',
'records_written_per_second': '0',
'start_time': '2022-09-01 09:16:20.645484',
'total_duration_in_seconds': '1.05',
'total_records_written': '0',
'warnings': []}
better-orange-49102
09/01/2022, 9:41 AMsource:
type: datahub-lineage-file
config:
# Coordinates
file: /path/to/file_lineage.yml
# Whether we want to query datahub-gms for upstream data
preserve_upstream: False
sink:
# sink configs
better-orange-49102
09/01/2022, 9:41 AMsteep-laptop-41463
09/01/2022, 9:47 AM---
version: 1
lineage:
- entity:
name: topic3
type: dataset
env: DEV
platform: kafka
upstream:
- entity:
name: topic2
type: dataset
env: DEV
platform: kafka
- entity:
name: topic1
type: dataset
env: DEV
platform: kafka
- entity:
name: topic2
type: dataset
env: DEV
platform: kafka
upstream:
- entity:
name: kafka.topic2
env: PROD
platform: snowflake
platform_instance: test
type: dataset
than i create file example_linage.yml (config?)
source:
type: datahub-lineage-file
config:
# Coordinates
file: /opt/datahub/gms_data/file_linage.yml
# Whether we want to query datahub-gms for upstream data
preserve_upstream: False
sink:
type: datahub-rest
config:
server: '<http://localhost:8080>'
and than i launch
datahub ingest -c example_linage.yml (using config file^ as I thought) what am I doing wrong?better-orange-49102
09/01/2022, 9:49 AM