gentle-plastic-92802
02/01/2023, 8:25 PMplain-france-42647
02/01/2023, 9:13 PMf = open('/bla.json', 'rt')
s = f.read()
f.close()
d = json.loads(s)
The problem is that d
is now a list of dictionaries, that belong to different types (e.g. ChartSnapshot, etc.) - and i need for each item in d to check its type (how to do that?) and then use the specific extractor. Is there any generic code that can easily do this?
(FTR - in my specific case, i have the result of running ingestion for Tableau)magnificent-lock-58916
02/02/2023, 4:33 AMplain-cricket-83456
02/02/2023, 6:47 AMfresh-zoo-34934
02/02/2023, 9:15 AM2
I know that there is a grapql UpdateLineageInput
to update the lineage and we probably can set the Trino tables to archive using BatchDatasetUpdateInput
, but I don’t know whether this is the best solution because there is no batch UpdateLineageInput
best-dawn-94548
02/02/2023, 9:42 AMaloof-egg-97140
02/02/2023, 10:04 AMbest-umbrella-88325
02/02/2023, 12:43 PMpython -m build
followed by
python -m pip install .
to install the datahub cli locally, in the metadata-ingestion directory.
I see the 0.0.0.dev0 version of datahub getting installed as well.
However when I run the command datahub ingest -c file.yaml or datahub version, it fails with the following error.
Traceback (most recent call last):
File "C:\XXX\XXX\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\XXX\XXX\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\XXX\XXX\AppData\Local\Programs\Python\Python310\Scripts\datahub.exe\__main__.py", line 4, in <module>
File "C:\XXX\XXX\datahub\datahub\metadata-ingestion\src\datahub\entrypoints.py", line 12, in <module>
from datahub.cli.check_cli import check
File "C:\XXX\XXX\datahub\datahub\metadata-ingestion\src\datahub\cli\check_cli.py", line 7, in <module>
from datahub.cli.json_file import check_mce_file
File "C:\XXX\XXX\datahub\datahub\metadata-ingestion\src\datahub\cli\json_file.py", line 3, in <module>
from datahub.ingestion.source.file import GenericFileSource
File "C:\XXX\XXX\datahub\datahub\metadata-ingestion\src\datahub\ingestion\source\file.py", line 17, in <module>
from datahub.emitter.mcp import MetadataChangeProposalWrapper
File "C:\XXX\XXX\datahub\datahub\metadata-ingestion\src\datahub\emitter\mcp.py", line 5, in <module>
from datahub.emitter.aspect import ASPECT_MAP, TIMESERIES_ASPECT_MAP
File "C:\XXX\XXX\datahub\datahub\metadata-ingestion\src\datahub\emitter\aspect.py", line 1, in <module>
from datahub.metadata.schema_classes import ASPECT_CLASSES
ModuleNotFoundError: No module named 'datahub.metadata'
Can someone help me here as to what is going wrong? Thanks in advance!delightful-orange-22738
02/02/2023, 1:49 PMelegant-salesmen-99143
02/02/2023, 6:43 PMtype: pattern_add_dataset_schema_terms
config:
semantics: OVERWRITE
term_pattern:
rules:
first_name: ['urn:li:glossaryTerm:XXX']
It adds term XXX to the field first_name
within a table. And lets say I want to add this term to the whole table, in a type: "pattern_add_dataset_terms"
transformer type, how do I do that? So far the ways I've tried to wtite it didn't work...rich-state-73859
02/02/2023, 10:16 PMdocument_missing_exception
from Elasticsearch when checking gms logs.elegant-salesmen-99143
02/03/2023, 8:57 AMbest-wire-59738
02/03/2023, 11:42 AMmicroscopic-twilight-7661
02/03/2023, 2:18 PM*.proto
source file that contains multiple non-nested messages.
Is there a way to specify which message to emit or even multiple messages?brainy-intern-50400
02/03/2023, 4:44 PMRemove edge not supported by Neo4JGraphService at this time.
great-kangaroo-88413
02/03/2023, 10:21 PM{
"topic": "data.now",
"partition": 0,
"offset": 2000,
"tstype": "create",
"ts": 1675455920363,
"broker": 1,
"key": null,
"payload": "{\"id\":1,\"first_name\":\"Zachariah\",\"last_name\":\"Wiffield\",\"email\":\"<mailto:zwiffield0@amazonaws.com|zwiffield0@amazonaws.com>\",\"gender\":\"Male\",\"ip_address\":\"176.189.152.5\"}"
}
{
"topic": "data.now",
"partition": 0,
"offset": 2001,
"tstype": "create",
"ts": 1675455920363,
"broker": 1,
"key": null,
"payload": "{\"id\":2,\"first_name\":\"Hilton\",\"last_name\":\"Siverns\",\"email\":\"<mailto:hsiverns1@csmonitor.com|hsiverns1@csmonitor.com>\",\"gender\":\"Male\",\"ip_address\":\"216.205.159.252\"}"
}
This is my source
source:
type: kafka
config:
connection:
consumer_config:
security.protocol: PLAINTEXT
bootstrap: '<http://kafka0.com:9094,kafka1.com:9094,kafka2.com:9094|kafka0.com:9094,kafka1.com:9094,kafka2.com:9094>'
schema_registry_url: '<http://dh-cp-schema-registry:8081>'
stateful_ingestion:
enabled: false
topic_patterns:
allow:
- data.now
sink:
type: datahub-rest
config:
server: '<http://datahub-datahub-gms.mynamespace.svc.cluster.local:8080>'
I get this message The schema registry subject for the value schema is not found. The topic is either schema-less, or no messages have been written to the topic yet.
This is what I end up with. What am I missing?rhythmic-glass-37647
02/03/2023, 11:04 PMsource:
type: tableau
config:
ingest_owner: true
connect_uri: '<https://mytableau.mycompany.com>'
ssl_verify: true
token_name: datahub
token_value: 'mytoken'
ingest_tags: true
pipeline_name: 'urn:li:dataHubIngestionSource:ba12380e-7fc1-425e-9783-88ada4ab8b61'
bitter-evening-61050
02/06/2023, 6:43 AMplain-cricket-83456
02/06/2023, 7:37 AMfresh-balloon-59613
02/06/2023, 8:03 AMconn_id
='datahub_rest' : generic//*'<http//2|http>//*******' . But from datahub platform I am not able to see the DAGS lineage in pipelinesbetter-state-74960
02/06/2023, 8:23 AMbetter-state-74960
02/06/2023, 8:25 AMsteep-fountain-54482
02/06/2023, 10:31 AMsteep-fountain-54482
02/06/2023, 10:31 AMThis entity is not discoverable via search or lineage graph. Contact your DataHub admin for more information.
steep-fountain-54482
02/06/2023, 10:32 AMsteep-fountain-54482
02/06/2023, 10:33 AMsquare-yak-42039
02/06/2023, 11:33 AMelegant-salesmen-99143
02/06/2023, 1:17 PMsource:
type: presto
config:
host_port: 'XXX'
database: hive
username: hive
include_views: false
include_tables: false
profiling:
enabled: true
profile_table_level_only: true
include_field_sample_values: true
schema_pattern:
allow:
- sandbox_data
stateful_ingestion:
enabled: true
transformers:
-
type: set_dataset_browse_path
config:
replace_existing: true
path_templates:
- /ENV/PLATFORM/DATASET_PARTS
(the transformer in recipe helped to start display tables in Dataset view, without it they weren't shown in it as well, just like in a Platform view)crooked-carpet-28986
02/06/2023, 2:18 PMgentle-plastic-92802
02/06/2023, 7:00 PM