lively-dusk-19162
02/23/2023, 5:50 PMwhite-horse-97256
02/23/2023, 7:27 PMsetInputDatajobs()
is deprecated and there is no other function for setting the job values in that class for this variable _inputDatajobsField
Where and how can i create lineage of dataset->datajob->dataset using JAVA SDK? I don't find any reference documentation either?handsome-flag-16272
02/23/2023, 7:33 PMrich-daybreak-77194
02/24/2023, 1:53 AMmagnificent-lawyer-97772
02/24/2023, 10:00 AMboundless-nail-65912
02/24/2023, 2:50 PMdazzling-microphone-98929
02/24/2023, 5:57 PMgray-ghost-82678
02/24/2023, 7:58 PMwhite-horse-97256
02/24/2023, 8:43 PMMetadataChangeProposalWrapper mcpw = MetadataChangeProposalWrapper.builder()
.entityType("datajob")
.entityUrn(Utils.createDataJobUrn("Connectors", "source_connector", "cdc_digital_account_master_account_dbz_source_connector_test", "STG"))
.upsert()
.aspect( new DataJobInfo()
.setName("cdc_digital_account_master_account_dbz_source_connector_test")
.setFlowUrn(new DataFlowUrn("Connectors", "source_connector", "STG")))
.build();
breezy-controller-54597
02/25/2023, 6:42 AMrich-daybreak-77194
02/25/2023, 1:28 PMbest-notebook-58252
02/26/2023, 5:41 PMconnection: "starburst"
include: "/partners_dm/explores/*.explore.lkml"
checking on source code, when scanning on reachable views only explores defined in model files are considered, ignoring the ones included from separate files:
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/looker/lookml_source.py#L1632
So every view is marked as unreachable.
Is this an expected behavior?important-afternoon-19755
02/27/2023, 2:30 AMcolossal-easter-99672
02/27/2023, 8:52 AMsalmon-vr-6357
02/27/2023, 4:23 PMgoogle.auth.exceptions.DefaultCredentialsError: ('Failed to load service account credentials from /tmp/tmplmjtp2eo', ValueError('Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', [_OpenSSLErrorWithText(code=503841036, lib=60, reason=524556, reason_text=b'error:1E08010C:DECODER routines::unsupported')]))
Anyone experienced this before and what could be wrong? btw I’m on v0.9.6.1
ripe-tailor-61058
02/27/2023, 6:10 PMcold-dress-65039
02/28/2023, 7:31 AMParserError: Unknown string format: None
error. doesn't look like this has been flagged before. any idea on how to triage or fix this?agreeable-cricket-61480
02/28/2023, 10:53 AMcrooked-rose-22807
02/28/2023, 4:05 PMentity_type
does nothing. Instead I need to do deletion using the urn one by one which is impractical but this method still have more issues:
datahub delete --entity_type glossaryTerm # DOES NOT WORK
datahub delete --urn "urn:li:glossaryTerm:green_transport_revenue" # assigned a custom ID, WORKS
datahub delete --hard --urn "urn:li:glossaryTerm:<randomid>" # enable_auto_id=True WORKS
datahub delete --hard --urn "urn:li:glossaryTerm:Metrics%203.My%20Revenue" # enable_auto_id=False, no custom ID assigned, the urn automatically takes glossary term `name`, DOES NOT WORK
2. Using contains
and/or inherits
key in the business glossary yaml, works as expected. However, if the a term is removed from these keys, the term DOES NOT ACTUALLY BEING REMOVED from the UI.
Interchangeably modify a term to contains
or inherits
key & vice versa, surprisingly works.
All in all, this should be fixed ASAP catyay or if there is already available solution, kindly lmk!busy-train-56443
02/28/2023, 7:18 PMrefined-energy-76018
02/28/2023, 11:38 PMon_success_callback
or on_failure_callback
for dag_policy
where it has any effect upon DAG completion. Would that explain why the run_dataflow
and complete_dataflow
are implemented in airflow_generator.py
but not used?acceptable-nest-20465
03/01/2023, 12:10 AMelegant-salesmen-99143
03/01/2023, 12:36 PMquiet-jelly-11365
03/01/2023, 1:37 PMlively-dusk-19162
03/01/2023, 3:31 PMhandsome-flag-16272
03/01/2023, 8:12 PMsource:
type: snowflake
config:
platform_instance: DEV
# Coordinates
account_id: MY_ACCOUNT
warehouse: MY_WH
# Credentials
username: MY_NAME
password: MY_PASS
# Options
include_table_lineage: true
include_view_lineage: true
include_operational_stats: false
include_usage_stats: false
database_pattern:
allow:
- MY_DB
schema_pattern:
allow:
- MY_SCHEMA
stateful_ingestion:
enabled: true
datahub_api:
server: '<http://localhost:8080>'
sink:
type: datahub-rest
config:
server: '<http://localhost:8080>'
pipeline_name: 'urn:li:dataHubIngestionSource:dev_snowflake_db'
2. Before the 3rd run of stateful ingestion, I dropped the my_test table. This time I can the summary in CLI as below:
Pipeline finished with at least 2 warnings; produced 168 events in 30.94 seconds.
In the 1st and 2nd rans, the message is “… produced 169 events ….“. The issues I’ve found:
• It also indicates this stateful ingestion is full ingestion rather than delta ingestion
• When I login the UI, I can still see the my_table. However, it neither marked as soft deleted nor update the “Last synchronized” time correctly. The “Last synchronized” is the 2nd ingestion time.ripe-tailor-61058
03/01/2023, 9:13 PMtransformers:
- type: "simple_add_dataset_properties"
config:
semantics: PATCH
properties:
bucket: djla-dev-tenant-jna
dataset: dataset2
[2023-03-01 15:59:22,624] DEBUG {datahub.telemetry.telemetry:239} - Sending Telemetry
[2023-03-01 15:59:22,689] DEBUG {datahub.ingestion.run.pipeline:181} - Source type:s3,<class 'datahub.ingestion.source.s3.source.S3Source'> configured
[2023-03-01 15:59:22,689] ERROR {datahub.ingestion.run.pipeline:127} - 1 validation error for SimpleAddDatasetPropertiesConfig
semantics
extra fields not permitted (type=value_error.extra)
Traceback (most recent call last):
File "/home/jabplana/repos/dpl-scripts/datahub/.venv/lib64/python3.6/site-packages/datahub/ingestion/run/pipeline.py", line 197, in __init__
self._configure_transforms()
File "/home/jabplana/repos/dpl-scripts/datahub/.venv/lib64/python3.6/site-packages/datahub/ingestion/run/pipeline.py", line 212, in _configure_transforms
transformer_class.create(transformer_config, self.ctx)
File "/home/jabplana/repos/dpl-scripts/datahub/.venv/lib64/python3.6/site-packages/datahub/ingestion/transformer/add_dataset_properties.py", line 97, in create
config = SimpleAddDatasetPropertiesConfig.parse_obj(config_dict)
File "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for SimpleAddDatasetPropertiesConfig
semantics
extra fields not permitted (type=value_error.extra)
[2023-03-01 15:59:22,691] INFO {datahub.cli.ingest_cli:119} - Starting metadata ingestion
[2023-03-01 15:59:22,692] INFO {datahub.cli.ingest_cli:137} - Finished metadata ingestion
Failed to configure transformers due to 1 validation error for SimpleAddDatasetPropertiesConfig
semantics
extra fields not permitted (type=value_error.extra)
[2023-03-01 15:59:22,703] DEBUG {datahub.telemetry.telemetry:239} - Sending Telemetry
It works fine without the semantics: PATCH
line but can't get it to work when including it before or after the properties.white-horse-97256
03/01/2023, 11:20 PMagreeable-cricket-61480
03/02/2023, 7:21 AM