lively-carpenter-82828
09/01/2023, 11:27 AMdry-raincoat-85182
09/01/2023, 3:06 PMTraceback (most recent call last):
File "/data/vdc/conda/condapub/svc_am_cicd/envs/dh-actions/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 373, in run
for record_envelope in self.transform(record_envelopes):
File "/data/vdc/conda/condapub/svc_am_cicd/envs/dh-actions/lib/python3.10/site-packages/datahub/ingestion/extractor/mce_extractor.py", line 77, in get_records
raise ValueError(
ValueError: source produced an invalid metadata work unit: MetadataChangeEventClass(
square-painter-33350
09/01/2023, 5:00 PMimportant-autumn-58748
09/01/2023, 6:23 PMbest-monitor-90704
09/02/2023, 1:06 PMrefined-gold-30439
09/04/2023, 8:09 AM[2023-09-04 06:03:02,906] ERROR {datahub.utilities.sqlalchemy_query_combiner:403} - Failed to execute queue using combiner: (pymysql.err.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'jgtcpmgyvhhzjeoo AS \n(SELECT count(*) AS count_1 \nFROM a.email)\n SELECT ' at line 1")
And it seems like the tagging syntax isn't working properly... What part is incorrect?
source:
type: mysql
config:
host_port: 'hostname:3306'
database: null
username: user
include_tables: true
include_views: true
profiling:
enabled: true
profile_table_level_only: false
profile_table_row_count_estimate_only: true
field_sample_values_limit: 0
include_field_sample_values: false
stateful_ingestion:
enabled: true
password: '${mysql_pw}'
schema_pattern:
allow:
- public
transformers:
-
type: simple_add_dataset_tags
config:
tag_urns:
- 'urn:li:tag:us'
fierce-monkey-46092
09/04/2023, 8:21 AMdazzling-stone-78871
09/04/2023, 12:47 PMdatahub --debug ingest -c my_file.yaml
(got: Finished metadata ingestion [...] Pipeline finished successfully;
) but my metadata are not updated, and I do not see any new runs on the UI or when doing datahub ingest list-runs
.
Does anyone have encountered those kind of issues ?
I have deployed Datahub on AWS with Kubernetes and used AWS MSK (Kafka v3.4.0), AWS Opensearch (Elasticsearch v7.10) and RDS (Postgresql v13.8). I am on version 0.10.4
of Datahub.
Thank you in advance 🙂straight-eve-29501
09/04/2023, 12:53 PMsome-alligator-9844
09/04/2023, 2:18 PMalert-analyst-73197
09/04/2023, 4:53 PMacceptable-stone-72571
09/05/2023, 8:06 AMnutritious-lighter-88459
09/05/2023, 9:16 AMdatahub_action
and is displayed in datahub under Validation
tab as expected. However, I was wondering if there is a way to update the description of the assertion which gets auto generated (PFA) ?
We would like to provide our own custom description may be in the form of some attribute in meta
tag in expectation suite.
TIAbest-kite-4934
09/05/2023, 11:40 AMbitter-florist-92385
09/05/2023, 12:50 PMshy-diamond-99510
09/05/2023, 12:58 PMgreat-florist-68068
09/05/2023, 10:44 PMdatahub ingest -c ./hive-datahub.yml
, i’m getting Server not found in Kerberos database
.
I tried to run kinit
before but still get the same error.
It is not clear to me how to provide krb5.conf in the hive-datahub.ymlenough-pizza-64105
09/06/2023, 3:08 AMbest-umbrella-88325
09/06/2023, 2:08 PMgreat-florist-68068
09/06/2023, 9:15 PMcreate table hive_example(a string, b int) partitioned by(c int);
but the datahub ingestion doesn’t indicate the isPartitioningKey
in the payload
{
"fieldPath": "c",
"nullable": true,
"type": {
"type": {
"com.linkedin.schema.NumberType": {}
}
},
"nativeDataType": "int",
"recursive": false,
"isPartOfKey": false
}
lively-energy-75016
09/07/2023, 3:47 AMbetter-orange-49102
09/07/2023, 8:36 AMmammoth-musician-30735
09/07/2023, 2:37 PMable-library-93578
09/07/2023, 9:12 PMCertified
. I create another tag (programmatically - python -DataHubGraph ) SourcesSDP
, so now there are 2 tags - Great. When I run the UI ingestion again, it deletes all tags and re-writes the Certified
tag. I verify that GMS log shows an UPSERT for that entity and tag. What is the behavior supposed to be for the UPSERT?
GMS log:
2023-09-07 18:08:32,038 [qtp1577592551-60548] INFO c.l.m.r.entity.AspectResource:180 - INGEST PROPOSAL proposal: {aspectName=globalTags, systemMetadata={lastObserved=1694110111876, runId=2359db8a-8c78-471f-b38a-0e8167a4f431}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:powerbi,SMG_Channel_Reporting_Dataset.DATE_DIM,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=43,bytes=7b227461...227d5d7d)}, changeType=UPSERT}
eager-monitor-4683
09/08/2023, 1:52 AMmammoth-musician-30735
09/08/2023, 4:51 AMmammoth-musician-30735
09/08/2023, 7:35 AMsource:
type: vertica
config:
host_port: 'host:5433'
database: databse
schema_pattern:
allow:
- '^specific_schema_name*'
username: '${VERTICA_USERNAME}'
password: '${VERTICA_PASSWORD}'
include_tables: true
include_views: true
include_projections: false
include_models: false
include_view_lineage: false
include_projection_lineage: false
profiling:
enabled: false
field_sample_values_limit: 10
max_workers: 1
As per document https://datahubproject.io/docs/generated/ingestion/sources/vertica/ profiling is disabled. Tried without adding profiling section in config and the result is same(Taking more than a day for small ingestion).bland-barista-59197
09/08/2023, 4:47 PMmelodic-dusk-2080
09/10/2023, 4:32 PMrefined-gold-30439
09/11/2023, 1:48 AM