jolly-traffic-67085
10/20/2022, 10:13 AMastonishing-dusk-99990
10/20/2022, 10:19 AMnumerous-bird-32188
10/20/2022, 4:31 PMglamorous-lion-94745
10/20/2022, 5:48 PMbland-teacher-2077
10/20/2022, 8:23 PMadamant-telephone-51921
10/21/2022, 4:28 AMadamant-telephone-51921
10/21/2022, 7:11 AMgifted-bird-57147
10/21/2022, 9:14 AMhappy-baker-8735
10/21/2022, 11:44 AMTraceback (most recent call last):
File "/home/moustlant/.local/bin/datahub", line 5, in <module>
from datahub.entrypoints import main
File "/home/moustlant/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 14, in <module>
from datahub.cli.docker_cli import docker
File "/home/moustlant/.local/lib/python3.8/site-packages/datahub/cli/docker_cli.py", line 523, in <module>
def quickstart(
File "/usr/lib/python3/dist-packages/click/decorators.py", line 173, in decorator
_param_memo(f, OptionClass(param_decls, **option_attrs))
File "/usr/lib/python3/dist-packages/click/core.py", line 1601, in __init__
raise TypeError('Got secondary option for non boolean flag.')
TypeError: Got secondary option for non boolean flag.
lively-dusk-19162
10/21/2022, 1:33 PMmysterious-advantage-78411
10/21/2022, 1:52 PMgentle-camera-33498
10/21/2022, 7:20 PMquiet-wolf-56299
10/22/2022, 1:15 AMquiet-wolf-56299
10/24/2022, 12:30 AMquiet-wolf-56299
10/24/2022, 12:43 AMquiet-ice-47245
10/24/2022, 8:09 AMaddTag
mutation addTags {
addTags(input: { tagUrns: ["urn:li:tag:NEW_TAG"], resourceUrn: "urn:li:dataset:(DATASET_URN)",subResourceType: DATASET_FIELD, subResource: "COLUMN_NAME" })
}
but im getting error:
Failed to update urn:li:tag:NEW_TAG does not exist.
how do I add NEW_TAG from graphQL ?steep-laptop-41463
10/24/2022, 9:52 AMfew-air-56117
10/24/2022, 11:57 AMgentle-camera-33498
10/24/2022, 1:49 PMflat-match-62670
10/24/2022, 11:55 PM2022/10/24 19:37:03 Waiting for: <http://datahub-dev-datahub-gms:8080/health>
2022/10/24 19:37:03 Received 200 from <http://datahub-dev-datahub-gms:8080/health>
No user action configurations found. Not starting user actions.
[2022-10-24 19:37:04,202] INFO {datahub_actions.cli.actions:68} - DataHub Actions version: unavailable (installed editable via git)
[2022-10-24 19:37:04,333] INFO {datahub_actions.cli.actions:98} - Action Pipeline with name 'ingestion_executor' is now running.
Exception in thread Thread-1 (run_pipeline):
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/confluent_kafka/deserializing_consumer.py", line 137, in poll
value = self._value_deserializer(value, ctx)
File "/usr/local/lib/python3.10/site-packages/confluent_kafka/schema_registry/avro.py", line 317, in __call__
raise SerializationError("Unknown magic byte. This message was"
confluent_kafka.serialization.SerializationError: Unknown magic byte. This message was not produced with a Confluent Schema Registry serializer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline_manager.py", line 42, in run_pipeline
pipeline.run()
File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline.py", line 161, in run
for enveloped_event in enveloped_events:
File "/usr/local/lib/python3.10/site-packages/datahub_actions/plugin/source/kafka/kafka_event_source.py", line 152, in events
msg = self.consumer.poll(timeout=2.0)
File "/usr/local/lib/python3.10/site-packages/confluent_kafka/deserializing_consumer.py", line 139, in poll
raise ValueDeserializationError(exception=se, kafka_message=msg)
confluent_kafka.error.ValueDeserializationError: KafkaError{code=_VALUE_DESERIALIZATION,val=-159,str="Unknown magic byte. This message was not produced with a Confluent Schema Registry serializer"}
%4|1666640260.315|MAXPOLL|rdkafka#consumer-1| [thrd:main]: Application maximum poll interval (10000ms) exceeded by 336ms (adjust <http://max.poll.interval.ms|max.poll.interval.ms> for long-running message processing): leaving group
Has anyone seen this before or has any advice of what I can troubleshoot to get Data Hub ingesting from Snowflake properly? Any help much appreciated!melodic-printer-96412
10/25/2022, 8:26 AMmicroscopic-mechanic-13766
10/25/2022, 3:16 PMStats
tab isn't enabled.
I don't know why it isn't enabled as the profiling was done successfully 'entities_profiled': '23'
My PostgreSQL recipe:
sink:
type: datahub-rest
config:
server: '<http://datahub-gms:8080>'
source:
type: postgres
config:
database: luca
password: '${POSTGRES_PASSWORD}'
profiling:
enabled: true
host_port: 'postgresql-luca:5432'
username: postgres
I have to say that in previous ingestions done with this recipe the stats of the tables where obtained and shown without a problem but I don't know why there aren't shown now.
I am currently using v0.8.45.astonishing-kite-41577
10/25/2022, 3:51 PMcareful-france-26343
10/25/2022, 5:46 PMpython3 -m datahub version
, I get the following error:
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 192, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/.local/lib/python3.8/site-packages/datahub/__main__.py", line 1, in <module>
from datahub.entrypoints import main
File "/home/ubuntu/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 13, in <module>
from datahub.cli.delete_cli import delete
File "/home/ubuntu/.local/lib/python3.8/site-packages/datahub/cli/delete_cli.py", line 123, in <module>
type=click.DateTime(),
AttributeError: module 'click' has no attribute 'DateTime'
best-pilot-37106
10/25/2022, 6:18 PMsearch
and this returns everything as expected except the glossaryTerms on a field which it returns null. Has anyone seen this issue before? Query and schema pictures in thread:jolly-tent-99362
10/27/2022, 4:50 AMcomplete_json = {
"source": {
"type": "bigquery",
"config": {
"project_id": "",
"credential": cred_json,
"include_views": "true",
"include_tables": "true",
"include_table_lineage": "true",
"upstream_lineage_in_report": "true",
"schema_pattern": {
"ignoreCase": "true",
"allow": ["^webengage_mum$"]
},
"table_pattern": {
"ignoreCase": "true",
"deny": ["^.*\.temp_.*"]
},
"profile_pattern": {
"allow": ["^.*\.application.*"]
},
"stateful_ingestion": {
"enabled": "true",
"remove_stale_metadata": "true",
"state_provider": {
"type": "datahub",
"config": {
"datahub_api": {
"server": datahub_gms_url,
"token": datahub_gms_token
}
}
}
},
"profiling": {
"enabled": "true",
"bigquery_temp_table_schema": ".datahub",
"turn_off_expensive_profiling_metrics": "true",
"query_combiner_enabled": "false",
"max_number_of_fields_to_profile": 1000,
"profile_table_level_only": "true",
"include_field_null_count": "true",
"include_field_min_value": "true",
"include_field_max_value": "true",
"include_field_mean_value": "true",
"include_field_median_value": "true",
"include_field_stddev_value": "true",
"include_field_quantiles": "true",
"include_field_distinct_value_frequencies": "true",
"include_field_histogram": "true",
"include_field_sample_values": "true"
}
},
},
"pipeline_name": "biquery_profiling_tables",
"sink": {
"type": "datahub-kafka",
"config": {
"connection": {
"bootstrap": bootstrap_url,
"schema_registry_url": schema_registry_url,
},
},
},
}
The job is running for sometime and then failing with following error:
[2022-10-26, 05:26:34 UTC] {ge_data_profiler.py:918} ERROR - Encountered exception while profiling <dataset>.<tableName>
Traceback (most recent call last):
File "/opt/python3.8/lib/python3.8/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 892, in _generate_single_profile
batch = self._get_ge_dataset(
File "/opt/python3.8/lib/python3.8/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 951, in _get_ge_dataset
batch = ge_context.data_context.get_batch(
File "/opt/python3.8/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 1642, in get_batch
return self._get_batch_v2(
File "/opt/python3.8/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 1336, in _get_batch_v2
datasource = self.get_datasource(batch_kwargs.get("datasource"))
File "/opt/python3.8/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 2062, in get_datasource
raise ValueError(
ValueError: Unable to load datasource `my_sqlalchemy_datasource-548b19eb-6db0-4fa2-8673-0e62306a3c7d` -- no configuration found or invalid configuration.
[2022-10-26, 05:26:35 UTC] {ge_data_profiler.py:773} INFO - Profiling 1 table(s) finished in 2.387 seconds
Can someone help please?kind-scientist-44426
10/27/2022, 7:19 AMdaasDf = spark.read.format("csv").option("inferSchema", "true").option("header", "true").load("filename")
ERROR DatasetExtractor: class org.apache.spark.sql.catalyst.plans.logical.GlobalLimit is not supported yet. Please contact datahub team for further support.
salmon-rose-54694
10/27/2022, 8:18 AMdata
parameter does not have runEvents object (empty list) [see pic 2]
2. But when i run query on graphql UI, it does return something on runEvents. [see pic3, pic4]
My question is, is there difference between the run on UI and graphql ?curved-apple-55756
10/27/2022, 8:40 AMhandsome-football-66174
10/27/2022, 2:18 PM