average-rocket-98592
03/23/2023, 3:50 PMstrong-hospital-52301
03/23/2023, 6:53 PMnumerous-byte-87938
03/23/2023, 10:37 PMdocument_missing_exception
error (logs in š§µ). Thereās good chance that we missed something during the upgrade, and it will be super helpful if you could offer some insights. Hereāre some extra notes:
⢠We had made sure our branch was on v0.8.45 (i.e. including #5827 per this thread and v0.8.44 release note suggested), and our standalone MXE consumers were looking functional according to logs.
⢠The errors were coming from our GMS pods, and right after it was bootstrapped.
⢠By monitoring the ES indices, we were not able to see any docs.count
increases for any indices.
⢠MCPs were able to land in MySQL without issue.
⢠The ES cluster we used to test is restored from prod snapshot without any options.full-football-39857
03/24/2023, 1:47 AMhappy-chef-34162
03/24/2023, 5:12 AMagreeable-cricket-61480
03/24/2023, 11:14 AMwide-optician-47025
03/24/2023, 8:06 PMbest-planet-6756
03/25/2023, 5:47 AMambitious-apple-49350
03/27/2023, 6:21 AMorange-room-20920
03/27/2023, 7:37 AMmicroscopic-room-90690
03/27/2023, 8:38 AMdatahub.metadata.schema_classes
to set table schema. And it seems native datatype is not necessary, I'm wondering how to keep the target datatype automatically adapt to native datatype because it is not easy to configure it manually during automated execution
reference:
https://docs-website-ej1aml8mp-acryldata.vercel.app/docs/python-sdk/models#datahub.metadata.schema_classes.TimeTypeClass
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/dataset_schema.pyaverage-rocket-98592
03/27/2023, 9:46 AMfresh-balloon-59613
03/27/2023, 11:52 AMastonishing-dusk-99990
03/27/2023, 12:19 PMsource:
type: trino
config:
host_port: '{host}:{port}'
database: {db_name}
username: {username}
password: {password}
Note:
⢠Iām deploying with helm chart with version 0.10.0
⢠Trino on top of dataproc clustermodern-france-82371
03/28/2023, 3:39 AMgreat-monkey-52307
03/28/2023, 5:38 AMwitty-butcher-82399
03/28/2023, 7:33 AMcharts
, dashboards
, containers
are missed in this mapping? Could these entities be added safely?
https://github.com/datahub-project/datahub/blob/c7d35ffd6609d0ae79a2b1151a2221086e[ā¦]ingestion/src/datahub/ingestion/transformer/base_transformer.py
self.entity_type_mappings: Dict[str, Type] = {
"dataset": DatasetSnapshotClass,
"dataFlow": DataFlowSnapshotClass,
"dataJob": DataJobSnapshotClass,
}
We have a custom transformer to enrich ownership information for datasets, charts, dashboards and containers. However our transform fails because of the assert here. Additionally, what's the point of the assert if there is the fallback return False
a couple of lines below?
Thanks!bitter-evening-61050
03/28/2023, 7:35 AMcool-tiger-42613
03/28/2023, 9:41 AMbrainy-parrot-75918
03/28/2023, 9:44 AMlively-raincoat-33818
03/28/2023, 10:42 AMpurple-printer-15193
03/28/2023, 4:23 PMquaint-football-54639
03/28/2023, 9:00 PMinclude_field_distinct_count: false
but when the ingestion start I still see SELECT count(distinct
run on all columns.damp-lighter-99739
03/29/2023, 9:09 AMwide-ghost-47822
03/29/2023, 12:16 PMsource:
type: mariadb
config:
# Coordinates
host_port: <host:port>
database: <db-name>
include_tables: true
profiling:
enabled: true
profile_table_level_only: true
stateful_ingestion:
enabled: true
table_pattern:
allow:
- <table>
# Credentials
username: <user>
password: <pass>
# sink configs
Then I executed the following command: datahub ingest -c datahub/recipes/myfile.dhub.yaml
And get this error: Failed to connect to DataHub
and requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10c1343d0>: Failed to establish a new connection: [Errno 61] Connection refused'))
.
I know that datahub-gms is running on port 8080. So Iāve checked the endpoint localhost:8080/config
with curl and it respond with http status code 200.
Here it is:
⯠curl localhost:8080/config -v
* Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /config HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.86.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Wed, 29 Mar 2023 12:14:47 GMT
< Content-Type: application/json
< Transfer-Encoding: chunked
< Server: Jetty(9.4.46.v20220331)
<
{
"models" : { },
"patchCapable" : true,
"versions" : {
"linkedin/datahub" : {
"version" : "v0.10.1",
"commit" : "d1bab5616cbf19ce22223288feb2b9852ec1fa23"
}
},
"managedIngestion" : {
"defaultCliVersion" : "0.10.1",
"enabled" : true
},
"statefulIngestionCapable" : true,
"supportsImpactAnalysis" : true,
"timeZone" : "GMT",
"telemetry" : {
"enabledCli" : true,
"enabledIngestion" : false
},
"datasetUrnNameCasing" : false,
"retention" : "true",
"datahub" : {
"serverType" : "quickstart"
},
"noCode" : "true"
}
* Connection #0 to host localhost left intact
I couldnāt figure it out the problem yet. Any comments on this?
Here is the full output of the error:
[2023-03-29 15:06:05,702] INFO {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.0.7
[2023-03-29 15:06:18,138] ERROR {datahub.entrypoints:188} - Command failed: Failed to set up framework context: Failed to connect to DataHub
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 244, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1285, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1331, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1280, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1040, in _send_output
self.send(msg)
File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 980, in send
self.connect()
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 205, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x10c1343d0>: Failed to establish a new connection: [Errno 61] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 815, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 815, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 815, in urlopen
return self.urlopen(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10c1343d0>: Failed to establish a new connection: [Errno 61] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/api/common.py", line 61, in __init__
self.graph = DataHubGraph(datahub_api) if datahub_api is not None else None
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/graph/client.py", line 71, in __init__
self.test_connection()
File "/usr/local/lib/python3.9/site-packages/datahub/emitter/rest_emitter.py", line 146, in test_connection
response = self._session.get(f"{self._gms_server}/config")
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10c1343d0>: Failed to establish a new connection: [Errno 61] Connection refused'))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 115, in _add_init_error_context
yield
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 167, in __init__
self.ctx = PipelineContext(
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/api/common.py", line 63, in __init__
raise Exception("Failed to connect to DataHub") from e
Exception: Failed to connect to DataHub
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/datahub/entrypoints.py", line 175, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
raise e
File "/usr/local/lib/python3.9/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
res = func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
return func(ctx, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 187, in run
pipeline = Pipeline.create(
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 308, in create
return cls(
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 167, in __init__
self.ctx = PipelineContext(
File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 117, in _add_init_error_context
raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to set up framework context: Failed to connect to DataHub
quaint-football-54639
03/29/2023, 1:31 PMinclude_field_null_count
as false. But I still can see the ingest run null_count. This highly impact the performance, is there are way to turn it off?lively-dusk-19162
03/29/2023, 3:47 PMhigh-night-94979
03/29/2023, 6:48 PMcalm-dinner-63735
03/29/2023, 8:28 PMcalm-dinner-63735
03/29/2023, 8:28 PM