rich-restaurant-61261
06/28/2023, 8:07 AMfrom datahub.emitter.mce_builder import make_dataset_urn
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
from datahub.metadata.schema_classes import DatasetPropertiesClass
dataset_urn = make_dataset_urn(
platform="trino", name="cass.data_insights.activity", env="PROD"
)
gms_endpoint = "<http://localhost:8080>""
graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))
# Query multiple aspects from entity
result = graph.get_aspects_for_entity(
entity_urn=dataset_urn,
aspects=["datasetProperties"],
aspect_types=[DatasetPropertiesClass],
)["datasetProperties"]
if result:
print(result.description)
wide-energy-52884
06/28/2023, 9:24 AMBuilding wheel for backports.zoneinfo (pyproject.toml): finished with status 'error'
airflow-airflow-webserver-1 | error: subprocess-exited-with-error
airflow-airflow-webserver-1 |
airflow-airflow-webserver-1 | × Building wheel for backports.zoneinfo (pyproject.toml) did not run successfully.
airflow-airflow-webserver-1 | │ exit code: 1
airflow-airflow-webserver-1 | ╰─> [49 lines of output]
airflow-airflow-webserver-1 | running bdist_wheel
airflow-airflow-webserver-1 | running build
airflow-airflow-webserver-1 | running build_py
airflow-airflow-webserver-1 | creating build
airflow-airflow-webserver-1 | creating build/lib.linux-aarch64-cpython-37
airflow-airflow-webserver-1 | creating build/lib.linux-aarch64-cpython-37/backports
airflow-airflow-webserver-1 | copying src/backports/__init__.py -> build/lib.linux-aarch64-cpython-37/backports
airflow-airflow-webserver-1 | creating build/lib.linux-aarch64-cpython-37/backports/zoneinfo
airflow-airflow-webserver-1 | copying src/backports/zoneinfo/_zoneinfo.py -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
airflow-airflow-webserver-1 | copying src/backports/zoneinfo/__init__.py -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
airflow-airflow-webserver-1 | copying src/backports/zoneinfo/_common.py -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
airflow-airflow-webserver-1 | copying src/backports/zoneinfo/_tzpath.py -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
airflow-airflow-webserver-1 | copying src/backports/zoneinfo/_version.py -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
airflow-airflow-webserver-1 | running egg_info
airflow-airflow-webserver-1 | writing src/backports.zoneinfo.egg-info/PKG-INFO
airflow-airflow-webserver-1 | writing dependency_links to src/backports.zoneinfo.egg-info/dependency_links.txt
airflow-airflow-webserver-1 | writing requirements to src/backports.zoneinfo.egg-info/requires.txt
airflow-airflow-webserver-1 | writing top-level names to src/backports.zoneinfo.egg-info/top_level.txt
airflow-airflow-webserver-1 | reading manifest file 'src/backports.zoneinfo.egg-info/SOURCES.txt'
airflow-airflow-webserver-1 | reading manifest template '<http://MANIFEST.in|MANIFEST.in>'
airflow-airflow-webserver-1 | /tmp/pip-build-env-c1pa5htp/overlay/lib/python3.7/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config in `setup.cfg`
airflow-airflow-webserver-1 | !!
airflow-airflow-webserver-1 |
airflow-airflow-webserver-1 | ********************************************************************************
airflow-airflow-webserver-1 | The license_file parameter is deprecated, use license_files instead.
airflow-airflow-webserver-1 |
airflow-airflow-webserver-1 | By 2023-Oct-30, you need to update your project and remove deprecated calls
airflow-airflow-webserver-1 | or your builds will no longer be supported.
airflow-airflow-webserver-1 |
airflow-airflow-webserver-1 | See <https://setuptools.pypa.io/en/latest/userguide/declarative_config.html> for details.
airflow-airflow-webserver-1 | ********************************************************************************
airflow-airflow-webserver-1 |
airflow-airflow-webserver-1 | !!
airflow-airflow-webserver-1 | parsed = self.parsers.get(option_name, lambda x: x)(value)
airflow-airflow-webserver-1 | warning: no files found matching '*.png' under directory 'docs'
airflow-airflow-webserver-1 | warning: no files found matching '*.svg' under directory 'docs'
airflow-airflow-webserver-1 | no previously-included directories found matching 'docs/_build'
airflow-airflow-webserver-1 | no previously-included directories found matching 'docs/_output'
airflow-airflow-webserver-1 | adding license file 'LICENSE'
airflow-airflow-webserver-1 | adding license file 'licenses/LICENSE_APACHE'
airflow-airflow-webserver-1 | writing manifest file 'src/backports.zoneinfo.egg-info/SOURCES.txt'
airflow-airflow-webserver-1 | copying src/backports/zoneinfo/__init__.pyi -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
airflow-airflow-webserver-1 | copying src/backports/zoneinfo/py.typed -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
airflow-airflow-webserver-1 | running build_ext
airflow-airflow-webserver-1 | building 'backports.zoneinfo._czoneinfo' extension
airflow-airflow-webserver-1 | creating build/temp.linux-aarch64-cpython-37
airflow-airflow-webserver-1 | creating build/temp.linux-aarch64-cpython-37/lib
airflow-airflow-webserver-1 | gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.7m -c lib/zoneinfo_module.c -o build/temp.linux-aarch64-cpython-37/lib/zoneinfo_module.o -std=c99
airflow-airflow-webserver-1 | error: command 'gcc' failed: Permission denied
airflow-airflow-webserver-1 | [end of output]
airflow-airflow-webserver-1 |
airflow-airflow-webserver-1 | note: This error originates from a subprocess, and is likely not a problem with pip.
airflow-airflow-webserver-1 | ERROR: Failed building wheel for backports.zoneinfo
airflow-airflow-webserver-1 | Successfully built avro click-default-group
airflow-airflow-webserver-1 | Failed to build backports.zoneinfo
airflow-airflow-webserver-1 | ERROR: Could not build wheels for backports.zoneinfo, which is required to install pyproject.toml-based projects
airflow-airflow-scheduler-1 |
airflow-airflow-scheduler-1 | [notice] A new release of pip available: 22.2.2 -> 23.1.2
airflow-airflow-scheduler-1 | [notice] To update, run: python -m pip install --upgrade pip
airflow-airflow-webserver-1 |
airflow-airflow-webserver-1 | [notice] A new release of pip available: 22.2.2 -> 23.1.2
airflow-airflow-webserver-1 | [notice] To update, run: python -m pip install --upgrade pip
Appreciate your help!proud-dusk-671
06/28/2023, 9:31 AMfull-shoe-73099
06/28/2023, 11:45 AMadamant-furniture-37835
06/28/2023, 2:16 PMError was An Identifier is expected, got Token[value: (] instead..
Could it be a issue related with dependencies on external tables ?
These problematic tables have dependency on tables from "stage_external" schema e.g.
Error parsing query create table "DATABASE"."SCHEMA"."TABLE" as ( with /* Starting points is .... from "DATABASE"."stage_external"."TABLE_X" ....
Error was An Identifier is expected, got Token[value: (] instead..
According to Redshift admin, the user we got have access to stage_external schema but no datasets are ingested from stage_external schema. When I search for "stage_external" in global search, I can see a couple of containers but these containers are empty.
Is it that tables in stage_external are external tables so they aren't ingested in Datahub ?
Datahub user is accessing SVV_TABLE_INFO and external tables aren't available in this view instead there is another view containing info about external tables i.e. SVV_EXTERNAL_TABLES.
Please guide how to fix this issue or if I should create a bug request ?
Thankswide-florist-83539
06/28/2023, 5:02 PMgray-ocean-32209
06/29/2023, 7:52 AMmany-manchester-24732
06/29/2023, 8:07 AMcreateCert: true
in elastic search helm. Now the elastic search instance is up with ssl enabled. The problem comes while trying to connect to Elasticsearch from datahub. The elasticsearch setup job itself fails. The hostname and port has been specified in the datahub helm chart. But if try I deploying I get following error. Problem with request: Get "<https://elasticsearch-master.datahub.svc.cluster.local:9200>": tls: failed to verify certificate: x509: certificate is valid for elasticsearch-master, elasticsearch-master.datahub, elasticsearch-master.datahub.svc
If I try to deploy specifying host as elasticsearch-master
then I get following error.
Problem with request: Get "<https://elasticsearch-master:9200>": tls: failed to verify certificate: x509: certificate signed by unknown authority. Sleeping 1s
The tls for Elastics search seems to be in the secret elasticsearch-master-certs.
Both datahub and elasticsearch is in same namespace.
Is there any configuration to be changed in datahub helm so that the ssl works fine between datahub and elasticsearchalert-state-4917
06/29/2023, 2:28 PMmost-nightfall-36645
06/29/2023, 2:33 PM0.10.x
from 0.9.6.1
We are using elasticsearch v7.10
as our indexing/graph server:
The datahub-system-update
container is failing with:
Suppressed: org.elasticsearch.client.ResponseException: method [PUT], host [<elasticsearch-host>], URI [/containerindex_v2_1687869120968/_clone/containerindex_v2_clone_1688044528961?master_timeout=30s&timeout=30s], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"validation_exception","reason":"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [999]/[1000] maximum shards open;"}],"type":"validation_exception","reason":"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [999]/[1000] maximum shards open;"},"status":400}
Whilst the elasticsearch-setup-job
container completes without error but contains the following error message in its logs:
>>> deleting invalid datahub_usage_event ...
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [datahub_usage_event] matches an alias, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [datahub_usage_event] matches an alias, specify the corresponding concrete indices instead."},"status":400}
>>> GET datahub_usage_event-000001 response code is 404
>>> creating datahub_usage_event-000001 because it doesn't exist ...
{
"aliases": {
"datahub_usage_event": {
"is_write_index": true
}
}
}
2023/06/29 14:25:05 Command finished successfully.
{"error":{"root_cause":[{"type":"validation_exception","reason":"Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [999]/[1000] maximum shards open;"}],"type":"validation_exception","reason":"Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [999]/[1000] maximum shards open;"},"status":400}
We deployed datahub via helm. I noticed the helm chart includes the following:
## The following section controls when and how reindexing of elasticsearch indices are performed
index:
## Enable reindexing when mappings change based on the data model annotations
enableMappingsReindex: true
## Enable reindexing when static index settings change.
## Dynamic settings which do not require reindexing are not affected
## Primarily this should be enabled when re-sharding is necessary for scaling/performance.
enableSettingsReindex: true
My deployed helm values include:
elasticsearch:
host: <elasticsearch-host>
index:
enableMappingsReindex: true
enableSettingsReindex: true
I tried setting number of shards for the index to 2000 with
name = "global.elasticsearch.index.entitySettingsOverrides"
value = <<EOF
{"/containerindex_v2_1687869120968/_clone/containerindex_v2_clone_1688044528961": {"number_of_shards": "2000"}}
EOF
However I get the same errors.
Has anyone experience this issue?powerful-planet-87080
06/29/2023, 4:41 PM2023-06-28 11:31:37,071 [pool-11-thread-5] INFO c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 4ms
2023-06-28 11:31:37,319 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:56 - Error feeding bulk request. No retries left. Request: Failed to perform bulk request: index [datahub_usage_event], optype: [CREATE], type [_doc], id [PageViewEvent_urn%3Ali%3Acorpuser%3Adatahub_1687951896555]
java.io.IOException: Unable to parse response body for Response{requestLine=POST /_bulk?timeout=1m HTTP/1.1, host=<https://vpc-datahubpoc-2o3qotwticclmfbedwhznvu4om.us-east-1.es.amazonaws.com:443>, response=HTTP/1.1 200 OK}
at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1783)
at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:636)
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:376)
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:370)
at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122)
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448)
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338)
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:121)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException: null
at java.base/java.util.Objects.requireNonNull(Objects.java:221)
at org.elasticsearch.action.DocWriteResponse.<init>(DocWriteResponse.java:127)
at org.elasticsearch.action.index.IndexResponse.<init>(IndexResponse.java:54)
at org.elasticsearch.action.index.IndexResponse.<init>(IndexResponse.java:39)
at org.elasticsearch.action.index.IndexResponse$Builder.build(IndexResponse.java:107)
at org.elasticsearch.action.index.IndexResponse$Builder.build(IndexResponse.java:104)
at org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:159)
at org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:188)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
at org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$10(RestHighLevelClient.java:1699)
at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1781)
... 18 common frames omitted
broad-parrot-31743
06/30/2023, 3:15 AMbest-wire-59738
06/30/2023, 10:31 AMancient-yacht-36269
06/30/2023, 11:12 AMbrief-nail-41206
06/30/2023, 12:05 PMblue-rainbow-97669
06/30/2023, 1:05 PMbroad-parrot-31743
07/03/2023, 2:23 AMfuture-yak-13169
07/03/2023, 6:53 AMelegant-forest-24965
07/03/2023, 11:37 AMfierce-restaurant-41034
07/03/2023, 12:11 PM- ELASTICSEARCH_QUERY_CUSTOM_CONFIG_ENABLED=true
- ELASTICSEARCH_QUERY_CUSTOM_CONFIG_FILE=search_config.yml
Where can I find the location of the search_config.yml in the GMS?
I didn’t find it on the server.
Do I need to create the file?
Thanks a lotrapid-spoon-94582
07/04/2023, 2:01 AMrapid-spoon-94582
07/04/2023, 2:06 AMcurved-judge-66735
07/04/2023, 12:23 PMTraceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 186, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/upgrade/upgrade.py", line 398, in async_wrapper
loop.run_until_complete(run_func_check_upgrade())
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/upgrade/upgrade.py", line 385, in run_func_check_upgrade
ret = await the_one_future
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/upgrade/upgrade.py", line 378, in run_inner_func
return await loop.run_in_executor(
File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
raise e
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
res = func(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/cli/delete_cli.py", line 297, in by_filter
urns = list(
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/ingestion/graph/client.py", line 684, in get_urns_by_filter
response = self.execute_graphql(
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/ingestion/graph/client.py", line 751, in execute_graphql
result = self._post_generic(url, body)
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/ingestion/graph/client.py", line 160, in _post_generic
return self._send_restli_request("POST", url, json=payload_dict)
File "/home/airflow/.local/lib/python3.8/site-packages/datahub/ingestion/graph/client.py", line 141, in _send_restli_request
response = self._session.request(method, url, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='mercari-dh-datahub-gms.datahub.svc.cluster.local', port=8080): Max retries exceeded with url: /api/graphql (Caused by ReadTimeoutError("HTTPConnectionPool(host='mercari-dh-datahub-gms.datahub.svc.cluster.local', port=8080): Read timed out. (read timeout=30)"))
Tried with the python function get_urns_by_filter
, get the same error.
We were able to delete even bigger number of datasets in previous Datahub versions. (0.9.x, 0.10.1)bland-orange-13353
07/04/2023, 12:31 PMfierce-electrician-85924
07/04/2023, 1:01 PMbrave-tomato-16287
07/04/2023, 2:04 PMFile "/tmp/datahub/ingest/venv-redshift-0.8.45/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py", line 328, in '
'inspect_namespace\n'
' raise PydanticUserError(\n'
'pydantic.errors.PydanticUserError: A non-annotated attribute was detected: `set_system_metadata = True`. All model fields require a type '
'annotation; if `set_system_metadata` is not meant to be a field, you may be able to resolve this error by annotating it as a `ClassVar` '
"or updating `model_config['ignored_types']`.\n"
'\n'
'For further information visit <https://errors.pydantic.dev/2.0/u/model-field-missing-annotation>\n'
v.0.8.45
CLI version 0.8.42
What is the way to fix it?better-actor-45043
07/04/2023, 7:05 PMCentralLogoutController.java
makes me think it’s supposed to log me out with my openid provider:
setCentralLogout(true);
brainy-oxygen-20792
07/04/2023, 10:25 PMDownstreamOf
, but not SiblingOf
, the DBT model, while the correctly grouped Snowflake table is both downstream and Sibling.
In both the Siblinged and not-siblinged cases the DBT urn has the database capitalised and the Snowflake urn is entirely lower case.stocky-morning-47491
07/05/2023, 11:02 AMstocky-morning-47491
07/05/2023, 11:02 AM