refined-apple-6340
12/08/2021, 3:56 AMquick-pizza-8906
12/08/2021, 10:58 AMbumpy-activity-74405
12/08/2021, 12:38 PMcom.linkedin.metadata.snapshot.DataJobSnapshot
using rest api, but get an error on validating the com.linkedin.datajob.DataJobInfo
aspect:
[HTTP Status:400]: Parameters of method 'ingest' failed validation with error 'ERROR :: /entity/value/com.linkedin.metadata.snapshot.DataJobSnapshot/aspects/1/com.linkedin.datajob.DataJobInfo/type :: union type is not backed by a DataMap or null
I can’t exclude type
as it’s not optional. If I exclude the entire aspect altogether ingestion works, but the task looks ugly in the UI. Running 0.8.17
. What am I missing here?refined-apple-6340
12/08/2021, 1:30 PMbumpy-activity-74405
12/09/2021, 8:50 AMdelightful-jackal-88844
12/09/2021, 10:06 AMdatahub docker quickstart
elasticsearch-setup faild:
2021/12/09 10:00:33 Received 503 from <http://elasticsearch:9200>. Sleeping 1s
2021/12/09 10:00:34 Timeout after 2m0s waiting on dependencies to become available: [<http://elasticsearch:9200>]
then elasticsearch:7.9.3 Up 9 minutes (unhealthy)
Frontend didnt work with error:
Caused by: java.lang.RuntimeException: Failed to generate session token for user
VM: 8 cpu 8 ram 30 GB Debian 10.
Help plz)bumpy-activity-74405
12/09/2021, 12:59 PMdatahub ingest rollback --run-id some-run-id
, all the aspects (like editableDatasetProperties
and I assume all the other ones that were entered by users) of the urns from that run get deleted. The way I see it they were not ingested with any run and it can also potentially be data that people don’t want to lose.cool-painting-92220
12/09/2021, 7:17 PMdatahub docker quickstart
command, but am now trying to get my Okta OIDC authentication for the datahub-frontend set up so that new logins can create new DataHub user accounts. I've configured everything correctly on Okta's side, but to the best of my understanding, I need to launch DataHub in a different way to get the authentication working properly. I've done docker-compose -p datahub -f docker-compose-without-neo4j.yml -f docker-compose-without-neo4j.override.yml up datahub-frontend-react
(the port for Neo4j is currently occupied on the server I'm running on) after executing the quickstart, but upon trying to access DataHub in a browser, I am met with a vague error. Could someone help me out with the steps needed to have a valid auth and new user flow?salmon-area-51650
12/10/2021, 11:46 AMdatahub ingest -c ./datahub_postgres_local.yml
I get an error:
psycopg2.errors.UndefinedFunction: operator does not exist: json = unknown
LINE 68: ...count(*) AS element_count, sum(CASE WHEN (address IN (NULL) ...
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
Seems like json
type is not supported. Any idea to skip this?
Thanks a lot!handsome-football-66174
12/10/2021, 6:21 PMrich-crayon-97494
12/10/2021, 8:18 PM/browse/dataset
endpoint, the page shows 0 entities for the staging environment while the same endpoint for the quickstart environment lists 7 datasets.
Do you have any pointers on further debugging this issue?full-area-6720
12/11/2021, 7:12 AMpolite-flower-25924
12/11/2021, 9:14 PMs3
, it logs warning messages like below.
..
..
WARNING: improperly formatted data platform: s3
WARNING: improperly formatted data platform: s3
WARNING: improperly formatted data platform: s3
..
..
full-area-6720
12/13/2021, 10:18 AMstocky-television-65849
12/13/2021, 2:00 PMFile "/opt/miniconda3/lib/python3.8/site-packages/datahub/emitter/rest_emitter.py", line 107, in test_connection
102 def test_connection(self) -> None:
103 response = self._session.get(f"{self._gms_server}/config")
104 response.raise_for_status()
105 config: dict = response.json()
106 if config.get("noCode") != "true":
--> 107 raise ValueError(
108 f"This version of {__package_name__} requires GMS v0.8.0 or higher"
..................................................
self = <datahub.ingestion.graph.client.DataHubGraph object at 0x7fc6fc416280>
response = <Response [200]>
self._session.get = <method 'Session.get' of <requests.sessions.Session object at 0x7fc6fc4163a0> sessions.py:534>
response.raise_for_status = <method 'Response.raise_for_status' of <Response [200]> models.py:918>
config = {'compatibilityLevel': 'BACKWARD'}
response.json = <method 'Response.json' of <Response [200]> models.py:874>
..................................................
---- (full traceback above) ----
File "/opt/miniconda3/lib/python3.8/site-packages/datahub/entrypoints.py", line 102, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/miniconda3/lib/python3.8/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/opt/miniconda3/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 141, in wrapper
res = func(*args, **kwargs)
File "/opt/miniconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 76, in run
pipeline = Pipeline.create(pipeline_config, dry_run, preview)
File "/opt/miniconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 143, in create
return cls(config, dry_run=dry_run, preview_mode=preview_mode)
File "/opt/miniconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 103, in __init__
self.ctx = PipelineContext(
File "/opt/miniconda3/lib/python3.8/site-packages/datahub/ingestion/api/common.py", line 38, in __init__
self.graph = DataHubGraph(datahub_api) if datahub_api is not None else None
File "/opt/miniconda3/lib/python3.8/site-packages/datahub/ingestion/graph/client.py", line 39, in __init__
self.test_connection()
File "/opt/miniconda3/lib/python3.8/site-packages/datahub/emitter/rest_emitter.py", line 107, in test_connection
raise ValueError(
ValueError: This version of acryl-datahub requires GMS v0.8.0 or higher
rapid-sundown-8805
12/13/2021, 2:01 PMlemon-receptionist-90470
12/13/2021, 3:43 PMextraEnvs:
- name: AUTH_JAAS_ENABLED
value: "false"
- name: AUTH_OIDC_ENABLED
value: "true"
- name: AUTH_OIDC_CLIENT_ID
valueFrom:
secretKeyRef:
name: datahub
key: OIDC_CLIENT_ID
- name: AUTH_OIDC_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: datahub
key: OIDC_CLIENT_SECRET
- name: AUTH_OIDC_DISCOVERY_URI
value: "<https://accounts.google.com/.well-known/openid-configuration>"
- name: AUTH_OIDC_BASE_URL
value: "<https://XXXXXXX>"
- name: AUTH_OIDC_SCOPE
value: "openid email profile"
- name: AUTH_OIDC_USER_NAME_CLAIM
value: "email"
- name: AUTH_OIDC_USER_NAME_CLAIM_REGEX
value: "([^@]+)"
- name: AUTH_OIDC_JIT_PROVISIONING_ENABLED
value: "true"
- name: AUTH_OIDC_PRE_PROVISIONING_REQUIRED
value: "false"
- name: AUTH_OIDC_EXTRACT_GROUPS_ENABLED
value: "true"
- name: AUTH_OIDC_GROUPS_CLAIM
value: "groups"
Note: The Login using OIDC is working as expected.
Any help here? Thanks! 🤶some-crayon-90964
12/13/2021, 9:24 PM./gradlew build
, i have checked that my python version is 3.8. Any ideas how to fix this? Thanks.handsome-football-66174
12/13/2021, 9:47 PMmutation createPolicy{
createPolicy (
input: {
type: PLATFORM,
name: "TestPolicy",
state: ACTIVE,
description: "Testing Policy via Graphiql",
#resources:ResourceTypeFilterInput(resources:allResources),
privileges:["MANAGE_POLICIES",
"MANAGE_USERS_AND_GROUPS",
"VIEW_ANALYTICS"]
}
)
}
cool-painting-92220
12/13/2021, 9:53 PMmysql --host=127.0.0.1 --port=3306 -u datahub -p datahub
, I was able to enter the mysql container for DataHub. It contained two tables, metadata_aspect_v2
and metadata_index
. Would these following steps be all that I need to backup in order to restore DataHub completely in case of dire circumstances where current volumes are removed or corrupted?
Backup: mysqldump --host=127.0.0.1 --port=3306 -u datahub -p --all-databases --no-tablespaces > metadata.sql
Restore: mysql --host=127.0.0.1 --port=3306 -u datahub -p < metadata.sql
stocky-television-65849
12/14/2021, 12:10 AMdatahub docker quickstart
Both my mac and linux box return this error:
CalledProcessError: Command '['docker-compose', '-f', '/var/folders/_6/ql7t0n_j2zxd7r_wbrwsgptc0000gq/T/tmphpw6pyai.yml', '-p', 'datahub', 'pull']' returned non-zero exit status 1.
bumpy-activity-74405
12/14/2021, 12:40 PM12:38:44.067 [qtp1504109395-479] INFO c.l.m.filter.RestliLoggingFilter - GET /aspects/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2Csome_db.some_table%2CPROD%29?aspect=subTypes&version=0 - get - 404 - 1ms
12:38:44.067 [qtp1504109395-479] ERROR c.l.m.filter.RestliLoggingFilter - null
It does not seem to affect anything - everything works fine as far as I can tell. Running 0.8.17
. Is this a known issue or am I doing something wrong?ambitious-cartoon-15344
12/15/2021, 7:41 AMcalm-sunset-28996
12/15/2021, 8:45 AMquery getDataset {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:redshift,mydataset,PROD)") {
downstreamLineage {
entities {
entity {
... on Dataset {
downstreamLineage {
entities {
entity {
urn
}
}
}
upstreamLineage {
entities {
entity {
urn
}
}
}
}
}
}
}
}
}
Is really slow (10+ seconds). Is there something that might improve this? We use Neo4j on the backend, so we were wondering if switching to the ES backend is faster or not? Or if you have any tips in debugging this. (Neo4j is not hitting capacity limits, GMS only very rarely.)bumpy-activity-74405
12/15/2021, 11:17 AMcom.linkedin.metadata.snapshot.DataJobSnapshot
that ties together a bunch of hive tables. Some of the data jobs have a lot of inputDatasets
( > 100). Yet somehow in the UI only 99 are showed although I can see through dev tools that graphql query returns two arrays.kind-engineer-69109
12/15/2021, 11:53 AMWARNING: dev.pg_catalog.stv_wlm_query_state missing table
WARNING: dev.pg_catalog.stv_wlm_classification_config missing table
WARNING: dev.pg_catalog.stv_wlm_service_class_config missing table
WARNING: dev.pg_catalog.stv_wlm_service_class_state missing table
Thanks in advance!red-window-75368
12/15/2021, 2:40 PMstocky-television-65849
12/15/2021, 3:01 PMmodern-monitor-81461
12/15/2021, 4:06 PM15:57:20 [main] INFO play.core.server.AkkaHttpServer - Listening for HTTP on /0:0:0:0:0:0:0:0:9002
16:00:50 [application-akka.actor.default-dispatcher-16] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
16:00:50 [application-akka.actor.default-dispatcher-9] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
16:00:50 [application-akka.actor.default-dispatcher-30] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
16:00:51 [application-akka.actor.default-dispatcher-20] WARN p.api.mvc.LegacySessionCookieBaker - Cookie failed message authentication check
16:00:54 [application-akka.actor.default-dispatcher-9] WARN o.p.o.profile.creator.TokenValidator - Preferred JWS algorithm: null not available. Using all metadata algorithms: [RS256]
16:00:54 [application-akka.actor.default-dispatcher-9] ERROR auth.sso.oidc.OidcCallbackLogic - Unable to renew the session. The session store may not support this feature
16:00:57 [application-akka.actor.default-dispatcher-22] ERROR auth.sso.oidc.OidcCallbackLogic - Unable to renew the session. The session store may not support this feature
If I start from an incognito window (with no cookies), I don't get the `Cookie failed message`errors.
I am using the latest helm chart and deploying on Azure AKS. Here are my extraEnvs
values:
datahub-frontend:
ingress:
enabled: true
hosts:
- host: <http://datahub.mydomain.com|datahub.mydomain.com>
paths:
- "/"
tls:
- secretName: mydomain-tls
hosts:
- <http://datahub.mydomain.com|datahub.mydomain.com>
extraEnvs:
# Required Configuration Values for OIDC:
- name: AUTH_OIDC_ENABLED
value: "true"
- name: AUTH_OIDC_CLIENT_ID
value: "..."
- name: AUTH_OIDC_CLIENT_SECRET
value: "..."
- name: AUTH_OIDC_DISCOVERY_URI
value: "<https://login.microsoftonline.com/><tenantID>/v2.0/.well-known/openid-configuration"
- name: AUTH_OIDC_BASE_URL
value: "<https://datahub.mydomain.com>"
and the .well-known/openid-configuration
from Azure:
{
"token_endpoint": "<https://login.microsoftonline.com/><tenantID>/oauth2/v2.0/token",
"token_endpoint_auth_methods_supported": [
"client_secret_post",
"private_key_jwt",
"client_secret_basic"
],
"jwks_uri": "<https://login.microsoftonline.com/><tenantID>/discovery/v2.0/keys",
"response_modes_supported": [
"query",
"fragment",
"form_post"
],
"subject_types_supported": [
"pairwise"
],
"id_token_signing_alg_values_supported": [
"RS256"
],
"response_types_supported": [
"code",
"id_token",
"code id_token",
"id_token token"
],
"scopes_supported": [
"openid",
"profile",
"email",
"offline_access"
],
"issuer": "<https://login.microsoftonline.com/><tenantID>/v2.0",
"request_uri_parameter_supported": false,
"userinfo_endpoint": "<https://graph.microsoft.com/oidc/userinfo>",
"authorization_endpoint": "<https://login.microsoftonline.com/><tenantID>/oauth2/v2.0/authorize",
"device_authorization_endpoint": "<https://login.microsoftonline.com/><tenantID>/oauth2/v2.0/devicecode",
"http_logout_supported": true,
"frontchannel_logout_supported": true,
"end_session_endpoint": "<https://login.microsoftonline.com/><tenantID>/oauth2/v2.0/logout",
"claims_supported": [
"sub",
"iss",
"cloud_instance_name",
"cloud_instance_host_name",
"cloud_graph_host_name",
"msgraph_host",
"aud",
"exp",
"iat",
"auth_time",
"acr",
"nonce",
"preferred_username",
"name",
"tid",
"ver",
"at_hash",
"c_hash",
"email"
],
"kerberos_endpoint": "<https://login.microsoftonline.com/><tenantID>/kerberos",
"tenant_region_scope": "NA",
"cloud_instance_name": "<http://microsoftonline.com|microsoftonline.com>",
"cloud_graph_host_name": "<http://graph.windows.net|graph.windows.net>",
"msgraph_host": "<http://graph.microsoft.com|graph.microsoft.com>",
"rbac_url": "<https://pas.windows.net>"
}
The error message Unable to renew the session. The session store may not support this feature
is not helping me in this case... Any idea what is wrong with my setup?stocky-television-65849
12/15/2021, 8:13 PM