delightful-jelly-56633
08/02/2022, 7:13 PMdatahub docker quickstart
No Datahub Neo4j volume found, starting with elasticsearch as graph service.
To use neo4j as a graph backend, run
`datahub docker quickstart --quickstart-compose-file ./docker/quickstart/docker-compose.quickstart.yml`
from the root of the datahub repo
Fetching docker-compose file <https://raw.githubusercontent.com/datahub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml> from GitHub
[2022-08-02 19:01:23,774] ERROR {datahub.entrypoints:188} - Command failed with [Errno 2] No such file or directory: 'docker-compose'. Run with --debug to get full trace
[2022-08-02 19:01:23,774] INFO {datahub.entrypoints:191} - DataHub CLI version: 0.8.41.2 at /home/ubuntu/.pyenv/versions/3.9.9/lib/python3.9/site-packages/datahub/__init__.py
[2022-08-02 19:01:23,812] ERROR {asyncio:1738} - Task exception was never retrieved
future: <Task finished name='Task-2' coro=<retrieve_version_stats() done, defined at /home/ubuntu/.pyenv/versions/3.9.9/lib/python3.9/site-packages/datahub/upgrade/upgrade.py:159> exception=UnboundLocalError("local variable 'current_server_release_date' referenced before assignment")>
Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/3.9.9/lib/python3.9/site-packages/datahub/upgrade/upgrade.py", line 180, in retrieve_version_stats
) = server_config_future.result()
File "/home/ubuntu/.pyenv/versions/3.9.9/lib/python3.9/site-packages/datahub/upgrade/upgrade.py", line 156, in get_server_version_stats
return (server_type, server_version, current_server_release_date)
UnboundLocalError: local variable 'current_server_release_date' referenced before assignment
delightful-jelly-56633
08/02/2022, 7:13 PMearly-student-2446
08/02/2022, 10:19 PM22:18:50.783 [gmsEbeanServiceConfig.heartBeat] ERROR i.e.datasource.pool.PooledConnection:311 - Error when fully closing connection [name[gmsEbeanServiceConfig17] slot[3] startTime[1659478591534] busySeconds[139] stackTrace[] stmt[select urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ? UNION ALL SELECT urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ? UNION ALL SELECT urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ? UNION ALL SELECT urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ? UNION ALL SELECT urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ? UNION ALL SELECT urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ? UNION ALL SELECT urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ? UNION ALL SELECT urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ? UNION ALL SELECT urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ? UNION ALL SELECT urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ?]]
java.sql.SQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown.
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)
at com.mysql.cj.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:1848)
at com.mysql.cj.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:1705)
at com.mysql.cj.jdbc.ConnectionImpl.close(ConnectionImpl.java:721)
at io.ebean.datasource.pool.PooledConnection.closeConnectionFully(PooledConnection.java:308)
at io.ebean.datasource.pool.FreeConnectionBuffer.trim(FreeConnectionBuffer.java:91)
at io.ebean.datasource.pool.PooledConnectionQueue.trimInactiveConnections(PooledConnectionQueue.java:442)
at io.ebean.datasource.pool.PooledConnectionQueue.trim(PooledConnectionQueue.java:422)
at io.ebean.datasource.pool.ConnectionPool.trimIdleConnections(ConnectionPool.java:441)
at io.ebean.datasource.pool.ConnectionPool.checkDataSource(ConnectionPool.java:459)
at io.ebean.datasource.pool.ConnectionPool.access$000(ConnectionPool.java:43)
at io.ebean.datasource.pool.ConnectionPool$HeartBeatRunnable.run(ConnectionPool.java:260)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
famous-florist-7218
08/03/2022, 3:31 AME0803 10:23:11.850679 89160 portforward.go:406] an error occurred forwarding 9002 -> 9002: error forwarding port 9002 to pod 39e9085d08f2eee680eec4bb5665835613d931532e5ca0688443fa60a75a9d7f, uid : failed to execute portforward in network namespace "/var/run/netns/cni-9494401e-109c-0e8f-c535-11ae11edfdce": read tcp4 127.0.0.1:56872->127.0.0.1:9002: read: connection reset by peer
E0803 10:23:11.852843 89160 portforward.go:234] lost connection to pod
cuddly-butcher-39945
08/02/2022, 3:39 PMnumerous-account-62719
08/03/2022, 6:33 AMsalmon-area-51650
08/03/2022, 7:19 AM502 Bad Gateway
š¢ when I try to authenticate with my google account. Itās happening for all users!!
Front-end log
datahub-datahub-frontend-56784d769d-zjndd datahub-frontend 07:11:19 [application-akka.actor.default-dispatcher-59] ERROR auth.sso.oidc.OidcCallbackLogic - Unable to renew the session. The session store may not support this feature
datahub-datahub-frontend-56784d769d-zjndd datahub-frontend 07:11:31 [application-akka.actor.default-dispatcher-23] ERROR application -
datahub-datahub-frontend-56784d769d-zjndd datahub-frontend ! XXXXXX - Internal server error, for (GET) [/callback/oidc?state=7XXXXXXXXXX&code=4XXXXXXXXX&scope=email%20profile%20openid%20<https://www.googleapis.com/auth/userinfo.profile%20https://www.googleapis.com/auth/userinfo.email&authuser=0&hd=company.com&prompt=none>] ->
datahub-datahub-frontend-56784d769d-zjndd datahub-frontend
datahub-datahub-frontend-56784d769d-zjndd datahub-frontend play.api.UnexpectedException: Unexpected exception[CompletionException: org.pac4j.core.exception.TechnicalException: Bad token response, error=invalid_grant]
datahub-datahub-frontend-56784d769d-zjndd datahub-frontend at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:247)
GMS log:
datahub-datahub-gms-6d9db764dd-vql2q datahub-gms 07:11:43.009 [qtp544724190-10] INFO c.l.m.r.entity.EntityResource:137 - GET urn:li:corpuser:miguel.sotomayor
datahub-datahub-gms-6d9db764dd-vql2q datahub-gms 07:11:43.022 [pool-10-thread-1] INFO c.l.m.filter.RestliLoggingFilter:55 - GET /entities/urn%3Ali%3Acorpuser%3Amiguel.sotomayor - get - 200 - 13ms
datahub-datahub-gms-6d9db764dd-vql2q datahub-gms 07:11:43.036 [qtp544724190-13] INFO c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=corpUserStatus, entityUrn=urn:li:corpuser:miguel.sotomayor, entityType=corpuser, changeType=UPSERT, aspect={contentType=application/json, value=ByteString(length=100,bytes=7b227374...33337d7d)}}
datahub-datahub-gms-6d9db764dd-vql2q datahub-gms 07:11:43.085 [pool-10-thread-1] INFO c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 49ms
datahub-datahub-gms-6d9db764dd-vql2q datahub-gms 07:11:43.222 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:28 - Successfully fed bulk request. Number of events: 2 Took time ms: -1
Configuration:
extraEnvs:
- name: AUTH_OIDC_ENABLED
value: "true"
- name: AUTH_OIDC_CLIENT_ID
valueFrom:
secretKeyRef:
name: auth-datahub-credentials
key: OIDC_CLIENT_ID
- name: AUTH_OIDC_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: auth-datahub-credentials
key: OIDC_CLIENT_SECRET
- name: AUTH_OIDC_DISCOVERY_URI
value: "<https://accounts.google.com/.well-known/openid-configuration>"
- name: AUTH_OIDC_SCOPE
value: "openid profile email"
- name: AUTH_OIDC_USER_NAME_CLAIM
value: "email"
- name: AUTH_OIDC_USER_NAME_CLAIM_REGEX
value: "([^@]+)"
- name: AUTH_OIDC_BASE_URL
value: "<https://mnycompany.com>"
numerous-account-62719
08/03/2022, 8:33 AMmammoth-lawyer-49919
08/03/2022, 10:25 AMgray-agency-10420
08/03/2022, 11:35 AMastonishing-table-23396
08/03/2022, 2:32 PMfaint-translator-23365
08/03/2022, 4:01 PMaloof-piano-22267
08/03/2022, 4:47 PMclever-air-4600
08/03/2022, 5:55 PM'warnings': {},
'failures': {},
'cli_version': '0.8.41.2',
'cli_entry_location': '***/POC_DataHub/venv/lib/python3.8/site-packages/datahub/__init__.py',
'py_version': '3.8.10 (default, Jun 22 2022, 20:18:18) \n[GCC 9.4.0]',
'py_exec_path': '***/POC_DataHub/venv/bin/python3',
'os_details': 'Linux-5.15.0-41-generic-x86_64-with-glibc2.29',
'filtered': []}
Sink (datahub-rest) report:
{'records_written': '23',
'warnings': [],
'failures': [],
'downstream_start_time': '2022-08-03 14:49:50.681803',
'downstream_end_time': '2022-08-03 14:50:20.485661',
'downstream_total_latency_in_seconds': '29.803858',
'gms_version': 'v0.8.41'}
Pipeline finished successfully producing 22 workunits
so everything seems to be working fine, i check the database and i see info of the bucket, but nothing shows up in datahub.
This is my recipe:
source:
type: s3
config:
path_specs:
-
include: "<s3://info_excluded/2021/11/01/*.*>"
aws_config:
aws_access_key_id: ***
aws_secret_access_key: ***
aws_region: ***
env: "PROD"
profiling:
enabled: True
transformers:
- type: "simple_add_dataset_tags"
config:
tag_urns:
- "urn:li:tag:probando_s3"
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
do you know how to solve this? thankslemon-doctor-75480
08/03/2022, 7:51 PMlemon-doctor-75480
08/03/2022, 7:51 PMlemon-doctor-75480
08/03/2022, 7:52 PMlemon-doctor-75480
08/03/2022, 7:54 PMusr/lib/python3/dist-packages/paramiko/transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated
"class": algorithms.Blowfish,
[2022-08-03 15:41:16,888] DEBUG {datahub.telemetry.telemetry:202} - Sending init Telemetry
[2022-08-03 15:41:16,967] DEBUG {datahub.telemetry.telemetry:235} - Sending Telemetry
[2022-08-03 15:41:17,013] INFO {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.41.2
[2022-08-03 15:41:17,027] DEBUG {datahub.cli.ingest_cli:178} - Using config: {'source': {'type': 'delta-lake', 'config': {'base_path': '<s3://dt.datalake-dev/eventsData/us-west-1/>', 's3': {'aws_config': {'aws_access_key_id': 'foo', 'aws_secret_access_key': 'bar'}}}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<http://localhost:8080>'}}}
[2022-08-03 15:41:17,121] DEBUG {datahub.ingestion.sink.datahub_rest:69} - Setting env variables to override config
[2022-08-03 15:41:17,121] DEBUG {datahub.ingestion.sink.datahub_rest:71} - Setting gms config
[2022-08-03 15:41:17,122] DEBUG {datahub.ingestion.run.pipeline:162} - Sink type:datahub-rest,<class 'datahub.ingestion.sink.datahub_rest.DatahubRestSink'> configured
[2022-08-03 15:41:17,122] INFO {datahub.ingestion.run.pipeline:163} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
[2022-08-03 15:41:17,618] ERROR {logger:39} - Deequ is still not supported in spark version: spark-3.0.2
[2022-08-03 15:41:17,618] INFO {logger:40} - Using deequ: com.amazon.deequ:deequ:1.2.2-spark-3.0
[2022-08-03 15:41:17,820] ERROR {datahub.ingestion.run.pipeline:127} - 's3'
[2022-08-03 15:41:17,821] INFO {datahub.cli.ingest_cli:119} - Starting metadata ingestion
[2022-08-03 15:41:17,821] INFO {datahub.cli.ingest_cli:137} - Finished metadata ingestion
Failed to configure source (delta-lake) due to 's3'
kind-dawn-17532
08/03/2022, 7:54 PMlively-action-8308
08/03/2022, 9:10 PMset-cookie
HTTP response header on the `callback/oidc`endpoint with the same-site and secure attributes, like this:
Set-Cookie: <cookie-name>=<cookie-value>; SameSite=None; Secure
ambitious-cartoon-15344
08/04/2022, 3:07 AMjolly-traffic-67085
08/04/2022, 9:13 AMquery {
glossaryNode(urn: "urn:li:glossaryNode:184d95df-a2b5-4479-a1b3-2cbd952a4325") {
children: relationships(
input: {
types: ["IsPartOf"]
direction: INCOMING
start: 0
count: 1000
}
) {
relationships {
entity {
urn
type
... on GlossaryTerm {
properties {
name
}
< I need to show related entities of dataset and field each glossaryterm >
}
}
}
}
}
}
early-student-2446
08/04/2022, 9:18 AMmost-nightfall-36645
08/04/2022, 9:59 AMv0.8.42
from v0.8.41
.
We have deployed datahub using the k8s helm chart.
During the upgrade all jobs succeeed and the front-end pod is replaced.
However the gms pod is failing to start with following error:
ERROR: No such classes directory file:///etc/datahub/plugins/auth/resources
Usage: java [-Djetty.home=dir] -jar jetty-runner.jar [--help|--version] [ server opts] [[ context opts] context ...]
Server opts:
--version - display version and exit
--log file - request log filename (with optional 'yyyy_mm_dd' wildcard
--out file - info/warn/debug log filename (with optional 'yyyy_mm_dd' wildcard
--host name|ip - interface to listen on (default is all interfaces)
--port n - port to listen on (default 8080)
--stop-port n - port to listen for stop command (or -DSTOP.PORT=n)
--stop-key n - security string for stop command (required if --stop-port is present) (or -DSTOP.KEY=n)
[--jar file]*n - each tuple specifies an extra jar to be added to the classloader
[--lib dir]*n - each tuple specifies an extra directory of jars to be added to the classloader
[--classes dir]*n - each tuple specifies an extra directory of classes to be added to the classloader
--stats [unsecure|realm.properties] - enable stats gathering servlet context
[--config file]*n - each tuple specifies the name of a jetty xml config file to apply (in the order defined)
Context opts:
[[--path /path] context]*n - WAR file, web app dir or context xml file, optionally with a context path
2022/08/04 09:53:25 Command exited with error: exit status 1
In short it seems like the jar is missing some classes?
I checked out the pod manifests and we are using the linkedin/datahub-gms:v0.8.42
container image.rich-painting-26110
08/04/2022, 10:24 AMwooden-pencil-40912
08/04/2022, 11:06 AMwooden-pencil-40912
08/04/2022, 2:33 PM0.8.40
and 0.8.41
, is this a known issue with these version ? Local docker testing works alright for 0.8.42
but since helm chart is not yet available, i can not use it.lemon-doctor-75480
08/04/2022, 4:06 PMlemon-doctor-75480
08/04/2022, 4:09 PMbrave-tomato-16287
08/05/2022, 7:39 AM2022-08-05T07:28:52.437028187Z ERROR: No such classes directory file:///etc/datahub/plugins/auth/resources
datahub-gms
Could you help us to resolve this error?