fresh-napkin-5247
04/21/2022, 1:07 PMUnboundLocalError: local variable 'node_urn' referenced before assignment
I am running the command datahub ingest -c glue.yml
, and it runs and writes a lot of datasets to the sink file, but then this error appears and the process exits. Anyone had a similar issue?
The recipe file is just a regular recipe file like in the demo on the website.
I also had another error, where an exception would occur because Datahub was trying to read the 'StorageDescriptor' from a dictionary without this key (I assume this is from the boto3 API). I solved this error by ignoring some tables, however it's weird to me that datahub does not handle this exception and just stops altogether.
Thank you!full-dentist-68591
04/21/2022, 2:45 PMquick-student-61408
04/21/2022, 3:01 PM'dropped_dns': ['cn=charlie,ou=datahubaccounts,dc=datahub,dc=com',\n"
" 'cn=anne,ou=datahubaccounts,dc=datahub,dc=com',\n"
" 'cn=antoine,ou=datahubaccounts,dc=datahub,dc=com',\n"
" 'cn=charlieC,ou=datahubaccounts,dc=datahub,dc=com',\n"
" 'cn=charlie C,ou=datahubaccounts,dc=datahub,dc=com',\n"
" 'cn=charlie charlie,ou=datahubaccounts,dc=datahub,dc=com',\n"
" 'cn=anneD,ou=datahubaccounts,dc=datahub,dc=com']}\n"
When i turned to false the drop_missing_first_last_name
option, i've an error : find attached
Can you help me ? Thank youacoustic-quill-54426
04/22/2022, 8:55 AMbigquery
and bigquery-usage
is failing since yesterday for us due to 500 errors at <http://logging.googleapis.com/v2/entries:list|logging.googleapis.com/v2/entries:list>
. Although google claims the incident to be resolved I can reproduce the error from google cloud console 😅square-solstice-69079
04/22/2022, 9:53 AMbetter-spoon-77762
04/22/2022, 5:42 PMsquare-solstice-69079
04/23/2022, 7:25 AMmany-pillow-9544
04/24/2022, 7:52 AMmodern-zoo-97059
04/25/2022, 2:39 AMplay.api.UnexpectedException: Unexpected exception[CompletionException: java.net.ConnectException: Connection refused: datahub-gms/172.18.0.5:8080]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:247)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:176)
at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:363)
at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:361)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:346)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:345)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:92)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:92)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:92)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: java.net.ConnectException: Connection refused: datahub-gms/172.18.0.5:8080
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:21)
at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:18)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
at scala.concurrent.Promise$class.failure(Promise.scala:104)
at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:157)
at play.libs.ws.ahc.StandaloneAhcWSClient$ResponseAsyncCompletionHandler.onThrowable(StandaloneAhcWSClient.java:227)
at play.shaded.ahc.org.asynchttpclient.netty.NettyResponseFuture.abort(NettyResponseFuture.java:278)
at play.shaded.ahc.org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:181)
at play.shaded.ahc.org.asynchttpclient.netty.channel.NettyChannelConnector$1.onFailure(NettyChannelConnector.java:108)
at play.shaded.ahc.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:28)
at play.shaded.ahc.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)
at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
at play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327)
at play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343)
at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at play.shaded.ahc.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at play.shaded.ahc.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused: datahub-gms/172.18.0.5:8080
at play.shaded.ahc.org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:179)
... 17 common frames omitted
Caused by: play.shaded.ahc.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: datahub-gms/172.18.0.5:8080
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at play.shaded.ahc.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
... 7 common frames omitted
Caused by: java.net.ConnectException: Connection refused
... 11 common frames omitted
Hi! I user ingest UI. and it's failed and throws 500 exception. then I refresh page and I'm facing this problem.full-dentist-68591
04/25/2022, 7:14 AMDataHubGraph
doesn't seem suitable because it requires entity_urn
.
Any recommendations here?important-wire-73
04/25/2022, 7:22 AMkind-psychiatrist-76973
04/25/2022, 9:04 AMstale-jewelry-2440
04/25/2022, 1:55 PMoutlets={"datasets": [Dataset("file", "AppleSchoolManager.courses_csv")]},
clever-air-4600
04/25/2022, 6:12 PMmicroscopic-mechanic-13766
04/26/2022, 8:17 AMjolly-traffic-67085
04/26/2022, 9:40 AMambitious-cartoon-15344
04/27/2022, 7:08 AMkind-psychiatrist-76973
04/27/2022, 10:29 AM10:28:53.812 [pool-9-thread-1] INFO c.l.m.filter.RestliLoggingFilter - POST /usageStats?action=queryRange - queryRange - 200 - 356ms
10:29:17.224 [qtp544724190-11718] INFO c.l.m.r.entity.EntityResource - LIST URNS for dataHubPolicy with start 0 and count 30
10:29:27.224 [pool-17-thread-1] ERROR c.d.m.a.AuthorizationManager - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
com.linkedin.r2.RemoteInvocationException: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://localhost:8080/entities>
at com.linkedin.restli.internal.client.ExceptionUtil.wrapThrowable(ExceptionUtil.java:135)
at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponseImpl(ResponseFutureImpl.java:130)
at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponse(ResponseFutureImpl.java:94)
at com.linkedin.common.client.BaseClient.sendClientRequest(BaseClient.java:28)
at com.linkedin.entity.client.RestliEntityClient.listUrns(RestliEntityClient.java:390)
at com.datahub.metadata.authorization.AuthorizationManager$PolicyRefreshRunnable.run(AuthorizationManager.java:186)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://localhost:8080/entities>
at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:67)
at com.linkedin.r2.transport.http.client.rest.ExecutionCallback.lambda$onResponse$0(ExecutionCallback.java:64)
... 3 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Exceeded request timeout of 10000ms
at com.linkedin.r2.transport.http.client.TimeoutTransportCallback$1.run(TimeoutTransportCallback.java:69)
at com.linkedin.r2.util.Timeout.lambda$new$0(Timeout.java:77)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 common frames omitted
10:31:17.225 [qtp544724190-7234] INFO c.l.m.r.enti
Do this affected the UI or any other code functionality of Datahub?brainy-vegetable-68946
04/27/2022, 5:01 PMbetter-orange-49102
04/28/2022, 10:30 AMTask :datahub-graphql-core:compileJava
/datahub/datahub-graphql-core/src/mainGeneratedGraphQL/java/com/linkedin/datahub/graphql/generated/VisualConfiguration.java:7: error: cannot find symbol @javax.annotation.processing.Generated(
symbol: class Generated
location: package javax.annotation.processing
<followed by all the other files in the same folder giving the same annotation error msg>
which i think is due to the presence of the JDK11. Any suggestions on overcoming this?
the command i used to build was
./gradlew build -x :metadata-ingestion:build -x :metadata-ingestion:check -x docs-website:build -x datahub-web-react:yarnBuild -x datahub-frontend:unzipAssets
./gradlew build -x :metadata-ingestion:build -x :metadata-ingestion:check -x docs-website:build -x :metadata-integration:java:spark-lineage:test
breezy-portugal-43538
04/28/2022, 1:30 PMmessage:"No root resource defined for path '/datasets'","status":404}
appears. Is it possible to update properties to datasets ingested from S3, if yes then how?
my curl command:
curl --location --request POST '<http://localhost:8080/datasets?action=ingest>' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"snapshot": {
"aspects": [
{
"com.linkedin.dataset.DatasetProperties": {
"customProperties": {
"SuperProperty": "over 9000"
}
}
}
],
"urn": "urn:li:dataset:(urn:li:dataset:(urn:li:dataPlatform:s3,origin_file_src%2Fdata%2Ftest%2Fother_timeZ%2Ftime%2other_folder%2Fsome_folder%2Fexample.csv,DEV)
}
}'
Issue might be because my urn is incorrect - I had copied it from the webpage url. I tried to find the correct url at http://localhost:9200/datasetindex_v2/_search?=pretty but for some reason dataplatform:s3 is not visible there, do you know how can I get my s3 urn name to be sure that I had it setup correctly?
Thanks in advance for the help!
*EDIT: changing in the urn name to use . instead of %2F did not helplimited-agent-54038
04/29/2022, 3:10 AM'[2022-04-29 02:44:40,288] ERROR {logger:26} - Please set env variable SPARK_VERSION\n'
I am just having trouble figuring out where this env variable is or how to change it. Thankssquare-solstice-69079
04/29/2022, 11:16 AMkind-psychiatrist-76973
04/29/2022, 12:49 PMv0.8.17
and I updated it to v0.8.33
. After the deployment the UI crashed and this is the error I have from the logs:
! @7nf015ap6 - Internal server error, for (GET) [/callback/oidc?state=LqmnUiAvYgUGt98yM69UMRPG24DNJMAazoGGCH66Fkw&code=4/0AX4XfWg4uU9YpUKuVYjja_NgSZ0r7n4HTGM_Gpg87fxx4ODyQDVde1tIC0jPB7nEzaVjSw&scope=email%20profile%20<https://www.googleapis.com/auth/userinfo.profile%20openid%20https://www.googleapis.com/auth/userinfo.email&authuser=1&hd=sennder.com&prompt=none>] ->
play.api.UnexpectedException: Unexpected exception[CompletionException: org.pac4j.core.exception.TechnicalException: Bad token response, error=invalid_grant]
mammoth-fall-12031
05/02/2022, 8:10 AM./gradlew build
* What went wrong:
Execution failed for task ':metadata-service:restli-servlet-impl:generateRestModel'.
> Process 'command '/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java'' finished with non-zero exit value 1
Have tried doing ./gradlew clean
and ran
./gradlew :metadata-service:restli-servlet-impl:build -Prest.model.compatibility=ignore
but still getting the same error.
System config: MacOS Monterey 12.1
Java version:
java version "1.8.0_331"
Java(TM) SE Runtime Environment (build 1.8.0_331-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.331-b09, mixed mode)
Any ways to resolve this?fresh-napkin-5247
05/02/2022, 8:57 AMkind-psychiatrist-76973
05/03/2022, 3:57 PM# Snowflake to Datahub recipe configuration
# To run an ingestion run: datahub ingest -c ./metadata-ingestion/recipes/snowflake_to_datahub_rest.yml
# pipeline_name: "my_snowflake_pipeline_1"
source:
type: snowflake
config:
# Coordinates
host_port: ${SNOWFLAKE_ACCOUNT}
warehouse: 'AGGREGATION_COMPUTE'
# Credentials
username: ${SNOWFLAKE_USERNAME}
password: ${SNOWFLAKE_PASSWORD}
role: 'XADMIN'
env: "PROD"
profiling:
enabled: False
database_pattern:
allow:
- "DWXX"
- "VISIBILITY"
- "STRATEGY_AND_PLANNING"
- "ABC_SHIPPER_STRATEGY_AND_PLANNING"
- "XYZ"
- "MARKETING"
- "GLOBAL_OPERATIONS"
- "CENTRAL_STRATEGY_AND_PLANNING"
- "FINANCE"
deny:
- "DEV"
- "ANALYST_DEV"
table_pattern:
ignoreCase: False
include_tables: True
include_views: True
include_table_lineage: False
stateful_ingestion:
enabled: True
remove_stale_metadata: True
sink:
type: "datahub-rest"
config:
server: ${DATAHUB_GMS_HOST}:8080
I get this validation error:
1 validation error for SnowflakeConfig │
│ stateful_ingestion │
│ extra fields not permitted (type=value_error.extra)
which is really vague, I have not any idea of what I am doing wrongclever-air-4600
05/03/2022, 7:02 PMsearch(
input: {start: 0, count: 10, query: "*", type: DATASET, filters: {field: "tags", value: "urn:li:tag:Phone"} }
) {
searchResults {
entity {
urn
type
}
matchedFields {
name
value
}
}
}
}
im trying something like this, works with table tags but not with the column oneslimited-agent-54038
05/04/2022, 5:16 AMsource:
type: data-lake
config:
env: "PROD"
platform: "local-data-lake"
base_path: "~/.datahub/data_test2.json"
profiling:
enabled: true
sink:
type: console
and am getting the following error:
---- (full traceback above) ----
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 82, in run
pipeline = Pipeline.create(pipeline_config, dry_run, preview)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 175, in create
return cls(config, dry_run=dry_run, preview_mode=preview_mode)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 127, in __init__
self.source: Source = source_class.create(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 248, in create
return cls(config, ctx)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 176, in __init__
self.init_spark()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 242, in init_spark
self.spark = SparkSession.builder.config(conf=conf).getOrCreate()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/sql/session.py", line 186, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/context.py", line 378, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/context.py", line 133, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/context.py", line 327, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/java_gateway.py", line 105, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
[2022-05-03 22:15:55,416] INFO {datahub.entrypoints:161} - DataHub CLI version: 0.8.30.0 at /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/__init__.py
[2022-05-03 22:15:55,416] INFO {datahub.entrypoints:164} - Python version: 3.10.0 (v3.10.0:b494f5935c, Oct 4 2021, 14:59:20) [Clang 12.0.5 (clang-1205.0.22.11)] at /Library/Frameworks/Python.framework/Versions/3.10/bin/python3 on macOS-11.6.5-x86_64-i386-64bit
[2022-05-03 22:15:55,416] INFO {datahub.entrypoints:167} - GMS config {}
astonishing-guitar-79208
05/04/2022, 9:10 AMdatahub-frontend
JaaS authentication with Kerberos. I'm providing a custom jaas.conf
file via k8s configmap, volume mounted in the container at the path specified here - https://datahubproject.io/docs/how/auth/jaas#custom-jaas-configuration. But no matter what jaas.conf
file I provide (even the default one with PropertyFileLoginModule) the app fails to boot up with an error that doesn't help much debug the issue. Full error in the thread.