better-bird-87143
12/02/2022, 10:31 AMwooden-mouse-75975
12/02/2022, 10:59 AMmicroscopic-mechanic-13766
12/02/2022, 12:40 PM2022-12-02 12:25:59.457:INFO:oejshC.ROOT:main: 1 Spring WebApplicationInitializers detected on classpath
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
2022-12-02 12:25:59.531:INFO:oejs.session:main: DefaultSessionIdManager workerName=node0
2022-12-02 12:25:59.531:INFO:oejs.session:main: No SessionScavenger set, using defaults
2022-12-02 12:25:59.533:INFO:oejs.session:main: node0 Scavenging every 600000ms
2022-12-02 12:25:59.597:INFO:oejshC.ROOT:main: Initializing Spring root WebApplicationContext
SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
SLF4J: Defaulting to no-operation MDCAdapter implementation.
SLF4J: See <http://www.slf4j.org/codes.html#no_static_mdc_binder> for further details.
ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.22022-12-02 12:26:53.116:INFO:oejshC.ROOT:main: Initializing Spring DispatcherServlet 'apiServlet'
2022-12-02 12:26:53.639:INFO:oejshC.ROOT:main: Initializing Spring DispatcherServlet 'authServlet'
2022-12-02 12:26:53.711:INFO:oejshC.ROOT:main: Initializing Spring DispatcherServlet 'openapiServlet'
2022-12-02 12:26:56.239:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@6eda5c9{Open source GMS,/,[file:///tmp/jetty-0_0_0_0-8080-war_war-_-any-11220994130116230824/webapp/, jar:file:///tmp/jetty-0_0_0_0-8080-war_war-_-any-11220994130116230824/webapp/WEB-INF/lib/swagger-ui-4.10.3.jar!/META-INF/resources],AVAILABLE}{file:///datahub/datahub-gms/bin/war.war}
2022-12-02 12:26:56.261:INFO:oejs.AbstractConnector:main: Started ServerConnector@4387b79e{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2022-12-02 12:26:56.263:INFO:oejs.Server:main: Started @71782ms
I know those errors don't usually mean that there has been an error during the deployment, and Datahub's performance isn't affected at all, but as I didn't obtain the message INFO c.d.m.ingestion.IngestionScheduler:251 - Successfully fetched 4 ingestion sources.
I thought that maybe something was wrong (which at first it doesn't feel like it)
But anyways I just wanted to let you know those two things as in previous versions didn't happen 🙂straight-mouse-85445
12/02/2022, 9:57 PMstraight-mouse-85445
12/02/2022, 9:59 PMstraight-mouse-85445
12/02/2022, 10:01 PMgifted-knife-16120
12/03/2022, 5:05 PMacoustic-ghost-64885
12/05/2022, 10:17 AMgentle-portugal-21014
12/05/2022, 10:29 AMnarrow-zebra-66884
12/05/2022, 11:38 AMnarrow-zebra-66884
12/05/2022, 11:41 AM[2022-12-05 12:32:49,448] WARNING {great_expectations.dataset.sqlalchemy_dataset:1945} - No sqlalchemy dialect found; relying in top-level sqlalchemy types.
microscopic-mechanic-13766
12/05/2022, 1:24 PMmicroscopic-mechanic-13766
12/05/2022, 4:27 PM[2022-12-05 17:09:18,100] INFO {datahub_actions.cli.actions:119} - Action Pipeline with name 'datahub_teams_action' is now running
DEBUG {datahub_actions.pipeline.pipeline_manager:63} - Attempting to start pipeline with name datahub_teams_action...
Exception in thread Thread-2 (run_pipeline):
Traceback (most recent call last):
File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline_manager.py", line 42, in run_pipeline
pipeline.run()
File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline.py", line 166, in run
for enveloped_event in enveloped_events:
File "/usr/local/lib/python3.10/site-packages/datahub_actions/plugin/source/kafka/kafka_event_source.py", line 154, in events
msg = self.consumer.poll(timeout=2.0)
File "/usr/local/lib/python3.10/site-packages/confluent_kafka/deserializing_consumer.py", line 131, in poll
raise ConsumeError(msg.error(), kafka_message=msg)
confluent_kafka.error.ConsumeError: KafkaError{code=_TRANSPORT,val=-195,str="FindCoordinator response error: Local: Broker transport failure"}
[2022-12-05 17:09:18,174] ERROR {datahub_actions.entrypoints:122} - File "/usr/local/lib/python3.10/site-packages/datahub_actions/entrypoints.py", line 114, in main
111 def main(**kwargs):
112 # This wrapper prevents click from suppressing errors.
113 try:
--> 114 sys.exit(datahub_actions(standalone_mode=False, **kwargs))
115 except click.exceptions.Abort:
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
1128 def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:
(...)
--> 1130 return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub_actions/cli/actions.py", line 118, in run
73 def run(ctx: Any, config: List[str], debug: bool) -> None:
(...)
114 logger.debug("Starting Actions Pipelines")
115
116 # Start each pipeline.
117 for p in pipelines:
--> 118 pipeline_manager.start_pipeline(p.name, p)
119 <http://logger.info|logger.info>(f"Action Pipeline with name '{p.name}' is now running.")
File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline_manager.py", line 71, in start_pipeline
62 def start_pipeline(self, name: str, pipeline: Pipeline) -> None:
(...)
67 spec = PipelineSpec(name, pipeline, thread)
68 self.pipeline_registry[name] = spec
69 logger.debug(f"Started pipeline with name {name}.")
70 else:
--> 71 raise Exception(f"Pipeline with name {name} is already running.")
Exception: Pipeline with name datahub_teams_action is already running.
[2022-12-05 17:09:18,174] INFO {datahub_actions.entrypoints:131} - DataHub Actions version: 0.0.0.dev0 at /usr/local/lib/python3.10/site-packages/datahub_actions/__init__.py
[2022-12-05 17:09:18,176] INFO {datahub_actions.entrypoints:134} - Python version: 3.10.7 (main, Oct 5 2022, 14:33:54) [GCC 10.2.1 20210110] at /usr/local/bin/python on Linux-3.10.0-1160.76.1.el7.x86_64-x86_64-with-glibc2.3
I am also getting the initialization message on Teams, but that is it. I have added some tags to a few datasets and columns, done some ingestions but haven't received any message. I should have received one message per action I have done, right??straight-mouse-85445
12/05/2022, 4:42 PMstraight-mouse-85445
12/05/2022, 5:13 PMimportant-night-50346
12/05/2022, 8:12 PMnutritious-bird-77396
12/05/2022, 10:03 PMValidation error (FieldUndefined@[listRecommendations/modules/content/params/searchParams/filters/value]) : Field 'value' in type 'Filter' is undefined (code undefined)
Any idea where this could be? Do i need to do any indexes migration as i made a major version move as well? Any info would be helpful here....acceptable-terabyte-34789
12/06/2022, 9:11 AMbright-motherboard-35257
12/06/2022, 6:49 PMfreezing-garage-69869
12/07/2022, 8:50 AM--packages io.acryl:datahub-spark-lineage:0.9.3-1
--conf spark.extraListeners=datahub.spark.DatahubSparkListener
--conf spark.datahub.rest.server=<http://10.5.0.37:8080>
My DataHub server is running on docker with the command datahub docker quickstart
on the version 0.9.3-1.
During the execution of my Spark job I get the following error from DatahubSparkListener (I removed the detailed spark plans from the log):
22/12/06 17:31:09 INFO McpEmitter: MetadataWriteResponse(success=true, responseContent={"value":"urn:li:dataFlow:(spark,ModelizeBankAccountEvaluations,yarn)"}, underlyingResponse=HTTP/1.1 200 OK [Date: Tue, 06 Dec 2022 17:31:09 GMT, Content-Type: application/json, X-RestLi-Protocol-Version: 2.0.0, Content-Length: 71, Server: Jetty(9.4.46.v20220331)] [Content-Length: 71,Chunked: false])
22/12/06 17:31:10 ERROR DatahubSparkListener: java.lang.NullPointerException
at datahub.spark.DatasetExtractor.lambda$static$6(DatasetExtractor.java:147)
at datahub.spark.DatasetExtractor.asDataset(DatasetExtractor.java:237)
at datahub.spark.DatahubSparkListener$SqlStartTask.run(DatahubSparkListener.java:114)
at datahub.spark.DatahubSparkListener.processExecution(DatahubSparkListener.java:350)
at datahub.spark.DatahubSparkListener.onOtherEvent(DatahubSparkListener.java:262)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at <http://org.apache.spark.scheduler.AsyncEventQueue.org|org.apache.spark.scheduler.AsyncEventQueue.org>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1447)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
22/12/06 17:31:10 INFO AsyncEventQueue: Process of event SparkListenerSQLExecutionStart(0,save at NativeMethodAccessorImpl.java:0,org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:282)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
py4j.ClientServerConnection.run(ClientServerConnection.java:106)
java.lang.Thread.run(Thread.java:750),== Parsed Logical Plan ==
== Analyzed Logical Plan ==
SaveIntoDataSourceCommand org.apache.hudi.Spark3DefaultSource@6b5aa9cf, ...
== Optimized Logical Plan ==
SaveIntoDataSourceCommand org.apache.hudi.Spark3DefaultSource@6b5aa9cf...
== Physical Plan ==
Execute SaveIntoDataSourceCommand
+- SaveIntoDataSourceCommand org.apache.hudi.Spark3DefaultSource@6b5aa9cf...
by listener DatahubSparkListener took 2.295850812s.
I think it’s worth pointing out that I use Apache Hudi format to write the data.
Is there something I’m missing here ?
Thanks for your helpripe-eye-60209
12/07/2022, 8:56 AMworried-branch-76677
12/07/2022, 10:43 AMsearchAcrossEntities
to check if its soft-deleted
from LineageSearchService.java
LineageSearchResult resultForBatch = buildLineageSearchResult(
_searchService.searchAcrossEntities(entitiesToQuery, input, finalFilter, sortCriterion, queryFrom, querySize,
SKIP_CACHE), urnToRelationship)
The top UI showing 9. But 2 of them are soft deleted.
This one is using resolver from EntityLineageResultResolver.java
The bottom UI show the correct value. Which is 7
Resolver from SearchAcrossLineageResolver.java
Can you advice how to take it from here?breezy-portugal-43538
12/07/2022, 11:30 AMMLFeatureTablePropertiesClass
and inside I am passing as a parameter a description and properties but I receive ValueError, does this proposal also need some other values? FYI the urn is already created and the data was uploaded, I only want to update something that already exists. Below is the function to do it:
def update_feature_table(self, urn, properties, urn_description=None):
properties = {k.capitalize().replace("_", " "): v for k, v in properties.items()}
# This below produces:
# ValueError: aspectName MLFeatureTableProperties does not match aspect type <class 'datahub.metadata.schema_classes.MLFeatureTablePropertiesClass'> with name mlFeatureTableProperties
# Create properties object for change proposal wrapper
feature_table_properties = MLFeatureTablePropertiesClass(
customProperties=properties,
description="Description not provided." if not urn_description else urn_description
)
# MCP creation
mcp = MetadataChangeProposalWrapper(
entityType="mlFeatureTable",
aspectName="MLFeatureTableProperties",
changeType=ChangeTypeClass.UPSERT,
entityUrn=urn,
aspect=feature_table_properties,
)
# Create an emitter to the GMS REST API.
emitter = DatahubRestEmitter(self.gms_endpoint)
# Emit metadata!
emitter.emit_mcp(mcp)
bumpy-pharmacist-66525
12/07/2022, 2:02 PMdatahub delete
command?gentle-tailor-78929
12/07/2022, 4:43 PMdatahub
password to something more secure than just datahub
. This is the approach that I’m using but it results in several errors with the datahub-frontend
deployment. What am I missing?
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: datahub-frontend
...
...
volumes:
- name: userprops
secret:
secretName: datahub-user-props-secret
containers:
- name: datahub-frontend
image: .../frontend:latest
volumeMounts:
- name: userprops
mountPath: /datahub-frontend/conf/
subPath: user.props
adamant-furniture-37835
12/07/2022, 6:53 PMdamp-greece-27806
12/07/2022, 7:03 PMNoCodeDataMigrationCleanup
and RestoreIndices
The failure for the first shows this in the logs:
Starting upgrade with id NoCodeDataMigrationCleanup...
Executing Step 1/4: UpgradeQualificationStep...
Found qualified upgrade candidate. Proceeding with upgrade...
Completed Step 1/4: UpgradeQualificationStep successfully.
Executing Step 2/4: DeleteLegacyAspectRowsStep...
Completed Step 2/4: DeleteLegacyAspectRowsStep successfully.
Executing Step 3/4: DeleteLegacyGraphRelationshipStep...
Failed to delete legacy data from graph: java.lang.ClassCastException: class com.linkedin.metadata.graph.elastic.ElasticSearchGraphService cannot be cast to class com.linkedin.metadata.graph.neo4j.Neo4jGraphService (com.linkedin.metadata.graph.elastic.ElasticSearchGraphService and com.linkedin.metadata.graph.neo4j.Neo4jGraphService are in unnamed module of loader org.springframework.boot.loader.LaunchedURLClassLoader @b97c004)
Failed to delete legacy data from graph: java.lang.ClassCastException: class com.linkedin.metadata.graph.elastic.ElasticSearchGraphService cannot be cast to class com.linkedin.metadata.graph.neo4j.Neo4jGraphService (com.linkedin.metadata.graph.elastic.ElasticSearchGraphService and com.linkedin.metadata.graph.neo4j.Neo4jGraphService are in unnamed module of loader org.springframework.boot.loader.LaunchedURLClassLoader @b97c004)
Failed Step 3/4: DeleteLegacyGraphRelationshipStep. Failed after 1 retries.
Exiting upgrade NoCodeDataMigrationCleanup with failure.
Upgrade NoCodeDataMigrationCleanup completed with result FAILED. Exiting...
For the second, it seems like it runs successfully for the most part, but has trouble emitting MCL for messages larger than the value of max.request.size
, so I guess we can try to bump that up, but if you have any suggestions on this, let us knowclever-garden-23538
12/07/2022, 9:24 PMminiature-ram-76637
12/07/2022, 11:50 PMerror reading /Users/*/Documents/code/datahub/metadata-auth/auth-api/build/libs/auth-api-0.9.4-SNAPSHOT.jar; zip END header not found
dazzling-shampoo-79134
12/08/2022, 4:47 AMFailed to update Deprecation: Failed to update resource with urn urn:li:dataset (urn:li:dataPlatform:googlesheet,lorem ipsum,PROD). Entity does not exist.
while deprecating a dataset. FYI, I've deprecated this dataset once the other day, but for some reason it still exists in the latest lineage, so I tried to deprecate it but to no avail.