https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • b

    better-bird-87143

    12/02/2022, 10:31 AM
    Hi, we have a issue when we remove task in airflow dags the task is still show in datahub how can i handle this case?
    d
    g
    • 3
    • 2
  • w

    wooden-mouse-75975

    12/02/2022, 10:59 AM
    Hi everyone, I'm installing acryl-datahub[hive] plugin in my test environment and got this error when installing this plugin: gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Isasl -I/usr/local/include/python3.11 -c sasl/saslwrapper.cpp -o build/temp.linux-x86_64-cpython-311/sasl/saslwrapper.o sasl/saslwrapper.cpp19612: fatal error: longintrepr.h: No such file or directory #include "longintrepr.h" ^~~~~~~~~~~~~~~ compilation terminated. error: command '/usr/bin/gcc' failed with exit code 1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure × Encountered error while trying to install package. ╰─> sasl3 please can someone help me with this, thanks in advance
    b
    d
    • 3
    • 5
  • m

    microscopic-mechanic-13766

    12/02/2022, 12:40 PM
    Good Friday everyone, I am trying out the new version of both gms and frontend v0.9.3. During the deployment of the gms I got the following messages:
    Copy code
    2022-12-02 12:25:59.457:INFO:oejshC.ROOT:main: 1 Spring WebApplicationInitializers detected on classpath
     SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
     SLF4J: Defaulting to no-operation (NOP) logger implementation
     SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
     2022-12-02 12:25:59.531:INFO:oejs.session:main: DefaultSessionIdManager workerName=node0
     2022-12-02 12:25:59.531:INFO:oejs.session:main: No SessionScavenger set, using defaults
     2022-12-02 12:25:59.533:INFO:oejs.session:main: node0 Scavenging every 600000ms
     2022-12-02 12:25:59.597:INFO:oejshC.ROOT:main: Initializing Spring root WebApplicationContext
     SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
     SLF4J: Defaulting to no-operation MDCAdapter implementation.
     SLF4J: See <http://www.slf4j.org/codes.html#no_static_mdc_binder> for further details.
     ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
     ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.22022-12-02 12:26:53.116:INFO:oejshC.ROOT:main: Initializing Spring DispatcherServlet 'apiServlet'
     2022-12-02 12:26:53.639:INFO:oejshC.ROOT:main: Initializing Spring DispatcherServlet 'authServlet'
     2022-12-02 12:26:53.711:INFO:oejshC.ROOT:main: Initializing Spring DispatcherServlet 'openapiServlet'
     2022-12-02 12:26:56.239:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@6eda5c9{Open source GMS,/,[file:///tmp/jetty-0_0_0_0-8080-war_war-_-any-11220994130116230824/webapp/, jar:file:///tmp/jetty-0_0_0_0-8080-war_war-_-any-11220994130116230824/webapp/WEB-INF/lib/swagger-ui-4.10.3.jar!/META-INF/resources],AVAILABLE}{file:///datahub/datahub-gms/bin/war.war}
     2022-12-02 12:26:56.261:INFO:oejs.AbstractConnector:main: Started ServerConnector@4387b79e{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
     2022-12-02 12:26:56.263:INFO:oejs.Server:main: Started @71782ms
    I know those errors don't usually mean that there has been an error during the deployment, and Datahub's performance isn't affected at all, but as I didn't obtain the message
    INFO  c.d.m.ingestion.IngestionScheduler:251 - Successfully 		fetched 4 ingestion sources.
    I thought that maybe something was wrong (which at first it doesn't feel like it) But anyways I just wanted to let you know those two things as in previous versions didn't happen 🙂
    b
    l
    +2
    • 5
    • 10
  • s

    straight-mouse-85445

    12/02/2022, 9:57 PM
    Hello Everyone. Seems like Datahub does not do profiling on partition datasets, does anyone know work around to do on such datasets?
    ✅ 1
    d
    • 2
    • 3
  • s

    straight-mouse-85445

    12/02/2022, 9:59 PM
    Or a way to fix this?
  • s

    straight-mouse-85445

    12/02/2022, 10:01 PM
    Earlier, I was getting issues with SQL combiner, I turned off this setting and it went through but failed on partition-enabled datasets. Maybe this is the main reason? It could not combine queries from each partition.
    d
    • 2
    • 1
  • g

    gifted-knife-16120

    12/03/2022, 5:05 PM
    Hello. I tried to make enhancements to our Datahub policies. But, I realized that the 'Edit Tag' is not well function. When I enable the permission to a specific group, I still cannot add/edit Tag on the entities page. How to solve it?
    a
    l
    e
    • 4
    • 8
  • a

    acoustic-ghost-64885

    12/05/2022, 10:17 AM
    custom properties how we will add for business glosary?
    ✅ 1
    e
    • 2
    • 1
  • g

    gentle-portugal-21014

    12/05/2022, 10:29 AM
    Hello, Is your question related to adding them, or to showing/editing them in the UI? As of now, adding them is possible using the API - there's an example about adding glossary terms in the GitHub repository and you could extend it with custom properties. Displaying them in UI and/or editing them there is not possible at the moment (there's a feature request for that). (Oops - this was meant as a response to https://datahubspace.slack.com/archives/C029A3M079U/p1670235428220579, but apparently added as a standalone post instead 😞 ).
  • n

    narrow-zebra-66884

    12/05/2022, 11:38 AM
    Hey I'm trying to see how datahub works with clickhouse. I'm having issue getting min/max/mean/median values though. The rest seems to work as intended. It just displays <unknown> for anything but the timestamp. But it does display some distinct values. Any ideas?
    m
    h
    • 3
    • 3
  • n

    narrow-zebra-66884

    12/05/2022, 11:41 AM
    I do have a lot of logs about No sqlalchemy dialect found. but I don't understand how it would find it for some info but not others?
    [2022-12-05 12:32:49,448] WARNING  {great_expectations.dataset.sqlalchemy_dataset:1945} - No sqlalchemy dialect found; relying in top-level sqlalchemy types.
    g
    • 2
    • 1
  • m

    microscopic-mechanic-13766

    12/05/2022, 1:24 PM
    Hello, I was trying to create a new policy but I encountered a small problem: when I first clicked in the resources textbox to select what resources I wanted such policy to be applied to, no resource appeared (first image). The thing is that when I write part of a resource name (second image) and erase it, the resources appear (third image). Is this the expected behaviour? I can't see any error logs or anything like it. I am currently using v0.9.3 for both gms and front, and v0.0.8 for actions
    ✅ 1
    e
    • 2
    • 8
  • m

    microscopic-mechanic-13766

    12/05/2022, 4:27 PM
    Hello again, I am trying out the new Teams integration but I am having some trouble. I have followed this guide. Theorically it should work as I am getting the messages that indicate that such connection is running, but I am getting the following error:
    Copy code
    [2022-12-05 17:09:18,100] INFO     {datahub_actions.cli.actions:119} - Action Pipeline with name 'datahub_teams_action' is now running
     DEBUG    {datahub_actions.pipeline.pipeline_manager:63} - Attempting to start pipeline with name datahub_teams_action...
     Exception in thread Thread-2 (run_pipeline):
     Traceback (most recent call last):
       File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
         self.run()
       File "/usr/local/lib/python3.10/threading.py", line 953, in run
         self._target(*self._args, **self._kwargs)
       File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline_manager.py", line 42, in run_pipeline
         pipeline.run()
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline.py", line 166, in run
         for enveloped_event in enveloped_events:
       File "/usr/local/lib/python3.10/site-packages/datahub_actions/plugin/source/kafka/kafka_event_source.py", line 154, in events
         msg = self.consumer.poll(timeout=2.0)
       File "/usr/local/lib/python3.10/site-packages/confluent_kafka/deserializing_consumer.py", line 131, in poll
         raise ConsumeError(msg.error(), kafka_message=msg)
     confluent_kafka.error.ConsumeError: KafkaError{code=_TRANSPORT,val=-195,str="FindCoordinator response error: Local: Broker transport failure"}
     [2022-12-05 17:09:18,174] ERROR    {datahub_actions.entrypoints:122} - File "/usr/local/lib/python3.10/site-packages/datahub_actions/entrypoints.py", line 114, in main
         111  def main(**kwargs):
         112      # This wrapper prevents click from suppressing errors.
         113      try:
     --> 114          sys.exit(datahub_actions(standalone_mode=False, **kwargs))
         115      except click.exceptions.Abort:
    
     File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
         1128  def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:
      (...)
    --> 1130      return self.main(*args, **kwargs)
    
     File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
         rv = self.invoke(ctx)
     File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
         return _process_result(sub_ctx.command.invoke(sub_ctx))
     File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
     File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
         return ctx.invoke(self.callback, **ctx.params)
     File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
         return __callback(*args, **kwargs)
     File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
         return f(get_current_context(), *args, **kwargs)
     File "/usr/local/lib/python3.10/site-packages/datahub_actions/cli/actions.py", line 118, in run
         73   def run(ctx: Any, config: List[str], debug: bool) -> None:
      (...)
         114      logger.debug("Starting Actions Pipelines")
         115
         116      # Start each pipeline.
         117      for p in pipelines:
     --> 118          pipeline_manager.start_pipeline(p.name, p)
         119          <http://logger.info|logger.info>(f"Action Pipeline with name '{p.name}' is now running.")
    
     File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline_manager.py", line 71, in start_pipeline
         62   def start_pipeline(self, name: str, pipeline: Pipeline) -> None:
      (...)
         67           spec = PipelineSpec(name, pipeline, thread)
         68           self.pipeline_registry[name] = spec
         69           logger.debug(f"Started pipeline with name {name}.")
         70       else:
     --> 71           raise Exception(f"Pipeline with name {name} is already running.")
     Exception: Pipeline with name datahub_teams_action is already running.
     [2022-12-05 17:09:18,174] INFO     {datahub_actions.entrypoints:131} - DataHub Actions version: 0.0.0.dev0 at /usr/local/lib/python3.10/site-packages/datahub_actions/__init__.py
     [2022-12-05 17:09:18,176] INFO     {datahub_actions.entrypoints:134} - Python version: 3.10.7 (main, Oct  5 2022, 14:33:54) [GCC 10.2.1 20210110] at /usr/local/bin/python on Linux-3.10.0-1160.76.1.el7.x86_64-x86_64-with-glibc2.3
    I am also getting the initialization message on Teams, but that is it. I have added some tags to a few datasets and columns, done some ingestions but haven't received any message. I should have received one message per action I have done, right??
    a
    m
    • 3
    • 6
  • s

    straight-mouse-85445

    12/05/2022, 4:42 PM
    can anyone check this please?
  • s

    straight-mouse-85445

    12/05/2022, 5:13 PM
    Just to give an exact error message - No partition predicate found for alias <DATASET_NAME>
    i
    a
    • 3
    • 3
  • i

    important-night-50346

    12/05/2022, 8:12 PM
    Hello. We are experiencing some issues with Graph API. We have around 6500 Redshift entities (datasets+containers) and experiencing some issues when trying to browse them in UI or thru Graph API, basically - api returns HTTP 503. Example: https://<hostname>/search?filter_platform=urn%3Ali%3AdataPlatform%3Aredshift&page=200&query= fails. Requesting very same from gms-restli api work flawlessly. Running gms-service in debug mode, I can see that all records are successfully retrieved from opensearch. This issue also reproduceable in demo instance (in graphiql)
    ✅ 1
    b
    • 2
    • 3
  • n

    nutritious-bird-77396

    12/05/2022, 10:03 PM
    Hello, I upgraded my Datahub installation from 0.8.43 to 0.9.3 (I know a big jump) and I am getting the below errors in the Datahub frontend
    Validation error (FieldUndefined@[listRecommendations/modules/content/params/searchParams/filters/value]) : Field 'value' in type 'Filter' is undefined (code undefined)
    Any idea where this could be? Do i need to do any indexes migration as i made a major version move as well? Any info would be helpful here....
    a
    l
    b
    • 4
    • 10
  • a

    acceptable-terabyte-34789

    12/06/2022, 9:11 AM
    hello, how can I get lineage when reading from s3 file? e.g.: I read a s3 avro file and write into an Athena table. Ok, I'm able to see the dataset created in Datahub but just as downstream..I would need the upstream from that s3 file to show the full lineage
    ✅ 1
    d
    • 2
    • 1
  • b

    bright-motherboard-35257

    12/06/2022, 6:49 PM
    I have several days of ingestions that will not run as they remain in status "pending". Is there a way to clear all these prior dates pending ingestions?
    b
    • 2
    • 4
  • f

    freezing-garage-69869

    12/07/2022, 8:50 AM
    Hello I’m having trouble getting lineage data pushed to DataHub from Spark running on EMR. My job is running with Spark 3.3.0, and I have also tried with Spark 3.2.0. The pipelines get created in Datahub but there are no datasets, no lineage, no tasks (cf. attached screenshot) I use the following configurations in my spark-submit
    Copy code
    --packages io.acryl:datahub-spark-lineage:0.9.3-1
    --conf spark.extraListeners=datahub.spark.DatahubSparkListener 
    --conf spark.datahub.rest.server=<http://10.5.0.37:8080>
    My DataHub server is running on docker with the command
    datahub docker quickstart
    on the version 0.9.3-1. During the execution of my Spark job I get the following error from DatahubSparkListener (I removed the detailed spark plans from the log):
    Copy code
    22/12/06 17:31:09 INFO McpEmitter: MetadataWriteResponse(success=true, responseContent={"value":"urn:li:dataFlow:(spark,ModelizeBankAccountEvaluations,yarn)"}, underlyingResponse=HTTP/1.1 200 OK [Date: Tue, 06 Dec 2022 17:31:09 GMT, Content-Type: application/json, X-RestLi-Protocol-Version: 2.0.0, Content-Length: 71, Server: Jetty(9.4.46.v20220331)] [Content-Length: 71,Chunked: false])
    22/12/06 17:31:10 ERROR DatahubSparkListener: java.lang.NullPointerException
    	at datahub.spark.DatasetExtractor.lambda$static$6(DatasetExtractor.java:147)
    	at datahub.spark.DatasetExtractor.asDataset(DatasetExtractor.java:237)
    	at datahub.spark.DatahubSparkListener$SqlStartTask.run(DatahubSparkListener.java:114)
    	at datahub.spark.DatahubSparkListener.processExecution(DatahubSparkListener.java:350)
    	at datahub.spark.DatahubSparkListener.onOtherEvent(DatahubSparkListener.java:262)
    	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
    	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
    	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
    	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
    	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
    	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
    	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
    	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
    	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
    	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    	at <http://org.apache.spark.scheduler.AsyncEventQueue.org|org.apache.spark.scheduler.AsyncEventQueue.org>$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
    	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
    	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1447)
    	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
    
    22/12/06 17:31:10 INFO AsyncEventQueue: Process of event SparkListenerSQLExecutionStart(0,save at NativeMethodAccessorImpl.java:0,org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    java.lang.reflect.Method.invoke(Method.java:498)
    py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    py4j.Gateway.invoke(Gateway.java:282)
    py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    py4j.commands.CallCommand.execute(CallCommand.java:79)
    py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    java.lang.Thread.run(Thread.java:750),== Parsed Logical Plan ==
    
    == Analyzed Logical Plan ==
    SaveIntoDataSourceCommand org.apache.hudi.Spark3DefaultSource@6b5aa9cf, ...
    == Optimized Logical Plan ==
    SaveIntoDataSourceCommand org.apache.hudi.Spark3DefaultSource@6b5aa9cf...
    == Physical Plan ==
    Execute SaveIntoDataSourceCommand
       +- SaveIntoDataSourceCommand org.apache.hudi.Spark3DefaultSource@6b5aa9cf...
    by listener DatahubSparkListener took 2.295850812s.
    I think it’s worth pointing out that I use Apache Hudi format to write the data. Is there something I’m missing here ? Thanks for your help
    ✅ 1
    m
    a
    +4
    • 7
    • 12
  • r

    ripe-eye-60209

    12/07/2022, 8:56 AM
    Hello we noticed that this kubernetes job datahub-restore-indices-job-template acryldata/datahub-upgrade:v0.9.1 is taking very long time. How can we speed it up given large set of records? some stats: job run for 22 hours and successfully sent MAEs for 1076000/1282980 rows (83.87% of total). 0 rows ignored (0.00% of total)
    ✅ 1
    i
    a
    • 3
    • 10
  • w

    worried-branch-76677

    12/07/2022, 10:43 AM
    Hi, I found out that lineage result number return is wrong between this two resolver. Mainly its because one uses
    searchAcrossEntities
    to check if its
    soft-deleted
    from
    LineageSearchService.java
    Copy code
    LineageSearchResult resultForBatch = buildLineageSearchResult(
              _searchService.searchAcrossEntities(entitiesToQuery, input, finalFilter, sortCriterion, queryFrom, querySize,
                  SKIP_CACHE), urnToRelationship)
    The top UI showing 9. But 2 of them are soft deleted. This one is using resolver from
    EntityLineageResultResolver.java
    The bottom UI show the correct value. Which is 7 Resolver from
    SearchAcrossLineageResolver.java
    Can you advice how to take it from here?
    ➕ 2
    ✅ 1
    a
    i
    • 3
    • 9
  • b

    breezy-portugal-43538

    12/07/2022, 11:30 AM
    Hello, I have a quick question for MlFeatureTables, in order to update custom_properties within a featureTable I am creating the
    MLFeatureTablePropertiesClass
    and inside I am passing as a parameter a description and properties but I receive ValueError, does this proposal also need some other values? FYI the urn is already created and the data was uploaded, I only want to update something that already exists. Below is the function to do it:
    Copy code
    def update_feature_table(self, urn, properties, urn_description=None):
    
            properties = {k.capitalize().replace("_", " "): v for k, v in properties.items()}
    
            # This below produces:
            # ValueError: aspectName MLFeatureTableProperties does not match aspect type <class 'datahub.metadata.schema_classes.MLFeatureTablePropertiesClass'> with name mlFeatureTableProperties
    
            # Create properties object for change proposal wrapper
            feature_table_properties = MLFeatureTablePropertiesClass(
                customProperties=properties,
                description="Description not provided." if not urn_description else urn_description
            )
    
            # MCP creation
            mcp = MetadataChangeProposalWrapper(
                entityType="mlFeatureTable",
                aspectName="MLFeatureTableProperties",
                changeType=ChangeTypeClass.UPSERT,
                entityUrn=urn,
                aspect=feature_table_properties,
            )
    
            # Create an emitter to the GMS REST API.
            emitter = DatahubRestEmitter(self.gms_endpoint)
    
            # Emit metadata!
            emitter.emit_mcp(mcp)
    • 1
    • 1
  • b

    bumpy-pharmacist-66525

    12/07/2022, 2:02 PM
    Hello, I have a quick question regarding glossary terms. For each dataset that I have (regardless of platform), I would like to remove all the glossary terms tagged on it (i.e., make every dataset have no glossary terms), however, other than going through the UI and manually clicking on the 'X' of each term for each dataset, I am not too sure if this is possible. Is there a way of doing this using the
    datahub delete
    command?
    a
    • 2
    • 1
  • g

    gentle-tailor-78929

    12/07/2022, 4:43 PM
    Hi, I am attempting to change the default user
    datahub
    password to something more secure than just
    datahub
    . This is the approach that I’m using but it results in several errors with the
    datahub-frontend
    deployment. What am I missing?
    Copy code
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        name: datahub-frontend
    ...
    ...
    volumes:
      - name: userprops
        secret:
          secretName: datahub-user-props-secret
    containers:
     - name: datahub-frontend
       image: .../frontend:latest
       volumeMounts:
         - name: userprops
           mountPath: /datahub-frontend/conf/
           subPath: user.props
    ✅ 1
    i
    b
    b
    • 4
    • 11
  • a

    adamant-furniture-37835

    12/07/2022, 6:53 PM
    Hi, It seems like Java SDK doesn't have support for Patch functionality. We are using MetadataChangeProposalWrapper.builder() to create Dataset entitities. The builder class does have upsert() method but not patch even though ChangeType enum supports Patch. I tried with latest version 0.9.3-1rc1. Is this issue already known or are there plans to support this ? Thanks
    o
    • 2
    • 3
  • d

    damp-greece-27806

    12/07/2022, 7:03 PM
    Hi - we keep getting failures on two jobs we run daily via the datahub-upgrade component:
    NoCodeDataMigrationCleanup
    and
    RestoreIndices
    The failure for the first shows this in the logs:
    Copy code
    Starting upgrade with id NoCodeDataMigrationCleanup...
    Executing Step 1/4: UpgradeQualificationStep...
    Found qualified upgrade candidate. Proceeding with upgrade...
    Completed Step 1/4: UpgradeQualificationStep successfully.
    Executing Step 2/4: DeleteLegacyAspectRowsStep...
    Completed Step 2/4: DeleteLegacyAspectRowsStep successfully.
    Executing Step 3/4: DeleteLegacyGraphRelationshipStep...
    Failed to delete legacy data from graph: java.lang.ClassCastException: class com.linkedin.metadata.graph.elastic.ElasticSearchGraphService cannot be cast to class com.linkedin.metadata.graph.neo4j.Neo4jGraphService (com.linkedin.metadata.graph.elastic.ElasticSearchGraphService and com.linkedin.metadata.graph.neo4j.Neo4jGraphService are in unnamed module of loader org.springframework.boot.loader.LaunchedURLClassLoader @b97c004)
    Failed to delete legacy data from graph: java.lang.ClassCastException: class com.linkedin.metadata.graph.elastic.ElasticSearchGraphService cannot be cast to class com.linkedin.metadata.graph.neo4j.Neo4jGraphService (com.linkedin.metadata.graph.elastic.ElasticSearchGraphService and com.linkedin.metadata.graph.neo4j.Neo4jGraphService are in unnamed module of loader org.springframework.boot.loader.LaunchedURLClassLoader @b97c004)
    Failed Step 3/4: DeleteLegacyGraphRelationshipStep. Failed after 1 retries.
    Exiting upgrade NoCodeDataMigrationCleanup with failure.
    Upgrade NoCodeDataMigrationCleanup completed with result FAILED. Exiting...
    For the second, it seems like it runs successfully for the most part, but has trouble emitting MCL for messages larger than the value of
    max.request.size
    , so I guess we can try to bump that up, but if you have any suggestions on this, let us know
    ✅ 1
    e
    • 2
    • 5
  • c

    clever-garden-23538

    12/07/2022, 9:24 PM
    is there a way to get markdown fragment linking to work in datahub?
    ✅ 1
    • 1
    • 2
  • m

    miniature-ram-76637

    12/07/2022, 11:50 PM
    Hi folks. I have just cloned a fresh copy of datahub from master and I am trying to get it building locally in intellij but running into a strange issue i cant seem to figure out.
    error reading /Users/*/Documents/code/datahub/metadata-auth/auth-api/build/libs/auth-api-0.9.4-SNAPSHOT.jar; zip END header not found
    ✅ 1
    a
    b
    • 3
    • 9
  • d

    dazzling-shampoo-79134

    12/08/2022, 4:47 AM
    Hi all, Been getting
    Copy code
    Failed to update Deprecation: Failed to update resource with urn urn:li:dataset (urn:li:dataPlatform:googlesheet,lorem ipsum,PROD). Entity does not exist.
    while deprecating a dataset. FYI, I've deprecated this dataset once the other day, but for some reason it still exists in the latest lineage, so I tried to deprecate it but to no avail.
    ✅ 1
    a
    • 2
    • 3
1...636465...119Latest