https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • b

    brief-ability-41819

    02/01/2023, 7:33 AM
    Hello, I’m having an issue upgrading DataHub via Helm from
    0.9.1
    to
    0.9.2
    (or newer). It seems that
    datahub-acryl-datahub-actions
    is the problem, it throws:
    Copy code
    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
        self.run()
      File "/usr/local/lib/python3.10/threading.py", line 953, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline_manager.py", line 42, in run_pipeline
        pipeline.run()
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline.py", line 166, in run
        for enveloped_event in enveloped_events:
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/plugin/source/kafka/kafka_event_source.py", line 154, in events
        msg = self.consumer.poll(timeout=2.0)
      File "/usr/local/lib/python3.10/site-packages/confluent_kafka/deserializing_consumer.py", line 139, in poll
        raise ValueDeserializationError(exception=se, kafka_message=msg)
    confluent_kafka.error.ValueDeserializationError: KafkaError{code=_VALUE_DESERIALIZATION,val=-159,str="HTTPConnectionPool(host='prerequisites-cp-schema-registry', port=8081): Max retries exceeded with url: /schemas/ids/2 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c1ad1e8f0>: Failed to establish a new connection: [Errno 111] Connection refused'))"}
    %4|1675236299.041|MAXPOLL|rdkafka#consumer-1| [thrd:main]: Application maximum poll interval (10000ms) exceeded by 170ms (adjust <http://max.poll.interval.ms|max.poll.interval.ms> for long-running message processing): leaving group
    Ingestions seems to go into pending state and nothing happens. I haven’t changed anything apart from app version.
    ✅ 1
    h
    • 2
    • 9
  • g

    great-computer-16446

    02/01/2023, 10:46 AM
    Hi team, I’m having an es index related problem, the datasetindex was duplicated, resulting in the data not being displayed and searched normally, is there a way to fix it? I tried rerunning linkedin/datahub-elasticsearch-setup, but it didn’t solve the problem.
    ✅ 1
    g
    • 2
    • 4
  • g

    gentle-camera-33498

    02/01/2023, 12:18 PM
    Hello everyone, Is there any way that I could delete the time-series metadata from elasticsearch? I'm trying to delete assertions, but the metadata stills appear in the UI.
    h
    • 2
    • 45
  • m

    microscopic-machine-90437

    02/01/2023, 1:26 PM
    Hi Team, I'm getting the below error while trying to start datahub. Can someone help me with this.
    b
    t
    • 3
    • 10
  • l

    lemon-scooter-69730

    02/01/2023, 2:34 PM
    This is interesting
    Your client version 0.8.43.5 is newer than your server version 0.9.6. Downgrading the cli to 0.9.6 is recommended.
    b
    • 2
    • 7
  • d

    damp-ambulance-34232

    02/01/2023, 5:12 PM
    I have 2 dataset with same field. How can i clone one dataset field document to another? Tks
    b
    • 2
    • 5
  • f

    fierce-garage-74290

    02/01/2023, 7:07 PM
    DataHub cli - ingesting business glossaries, how to determine how many of them actually got changed (not overwritten)? When ingesting my business glossaries via cli by
    datahub ingest -c recipes/glossaries/glossary_recipe.yml
    I'd like to learn from the output how many definitions got actually changed. But I am afraid that whenever I run this recipe all the terms get overwritten and I am always getting in the
    total_records_written
    the number of records in the recipe. Question: how I can determine if any glossary got modified? I need this to configure notifications for the business team (they would like to be informed whenever any glossary changes). Thanks!
    ✅ 1
    b
    • 2
    • 4
  • n

    numerous-ram-92457

    02/01/2023, 9:20 PM
    Hey all 👋🏼, having issues seeing column lineage flow from Snowflake through to Looker. We can see lineage for tables contained within Snowflake and lineage for Looker views/explores for things contained within Looker; however, no connection between the two. For the LookML ingestion, we’re using the UI and have a user account created with admin privileges. Is there anything else that could be impacting the full lineage connection from showing? All of our current ingestions are successfully running (Snowflake, Looker, and LookML). (
    b
    a
    • 3
    • 8
  • g

    gentle-camera-33498

    02/01/2023, 10:08 PM
    Hello everyone, I implemented a DataHub integration with Airflow that works for version 2.5+ (the version I use), but I'm having a little problem with my database volume. Seem that the dataProcessInstance entity is created with a mutable urn, as you can see here, and the problem is that every data pipeline run generates lots of new dataProcessInstances with a different URN. I have pipelines thar runs frequently and have a lot of taks. Because of that, I have about 1.5 milion new entities a month. I'm doing something wrong?
    a
    • 2
    • 7
  • r

    refined-energy-76018

    02/01/2023, 10:35 PM
    Hi, we've recently implemented some pipelines in the Actions Framework using custom actions. On the container I see that all 5 of our pipelines are running under the same process started by this script. However, sometimes 1 of those pipelines will stop while the other 4 continue running. We don't have
    failure_mode
    set which means it should default to
    CONTINUE
    . Any ideas how to debug this?
    b
    • 2
    • 14
  • w

    wooden-breakfast-17692

    02/02/2023, 8:53 AM
    Hi all, I’m on a mac M1 and trying to build with
    ./gradlew build -x test -x yarnTest -x testQuick
    . Everything seems to work fine but at 99% of building it fails at task
    :metadata-ingestion:docGen
    I seem to get a seg fault:
    ./scripts/docgen.sh: line 10: 51078 Segmentation fault: 11 python scripts/docgen.py --out-dir ${DOCS_OUT_DIR} --extra-docs ${EXTRA_DOCS_DIR} $@
    . Now the strange thing is that the script actually succeeds in generating the docs, it exits with 0. I’m using my system’s python, which is
    3.9.6
    . Any suggestions? Cheers!
    👀 1
    a
    • 2
    • 7
  • r

    rich-policeman-92383

    02/02/2023, 1:45 PM
    Hello In the default monitoring provided by datahub we are not getting certain metrics in prometheus. Any suggestions on which component should we investigate ?
    Copy code
    metrics_com_linkedin_metadata_resources_entity_EntityResource_search_Count
    metrics_com_linkedin_metadata_resources_entity_EntityResource_search_failed_Count
    
    ...... and a few others
    b
    • 2
    • 5
  • r

    rough-car-65301

    02/02/2023, 3:23 PM
    Hello Team: Happy to meet you all, I have a QQ: I'm try to run on a M1 machine the datahub project, I already setup my docker to have 16 GB of memory, 5 CPUs and 2.5 Gb of Swap but every time I ran it it shows me this:
    Copy code
    Unable to run quickstart - the following issues were detected:
    - datahub-gms is still starting
    - elasticsearch-setup is still running
    - elasticsearch is running but not healthy
    ✅ 1
    b
    • 2
    • 1
  • r

    rough-car-65301

    02/02/2023, 3:24 PM
    Could you please give me some advices of how can I run locally Datahub project? Thanks in advice 🙂
    b
    • 2
    • 1
  • h

    handsome-football-66174

    02/02/2023, 3:38 PM
    Hi Team, We are facing this error when installing Datahub packages in Airflow instances :
    Copy code
    File "/root/.venvs/airflow/lib/python3.7/site-packages/airflow/lineage/__init__.py", line 103, in apply_lineage
        _backend = get_backend()
      File "/root/.venvs/airflow/lib/python3.7/site-packages/airflow/lineage/__init__.py", line 52, in get_backend
        clazz = conf.getimport("lineage", "backend", fallback=None)
      File "/root/.venvs/airflow/lib/python3.7/site-packages/airflow/configuration.py", line 675, in getimport
        f'The object could not be loaded. Please check "{key}" key in "{section}" section. '
    airflow.exceptions.AirflowConfigException: The object could not be loaded. Please check "backend" key in "lineage" section. Current value: "datahub_provider.lineage.datahub.DatahubLineageBackend".
    We are using the following configurations :
    Copy code
    [lineage]
    backend = datahub_provider.lineage.datahub.DatahubLineageBackend
    datahub_kwargs = {
        "enabled": true,
        "datahub_conn_id": "datahub_rest_default",
        "cluster": "prod",
        "capture_ownership_info": true,
        "capture_tags_info": true,
        "graceful_exceptions": true }
    b
    • 2
    • 4
  • w

    wide-afternoon-79955

    02/02/2023, 4:03 PM
    Hi All, We are working on having Domain specific role for the Editors so that they can add/edit/delete all objects they own and view access the rest of them. But there's
    Editor - Metadata policy
    which is not editable and comes with the default source package. It gives Editors All privileges even on the objects which they don't own. We have managed to make this policy editable and deactivate it via update query on the DB. (query is in the thread). The problem which we are facing every time the POD restart it re-loads the default policies from
    policy.json
    overwriting our updated value. Is there a trick where I can either 1. Deactivate
    Editor - Metadata policy
    policy by default 2. Make
    Editor - Metadata policy
    editable Note : I am trying to avoid for forking out a new project from and build a new custom image just for this tiny config change.
    b
    b
    • 3
    • 7
  • g

    gentle-portugal-21014

    02/02/2023, 5:35 PM
    Hi, While looking at the GMS logs, I noticed suspicious messages like "`INFO c.l.m.k.t.DataHubUsageEventTransformer:74 - Invalid event type: CreateGlossaryEntityEvent`" followed with "`WARN c.l.m.k.DataHubUsageEventsProcessor:56 - Failed to apply usage events transform to record: {"type":"CreateGlossaryEntityEvent","entityType":"GLOSSARY_TERM",...`" (further information about the user performing the action and his WWW browser followed). Somewhat later, there was: "`INFO c.l.m.k.t.DataHubUsageEventTransformer:112 - Unsupported entity type: GLOSSARY_TERM`". Do these messages suggest some limitations related to the Glossary Term entity?
    ✅ 1
    b
    • 2
    • 1
  • m

    miniature-exabyte-80137

    02/02/2023, 8:37 PM
    Hi all! I am running into this issue when running the quickstart cmd
    Unable to run quickstart - the following issues were detected:
    - broker is not running
    - datahub-gms is still starting
    - zookeeper is not running
    If you think something went wrong, please file an issue at <https://github.com/datahub-project/datahub/issues>
    or send a message in our Slack <https://slack.datahubproject.io/>
    Be sure to attach the logs from /tmp/tmp3qcg5vb6.log
    i killed all containers and ran docker system prune but still get this error. still debugging this but lmk if you have any ideas, thanks!
    log.txt
    ✅ 1
    b
    b
    +2
    • 5
    • 15
  • g

    great-toddler-2251

    02/03/2023, 12:21 AM
    Hi everyone. I’ve been searching the channel and saw some mention of the SLF4J issue
    Copy code
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
    and that it was fixed. Well, not in Java datahub-client. I am using the latest and greatest
    Copy code
    implementation 'io.acryl:datahub-client:0.9.6-3'
    and yet (trivial Boot app from start.spring.io)
    Copy code
    $ ./gradlew bootRun
    
    > Task :bootRun
    
      .   ____          _            __ _ _
     /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
    ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
     \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
      '  |____| .__|_| |_|_| |_\__, | / / / /
     =========|_|==============|___/=/_/_/_/
     :: Spring Boot ::                (v3.0.2)
    
    2023-02-02T16:11:01.813-08:00  INFO 58440 --- [           main] c.e.demo.DemoLoggingIssueApplication     : Starting DemoLoggingIssueApplication using Java 17.0.1 with PID 58440 (/private/tmp/demo-logging-issue/build/classes/java/main started by raysuliteanu in /private/tmp/demo-logging-issue)
    2023-02-02T16:11:01.815-08:00  INFO 58440 --- [           main] c.e.demo.DemoLoggingIssueApplication     : No active profile set, falling back to 1 default profile: "default"
    2023-02-02T16:11:02.110-08:00  INFO 58440 --- [           main] c.e.demo.DemoLoggingIssueApplication     : let's create a DataHub RestEmitter!
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
    2023-02-02T16:11:02.436-08:00  INFO 58440 --- [           main] c.e.demo.DemoLoggingIssueApplication     : Started DemoLoggingIssueApplication in 0.877 seconds (process running for 1.126)
    2023-02-02T16:11:02.437-08:00  INFO 58440 --- [           main] c.e.demo.DemoLoggingIssueApplication     : that was fun
    
    BUILD SUCCESSFUL in 2s
    4 actionable tasks: 4 executed
    I have attached the example project; just unzip and run gradlew. Needless to say, without creating a
    RestEmitter
    , there is no SLF4J error. Suggestions?
    demo-logging-issue.zip
    b
    • 2
    • 9
  • m

    microscopic-room-90690

    02/03/2023, 6:22 AM
    Hi guys, I got troubles when I ingest metadata from S3/hive/DBT. The recipes I use are attached. The version of CLI is 0.8.43 and GMS is 0.9.6.1. After the first time I got this error, I update the version of CLI to 0.9.6.1, while it still doesn't work. One thing I can ensure is that it works well when the version of CLI and GMS were both 0.8.43. And the error is like
    ERROR  {datahub.ingestion.run.pipeline:112} - failed to write record with workunit container-urn:li:container:73b796f6a931c3fbf572bf7a011dfca8-to-urn:li:dataset:(urn:li:dataPlatform:database.table,PROD) with Expecting value: line 1 column 1 (char 0) and info {}
    Any help will be appreciated. Thank you!
  • b

    bland-appointment-45659

    02/03/2023, 7:15 AM
    Team, Anyone faced an issue where the ingestions are happening but the status of the executions do not reflect on the UI ? We are also trying to create the new Domains. They are also reflecting in UI. Any pointers ? Existing thread. https://datahubspace.slack.com/archives/CV2UXSE9L/p1675345635830569 Pasting here to get wider audience. appreciate your help.
    a
    e
    b
    • 4
    • 21
  • r

    rich-pager-68736

    02/03/2023, 8:46 AM
    Hi Team, I have some issues with the Tableau ingestion. Limiting the extraction to only a couple of (top-level) projects results in 0 assets being ingested. I can extract everything (which is way to much and not wanted) using `projects: null`in the recipe, though. I guess it has something to do with our nested projects structure. I tried using wild cards, like
    Copy code
    projects:
                - 'Common Analytics Domain/.*'
    to no avail. Any idea how to narrow the ingestion down to selected top level projects? Thanks!
    ✅ 1
    c
    • 2
    • 7
  • r

    rhythmic-quill-75064

    02/03/2023, 8:59 AM
    Hi Team. I have a problem to upgrade to version 0.2.115 (from 0.2.114). There is a tag 0.2.115 in the datahub-helm repository, but
    helm search repo
    does not show this version :
    Copy code
    $ helm search repo datahub --versions
    [...]
    datahub/datahub                 0.2.116         0.9.1
    datahub/datahub                 0.2.114         0.9.1
    datahub/datahub                 0.2.113         0.9.1
    [...]
    Then
    helm
    commands fail, for example :
    Copy code
    $ helm template --debug datahub datahub/datahub -n <NS> --version 0.2.115
    [...]
    install.go:192: [debug] Original chart version: "0.2.115"
    Error: chart "datahub" matching 0.2.115 not found in datahub index. (try 'helm repo update'): no chart version found for datahub-0.2.115
    The repo is up to date. There are other "holes" in the versions. Is this normal ?
    ✅ 1
    • 1
    • 1
  • m

    many-solstice-66904

    02/03/2023, 9:30 AM
    Goodmorning everyone, while browsing the documentation about extending the metadata model graphql interface in this page: https://datahubproject.io/docs/datahub-graphql-core/ it mentions the following that there should be a
    gms.grqphql
    file under resources but I am unable to locate this file anywhere in the repository. Could it be that this page is out-of-date?
    ✅ 1
    h
    • 2
    • 1
  • t

    tall-dentist-87295

    02/03/2023, 1:09 PM
    Is anyone else running into issues with the Demo Datahub instance? I am getting this error when trying to view any page:
    Copy code
    Validation error (FieldUndefined@[searchResultFields/datasetProfiles/sizeInBytes]) : Field 'sizeInBytes' in type 'DatasetProfile' is undefined (code undefined)
    b
    a
    o
    • 4
    • 7
  • i

    incalculable-manchester-41314

    02/03/2023, 1:27 PM
    Hi all , getting this error when trying to build the project FAILURE: Build failed with an exception. * What went wrong: Unable to start the daemon process. This problem might be caused by incorrect configuration of the daemon. For example, an unrecognized jvm option is used. Please refer to the User Manual chapter on the daemon at https://docs.gradle.org/6.9.2/userguide/gradle_daemon.html Process command line: C:\Program Files\Eclipse Adoptium\jdk-17.0.6.10-hotspot\bin\java.exe --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang =ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED -Xmx3000m -Dfile.encoding=windows-1250 -Duser.count ry=US -Duser.language=en -Duser.variant -cp C:\Users\majid.mardanov\.gradle\wrapper\dists\gradle-6.9.2-bin\30myfq8gjgdgqicjitpktoyx1\gradle-6.9.2\lib\gradle-launcher- 6.9.2.jar org.gradle.launcher.daemon.bootstrap.GradleDaemon 6.9.2 Please read the following process output to find out more: ----------------------- FAILURE: Build failed with an exception. * What went wrong: Could not create service of type ClassLoaderRegistry using GlobalScopeServices.createClassLoaderRegistry(). * Try: Run with --info or --debug option to get more log output. Run with --scan to get full insights. * Exception is: org.gradle.internal.service.ServiceCreationException: Could not create service of type ClassLoaderRegistry using GlobalScopeServices.createClassLoaderRegistry(). Caused by: java.util.zip.ZipException: zip END header not found
    b
    • 2
    • 1
  • c

    calm-balloon-31412

    02/03/2023, 4:49 PM
    👋 I'm getting the error
    AvroException: ('Datum union type not in schema: %s', None)
    when running
    graph.get_aspect_v2(entity_urn=urn, aspect="dataJobInfo", aspect_type=DataJobInfoClass)
    I see someone brought this up in the past but not sure if it was ever resolved. I'm trying to write a job that updates the datajobInfo aspect of a data job instead of overwriting it, so I need to access this aspect. Any help would be appreciated! cc @big-carpet-38439 who looked at this issue before
    ✅ 1
    b
    m
    • 3
    • 20
  • r

    rapid-hamburger-95729

    02/03/2023, 4:50 PM
    hello! posting this here just in case it's a better place to grab a hand if poss! 🙂 https://datahubspace.slack.com/archives/CV2UVAPPG/p1675351168523089
    b
    o
    • 3
    • 20
  • g

    gentle-lifeguard-88494

    02/04/2023, 6:16 PM
    Tried to run
    datahub ingest list-runs
    and got the following error. Any ideas on how to solve this? Thanks
    plus1 1
    a
    • 2
    • 8
  • c

    cuddly-ram-44320

    02/05/2023, 11:19 AM
    Hi Team, we're trying to get our Redshift database visible with the following columns (see attached pic from the demo environment). Can you give us a hint on how to make this work?
    g
    m
    • 3
    • 16
1...747576...119Latest