https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • j

    jolly-traffic-67085

    10/20/2022, 10:13 AM
    Hi! everyone. I have a question ,How I can change this env name to new env ? Thank you for your answer.
    d
    b
    • 3
    • 11
  • a

    astonishing-dusk-99990

    10/20/2022, 10:19 AM
    Hi everyone! Does someone have tried for backup and restore in datahub? I already tried for the backup, and then copy the backup to the aws s3. After that created new instance and copy backup.sql to new instance and ran datahub docker restore but it always failed and didn't have detail information. Can someone help me? *note : 1st pict is my backup.sql file and 2nd pic failed when running that command
    • 1
    • 1
  • n

    numerous-bird-32188

    10/20/2022, 4:31 PM
    Hi All. Ive been searching for ages for a better way to do this so i hope this is the place to go. We have datahub running on EKS and every time a pod restarts the local admin password seems to be overwritten with the default values again. I have followed this step to reset it but it keeps getting overwritten. https://datahubproject.io/docs/authentication/guides/add-users/#changing-the-default-datahub-user
    b
    • 2
    • 7
  • g

    glamorous-lion-94745

    10/20/2022, 5:48 PM
    Hello everyone! I'm trying to deploy Datahub on AWS following instructions located here: https://datahubproject.io/docs/deploy/aws/ - > deployed cluster and installed everything on my machine https://datahubproject.io/docs/deploy/kubernetes/ -> created secrets, created repo and tried install prerequisites after trying install prerequisites, prerequisites-cp-schema-registry keeps crashloopbackoff and restarting by himself and the others prerequisites keeps on "pending" state. Can someone help me debug this? I'm not very familiar about kubernetes. kubectl get pods Printscreen:
    s
    g
    • 3
    • 7
  • b

    bland-teacher-2077

    10/20/2022, 8:23 PM
    Hi everyone, I'm working with the dynamic filtering and noticed that the results are limited to 20 (i.e., 20 tags, 20 glossary terms, 20 domains, etc). Is this configurable?
    b
    • 2
    • 2
  • a

    adamant-telephone-51921

    10/21/2022, 4:28 AM
    Hello everyone, I am new to datahub and was working to integrate with airflow dag. The example dag completed successfully but i cannot see the metadata on the Datahub UI, request inputs please why is that so ? P.S. I did restart the docker container for datahub as well but no luck :(
    d
    a
    • 3
    • 22
  • a

    adamant-telephone-51921

    10/21/2022, 7:11 AM
    Request anyone to please look at the above query and revert please .
  • g

    gifted-bird-57147

    10/21/2022, 9:14 AM
    Hi Team, I have this python code that uses the rest_emitter to assign GlossaryTerms to datasets. The code itself works fine and the terms all show up on the dataset pages. However on the GlossaryTerms pages under 'related Entities' the tables show up when the term is all lowercase, but when the term is CamelCase the 'related Entities' tab remains empty.... (see screenshot. lefthand side relations are created both directions, righthand side, relation back to dataset is empty). anybody knows what's going on here?
    b
    • 2
    • 5
  • h

    happy-baker-8735

    10/21/2022, 11:44 AM
    Hi everyone, I've changed my computer and have to locally deploy datahub on it. But I've got some trouble after installation (following https://datahubproject.io/docs/quickstart) When I launch datahub version, I've got this:
    Traceback (most recent call last):
    File "/home/moustlant/.local/bin/datahub", line 5, in <module>
    from datahub.entrypoints import main
    File "/home/moustlant/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 14, in <module>
    from datahub.cli.docker_cli import docker
    File "/home/moustlant/.local/lib/python3.8/site-packages/datahub/cli/docker_cli.py", line 523, in <module>
    def quickstart(
    File "/usr/lib/python3/dist-packages/click/decorators.py", line 173, in decorator
    _param_memo(f, OptionClass(param_decls, **option_attrs))
    File "/usr/lib/python3/dist-packages/click/core.py", line 1601, in __init__
    raise TypeError('Got secondary option for non boolean flag.')
    TypeError: Got secondary option for non boolean flag.
    m
    a
    • 3
    • 5
  • l

    lively-dusk-19162

    10/21/2022, 1:33 PM
    Hello everyone, I have been going through the datahub from long time. I have datahub running on docker. I am trying to understand about upstream and downstream lineage. Could you please help me on what exactly is that?
    a
    • 2
    • 2
  • m

    mysterious-advantage-78411

    10/21/2022, 1:52 PM
    Hi All, Is there any suggestion to work with Tableau Projects \ sub-projects? it seems current ingestions could not work properly with sub projects if there are same names .... any thoughts? thx
  • g

    gentle-camera-33498

    10/21/2022, 7:20 PM
    Trying to get help here on this channel
    plus1 2
  • q

    quiet-wolf-56299

    10/22/2022, 1:15 AM
    So i’ve made a few updates to the code and when running ./gradlew build I run into a failed test case for spark lineage in the metadata-io test cases. I never touched that code, is it possible that slipped through a recent PR?
    m
    • 2
    • 11
  • q

    quiet-wolf-56299

    10/24/2022, 12:30 AM
    So I took my code out of the equation and made a fresh clone of the git repo. Built the frontend and gms with gradle and tried to launch with the dev script and The app never fully launches. When trying to access the frontend it returns a screen saying an error has occurred and logs an exception, which in the docker logs looks like its just play failing to route the request. As far as I can tell broker, zookeeper, schema-reg, and actions are all not marked healthy when running docker ps -a, but they aren’t if I run the quickstart script either. But quickstart launches fine, the overall docker configs are basically identical, save the fact that the dev image uses environment files, and the logs and ps -a output are identical between quickstart and dev-without-neo4j
  • q

    quiet-wolf-56299

    10/24/2022, 12:43 AM
    Oh and the script never exits. It settles into a steady state of the kafka broker reporting some standard log info about the imbalance ratio being 0.0 and will stay like that as long as you let it run. But the app is still unreachable. I assume this is just the fact that compose doesn’t seem to daemonize the containers by default?
    i
    • 2
    • 3
  • q

    quiet-ice-47245

    10/24/2022, 8:09 AM
    Hello All, Im using graphQL to
    addTag
    Copy code
    mutation addTags {
        addTags(input: { tagUrns: ["urn:li:tag:NEW_TAG"], resourceUrn: "urn:li:dataset:(DATASET_URN)",subResourceType: DATASET_FIELD, subResource: "COLUMN_NAME" })
    }
    but im getting error:
    Copy code
    Failed to update urn:li:tag:NEW_TAG does not exist.
    how do I add NEW_TAG from graphQL ?
    plus1 1
    g
    • 2
    • 2
  • s

    steep-laptop-41463

    10/24/2022, 9:52 AM
    Hi!) Can you help me with Validation? I set up GreateExpectations by instructions https://docs.greatexpectations.io/docs/integrations/integration_datahub/ And in DataHub I see message that No assertions have run
    plus1 2
    g
    b
    • 3
    • 5
  • f

    few-air-56117

    10/24/2022, 11:57 AM
    Hi folks, i tried to delete an entity using openapi swager /entities/v1 delete endpoint. The details like schema, documentation, properties etc are deleted for a specific entity but i can still see that entity in search bar.
    g
    • 2
    • 3
  • g

    gentle-camera-33498

    10/24/2022, 1:49 PM
    Hello everybody. I'm having Airflow version 2.4+ incompatibility issues with the backend lineage dependencies. Airflow requires SQL ALchemy 1.4+ while DataHub plugins require SQL Alchemy 1.3.24. Is there any task mapped to solving these dependency issues?
    m
    • 2
    • 9
  • f

    flat-match-62670

    10/24/2022, 11:55 PM
    Hi all. I have version 9.0 of Data Hub deployed on Amazon EKS but I am having some connection issues. I am attempting to ingest metadata from Snowflake but when I put in my snowflake info and hit "Test Connection" in the UI I get an endless loop. I attempted to manually execute and ingestion as well and received N/A instead of the job kicking off. I read on the UI guide that often this is due to the datahub-actions pod being down. I checked the error logs for the datahub-actions pod and am getting the following kafka error regarding an "Unknown magic byte" :
    Copy code
    2022/10/24 19:37:03 Waiting for: <http://datahub-dev-datahub-gms:8080/health>
    2022/10/24 19:37:03 Received 200 from <http://datahub-dev-datahub-gms:8080/health>
    No user action configurations found. Not starting user actions.
    [2022-10-24 19:37:04,202] INFO     {datahub_actions.cli.actions:68} - DataHub Actions version: unavailable (installed editable via git)
    [2022-10-24 19:37:04,333] INFO     {datahub_actions.cli.actions:98} - Action Pipeline with name 'ingestion_executor' is now running.
    Exception in thread Thread-1 (run_pipeline):
    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/site-packages/confluent_kafka/deserializing_consumer.py", line 137, in poll
        value = self._value_deserializer(value, ctx)
      File "/usr/local/lib/python3.10/site-packages/confluent_kafka/schema_registry/avro.py", line 317, in __call__
        raise SerializationError("Unknown magic byte. This message was"
    confluent_kafka.serialization.SerializationError: Unknown magic byte. This message was not produced with a Confluent Schema Registry serializer
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
        self.run()
      File "/usr/local/lib/python3.10/threading.py", line 953, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline_manager.py", line 42, in run_pipeline
        pipeline.run()
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/pipeline/pipeline.py", line 161, in run
        for enveloped_event in enveloped_events:
      File "/usr/local/lib/python3.10/site-packages/datahub_actions/plugin/source/kafka/kafka_event_source.py", line 152, in events
        msg = self.consumer.poll(timeout=2.0)
      File "/usr/local/lib/python3.10/site-packages/confluent_kafka/deserializing_consumer.py", line 139, in poll
        raise ValueDeserializationError(exception=se, kafka_message=msg)
    confluent_kafka.error.ValueDeserializationError: KafkaError{code=_VALUE_DESERIALIZATION,val=-159,str="Unknown magic byte. This message was not produced with a Confluent Schema Registry serializer"}
    %4|1666640260.315|MAXPOLL|rdkafka#consumer-1| [thrd:main]: Application maximum poll interval (10000ms) exceeded by 336ms (adjust <http://max.poll.interval.ms|max.poll.interval.ms> for long-running message processing): leaving group
    Has anyone seen this before or has any advice of what I can troubleshoot to get Data Hub ingesting from Snowflake properly? Any help much appreciated!
    m
    m
    • 3
    • 3
  • m

    melodic-printer-96412

    10/25/2022, 8:26 AM
    Hi there, I’ve try to setup datahub lineage backend for my own airflow cluster running version 2.4.x (python 3.7). When I run pip install acryl-datahub-airflow-plugin to integrate datahub with airflow, there are some conflict on some dependencies between this package and airflow constraint (https://raw.githubusercontent.com/apache/airflow/constraints-2.4.0/constraints-3.7.txt). Can anyone explain or help me this issue? Does currently datahub not support airflow 2.4.x, right?
    m
    • 2
    • 2
  • m

    microscopic-mechanic-13766

    10/25/2022, 3:16 PM
    Hi, so I have just done an ingestion on PostgreSQL with profiling enabled but the
    Stats
    tab isn't enabled. I don't know why it isn't enabled as the profiling was done successfully
    'entities_profiled': '23'
    My PostgreSQL recipe:
    Copy code
    sink:
        type: datahub-rest
        config:
            server: '<http://datahub-gms:8080>'
    source:
        type: postgres
        config:
            database: luca
            password: '${POSTGRES_PASSWORD}'
            profiling:
                enabled: true
            host_port: 'postgresql-luca:5432'
            username: postgres
    I have to say that in previous ingestions done with this recipe the stats of the tables where obtained and shown without a problem but I don't know why there aren't shown now. I am currently using v0.8.45.
    exec-urn_li_dataHubExecutionRequest_6cf856cb-1515-45e6-a2c3-ad77cab21727.log
    • 1
    • 1
  • a

    astonishing-kite-41577

    10/25/2022, 3:51 PM
    Hi, I'm seeing some strange behavior with transformers. Please see recipe in thread. Version: 0.9.0 • If I try to add all 5 other transformers at one time, it DOES NOT WORK • If I add it by itself, it WORKS • If I add it first, by itself, then add the other 4 transformers after that and process again, it WORKS • If I add all 5 transformers to start, then remove them one-by-one, with just two other transformers remaining, it WORKS
    e
    • 2
    • 3
  • c

    careful-france-26343

    10/25/2022, 5:46 PM
    Hi I'm having some trouble installing datahub. When I try to run
    python3 -m datahub version
    , I get the following error:
    Copy code
    Traceback (most recent call last):
      File "/usr/lib/python3.8/runpy.py", line 192, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib/python3.8/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/ubuntu/.local/lib/python3.8/site-packages/datahub/__main__.py", line 1, in <module>
        from datahub.entrypoints import main
      File "/home/ubuntu/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 13, in <module>
        from datahub.cli.delete_cli import delete
      File "/home/ubuntu/.local/lib/python3.8/site-packages/datahub/cli/delete_cli.py", line 123, in <module>
        type=click.DateTime(),
    AttributeError: module 'click' has no attribute 'DateTime'
    q
    g
    • 3
    • 12
  • b

    best-pilot-37106

    10/25/2022, 6:18 PM
    Hello! I have been having issues accessing glossaryTerms on fields using gql queries. I have been using the query
    search
    and this returns everything as expected except the glossaryTerms on a field which it returns null. Has anyone seen this issue before? Query and schema pictures in thread:
    e
    • 2
    • 6
  • j

    jolly-tent-99362

    10/27/2022, 4:50 AM
    Hi All, PLease help me here, I am trying to profile a bigquery table from an airflow composer with following yaml:
    Copy code
    complete_json = {
            "source": {
                "type": "bigquery",
                "config": {
                    "project_id": "",
                    "credential": cred_json,
                    "include_views": "true",
                    "include_tables": "true",
                    "include_table_lineage": "true",
                    "upstream_lineage_in_report": "true",
                    "schema_pattern": {
                        "ignoreCase": "true",
                        "allow": ["^webengage_mum$"]
                    },
                    "table_pattern": {
                        "ignoreCase": "true",
                        "deny": ["^.*\.temp_.*"]
                    },
                    "profile_pattern": {
                        "allow": ["^.*\.application.*"]
                    },
                    "stateful_ingestion": {
                        "enabled": "true",
                        "remove_stale_metadata": "true",
                        "state_provider": {
                            "type": "datahub",
                            "config": {
                                "datahub_api": {
                                    "server": datahub_gms_url,
                                    "token": datahub_gms_token
                                }
                            }
                        }
                    },
                    "profiling": {
                        "enabled": "true",
                        "bigquery_temp_table_schema": ".datahub",
                        "turn_off_expensive_profiling_metrics": "true",
                        "query_combiner_enabled": "false",
                        "max_number_of_fields_to_profile": 1000,
                        "profile_table_level_only": "true",
                        "include_field_null_count": "true",
                        "include_field_min_value": "true",
                        "include_field_max_value": "true",
                        "include_field_mean_value": "true",
                        "include_field_median_value": "true",
                        "include_field_stddev_value": "true",
                        "include_field_quantiles": "true",
                        "include_field_distinct_value_frequencies": "true",
                        "include_field_histogram": "true",
                        "include_field_sample_values": "true"
                    }
                },
    
            },
            "pipeline_name": "biquery_profiling_tables",
            "sink": {
                "type": "datahub-kafka",
                "config": {
                    "connection": {
                        "bootstrap": bootstrap_url,
                        "schema_registry_url": schema_registry_url,
                    },
                },
            },
        }
    The job is running for sometime and then failing with following error:
    Copy code
    [2022-10-26, 05:26:34 UTC] {ge_data_profiler.py:918} ERROR - Encountered exception while profiling <dataset>.<tableName>
    Traceback (most recent call last):
      File "/opt/python3.8/lib/python3.8/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 892, in _generate_single_profile
        batch = self._get_ge_dataset(
      File "/opt/python3.8/lib/python3.8/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 951, in _get_ge_dataset
        batch = ge_context.data_context.get_batch(
      File "/opt/python3.8/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 1642, in get_batch
        return self._get_batch_v2(
      File "/opt/python3.8/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 1336, in _get_batch_v2
        datasource = self.get_datasource(batch_kwargs.get("datasource"))
      File "/opt/python3.8/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 2062, in get_datasource
        raise ValueError(
    ValueError: Unable to load datasource `my_sqlalchemy_datasource-548b19eb-6db0-4fa2-8673-0e62306a3c7d` -- no configuration found or invalid configuration.
    [2022-10-26, 05:26:35 UTC] {ge_data_profiler.py:773} INFO - Profiling 1 table(s) finished in 2.387 seconds
    Can someone help please?
    a
    g
    • 3
    • 9
  • k

    kind-scientist-44426

    10/27/2022, 7:19 AM
    Hi all, I was trying to integrate spark with datahub by creating a spark session in notebook. While running spark read i’m getting below error. Can someone help with it?
    Copy code
    daasDf = spark.read.format("csv").option("inferSchema", "true").option("header", "true").load("filename")
    
    ERROR DatasetExtractor: class org.apache.spark.sql.catalyst.plans.logical.GlobalLimit is not supported yet. Please contact datahub team for further support.
    a
    • 2
    • 1
  • s

    salmon-rose-54694

    10/27/2022, 8:18 AM
    hi all, I upgrade code to Oct. 25 and find the validation/assertions all run events are gone. [pic 1] So i debug and find 1. I open the chrome developer tool and debug on useGetDatasetAssertionsQuery, the returned
    data
    parameter does not have runEvents object (empty list) [see pic 2] 2. But when i run query on graphql UI, it does return something on runEvents. [see pic3, pic4] My question is, is there difference between the run on UI and graphql ?
    g
    • 2
    • 2
  • c

    curved-apple-55756

    10/27/2022, 8:40 AM
    Hi all. I have version 9.0 of Data Hub deployed on RHEL9 but I am having some connection issues. Below the issues (quickstart was unable to run) : - kafka-setup is still running - schema-registry is not running - broker is not running - zookeeper is not running - datahub-gms is still starting - mysql-setup did not exit cleanly Version details : DataHub CLI version: 0.9.0.4 Python version: 3.9.10 (main, Feb 9 2022, 000000) [GCC 11.2.1 20220127 (Red Hat 11.2.1-9)] It seems that there some connexion errors : (...) datahub-datahub-actions-1 | 2022/10/27 065100 Problem with request: Get "http://datahub-gms:8080/health": dial tcp 172.18.0.58080 connect: connection refused. Sleeping 1s datahub-datahub-actions-1 | 2022/10/27 065101 Problem with request: Get "http://datahub-gms:8080/health": dial tcp 172.18.0.58080 connect: connection refused. Sleeping 1s (...) elasticsearch-setup | 2022/10/27 070522 Problem with request: Get http://elasticsearch:9200: dial tcp 172.18.0.29200 connect: connection refused. Sleeping 1s (...) datahub-gms | 2022/10/27 064926 Problem with dial: dial tcp: lookup broker on 127.0.0.1153 server misbehaving. Sleeping 1s (...) datahub-gms | 2022/10/27 064927 Problem with dial: dial tcp 172.18.0.929092 connect: connection refused. Sleeping 1s (...) datahub-frontend-react | sasl.kerberos.service.name = null datahub-gms | 2022/10/27 065055 Problem with dial: dial tcp 172.18.0.1129092 connect: connection refused. Sleeping 1s (...) Has anyone seen this before or has any advice of what I can troubleshoot to get Data Hub running properly? Any help much appreciated!
    i
    • 2
    • 35
  • h

    handsome-football-66174

    10/27/2022, 2:18 PM
    Hi Everyone, Currently using Datahub 0.8.45, is there a way to search for entities with a specific browse path using graphql queries?
    plus1 1
    a
    • 2
    • 1
1...555657...119Latest