https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • b

    best-eve-12546

    03/02/2023, 4:20 PM
    Hi! I’ve been working on an OpenAPI integration on v0.9.5. Looking at the raw OpenAPI Json Schema, I’ve noticed some OneOf schemas which seem to be missing the
    OneOf
    Copy code
    "OneOfSchemaMetadataPlatformSchema":
    {
        "required":
        [
            "__type"
        ],
        "type": "object",
        "properties":
        {
            "__type":
            {
                "type": "string"
            }
        },
        "description": "The native schema in the dataset's platform.",
        "discriminator":
        {
            "propertyName": "__type"
        }
    },
    It seems like this is missing a OneOf — I see in the Golden Test Data that this is actually a nested structure, with no “__type” field. I see the same for a bunch of other OneOf types like
    OneOfSchemaFieldDataTypeType
    . Checked out the YAML schema and it’s the same. Am I supposed to be instantiating these somehow, or should I be sending raw json in “__type” or?
    a
    m
    • 3
    • 9
  • n

    nice-match-35259

    03/02/2023, 4:23 PM
    Hello all ! I am facing some issues to add a validation layer to Datahub with great expectations. I added the following in the checkpoint config :
    Copy code
    yaml_config = f"""
    name: {my_checkpoint_name}
    config_version: 1.0
    class_name: SimpleCheckpoint
    run_name_template: "%Y%m%d-%H%M%S-my-run-name-template"
    validations:
      - batch_request:
          datasource_name: bigquery_datasource
          data_connector_name: default_inferred_data_connector_name
          data_asset_name: raw_prod.applicative_database_deposit
          data_connector_query:
              index: -1
        expectation_suite_name: suite_deposit_test2
        action_list:
          - name: datahub_action
            action:
                module_name: datahub.integrations.great_expectations.action
                class_name: DataHubValidationAction
                server_url: {server_url}
                token: {gms_token}
                extra_headers:
                    - Proxy-Authorization: Bearer {iap_token}
    """
    The checkpoint runs correctly but the metadata is not sent to datahub gms. The error I got is in the attached .png. Did somebody face the same problem ?
    a
    • 2
    • 1
  • n

    nutritious-bird-77396

    03/02/2023, 6:02 PM
    Looking to get some thoughts to unit test this PR - https://github.com/datahub-project/datahub/pull/7476
    a
    • 2
    • 2
  • c

    cuddly-butcher-39945

    03/02/2023, 10:01 PM
    Anyone out there who could tell me what's wrong with this? I am trying to get a list of all the AD groups I ingested into Datahub after setting up SSO, I want to get a list of Groups in a query, then create a mutation to delete them.
    Copy code
    query ListCorpGroups {
      search(input: { type: corpgroup, query: "*"}) {
        total
        count
        searchResults {
          entity {
            urn
            type
            ... on corpgroup {
              properties {
                name
              }
            }
          }
        }
      }
    }
    Getting the following error:
    Copy code
    {
      "errors": [
        {
          "message": "Validation error (WrongType@[search]) : argument 'input.type' with value 'EnumValue{name='corpgroup'}' is not a valid 'EntityType' - Expected enum literal value not in allowable values -  'EnumValue{name='corpgroup'}'.",
          "locations": [
            {
              "line": 2,
              "column": 10
            }
          ],
          "extensions": {
            "classification": "ValidationError"
          }
        },
        {
          "message": "Validation error (UnknownType@[search/searchResults/entity]) : Unknown type 'corpgroup'",
          "locations": [
            {
              "line": 9,
              "column": 16
            }
          ],
          "extensions": {
            "classification": "ValidationError"
          }
        }
      ],
      "data": null,
      "extensions": {}
    }
    Thanks In advance!
    ✅ 1
    g
    • 2
    • 3
  • n

    numerous-account-62719

    03/03/2023, 4:32 AM
    Hi Team/
  • n

    numerous-account-62719

    03/03/2023, 4:33 AM
    Hi Team I am working on dataset to dataset lineage It supports one dataset as input and one as output But i have 2 inputs and 1 output. How to handle this? Please help me out here
    h
    • 2
    • 19
  • m

    microscopic-room-90690

    03/03/2023, 8:49 AM
    Hello team, I try to use classification for snowflake and get this error
    Failed to classify table columns
    cli_version and gms_version are both '0.9.6.1' It seems everything is ok except classificaion. What should I do? Any help will be appreciated. Thank you!
    ✅ 1
    a
    b
    +2
    • 5
    • 11
  • s

    shy-dog-84302

    03/03/2023, 2:02 PM
    Hi! I have noticed following exception in (🧵) DataHub Metadata Service. A little digging reveals a possible bug here with seek to negative offsets when current offset on a partition is 0. I have configured all my backend Kafka topics with 3 partitions. Anyone else experienced similar error?
    ✅ 1
    b
    • 2
    • 3
  • b

    best-umbrella-88325

    03/06/2023, 7:41 AM
    Hello community! We've been trying to install the latest 0.10.0 version of DataHub and we feel there is a bug. Please correct me if we are wrong. We currently have enabled the metadata_service_authentication flag in the value.yaml in the helm installation. We are now moving to 0.10.0 using chart version 0.2.154. When metadata_service_authentication value is true, the system-update job fails with the CreateConfigError since the 'datahub-auth-secrets' secret doesn't get created. On the other hand, if we put metadata_service_authentication as false, the system update job passes and the secret is also created successfully. Maybe an issue with the helm templates which creates the secret. Unsure about the root cause, but I think this could be a potential problem. Please let me know if this is the desired behavior or we are missing something. Thanks in advance
    a
    a
    a
    • 4
    • 7
  • g

    gifted-diamond-19544

    03/06/2023, 12:13 PM
    Hello all. Sorry for crossposting, not sure wheter to post here or in #all-things-deployment. We are having trouble running the the update container. More details in the link below: https://datahubspace.slack.com/archives/CV2UVAPPG/p1678104384250809
    ✅ 2
    • 1
    • 1
  • f

    future-dog-77968

    03/06/2023, 6:17 PM
    hey y’all! We ran into a DataHub issue where we can’t select text from a search result unless you very carefully start from before/after the text itself. • We’re on datahub 0.10, and this video recording is from the datahub demo instance. • Is it fair to say that this is a bug, or is it by design? ◦ If the former, we couldn’t find any Github issues and we’re happy to open one (and even take a stab at fixing it!)
    Screen Recording 2023-03-06 at 1.17.16 PM.mov
    ✅ 1
    a
    • 2
    • 1
  • h

    handsome-football-66174

    03/06/2023, 9:25 PM
    Hi Team, Trying to upgrade to 0.9.3 version of Datahub, but getting following error for the datahub-datahub-upgrade-job pod ( we use k8s deployment ) . Any suggestions ?
    Copy code
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
    ERROR SpringApplication Application run failed
     org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'upgradeCli': Unsatisfied dependency expressed through field 'noCodeUpgrade'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ebeanServer' defined in class path resource [com/linkedin/gms/factory/entity/EbeanServerFactory.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [io.ebean.EbeanServer]: Factory method 'createServer' threw exception; nested exception is java.lang.NullPointerException
    a
    • 2
    • 5
  • p

    proud-soccer-58887

    03/07/2023, 6:08 AM
    Hi Team, I have completed the integration between DataHub and Apache Ranger and am currently testing it. I have confirmed the platform level privileges, but I'm not sure how to set policies for metadata privileges in Apache Ranger. Any Examples?
    ✅ 1
    a
    • 2
    • 2
  • r

    rich-pager-68736

    03/07/2023, 6:39 AM
    Hi all, while trying to restore our indicies from the DB to a fresh OpenSearch cluster, some messages could not be processed due to:
    Copy code
    2023-02-27 15:06:41.598 ERROR 1 --- [ool-10-thread-1] c.l.m.dao.producer.KafkaHealthChecker    : Failed to emit MCL for entity urn:li:dataHubExecutionRequest:#Snowflake-2023_02_20-09_43_34
    
    org.apache.#Kafka.common.errors.RecordTooLargeException: The message is 1633361 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.
    I've already increased the allowed message size for the topic (
    max.message.bytes
    ) and the Kafka cluster (
    replica.fetch.max.bytes
    ). I do not find any config parameter to adjust the producer's
    max.request.size
    , i.e., datahub-upgrade, though. Same for consumer side - how to increase
    max.partition.fetch.bytes
    for MCL consumer? Any help here?
    a
    h
    +6
    • 9
    • 14
  • g

    great-branch-515

    03/07/2023, 7:36 AM
    Hi Team GMS service is not getting stable we are getting repetitive warnings in logs
    Copy code
    2023-03-07 07:27:35,738 [ThreadPoolTaskExecutor-1] WARN  o.apache.kafka.clients.NetworkClient:1077 - [Consumer clientId=consumer-generic-duhe-consumer-job-client-1, groupId=generic-duhe-consumer-job-client] Error while fetching metadata with correlation id 2775 : {DataHubUpgradeHistory_v1=UNKNOWN_TOPIC_OR_PARTITION}
    2023-03-07 07:27:35,738 [ThreadPoolTaskExecutor-1] WARN  o.apache.kafka.clients.NetworkClient:1077 - [Consumer clientId=consumer-generic-duhe-consumer-job-client-1, groupId=generic-duhe-consumer-job-client] Error while fetching metadata with correlation id 2775 : {DataHubUpgradeHistory_v1=UNKNOWN_TOPIC_OR_PARTITION}
    2023-03-07 07:27:35,839 [ThreadPoolTaskExecutor-1] WARN  o.apache.kafka.clients.NetworkClient:1077 - [Consumer clientId=consumer-generic-duhe-consumer-job-client-1, groupId=generic-duhe-consumer-job-client] Error while fetching metadata with correlation id 2776 : {DataHubUpgradeHistory_v1=UNKNOWN_TOPIC_OR_PARTITION}
    2023-03-07 07:27:35,839 [ThreadPoolTaskExecutor-1] WARN  o.apache.kafka.clients.NetworkClient:1077 - [Consumer clientId=consumer-generic-duhe-consumer-job-client-1, groupId=generic-duhe-consumer-job-client] Error while fetching metadata with correlation id 2776 : {DataHubUpgradeHistory_v1=UNKNOWN_TOPIC_OR_PARTITION}
    2023-03-07 07:27:35,940 [ThreadPoolTaskExecutor-1] WARN  o.apache.kafka.clients.NetworkClient:1077 - [Consumer clientId=consumer-generic-duhe-consumer-job-client-1, groupId=generic-duhe-consumer-job-client] Error while fetching metadata with correlation id 2777 : {DataHubUpgradeHistory_v1=UNKNOWN_TOPIC_OR_PARTITION}
    2023-03-07 07:27:35,940 [ThreadPoolTaskExecutor-1] WARN  o.apache.kafka.clients.NetworkClient:1077 - [Consumer clientId=consumer-generic-duhe-consumer-job-client-1, groupId=generic-duhe-consumer-job-client] Error while fetching metadata with correlation id 2777 : {DataHubUpgradeHistory_v1=UNKNOWN_TOPIC_OR_PARTITION}
    When we try to login in frontend we get error
    Copy code
    Failed to perform post authentication steps. Error message: Failed to provision user with urn
    that is caused by
    Copy code
    java.lang.RuntimeException: Failed to provision user with urn urn:li:corpuser:atul.atri@chegg.com.
    Any ideas?
    e
    • 2
    • 4
  • b

    busy-analyst-35820

    03/07/2023, 10:33 AM
    Hi , We use v0.9.2 version of Datahub. We gave the expiry for Bearer token as "_*never"*_, but still it expired. The token is no longer working and it gives forbidden. Is this option ("never" ) for bearer token expiry is valid? Even though we opt for "never" option under expiry , in the next screen it shows as in the screen shot given below cc: @melodic-match-38516
    b
    • 2
    • 2
  • e

    elegant-salesmen-99143

    03/07/2023, 11:34 AM
    Hi. We can't figure out a working command for CLI to get all dataset urns from within a certaing container. There is such finctionality in UI - Download button in a contaner that gives you a CSV file with info for Datasets: urns and metadata for them (like owners, tags, terms, domains). How do I get the same result using CLI?
    a
    s
    h
    • 4
    • 4
  • d

    delightful-sugar-63810

    03/07/2023, 1:47 PM
    Hey team 👋🏻 I am not sure if this is a valuable feedback but we observed that Datahub fails to return the impact analysis(transitive downstream consumers) of an entity if that entity has more than around 3K downstreams. The effect becomes more visible with the entites with 5-7K downstream dependencies. I know thesse numbers seems very high but I think makes sense when you want to get the downstreams of a very core table, also feeding looker(looker has many entity types such as dashbaords). I don't think that the infrastructure we serve Datahub is the bottleneck here but this is always a possibility.
    g
    • 2
    • 2
  • h

    hallowed-shampoo-52722

    03/07/2023, 2:53 PM
    Hi Team, I have an issue with an ingestion in the QA instance.. its been pending since 3 days. Other environments are working fine! I dont see any issues with existing PODS.. Could you please help how I can debug this?
    a
    • 2
    • 9
  • m

    microscopic-application-63745

    03/07/2023, 2:56 PM
    Hi team, I hope you are all doing great! I am working on datahub 0.9.5 and I am trying to run an S3 Data Lake custom recipe and according to the documentation I can use the config property
    verify_ssl
    but whenever I add it I get the following error:
    Copy code
    [2023-03-07 15:10:23,049] ERROR    {logger:26} - Please set env variable SPARK_VERSION
    [2023-03-07 15:10:23,543] ERROR    {datahub.ingestion.run.pipeline:127} - 1 validation error for DataLakeSourceConfig
    verify_ssl
      extra fields not permitted (type=value_error.extra)
    Please note that without the
    verify_ssl
    the recipe ingests just fine.
    ✅ 2
    • 1
    • 2
  • g

    green-hamburger-3800

    03/07/2023, 4:32 PM
    Hello folks! I wanted to create a policy to allow some users to edit everything about one specific EntityType. Is that possible? I tried to use the resource part for it but it wasn't possible to do it for
    MLMODEL
    This was my query:
    Copy code
    mutation CreatePolicy($input: PolicyUpdateInput!) {
      createPolicy(input: $input)
    }
    This was my payload:
    Copy code
    {
      "input": {
        "type": "METADATA",
        "name": "MLP - Service Account",
        "state": "ACTIVE",
        "description": "Test",
        "privileges": [
          "EDIT_ENTITY_TAGS",
          "EDIT_ENTITY_GLOSSARY_TERMS",
          "EDIT_ENTITY_OWNERS",
          "EDIT_ENTITY_DOCS",
          "EDIT_ENTITY_DOC_LINKS",
          "EDIT_ENTITY_STATUS",
          "EDIT_DOMAINS_PRIVILEGE",
          "EDIT_DEPRECATION_PRIVILEGE",
          "EDIT_ENTITY",
          "EDIT_DATASET_COL_DESCRIPTION",
          "EDIT_DATASET_COL_TAGS",
          "EDIT_DATASET_COL_GLOSSARY_TERMS",
          "EDIT_ENTITY_ASSERTIONS",
          "EDIT_LINEAGE",
          "EDIT_ENTITY_EMBED",
          "EDIT_TAG_COLOR"
        ],
        "actors": {
          "users": [
            "urn:li:corpuser:mlp_user"
          ],
          "allUsers": false,
          "allGroups": false,
          "resourceOwners": false
        },
        "resources": {
          "type": "MLMODEL",
          "allResources": true
        }
      }
    }
    ✅ 1
    a
    • 2
    • 2
  • i

    important-processor-44077

    03/07/2023, 11:37 PM
    @astonishing-answer-96712 added a workflow to this channel: *Community Support Bot *.
  • b

    best-wire-59738

    03/08/2023, 11:44 AM
    Hi Team, We are facing Kakfa clients re-balancing issue, At this point of time our UI is also freezed as the consumer is going in a re-balancing loop and its not consuming offsets and offset lag keeps on Increasing as Ingestion is pulling more info. Upon debugging we found that the datahub is using MetadataChangeLog_Versioned_v1 topic for all the changes made to Metadata Graph using UI and also while using kafka sink for Ingestions. So for this reason our UI is in freezed state till the consumer (
    generic-mae-consumer-job-client
    ) reads all the partitions from the topic as the change made to UI is also some where in the queue in the kafka topic. 1. Can we use seperate topic for all the changes we made using UI so that our UI be free from freeezing issue? 2. Also how can we let ourselves come out from the re-balancing of groups issue and speed up our ingestion, As kafka is Asynchronous . MCE consumers are slow in reading the offsets. we are yet to create standalone MCE and MAE Consumers. Hope it increase the speed of the Ingestion but yet to find solution for re-balancing issue.
    e
    a
    • 3
    • 2
  • g

    gentle-camera-33498

    03/08/2023, 12:47 PM
    Hello everyone, I'm having problemns with my 0.10.0 deployment. Context: Before updating the version, I decided to soft delete datasets, charts and dashboards. With this, I could delete all entities and force reingestion to ingest new ones. Problem: I'm receiving a log of exception messages of the BulkListener like the below:
    Copy code
    [I/O dispatcher 2] ERROR c.l.m.s.e.update.BulkListener:44 - Failed to feed bulk request. Number of events: 5 Took time ms: -1 Message: failure in bulk execution:
    [1]: index [datasetindex_v2_1678278613797], type [_doc], id [urn...], message [[datasetindex_v2_1678278613797/YrBRraPeT6OLr7JvUNdy6A][[datasetindex_v2_1678278613797][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn...]: document missing]]]
    I'm unsure if it is the cause, but I do not see any dataset in the UI. NOTE: Yes, I tried to run the restoreIndices job, but nothing changed.
    ✅ 2
    a
    b
    +2
    • 5
    • 49
  • n

    nice-river-27843

    03/08/2023, 1:47 PM
    hey, i am tying to connect oidc using azure and keycloak, after setting up everything i am redirected to the login page in azure finishing successfully ( according to azure) but then when redirecting back to my local frontend it looks like it fails and retry several times , and in the frontend logs i see
    Copy code
    2023-03-08 13:37:28,465 [application-akka.actor.default-dispatcher-13] ERROR o.p.core.engine.DefaultCallbackLogic - Unable to renew the session. The session store may not support this feature
    ✅ 1
    a
    b
    b
    • 4
    • 13
  • a

    acceptable-evening-60358

    03/08/2023, 2:58 PM
    Unable to run quickstart - the following issues were detected: - quickstart.sh or dev.sh is not running If you think something went wrong, please file an issue at https://github.com/datahub-project/datahub/issues or send a message in our Slack https://slack.datahubproject.io/ Be sure to attach the logs from C:\Users\lozza\AppData\Local\Temp\tmpl_pih9ix.log
    ✅ 1
    a
    • 2
    • 1
  • a

    acceptable-evening-60358

    03/08/2023, 2:59 PM
    Hi All newbie here, attempted to deploy my first instance, but met this error, any support would be a great help!
    g
    • 2
    • 1
  • a

    able-city-76673

    03/09/2023, 6:24 AM
    Hello, we have deplopyed datahub in azure kubernetes service. we aren't able to configure ingress as getting 404. is there any document on deploying datahub on azure or helping in ingress configuration for azure application gateway?
    ✅ 1
    p
    a
    b
    • 4
    • 3
  • a

    agreeable-belgium-70840

    03/09/2023, 9:35 AM
    Hello, I am trying to update datahub from 0.9.5 to 0.10.0. I ran the system upgrade job, and now GMS is giving me this error:
    Copy code
    2023-03-09 09:29:44,122 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1
    2023-03-09 09:30:23,729 [R2 Nio Event Loop-1-1] WARN c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
    io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
    Caused by: java.net.ConnectException: Connection refused
    at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.lang.Thread.run(Thread.java:829)
    Any ideas?
    ✅ 1
    plus1 7
    g
    a
    +17
    • 20
    • 93
  • f

    freezing-architect-85960

    03/09/2023, 9:58 AM
    Hello, team, I am trying to emit airflow data to datahub by using kafka based hook, but the airflow task report some errors, it looks like the producer was terminated by task and had no enough time to flush msgs to kafka
    %4|1678351593.679|TERMINATE|rdkafka#producer-2| [thrd:app]: Producer terminating with 23 messages (9368 bytes) still in queue or transit: use flush() to wait for outstanding message delivery
    Any help ideas about this? I didn't find the flush action in the datahub-airflow-plugin emit function
    a
    • 2
    • 9
1...818283...119Latest