https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • s

    salmon-area-51650

    06/06/2023, 9:57 AM
    Hi 👋! I’m running DataHub
    v0.9.6
    and I’m getting the following error in `datahub-gms`:
    Copy code
    Caught exception while executing bootstrap step IngestRolesStep. Continuing...
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms com.amazonaws.services.schemaregistry.exception.AWSSchemaRegistryException: Exception occurred while fetching or registering schema definition = {"type":"record","name":"MetadataChangeLog","namespace":"com.linkedin.pegasus2avro.mxe","doc":"Kafka event for capturing update made to an entity's metadata.","fields":[{"name":"auditHeader","type":["null",{"type":
    ...
    ...
    :"An audit stamp detailing who and when the aspect was changed by. Required for all intents and purposes.","default":null}]}, schema name = MetadataChangeLog_Versioned_v1
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.amazonaws.services.schemaregistry.common.SchemaByDefinitionFetcher.getORRegisterSchemaVersionId(SchemaByDefinitionFetcher.java:99)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.amazonaws.services.schemaregistry.serializers.GlueSchemaRegistrySerializationFacade.getOrRegisterSchemaVersion(GlueSchemaRegistrySerializationFacade.java:86)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.amazonaws.services.schemaregistry.serializers.GlueSchemaRegistryKafkaSerializer.serialize(GlueSchemaRegistryKafkaSerializer.java:113)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:902)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:862)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.linkedin.metadata.dao.producer.KafkaEventProducer.produceMetadataChangeLog(KafkaEventProducer.java:114)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.linkedin.metadata.entity.EntityService.produceMetadataChangeLog(EntityService.java:1286)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.linkedin.metadata.entity.EntityService.produceMetadataChangeLog(EntityService.java:1311)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.linkedin.metadata.boot.steps.IngestRolesStep.ingestRole(IngestRolesStep.java:111)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.linkedin.metadata.boot.steps.IngestRolesStep.execute(IngestRolesStep.java:79)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.linkedin.metadata.boot.BootstrapManager.lambda$start$0(BootstrapManager.java:44)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at java.base/java.lang.Thread.run(Thread.java:829)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms Caused by: com.amazonaws.services.schemaregistry.exception.AWSSchemaRegistryException: Failed to get schemaVersionId by schema definition for schema name = MetadataChangeLog_Versioned_v1
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.amazonaws.services.schemaregistry.common.AWSSchemaRegistryClient.getSchemaVersionIdByDefinition(AWSSchemaRegistryClient.java:148)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.amazonaws.services.schemaregistry.common.SchemaByDefinitionFetcher$SchemaDefinitionToVersionCache.load(SchemaByDefinitionFetcher.java:110)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.amazonaws.services.schemaregistry.common.SchemaByDefinitionFetcher$SchemaDefinitionToVersionCache.load(SchemaByDefinitionFetcher.java:106)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.google.common.cache.LocalCache.get(LocalCache.java:3951)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4935)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.amazonaws.services.schemaregistry.common.SchemaByDefinitionFetcher.getORRegisterSchemaVersionId(SchemaByDefinitionFetcher.java:74)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	... 15 common frames omitted
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms Caused by: software.amazon.awssdk.services.glue.model.AccessDeniedException: User: arn:aws:sts::277977467804:assumed-role/eks-node-iam-role/XXXX is not authorized to perform: glue:GetSchemaByDefinition on resource: arn:aws:glue:us-east-1:XXXXXXX:registry/default-registry because no identity-based policy allows the glue:GetSchemaByDefinition action (Service: Glue, Status Code: 400, Request ID: 7df19e37-f1f8-40ab-9032-d24796a3972b)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
    ...
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	at com.amazonaws.services.schemaregistry.common.AWSSchemaRegistryClient.getSchemaVersionIdByDefinition(AWSSchemaRegistryClient.java:144)
    datahub-datahub-gms-5ff98bf459-st6pn datahub-gms 	... 25 common frames omitted
    It’s like it’s trying to access to AWS Glue instead of Schema registry and I have set up schema registry in global variables. Any ideas 🙏?
    a
    • 2
    • 2
  • b

    boundless-student-48844

    06/06/2023, 10:12 AM
    Hi team, we encounter a case when a graphql request that is usually running very fast takes more than 5 min to execute based on opentelemetry traces. The first screenshot shows the usual scenario for this GraphQL request (less than 10ms to execute); however, for the same GraphQL query, the request can be timed out at 30s (refer to the second screenshot). And there’s a long period (> 5 min) of hiatus based on the traces. Anyone has any idea why this can happen? 🙇
    ➕ 1
    a
    • 2
    • 2
  • b

    brash-plumber-28960

    06/06/2023, 10:58 AM
    Hi team,
    ✅ 1
  • b

    brash-plumber-28960

    06/06/2023, 11:12 AM
    we are using datahub 0.10.3 and deployed it on aws eks. but while we are trying to access we are gating the error "Validation error of type FieldUndefined: Field 'globalViewsSettings' in type 'Query' is undefined @ 'globalViewsSettings' (code undefined)" I am using the below helm for deploying Datahub , xxxxxxxxxxxxxxxx has respective values. # Values to start up datahub after starting up the datahub-prerequisites chart with "prerequisites" release name # Copy this chart and change configuration as needed. datahub-gms: enabled: true image: repository: linkedin/datahub-gms tag: "v0.10.3" ingress: enabled: true annotations: kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/target-type: instance alb.ingress.kubernetes.io/subnets: xxxxxxxxxxxxxxxxxxxxxxxxx # alb.ingress.kubernetes.io/certificate-arn: arnawsacmeu west 1xxxxxxxxxx:certificate/873f5724-3572-4b23-87b0-3bafe7cf4d54 alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0 alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80},{"HTTPS": 443}]' # alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}' hosts: - host: datahub.mcsaatchiperformance.com redirectPaths: - path: /* name: ssl-redirect port: use-annotation paths: - /* datahub-frontend: enabled: true image: repository: linkedin/datahub-frontend-react tag: "v0.10.3" ingress: enabled: true annotations: kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/target-type: instance alb.ingress.kubernetes.io/subnets: xxxxxxxxxxxxxxxxxxxxxxx # alb.ingress.kubernetes.io/certificate-arn: <<certificate-arn>> alb.ingress.kubernetes.io/inbound-cidrs: 0.0.0.0/0 alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]' #alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTP", "Port": "443", "StatusCode": "HTTP_301"}}' hosts: - host: datahub.mcsaatchiperformance.com redirectPaths: - path: /* name: ssl-redirect port: use-annotation paths: - /* acryl-datahub-actions: enabled: true image: repository: acryldata/datahub-actions tag: "v0.10.3" resources: limits: memory: 512Mi requests: cpu: 300m memory: 256Mi datahub-mae-consumer: image: repository: linkedin/datahub-mae-consumer tag: "v0.10.3" datahub-mce-consumer: image: repository: linkedin/datahub-mce-consumer tag: "v0.10.3" datahub-ingestion-cron: enabled: true image: repository: acryldata/datahub-ingestion tag: "v0.10.3" elasticsearchSetupJob: enabled: true image: repository: linkedin/datahub-elasticsearch-setup tag: "v0.10.3" podSecurityContext: fsGroup: 1000 securityContext: runAsUser: 1000 podAnnotations: {} extraEnvs: - name: USE_AWS_ELASTICSEARCH value: "true" kafkaSetupJob: enabled: true image: repository: linkedin/datahub-kafka-setup tag: "v0.10.3" podSecurityContext: fsGroup: 1000 securityContext: runAsUser: 1000 podAnnotations: {} mysqlSetupJob: enabled: false image: repository: acryldata/datahub-mysql-setup tag: "v0.10.3" podSecurityContext: fsGroup: 1000 securityContext: runAsUser: 1000 podAnnotations: {} postgresqlSetupJob: enabled: true image: repository: acryldata/datahub-postgres-setup tag: "v0.10.3" podSecurityContext: fsGroup: 1000 securityContext: runAsUser: 1000 podAnnotations: {} datahubUpgrade: enabled: true image: repository: acryldata/datahub-upgrade tag: "v0.10.3" batchSize: 1000 batchDelayMs: 100 noCodeDataMigration: #sqlDbType: "MYSQL" sqlDbType: "POSTGRES" podSecurityContext: {} # fsGroup: 1000 securityContext: {} # runAsUser: 1000 podAnnotations: {} restoreIndices: resources: limits: cpu: 500m memory: 512Mi requests: cpu: 300m memory: 256Mi global: graph_service_impl: elasticsearch datahub_analytics_enabled: true datahub_standalone_consumers_enabled: false elasticsearch: host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" port: "443" useSSL: "true" auth: username: "xxxxxxxx" password: secretRef: elasticsearch-secrets secretKey: elasticsearch-password kafka: bootstrap: server: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" zookeeper: server: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" ## For AWS MSK set this to a number larger than 1 # partitions: 3 # replicationFactor: 3 schemaregistry: url: "http://prerequisites-cp-schema-registry:8081" sql: datasource: ## Use below for usage of PostgreSQL instead of MySQL host: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" hostForpostgresqlClient: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" port: "5432" url: "jdbc:postgresql://xxxxxxxxxxxxxxxxxxxxxxxxxxx/datahub" driver: "org.postgresql.Driver" username: "xxxxxxxxxxxxxxxxxxx" password: secretRef: postgresql-secrets secretKey: postgresql-root-password datahub: gms: port: "8080" nodePort: "30001" monitoring: enablePrometheus: true mae_consumer: port: "9091" nodePort: "30002" appVersion: "1.0" encryptionKey: secretRef: "datahub-encryption-secrets" secretKey: "encryption_key_secret" # Set to false if you'd like to provide your own secret. provisionSecret: enabled: true autoGenerate: true managed_ingestion: enabled: true defaultCliVersion: "0.10.3" metadata_service_authentication: enabled: true systemClientId: "__datahub_system" systemClientSecret: secretRef: "datahub-auth-secrets" secretKey: "token_service_signing_key" tokenService: signingKey: secretRef: "datahub-auth-secrets" secretKey: "token_service_signing_key" salt: secretRef: "datahub-auth-secrets" secretKey: "token_service_salt" # Set to false if you'd like to provide your own auth secrets provisionSecrets: enabled: true autoGenerate: true
    a
    • 2
    • 2
  • i

    important-bear-9390

    06/06/2023, 12:20 PM
    Hello! One of our analysts deleted the most used tag via UI🙃. We restored the RDS backup and re-run the restore indice tags. Now we can see the tag, and the entities it's associated to. But if we check the table, we can't see the tag there. datahub v0.10.1 - deployment in k8s via helm chart any tips how to solve this?
    a
    • 2
    • 2
  • r

    refined-continent-34966

    06/06/2023, 1:28 PM
    Hey all, We recently reinstalled datahub v0.9.6 on kubernetes using neo4j as the graph service backend and postgres for the database, and we're experiencing some strange behaviors. When we reinstalled, the PVCs for elasticsearch, postgres, and neo4j still existed and so the services reattached to them. In the UI, everything seemed to work fine with the existing data, but adding new entities/relationships did not work. For example if I created a domain in the UI, it would "successfully" get created and I could navigate to the entity page and see the urn, but it would disappear from the UI on a page refresh. In the datahub GMS logs I noticed elastic search
    document_missing
    errors, but if I queried the postgres db directly I could find that domain. If I added a dataset to that domain I was also able to find it from that dataset profile under domains. Running the reindexing cron job seemed to have no effect, but then after a few days, entities that were added would suddenly appear in the UI. Has anyone experienced anything similar? I'm tempted to try and remove all data from the neo4j and elastic search volumes and then try the reindexing cron job again.
    a
    b
    a
    • 4
    • 4
  • a

    astonishing-answer-88639

    06/06/2023, 4:40 PM
    Hi I am trying to install on a M1 mac, following the instructions on the Quick Start page. I set Docker up with the recommended config. When I run
    datahub docker quickstart
    , it gives me the error below and then gets stuck. What do i need to do to get to the UI?
    a
    b
    • 3
    • 6
  • a

    able-evening-90828

    06/06/2023, 7:23 PM
    During the investigation of a slow ES search query of the
    datasetindex_v2
    index, we have discovered that all the slowness came from the
    simple_query_string
    with the
    query_word_delimited
    analyzer. If we removed this from the query, then the query returned in 0.5 second. Otherwise, it took more than 12 seconds. Is there anyway to disable this particular
    simple_query_string
    via some settings? We are on
    10.2.2
    .
    Copy code
    {
      "simple_query_string": {
        "query": "parquet",
        "fields": [
          "id.delimited^4.0",
          "editedFieldDescriptions.delimited^0.040000003",
          "fieldDescriptions.delimited^0.040000003",
          "name.delimited^4.0",
          "description.delimited^0.4",
          "fieldLabels.delimited^0.080000006",
          "urn.delimited^5.0",
          "fieldPaths.delimited^2.0",
          "qualifiedName.delimited^4.0",
          "editedDescription.delimited^0.4"
        ],
        "analyzer": "query_word_delimited",
        "flags": -1,
        "default_operator": "and",
        "analyze_wildcard": false,
        "auto_generate_synonyms_phrase_query": true,
        "fuzzy_prefix_length": 0,
        "fuzzy_max_expansions": 50,
        "fuzzy_transpositions": true,
        "boost": 1
      }
    }
    d
    b
    a
    • 4
    • 15
  • m

    microscopic-country-10588

    06/06/2023, 7:26 PM
    How i can use datahub with elasticsearch http with basic auth? I don't understand. Does it always requires to use ssl keystore/certstore? Why?
    d
    b
    • 3
    • 5
  • e

    early-hydrogen-27542

    06/06/2023, 8:32 PM
    👋
    searchAcrossEntities
    appears to slow heavily at a certain pagination threshold. It gets progressively slower as I change
    start
    . For instance, this is fast...
    Copy code
    {
      searchAcrossEntities(
        input: {types: DATASET, start: 0, count: 10, query: "*"}
      ) {
        total
     }
    }
    This is a bit slower...
    Copy code
    {
      searchAcrossEntities(
        input: {types: DATASET, start: 100, count: 10, query: "*"}
      ) {
        total
     }
    }
    This is slower still...
    Copy code
    {
      searchAcrossEntities(
        input: {types: DATASET, start: 250, count: 10, query: "*"}
      ) {
        total
     }
    }
    And this fails with a
    503 Service Unavailable
    ...
    Copy code
    {
      searchAcrossEntities(
        input: {types: DATASET, start: 500, count: 10, query: "*"}
      ) {
        total
     }
    }
    Do you all have any ideas on speeding this up?
    ✅ 1
    m
    s
    • 3
    • 5
  • s

    strong-potato-63475

    06/07/2023, 1:15 AM
    running into an import error : urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with OpenSSL 1.1.0h Attempting to deploy on docker, Python is up to date and checked my openssl version it is 1.1.1. this is on Windows 10 any other spots to check?
    ✅ 1
    d
    • 2
    • 2
  • r

    ripe-eye-60209

    06/07/2023, 9:32 AM
    Hello Team, we have an entity in our DataHub environment that has more than 5K upstream relations in its lineage. Trying to visualize the lineage results in a dramatic increase of database connections (from 15 to 150 or so) and the operation results in HTTP 500 sometimes as well. Any thoughts on why this is happening and how can we optimize it?
    d
    • 2
    • 1
  • l

    loud-painting-41553

    06/07/2023, 1:12 PM
    How do I solve this
    d
    • 2
    • 1
  • s

    swift-processor-45491

    06/07/2023, 2:25 PM
    Hi, team. I have a question related to the scrollAcrossEntities endpoint. I noticed that DataHub takes a lot of time to return the next page in some cases. As a result, we need to retry and wait before we are sure that the pagination has finished. However, in some cases it really takes a lot of time. I was wondering if you have faced a similar issue in the past. Thanks!
    d
    b
    +2
    • 5
    • 8
  • v

    victorious-monkey-86128

    06/07/2023, 2:30 PM
    Hi, I'm currently trying to develop on top of DataHub using this guide https://datahubproject.io/docs/developers/. However, when I'm running
    ./gradlew quickstart
    , it looks like it froze at a container instantiation. How could solve this problem?
    ✅ 1
    m
    • 2
    • 1
  • m

    mammoth-breakfast-21990

    06/07/2023, 5:50 PM
    Hi, I run into a confluent schema registry <-> datahub data source ingestion error, could someone help? See details in 🧵
    ✅ 1
    d
    • 2
    • 7
  • w

    witty-journalist-16013

    06/07/2023, 8:36 PM
    On a fresh install, the datahub admin doesn't seem to have any privileges?
    d
    • 2
    • 2
  • e

    elegant-student-62491

    06/07/2023, 10:35 PM
    Hi, I`ve found a strange behavior that relates to MsSQL and dbt. I have a table "Inventory" in the MsSQL database and I ingest it to DataHub successfully. After that I add a view to the database via dbt and then I ingest dbt to DataHub. As result, I see 3 records in DataHub that are related to MsSQL: one is to my view (test.dbo.view_inventory) and two are to my table (Inventory and test.dbo.inventory). Moreover, a lineage of "test.dbo.view_inventory" and "test.dbo.inventory" looks great. But in my perspective, the record "Inventory" (from "MsSQL" ingest) should be merged with "test.dbo.inventory" (from "dbt" ingest) so I should have only 2 records for MsSQL with perfect lineage 🙂 Can somebody explain, is it a bug or do I wrong?
    d
    f
    • 3
    • 7
  • b

    brief-advantage-89816

    06/08/2023, 2:51 AM
    I am running the ingestion receipt for redshift in
    dev
    , and my
    source.config.env = dev
    But for some reason my urn is getting this PROD: urnlidataset:(urnlidataPlatform:redshift,dev.<schema_name>.salesorder,
    PROD
    I don’t have a clue where is this PROD coming from.
    Copy code
    {process_utils.py:187} INFO - Source (redshift) report:
     {process_utils.py:187} INFO - {'aspects': {'container': {'container': 3, 'containerProperties': 4, 'dataPlatformInstance': 4, 'status': 4, 'subTypes': 4},
     {process_utils.py:187} INFO -              'dataset': {'container': 95, 'datasetProfile': 1, 'datasetProperties': 95, 'schemaMetadata': 95, 'subTypes': 95}},
     {process_utils.py:187} INFO -  'entities': {'container': ['urn:li:container:e46efd1c881f4d6ee511bcb6024fdaf8',
     {process_utils.py:187} INFO -                             'urn:li:container:e99a7636015d37a29ddf5e05efeacf57',
     {process_utils.py:187} INFO -                             'urn:li:container:4ddad5b8ba6c86bf31cb5d757fe631e9',
     {process_utils.py:187} INFO -                             'urn:li:container:94782e3c226027a0cf9a9b12c5eddc1d'],
     {process_utils.py:187} INFO -               'dataset': ['urn:li:dataset:(urn:li:dataPlatform:redshift,dev.<schema_name>.salesorder,PROD)',
     {process_utils.py:187} INFO -                           'urn:li:dataset:(urn:li:dataPlatform:redshift,dev.<schema_name>.vicidial_users,PROD)',
     {process_utils.py:187} INFO -                           'urn:li:dataset:(urn:li:dataPlatform:redshift,dev.<schema_name>.dim_contact,PROD)',
     {process_utils.py:187} INFO -                           'urn:li:dataset:(urn:li:dataPlatform:redshift,dev.<schema_name>.dim_lead_crm,PROD)',
     {process_utils.py:187} INFO -                           'urn:li:dataset:(urn:li:dataPlatform:redshift,dev.<schema_name>.envision_ssot,PROD)',
     {process_utils.py:187} INFO -                           'urn:li:dataset:(urn:li:dataPlatform:redshift,dev.<schema_name>.fct_lead_activity,PROD)',
     {process_utils.py:187} INFO -                           'urn:li:dataset:(urn:li:dataPlatform:redshift,dev.<schema_name>.fct_lead_opps_sale,PROD)',
     {process_utils.py:187} INFO -                           'urn:li:dataset:(urn:li:dataPlatform:redshift,dev.<schema_name>.fct_salesorder_activity,PROD)',
     {process_utils.py:187} INFO -                           'urn:li:dataset:(urn:li:dataPlatform:redshift,dev.<schema_name>.fct_salesorder_payment,PROD)',
     {process_utils.py:187} INFO -                           'urn:li:dataset:(urn:li:dataPlatform:redshift,dev.<schema_name>.fct_user_login,PROD)',
     {process_utils.py:187} INFO -                           '... sampled of 96 total elements']},
    b
    f
    a
    • 4
    • 4
  • s

    strong-potato-63475

    06/08/2023, 3:44 AM
    Have nuked my docker image twice now, the datahub quickstart keeps erroring on the below: time="2023-06-07T233818-04:00" level=warning msg="The \"HOME\" variable is not set. Defaulting to a blank string." time="2023-06-07T233818-04:00" level=warning msg="The \"HOME\" variable is not set. Defaulting to a blank string." [+] Building 0.1s (0/0) [+] Running 17/17 ✔️ Network datahub_network Created 2.0s ✔️ Volume "datahub_broker" Created 0.5s ✔️ Volume "datahub_esdata" Created 0.3s ✔️ Volume "datahub_zkdata" Created 0.5s ✔️ Volume "datahub_mysqldata" Created 0.3s ✘ Container zookeeper Error 135.4s ✘ Container mysql Error 126.1s ✔️ Container elasticsearch Healthy 97.0s ✔️ Container mysql-setup Created 15.0s ✔️ Container elasticsearch-setup Started 75.8s ✔️ Container broker Created 12.7s ✔️ Container schema-registry Created 7.6s ✔️ Container kafka-setup Created 8.7s ✔️ Container datahub-upgrade Created 4.9s ✔️ Container datahub-gms Created 6.6s ✔️ Container datahub-frontend-react Created 6.7s ✔️ Container datahub-actions Created
    b
    • 2
    • 1
  • s

    strong-potato-63475

    06/08/2023, 3:46 AM
    additionally zookeeper and mysql seem to be running within docker.
    • 1
    • 4
  • g

    gray-airplane-39227

    06/08/2023, 4:18 PM
    Hello, I created a user with following permission:
    Generate Personal Access Token
    ,
    Edit Entity
    , and I log in as this user and confirm that I’m not authorized to view dataset and its fields, and I also confirm GMS rest api is guarded by setting env variable
    REST_API_AUTHORIZATION_ENABLED
    to
    true
    . I’ll get 401 when I curl a search request to GMS rest api However, if I open GraphiQL from UI and make a search request on dataset, I’m able to view all metadata of any dataset, and similarly I can get results by making graphql search queries to GMS graphql endpoint. I checked the code and seems
    SearchResolver.java
    doesn’t have any authentication on it, would like to confirm if this is a valid issue, thank you!
    b
    a
    b
    • 4
    • 4
  • b

    bland-gigabyte-28270

    06/09/2023, 9:26 AM
    I’m creating a fresh install for our Datahub (version
    0.10.3
    , helm chart
    0.2.165
    ), however our
    datahub:datahub
    user seems like they don’t have any permission. Noted that this works before for previous PoC, and somehow it’s don’t work anymore
    g
    d
    +2
    • 5
    • 14
  • b

    best-rose-86507

    06/09/2023, 9:52 AM
    Hi engineers! Wanted to ask wether there's a way to query a dataset's schema fields (all of them) using graphQL, assuming the input is the dataset's URN
    ✅ 1
    b
    • 2
    • 1
  • m

    mysterious-advantage-78411

    06/09/2023, 12:11 PM
    Hi All, Could not ingest tableau Dashboards on version 10.3. Everything is fine till datahub starts ingest charts from tableau. (so datasources created fine) it shows: ... any ideas? https://github.com/datahub-project/datahub/issues/8204 [2023-06-09 114631,877] ERROR {datahub.entrypoints:199} - Command failed: Query sheets Connection error: [{\'message\': "Validation ' 'error of type FieldUndefined: Field \'documentViewId\' in type \'Sheet\' is undefined @ \'sheetsConnection/nodes/documentViewId\'", ' '\'locations\': [{\'line\': 10, \'column\': 5, \'sourceName\': None}], \'description\': "Field \'documentViewId\' in type \'Sheet\' is ' 'undefined", \'validationErrorType\': \'FieldUndefined\', \'queryPath\': [\'sheetsConnection\', \'nodes\', \'documentViewId\'], ' "'errorType': 'ValidationError', 'path': None, 'extensions': None}]\n" 'Traceback (most recent call last):\n'
    h
    g
    • 3
    • 2
  • b

    best-market-29539

    06/09/2023, 2:07 PM
    Hello guys, why this search from advanced queries tutorial gives so many result if it should return only one exact match due to doublequote usage? Is it a bug?
    g
    a
    +2
    • 5
    • 17
  • h

    handsome-park-80602

    06/09/2023, 2:59 PM
    hi all, I have helm deployed datahub (v.0.10.3) to k8s in aws (using managed Elasticsearch, managed RDS postgres, and confluent cloud) and when i login with user
    datahub
    I have no permission to view the policies and my datahub user doesn't seem to have root role as I don't have visibility to ingestion UI tab either. I tried restoring-incides as it was suggested here: https://datahubspace.slack.com/archives/C029A3M079U/p1675057819949539?thread_ts=1674544681.800709&amp;cid=C029A3M079U and that also didn't work. I was wondering if anyone else has seen this issue before. I am not sure where to look next as looking into gms log has no indication that it attempted to ingest policies.json file during boot process despite me mounting the policies.json file explicitly to datahub-gms:
    Copy code
    datahub-gms:
      enabled: true
      image:
        repository: linkedin/datahub-gms
        # tag: "v0.10.0 # defaults to .global.datahub.version
      resources:
        limits:
          memory: 2Gi
        requests:
          cpu: 100m
          memory: 1Gi
      extraVolumes:
        - name: datahub-policies-volume
          configMap:
            name: "datahub-policies-cm"
      extraVolumeMounts:
        - name: datahub-policies-volume
          mountPath: /datahub/datahub-gms/resources/policies.json
          subPath: policies.json
      extraEnvs:
        - name: UI_INGESTION_ENABLED
          value: "true"
    any help would be appreciated.
    b
    b
    +2
    • 5
    • 9
  • l

    lemon-greece-73651

    06/09/2023, 8:30 PM
    I’m looking to protect tags in DataHub using a custom policy which isn’t working as expected. Is this something anyone has run across and been able to solve? more details in 🧵
    g
    b
    a
    • 4
    • 12
  • b

    bland-orange-13353

    06/09/2023, 9:16 PM
    This message was deleted.
    ✅ 1
    g
    e
    • 3
    • 2
  • e

    eager-winter-63685

    06/10/2023, 1:42 AM
    Hello everyone, I tried to install datahub on my ubuntu with the quickstart guide. Things went well until I tried to do the
    Copy code
    datahub docker quickstart
    it gives me an error:
    Copy code
    [2023-06-10 09:34:07,069] ERROR    {datahub.entrypoints:189} - Command failed with Unknown color 'bright_red'. Run with --debug to get full trace
    I m sure that I have datahub installed successfully:
    Copy code
    $ datahub version
    /home/leo/.local/lib/python3.6/site-packages/datahub/__init__.py:23: FutureWarning: DataHub will require Python 3.7 or newer in a future release. Please upgrade your Python version to continue using DataHub.
      FutureWarning,
    DataHub CLI version: 0.8.43
    Python version: 3.6.9 (default, Nov 25 2022, 14:10:45) 
    [GCC 8.4.0]
    does anyone know why this happens?
    ✅ 1
    g
    • 2
    • 3
1...100101102...119Latest