https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • a

    astonishing-dream-54015

    06/12/2023, 3:00 PM
    Hi All, I am trying to quickstart on my mac M1 here. Python=3.8(conda create) Mac Ver. = Ventura 13.0 Stuck in starting broker... In my terminal:
    Copy code
    Starting up DataHub...
    [+] Running 7/8
     ⠿ Container mysql        Healthy                                          0.5s
     ⠿ Container zookeeper      Healthy                                          5.9s
     ⠿ Container mysql-setup     Started                                          0.8s
     ⠿ Container elasticsearch    Healthy                                          1.9s
     ⠿ Container elasticsearch-setup Started                                          2.1s
     ⠿ Container broker        Waiting                                         821.1s
     ⠿ Container schema-registry   Created                                          0.0s
     ⠿ Container kafka-setup     Created                                          0.0s
    In broker log there is an error:
    Copy code
    [2023-06-12 14:45:40,243] INFO Initiating client connection, connectString=zookeeper:2181 sessionTimeout=18000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@1ecee32c (org.apache.zookeeper.ZooKeeper)
    [2023-06-12 14:45:40,247] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket)
    [2023-06-12 14:45:40,251] INFO zookeeper.request.timeout value is 0. feature enabled=false (org.apache.zookeeper.ClientCnxn)
    [2023-06-12 14:45:40,255] INFO Opening socket connection to server zookeeper/172.19.0.4:2181. (org.apache.zookeeper.ClientCnxn)
    [2023-06-12 14:45:40,257] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
    [2023-06-12 14:45:40,260] INFO Socket connection established, initiating session, client: /172.19.0.6:50088, server: zookeeper/172.19.0.4:2181 (org.apache.zookeeper.ClientCnxn)
    [2023-06-12 14:45:40,268] INFO Session establishment complete on server zookeeper/172.19.0.4:2181, session id = 0x10000010f140001, negotiated timeout = 18000 (org.apache.zookeeper.ClientCnxn)
    [2023-06-12 14:45:40,271] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
    [2023-06-12 14:45:40,343] INFO [feature-zk-node-event-process-thread]: Starting (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
    [2023-06-12 14:45:40,355] INFO Feature ZK node at path: /feature does not exist (kafka.server.FinalizedFeatureChangeListener)
    [2023-06-12 14:45:40,355] INFO Cleared cache (kafka.server.FinalizedFeatureCache)
    [2023-06-12 14:45:40,470] INFO Cluster ID = VBPvRVOsQmK-gMNc1hRWRg (kafka.server.KafkaServer)
    [2023-06-12 14:45:40,476] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
    kafka.common.InconsistentClusterIdException: The Cluster ID VBPvRVOsQmK-gMNc1hRWRg doesn't match stored clusterId Some(5J_OZE3xSnicskM-BRhbyA) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
            at kafka.server.KafkaServer.startup(KafkaServer.scala:230)
            at kafka.Kafka$.main(Kafka.scala:109)
            at kafka.Kafka.main(Kafka.scala)
    [2023-06-12 14:45:40,478] INFO shutting down (kafka.server.KafkaServer)
    [2023-06-12 14:45:40,479] INFO [feature-zk-node-event-process-thread]: Shutting down (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
    [2023-06-12 14:45:40,479] INFO [feature-zk-node-event-process-thread]: Stopped (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
    [2023-06-12 14:45:40,479] INFO [feature-zk-node-event-process-thread]: Shutdown completed (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
    [2023-06-12 14:45:40,480] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
    [2023-06-12 14:45:40,585] INFO Session: 0x10000010f140001 closed (org.apache.zookeeper.ZooKeeper)
    [2023-06-12 14:45:40,585] INFO EventThread shut down for session: 0x10000010f140001 (org.apache.zookeeper.ClientCnxn)
    [2023-06-12 14:45:40,586] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
    [2023-06-12 14:45:40,591] INFO App info kafka.server for 1 unregistered (org.apache.kafka.common.utils.AppInfoParser)
    [2023-06-12 14:45:40,591] INFO shut down completed (kafka.server.KafkaServer)
    [2023-06-12 14:45:40,591] ERROR Exiting Kafka. (kafka.Kafka$)
    [2023-06-12 14:45:40,591] INFO shutting down (kafka.server.KafkaServer)
    Any help would be appreciated!! Thanks in advance~~~~
    ✅ 1
    s
    • 2
    • 4
  • b

    bland-barista-59197

    06/12/2023, 6:15 PM
    Hi Team Could you please assist me to make tag as read only / non deletable from dataset table/ column? After BigQuery ingestion i saw “PARTITION_KEY” tag is non editable or deletable.
    plus1 1
    d
    l
    • 3
    • 3
  • p

    purple-printer-15193

    06/12/2023, 6:58 PM
    Heya team, Can anyone confirm to me if these tests are working in https://demo.datahubproject.io/? I did some changes already based on the rules and they aren’t working as expected.
    ✅ 1
    g
    b
    • 3
    • 8
  • e

    elegant-guitar-28442

    06/13/2023, 2:35 AM
    Hello! There was a problem when I evaluated my data using Great Expectations. DataHub version 0.10.2 DataHub CLI version 0.10.2 Great Expectations version 0.15.50 Python Version 3.10 Exception occurs in datahub.integrations.great_expectations.action line 193
    ✅ 1
    h
    • 2
    • 1
  • a

    adorable-lawyer-88494

    06/13/2023, 7:21 AM
    Hii All I am getting some errors while upgrading datahub project from java 11- java 17. https://scans.gradle.com/s/zlct6yyhajqnm This is my scan report. Error is like..
    Copy code
    Failure 1 of 1The :li-utils:compileMainGeneratedDataTemplateJava task failed.View task in console log
    Compilation failed; see the compiler error output for details.
    Exception
    org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':li-utils:compileMainGeneratedDataTemplateJava'.	
    at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.lambda$executeIfValid$1(ExecuteActionsTaskExecuter.java:145)	
    •••
    Caused by: org.gradle.api.internal.tasks.compile.CompilationFailedException: Compilation failed; see the compiler error output for details.	
    at org.gradle.api.internal.tasks.compile.JdkJavaCompiler.execute(JdkJavaCompiler.java:56)	
    •••
    ✅ 1
    g
    • 2
    • 4
  • s

    swift-dream-78272

    06/13/2023, 12:04 PM
    Hey, I’ve got a question regarding snowflake classification feature. I see that there’s a parameter info_type_to_term, is there a possibility that snowflake will assign info type to already existing glossary term? I’ve been trying like on an attached picture, but it keep assign glossary term named ‘Email’ that I cannot even find in Glossary view tab and when I check my glossary term group ContactInformation and term Email, there’s nothing inside.
    ✅ 1
    g
    h
    a
    • 4
    • 9
  • b

    billions-baker-82097

    06/13/2023, 1:07 PM
    Hi team, I am trying to install datahub by cloning the datahub repo and building each module as per documentation https://datahubproject.io/docs/developers But I am getting this issue after trying to build datahub-frontend. It got stuck on 87℅. Attaching below the image of the same. I am using wsl
    ✅ 1
    g
    • 2
    • 3
  • m

    mysterious-advantage-78411

    06/13/2023, 2:40 PM
    Hi Guys! Could somebody help with s3 ingestion timeout error? Is there is a way to rise this option?
    d
    • 2
    • 3
  • c

    creamy-battery-20182

    06/13/2023, 5:09 PM
    Hi! I was running into these exceptions on a dbt ingestion job:
    Copy code
    023-06-12 22:40:55,684 [qtp944427387-17466] INFO  c.l.m.r.entity.AspectResource:166 - INGEST PROPOSAL proposal: {aspectName=assertionInfo, systemMetadata={lastObserved=1686609651915, runId=dbt-2023_06_12-22_40_42}, entityUrn=urn:li:assertion:d8691f1c759e159221940a3696e48cf8, entityType=assertion, aspect={contentType=application/json, value=ByteString(length=1375,bytes=7b226375...6e227d7d)}, changeType=UPSERT}
    
    2023-06-12 22:40:55,687 [qtp944427387-17421] ERROR c.l.m.filter.RestliLoggingFilter:38 - <http://Rest.li|Rest.li> error: 
    com.linkedin.restli.server.RestLiServiceException: com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries
    But these are the underlying exceptions (logs are from the GMS pod):
    Copy code
    Caused by: io.ebean.DuplicateKeyException: Error when batch flush on sql: insert into metadata_aspect_v2 (urn, aspect, version, metadata, createdOn, createdBy, createdFor, systemmetadata) values (?,?,?,?,?,?,?,?)
    
    Caused by: java.sql.BatchUpdateException: Duplicate entry 'urn:li:assertion:04063f0fbcbe627b390598a883fb0272-assertionInfo-' for key 'PRIMARY'
    
    Caused by: java.sql.SQLIntegrityConstraintViolationException: Duplicate entry 'urn:li:assertion:04063f0fbcbe627b390598a883fb0272-assertionInfo-' for key 'PRIMARY'
    Has anyone seen these before? What could be the underlying issue here, is there an issue with the data itself?
    ✅ 1
    g
    • 2
    • 1
  • f

    flat-engineer-75197

    06/13/2023, 5:24 PM
    👋 could someone help me understand the difference between groups and native groups? I manually made the
    cool_kids
    group on the UI and added my user to it. The group’s URN tells me it’s of type corpGroup but when I queried the user’s aspects, it instead shows up as a native group.
    g
    • 2
    • 4
  • v

    victorious-monkey-86128

    06/13/2023, 8:29 PM
    Hi, this is a question rather than troubleshoot. How can I create term groups and assign terms to them using the Python SDK?
    g
    o
    a
    • 4
    • 5
  • i

    incalculable-portugal-45517

    06/14/2023, 3:31 AM
    Hello, I am trying to upgrade datahub from 0.9.3 to 0.10.3 with a kubernetes set up and using the datahub-upgrade container, but I am getting
    No such file or directory
    for
    /docker/datahub-upgrade/datahub-upgrade.sh
    and
    ./datahub-upgrade.sh
    (I also can't find any
    datahub-upgrade.sh
    in the container when I run it locally with docker run)
    g
    • 2
    • 12
  • b

    bumpy-shoe-90203

    06/14/2023, 6:38 AM
    #troubleshoot there is an error in the latest Helm chart version:
    Copy code
    helm install datahub datahub/datahub                 
    Error: INSTALLATION FAILED: YAML parse error on datahub/templates/datahub-upgrade/datahub-cleanup-job-template.yml: error converting YAML to JSON: yaml: line 91: did not find expected '-' indicator
    g
    m
    +3
    • 6
    • 10
  • b

    bland-gigabyte-28270

    06/14/2023, 8:55 AM
    I recently tried to update from
    0.10.2
    to
    0.10.3
    , and seems like the secret set before cannot be accessed anymore. Is this expected? Details in 🧵
    g
    a
    +3
    • 6
    • 15
  • e

    enough-football-92033

    06/14/2023, 12:09 PM
    Hi Team! I need to add new admin user so added new creds to
    user.props
    file and able to login with them but this user can't access
    Manage Permissions
    tab. For now I able to create policies only with default datahub user, I guess it a bug and how I can fix it?
    ✅ 1
    b
    m
    • 3
    • 3
  • p

    proud-dusk-671

    06/14/2023, 12:10 PM
    FYI, I am stuck here and this should have been a part of this channel https://datahubspace.slack.com/archives/CV2UVAPPG/p1686739512309419
    g
    • 2
    • 1
  • n

    numerous-autumn-22862

    06/14/2023, 2:42 PM
    HI, We are trying to install Data hub. But we are getting the following error
    Copy code
    Error while getting broker list.
    java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listNodes
    this is an error happening in the
    kafka-setup-job
    We are running kafka in MSK from AWS
    plus1 1
    ✅ 1
    d
    o
    f
    • 4
    • 3
  • l

    late-dawn-4912

    06/14/2023, 11:41 PM
    Hi! Quick question. I'm using both Looker and LookML integration together with BigQuery to load in Datahub. But we are facing a problem that for the "sql_table_name" variable of each view, we use a combination of a variable to identify the environment an the actual name (like "variable.table_name") and this variable is a LookML variable. The problem is that Datahub doesn't read that correctly and because of that is not matching the actual LookML view to the table in BigQuery. Any ideas would be greatly appreciated! Thanks! 🙏 🙏 🙏
    d
    g
    • 3
    • 14
  • f

    faint-oyster-25890

    06/15/2023, 3:56 AM
    Hello! Does anyone know about how to set retention for DataProcessInstance? I had set the configuration in this file
    /etc/datahub/plugins/retention/retention.yaml
    like this
    Copy code
    - entity: "dataProcessInstance"
      aspect: "*"
      config:
        retention:
          version:
            maxVersions: 1
          time:
            maxAgeInSeconds: 2592000 # 30 days
    But, it did not work. Basically, my goal is to keep retaining every DataProcessInstance history for 1 month
    d
    w
    • 3
    • 3
  • b

    best-wire-59738

    06/15/2023, 4:34 AM
    Hi Team, We trying to build arm64 Images for frontend in my local MAC M1 chip. Build was succcessful using the command
    ./gradlew build
    . When I tried to build frontend image using the below dockerfile the Image build is getting stuck in the middle at gradle build command and it’s not moving forward. I had check the logs using
    --debug
    mode but didn’t figure out the actual issue. I had also attached the logs for your reference. Could you please help me out with the issue.
    docker buildx build . -t datahub --platform=linux/arm64
    arm64_buildLogsdockerfile.rtf
    g
    a
    • 3
    • 13
  • b

    brief-nail-41206

    06/15/2023, 5:01 AM
    Hi, I wanted to get usage statistics on our BigQuery project like number of queries, using the
    get_usage_aspects_from_urn
    method. I was able to get this at the table level but not at a project or even dataset level. It gives an empty response when I use a container entity (like BQ dataset or project). Would you know how I would get these stats (like most queried tables in my project) at a project level using graphql?
    ✅ 1
    g
    h
    • 3
    • 5
  • s

    stocky-guitar-68560

    06/15/2023, 6:41 AM
    hi team, I have deployed datahub version 0.9.5 using docker container in the AWS VM. I have done all the setup required. But when I run the docker compose file it prints the logs on stdout/terminal, but when I go inside the gms container, I am unable to see the logs at path /tmp/datahub/logs/gms. can someone suggest me the solution for this.
    ✅ 1
    g
    • 2
    • 3
  • p

    powerful-tent-14193

    06/15/2023, 7:49 AM
    Hi team, How can I delete resources in datahub ? I did not find a good solution for deleting the ingested metadata via UI, can somebody help me with this ? Thanks in advance!
    ✅ 1
    m
    • 2
    • 1
  • i

    icy-flag-80360

    06/15/2023, 8:53 AM
    Hello! Is datahub have ability to disable access tokens with expiration date "never"? It would be great for security reasons.
    ✅ 1
    g
    a
    • 3
    • 6
  • p

    powerful-cat-68806

    06/15/2023, 9:10 AM
    Team hi, I need inputs from this pls. soon 🙏
  • p

    proud-dusk-671

    06/15/2023, 9:49 AM
    Hi team, Facing this issue while running restore-indices-adhoc job. Please help. The job had to restore around 20k rows in MySQL Datahub version: 0.10.2
    Copy code
    Reading rows 1000 through 2000 from the aspects table completed.
    metrics so far RestoreIndicesResult(ignored=0, rowsMigrated=1000, timeSqlQueryMs=86, timeGetRowMs=0, timeUrnMs=4, timeEntityRegistryCheckMs=1, aspectCheckMs=76, createRecordMs=3814, sendMessageMs=26099)
    Successfully sent MAEs for 1000/12805 rows (7.81% of total). 0 rows ignored (0.00% of total)
    0.64 mins taken. 7.52 est. mins to completion. Total mins est. = 8.15.
    Args are RestoreIndicesArgs(start=2000, batchSize=1000, numThreads=1, batchDelayMs=100, aspectName=null, urn=null, urnLike=null)
    Reading rows 2000 through 3000 from the aspects table started.
    Reading rows 2000 through 3000 from the aspects table completed.
    ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
    	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
    	at com.linkedin.datahub.upgrade.restoreindices.SendMAEStep.iterateFutures(SendMAEStep.java:71)
    	at com.linkedin.datahub.upgrade.restoreindices.SendMAEStep.lambda$executable$0(SendMAEStep.java:138)
    	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(DefaultUpgradeManager.java:106)
    	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:65)
    	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(DefaultUpgradeManager.java:39)
    	at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.execute(DefaultUpgradeManager.java:30)
    	at com.linkedin.datahub.upgrade.UpgradeCli.run(UpgradeCli.java:80)
    	at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:768)
    	at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:752)
    	at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
    	at org.springframework.boot.builder.SpringApplicationBuilder.run(SpringApplicationBuilder.java:164)
    	at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(UpgradeCliApplication.java:23)
    	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
    	at org.springframework.boot.loader.Launcher.launch(Launcher.java:108)
    	at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
    Config in helm -
    Copy code
    datahubUpgrade:
      enabled: true
      image:
        repository: acryldata/datahub-upgrade
        # tag: "v0.10.0"  # defaults to .global.datahub.version
      batchSize: 1000
      batchDelayMs: 100
      noCodeDataMigration:
        sqlDbType: "MYSQL"
        # sqlDbType: "POSTGRES"
      podSecurityContext: {}
        # fsGroup: 1000
      securityContext: {}
        # runAsUser: 1000
      podAnnotations:
        "<http://sidecar.istio.io/inject|sidecar.istio.io/inject>": 'false'
      # Add extra sidecar containers to job pod
      extraSidecars: []
        # - name: my-image-name
        #   image: my-image
        #   imagePullPolicy: Always
      cleanupJob:
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 300m
            memory: 256Mi
        # Add extra sidecar containers to job pod
        extraSidecars: []
          # - name: my-image-name
          #   image: my-image
          #   imagePullPolicy: Always
      restoreIndices:
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 300m
            memory: 256Mi
    d
    • 2
    • 2
  • f

    flat-engineer-75197

    06/15/2023, 2:22 PM
    Hi team, I am writing a custom transformer for users. Is
    corpUser
    not a valid entity type in a transformer? My
    transform_aspect
    method refuses to run when I have this:
    Copy code
    def entity_types(self) -> List[str]:
            return ["corpUser"]
    But it’s fine if I do:
    Copy code
    def entity_types(self) -> List[str]:
            return ["*"]
    ✅ 2
    d
    • 2
    • 2
  • s

    silly-fish-85029

    06/15/2023, 2:37 PM
    Hi team, Is there an example of assigning custom ownership types to an entity in a recipe? The docs show examples using the UI only. I'm trying to add custom ownership types to dbt entities.
    ✅ 1
    g
    • 2
    • 8
  • a

    abundant-grass-62044

    06/15/2023, 2:53 PM
    Hello, I managed to ingest many sources using data hub version 10.1 But I faced a problem using Tableau Server 2022.1.3; I got only an empty project.
    g
    h
    • 3
    • 7
  • s

    swift-processor-45491

    06/15/2023, 3:59 PM
    Hi, team. I'm writing a query that uses the scrollAcrossLineage endpoint and I've detected a funny behavior. Yesterday, this query, when applied to a URN, returned the whole lineage that was requested. In contrast, the same query only returned the first degree lineage today. This was true until my afternoon. I was wondering if there is any kind of indexing or internal behavior in DataHub that we should be aware of whenever we run this query. Is there any kind of internal scheduled job that may affect it?
    g
    b
    • 3
    • 4
1...101102103...119Latest