https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • v

    victorious-xylophone-76105

    09/21/2022, 9:23 PM
    We have some issues, drilling down entities metadata from UI main page, it shows
    No entities
    in most deployment cases except default quickstart (quickstart with explicit version also has that problem). I have opened a bug https://github.com/datahub-project/datahub/issues/6014 . If anyone knows what the issue is and how to get around, please, let me know. We really need that feature.
  • t

    tall-butcher-30509

    09/22/2022, 7:02 AM
    Is there any way to change to default scope of the visual lineage?
  • c

    careful-engine-38533

    09/22/2022, 7:04 AM
    Hi, my mongodb ingestion fails with the following message - any help?
    Copy code
    '/usr/local/bin/run_ingest.sh: line 40:    79 Killed                  ( datahub ingest run -c "${recipe_file}" ${report_option} )\n',
               "2022-09-22 06:29:49.739560 [exec_id=29430983-bfd2-4551-b153-c869537f5fe5] INFO: Failed to execute 'datahub ingest'",
               '2022-09-22 06:29:49.739831 [exec_id=29430983-bfd2-4551-b153-c869537f5fe5] INFO: Caught exception EXECUTING '
               'task_id=29430983-bfd2-4551-b153-c869537f5fe5, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
               '    self.event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in
    i
    • 2
    • 2
  • c

    cuddly-arm-8412

    09/22/2022, 12:18 PM
    hi,team。 I want to confirm whether the dataflow corresponds to the pipeline, whether the datajob corresponds to the task, and whether the task can exist independently of the unrelated dataflow。
    b
    • 2
    • 4
  • l

    lemon-cat-72045

    09/23/2022, 8:24 AM
    Hi, team. We have deployed Datahub with ES as its graph database backend. I'm wondering if there is guidance to migrate from ES to Neo4j without losing any data. Thanks in advance.
    i
    • 2
    • 3
  • r

    rapid-book-98432

    09/23/2022, 12:34 PM
    Hi hi. I'm facing a new problem deploying datahub with helm chart. Ilt seems to be linked to ES container setup job :
    helm install datahub datahub/datahub -n demo --version 0.2.83 --debug
    install.go178 [debug] Original chart version: "0.2.83"
    install.go195 [debug] CHART PATH: /home/cmo/.cache/helm/repository/datahub-0.2.83.tgz
    client.go299 [debug] Starting delete for "datahub-elasticsearch-setup-job" Job
    client.go128 [debug] creating 1 resource(s)
    client.go529 [debug] Watching for changes to Job datahub-elasticsearch-setup-job with timeout of 5m0s
    client.go557 [debug] Add/Modify event for datahub-elasticsearch-setup-job: ADDED
    client.go596 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
    client.go557 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED
    client.go596 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 1, jobs succeeded: 0
    client.go557 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED
    client.go596 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 2, jobs succeeded: 0
    Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition
    helm.go84 [debug] failed pre-install: timed out waiting for the condition
    INSTALLATION FAILED
    main.newInstallCmd.func2
    helm.sh/helm/v3/cmd/helm/install.go:127
    github.com/spf13/cobra.(*Command).execute
    github.com/spf13/cobra@v1.3.0/command.go:856
    github.com/spf13/cobra.(*Command).ExecuteC
    github.com/spf13/cobra@v1.3.0/command.go:974
    github.com/spf13/cobra.(*Command).Execute
    github.com/spf13/cobra@v1.3.0/command.go:902
    main.main
    helm.sh/helm/v3/cmd/helm/helm.go:83
    runtime.main
    runtime/proc.go:255
    runtime.goexit
    runtime/asm_amd64.s:1581
    If you have any idea. Thanks ! N.B : The es setup job container having this log :
    2022/09/23 123401 Problem with request: Get http://elasticsearch-master:9200: dial tcp 10.0.125.209200 connect: connection refused. Sleeping 1s
    2022/09/23 123403 Problem with request: Get http://elasticsearch-master:9200: dial tcp 10.0.125.209200 connect: connection refused. Sleeping 1s
    2022/09/23 123403 Timeout after 2m0s waiting on dependencies to become available: [http://elasticsearch-master:9200]
    Well known error here -_-
    i
    • 2
    • 24
  • t

    thousands-solstice-2498

    09/26/2022, 9:12 AM
    Hi team, Please advise here. URI [/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 404 Not Found] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahubpolicyindex_v2]","resource.type":"index_or_alias","resource.id":"datahubpolicyindex_v2","index_uuid":"_na_","index":"datahubpolicyindex_v2"}],"type":"index_not_found_exception","reason":"no such index [datahubpolicyindex_v2]","resource.type":"index_or_alias","resource.id":"datahubpolicyindex_v2","index_uuid":"_na_","index":"datahubpolicyindex_v2"},"status":404}
    b
    f
    • 3
    • 8
  • c

    cool-gpu-21169

    09/27/2022, 2:38 PM
    Hi , I've couple of questions regarding storage requirements; 1. We deployed DataHub to a Kubernetes cluster and are not sure storage requirements for various components, especially mysql. Understand it is case by case, but what is minimal storage requirement? 2. Is there a built-in process that can be used for backup and restore data? Thanks
    b
    • 2
    • 2
  • a

    able-evening-90828

    09/27/2022, 10:55 PM
    Has anyone seen the following error in
    elasticsearch-setup-job
    in
    v0.8.45
    ?
    Copy code
    2022-09-27 15:53:34.835 PDT curl: option -k <http://elasticsearch-master:9200/_ilm/policy/datahub_usage_event_policy>: is unknown
    Error
    2022-09-27 15:53:34.835 PDT curl: try 'curl --help' or 'curl --manual' for more information
    Error
    2022-09-27 15:53:34.836 PDT /create-indices.sh: line 41: [: -eq: unary operator expected
    Error
    2022-09-27 15:53:34.836 PDT /create-indices.sh: line 45: [: -eq: unary operator expected
    Error
    2022-09-27 15:53:34.836 PDT /create-indices.sh: line 47: [: -eq: unary operator expected
    b
    l
    s
    • 4
    • 4
  • b

    bumpy-park-19085

    09/28/2022, 3:17 PM
    Hi all! I've been trying to set up a datahub project with a manually (terraformed) elastic search. the health checks for the front end come up but the gms service keeps returning a 403 (error in thread). i've tried both resource and identity based policies on the es cluster with no luck. has anyone else had this issue?
    b
    • 2
    • 11
  • n

    nutritious-finland-99092

    09/28/2022, 5:17 PM
    Hi guys, i'm running https://github.com/datahub-project/datahub/blob/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml to test some metadata-ingestion examples (https://github.com/datahub-project/datahub/blob/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml). But somehow at anytime my datahub-gms container dies and I can't connect to localhost anymore, on the gms container I receive the following error
    Copy code
    Command exited with error: signal: killed
    Sometimes works fine, and others this happens, any ideas?
    m
    • 2
    • 2
  • b

    brief-ability-41819

    09/29/2022, 7:17 AM
    Hello, is there any example of datahub-frontend Helm chart configured for Okta? I assume I need to work with values listed here as `oidc`: https://github.com/acryldata/datahub-helm/tree/master/charts/datahub/subcharts/datahub-frontend ?
    b
    • 2
    • 2
  • m

    microscopic-mechanic-13766

    09/29/2022, 9:05 AM
    Good morning, could someone point me out where the Datahub's API usage guide is?? (If there is any of course) Thanks in advance! Edit: Is OpenAPI the only API for Datahub?
    b
    • 2
    • 3
  • b

    better-orange-49102

    09/29/2022, 10:12 AM
    general helm qn, if i want to adjust the liveness probe for datahub-gms, should i edit the the values.yaml inside /charts/datahub-gms/values.yaml or /subcharts/datahub-gms/values.yaml? i dont think it can be adjusted inside the top-level values.yaml
    f
    b
    • 3
    • 8
  • g

    gentle-camera-33498

    09/29/2022, 7:07 PM
    Hello everyone! Which version of MySQL is recommended to use? On dockerfile the default version is 5.7, but this version will end the support next year:
    b
    • 2
    • 1
  • c

    clever-artist-51241

    10/02/2022, 10:19 AM
    Hello, Im new on datahub. Im looking config yaml file for example to connect to DBT,
    b
    h
    • 3
    • 10
  • f

    full-apple-16103

    10/03/2022, 8:39 AM
    Hello, We want to test Datahub using a deployment of docker on ec2 in AWS , I could see the tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap area , is there a specific instance type that is recommended \ tested before ? And also - can we use spot group or it is preferred to use a single on-demand ?
    b
    • 2
    • 8
  • m

    microscopic-mechanic-13766

    10/03/2022, 12:31 PM
    Good Monday everyone, so I have been looking through the possible actions that can be done in both the glossary terms and the domains and I have one question. Why is it possible to create a hierarchy with glossary terms but not with domains?? Thanks in advance!
    s
    • 2
    • 1
  • m

    microscopic-mechanic-13766

    10/03/2022, 2:23 PM
    Hello again, is it possible to create policies at role level? Let's say I want to modify the default policies of the readers in my Datahub deployment, is that possible? Thanks in advance!
    e
    • 2
    • 4
  • p

    polite-application-51650

    10/04/2022, 6:42 AM
    Copy code
    06:33:37.977 [pool-12-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 7ms
    06:33:38.059 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector:41 - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
    06:33:41.636 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:25 - Failed to feed bulk request. Number of events: 1 Took time ms: -1 Message: failure in bulk execution:
    [0]: index [datahubexecutionrequestindex_v2], type [_doc], id [urn%3Ali%3AdataHubExecutionRequest%3A79ec91eb-af3b-4baa-87b4-ade8f202dfce], message [[datahubexecutionrequestindex_v2/wfhCmJ_jR0e48O5ItrryJA][[datahubexecutionrequestindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubExecutionRequest%3A79ec91eb-af3b-4baa-87b4-ade8f202dfce]: document missing]]]
    06:33:42.648 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:28 - Successfully fed bulk request. Number of events: 4 Took time ms: -1
    06:33:45.035 [Thread-283] WARN  c.l.m.s.e.q.r.SearchRequestHandler:444 - Found invalid filter field for entity search. Invalid or unrecognized facet ingestionSource
    06:33:51.157 [Thread-286] WARN  c.l.m.s.e.q.r.SearchRequestHandler:444 - Found invalid filter field for entity search. Invalid or unrecognized facet ingestionSource
    Hi team, can someone help me with this error I'm getting when doing the UI ingestion after setting up datahub in k8s
    s
    t
    +4
    • 7
    • 12
  • b

    bland-orange-13353

    10/04/2022, 11:14 AM
    This message was deleted.
    s
    f
    b
    • 4
    • 5
  • c

    crooked-rose-22807

    10/05/2022, 10:54 AM
    Hello, I have couple of questions. 1. Is there any way we can clean up unused tags? 2. How can we customise permission and roles? 3. How can we add Tags, Glossary Terms & Domains NOT VIA the UI? Maybe json?
    s
    • 2
    • 3
  • a

    agreeable-belgium-70840

    10/05/2022, 11:23 AM
    Hello all, I am trying to upgrade to v0.8.45. However, the frontend pod is restarting and the error message is, any ideas? :
    Copy code
    Oops, cannot start the server.
    java.nio.file.AccessDeniedException: /RUNNING_PID
    	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
    	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
    	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
    	at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
    	at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:478)
    	at java.base/java.nio.file.Files.newOutputStream(Files.java:220)
    	at play.core.server.ProdServerStart$.createPidFile(ProdServerStart.scala:162)
    	at play.core.server.ProdServerStart$.start(ProdServerStart.scala:48)
    	at play.core.server.ProdServerStart$.main(ProdServerStart.scala:30)
    	at play.core.server.ProdServerStart.main(ProdServerStart.scala)
    m
    • 2
    • 2
  • r

    rich-machine-24265

    10/06/2022, 9:04 AM
    Hi team! I created PR some time ago, and I would really appreciate if somebody take a look and merge it https://github.com/datahub-project/datahub/pull/6090 . It's related to frontend deployment. Thanks!
  • f

    fierce-monkey-46092

    10/06/2022, 10:25 AM
    HI everyone, I've created data lineage with file-based lineage (.yml file). I created too many wrong lineages just because i was learning at that movement. So my question is "Is it possible to delete the lineage that i created before?" Any answer will be helpful.
    s
    • 2
    • 2
  • e

    early-afternoon-71938

    10/06/2022, 1:00 PM
    Copy code
    Hi, I am facing issues of pods being in pending and not running in EKS cluster after following the K8 deployment guide, can you please help:--~# kubectl get pods
    NAME                                                READY   STATUS             RESTARTS       AGE
    elasticsearch-master-0                              0/1     Pending            0              64m
    elasticsearch-master-1                              0/1     Pending            0              64m
    elasticsearch-master-2                              0/1     Pending            0              64m
    prerequisites-cp-schema-registry-6f4b5b894f-8lzvj   1/2     CrashLoopBackOff   15 (38s ago)   64m
    prerequisites-kafka-0                               0/1     Pending            0              64m
    prerequisites-mysql-0                               0/1     Pending            0              64m
    prerequisites-neo4j-community-0                     0/1     Pending            0              64m
    prerequisites-zookeeper-0                           0/1     Pending            0              64m
    :~# kubectl describe pods prerequisites-cp-schema-registry-6f4b5b894f-8lzvj
    Name:             prerequisites-cp-schema-registry-6f4b5b894f-8lzvj
    Namespace:        default
    Priority:         0
    Service Account:  default
    Node:             ip-10-0-1-247.ec2.internal/10.0.1.247
    Start Time:       Thu, 06 Oct 2022 17:06:53 +0530
    Labels:           app=cp-schema-registry
                      pod-template-hash=6f4b5b894f
                      release=prerequisites
    Annotations:      <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
                      <http://prometheus.io/port|prometheus.io/port>: 5556
                      <http://prometheus.io/scrape|prometheus.io/scrape>: true
    Status:           Running
    IP:               10.0.1.33
    IPs:
      IP:           10.0.1.33
    Controlled By:  ReplicaSet/prerequisites-cp-schema-registry-6f4b5b894f
    Containers:
      prometheus-jmx-exporter:
        Container ID:  <docker://d106dfe9388bd4e0009227c3d68bb83bc81bcdb530f0d2f3ad4a94dee19df75>1
        Image:         solsson/kafka-prometheus-jmx-exporter@sha256:6f82e2b0464f50da8104acd7363fb9b995001ddff77d248379f8788e78946143
        Image ID:      <docker-pullable://solsson/kafka-prometheus-jmx-exporter@sha256:6f82e2b0464f50da8104acd7363fb9b995001ddff77d248379f8788e78946143>
        Port:          5556/TCP
        Host Port:     0/TCP
        Command:
          java
          -XX:+UnlockExperimentalVMOptions
          -XX:+UseCGroupMemoryLimitForHeap
          -XX:MaxRAMFraction=1
          -XshowSettings:vm
          -jar
          jmx_prometheus_httpserver.jar
          5556
          /etc/jmx-schema-registry/jmx-schema-registry-prometheus.yml
        State:          Running
          Started:      Thu, 06 Oct 2022 17:06:54 +0530
        Ready:          True
        Restart Count:  0
        Environment:    <none>
        Mounts:
          /etc/jmx-schema-registry from jmx-config (rw)
          /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xbgmf (ro)
      cp-schema-registry-server:
        Container ID:   <docker://9effa12c8c8cd8a6585b56155f72b0a1e51b79e3b3ce31473c5cc3dbf4863bb>6
        Image:          confluentinc/cp-schema-registry:6.0.1
        Image ID:       <docker-pullable://confluentinc/cp-schema-registry@sha256:b52e16cf232e3c9acd677ae8944de813e16fa541a367d9f805b300c5d2be1a1f>
        Ports:          8081/TCP, 5555/TCP
        Host Ports:     0/TCP, 0/TCP
        State:          Running
          Started:      Thu, 06 Oct 2022 18:27:55 +0530
        Last State:     Terminated
          Reason:       Error
          Exit Code:    1
          Started:      Thu, 06 Oct 2022 18:22:00 +0530
          Finished:     Thu, 06 Oct 2022 18:22:45 +0530
        Ready:          True
        Restart Count:  18
        Environment:
          SCHEMA_REGISTRY_HOST_NAME:                      (v1:status.podIP)
          SCHEMA_REGISTRY_LISTENERS:                     <http://0.0.0.0:8081>
          SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS:  prerequisites-kafka:9092
          SCHEMA_REGISTRY_KAFKASTORE_GROUP_ID:           prerequisites
          SCHEMA_REGISTRY_MASTER_ELIGIBILITY:            true
          SCHEMA_REGISTRY_HEAP_OPTS:                     -Xms512M -Xmx512M
          JMX_PORT:                                      5555
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xbgmf (ro)
    Conditions:
      Type              Status
      Initialized       True
      Ready             True
      ContainersReady   True
      PodScheduled      True
    Volumes:
      jmx-config:
        Type:      ConfigMap (a volume populated by a ConfigMap)
        Name:      prerequisites-cp-schema-registry-jmx-configmap
        Optional:  false
      kube-api-access-xbgmf:
        Type:                    Projected (a volume that contains injected data from multiple sources)
        TokenExpirationSeconds:  3607
        ConfigMapName:           kube-root-ca.crt
        ConfigMapOptional:       <nil>
        DownwardAPI:             true
    QoS Class:                   BestEffort
    Node-Selectors:              <none>
    Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                                 <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
    Events:
      Type     Reason   Age                   From     Message
      ----     ------   ----                  ----     -------
      Normal   Pulled   6m29s (x18 over 81m)  kubelet  Container image "confluentinc/cp-schema-registry:6.0.1" already present on machine
      Warning  BackOff  91s (x307 over 80m)   kubelet  Back-off restarting failed container
    b
    • 2
    • 41
  • b

    better-orange-49102

    10/07/2022, 9:34 AM
    for helm, has anyone turned on jmx exporters for frontend? I'm getting a bunch of error messages in the jmx pod (I only removed the comments in the FE values.yaml) Also, jmx exporter has disappeared from gms's values.yaml? Then what will be our options for monitoring gms pod
    b
    • 2
    • 19
  • f

    full-chef-85630

    10/09/2022, 12:59 AM
    hi all,This error was encountered after the upgrade,version 0.8.45, Download the source code and build the front image,Nothing has changed @dazzling-judge-80093
    Copy code
    Validation error (FieldUndefined@[analyticsChart/rows/cells/linkParams/searchParams/filters/value]) : Field 'value' in type 'Filter' is undefined
    
    Validation error (FieldUndefined@[listRecommendations/modules/content/params/searchParams/filters/value]) : Field 'value' in type 'Filter' is undefined (code undefined)
    
    00:53:21 [application-akka.actor.default-dispatcher-25] ERROR controllers.TrackingController - Failed to emit product analytics event. actor: urn:li:corpuser:datahub, event: {"title":"Conviva Schema Center","url":"<http://xxxx/>","path":"/","hash":"","search":"","width":656,"height":971,"referrer":"<http://xxxx/>","prevPathname":"/login","type":"PageViewEvent","actorUrn":"urn:li:corpuser:datahub","timestamp":1665276799567,"date":"Sun Oct 09 2022 08:53:19 GMT+0800 (中国标准时间)","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","browserId":"96726d31-fc24-47fd-98ff-6fa14864c04e"}
  • c

    cuddly-arm-8412

    10/09/2022, 6:02 AM
    hi,team. I want to know which model is used for some of my data interfaces? dataset?or Post?....... eg:We have an internal interface eg: xxx.com/api/v1/xxxx We want to import into the datahub Does this interface data have a corresponding data model?
    • 1
    • 1
  • t

    tall-butcher-30509

    10/10/2022, 3:08 AM
    Hi All, A question on API queries: We can use the below simple query to search for a specifically defined value of ‘ifMeta_interdomain_id’. Additionally, we would like to get a list of all datasets where the property is defined as any value (i.e. exclude datasets that do not have this property defined). Does anyone know the way to query?
    Copy code
    http://.../entities?action=search
    {
        "input": "customProperties: ifMeta_interdomain_id=<specific value>",
        "entity": "dataset",
        "start": 0,
        "count": 10
    }
    • 1
    • 1
1...242526...53Latest