DataHub #all-things-deployment

victorious-xylophone-76105

09/21/2022, 9:23 PM

We have some issues, drilling down entities metadata from UI main page, it shows

No entities

in most deployment cases except default quickstart (quickstart with explicit version also has that problem). I have opened a bug https://github.com/datahub-project/datahub/issues/6014 . If anyone knows what the issue is and how to get around, please, let me know. We really need that feature.

tall-butcher-30509

09/22/2022, 7:02 AM

Is there any way to change to default scope of the visual lineage?

careful-engine-38533

09/22/2022, 7:04 AM

Hi, my mongodb ingestion fails with the following message - any help?

Copy code

'/usr/local/bin/run_ingest.sh: line 40:    79 Killed                  ( datahub ingest run -c "${recipe_file}" ${report_option} )\n',
           "2022-09-22 06:29:49.739560 [exec_id=29430983-bfd2-4551-b153-c869537f5fe5] INFO: Failed to execute 'datahub ingest'",
           '2022-09-22 06:29:49.739831 [exec_id=29430983-bfd2-4551-b153-c869537f5fe5] INFO: Caught exception EXECUTING '
           'task_id=29430983-bfd2-4551-b153-c869537f5fe5, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
           '    self.event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in

cuddly-arm-8412

09/22/2022, 12:18 PM

hi，team。 I want to confirm whether the dataflow corresponds to the pipeline, whether the datajob corresponds to the task, and whether the task can exist independently of the unrelated dataflow。

lemon-cat-72045

09/23/2022, 8:24 AM

Hi, team. We have deployed Datahub with ES as its graph database backend. I'm wondering if there is guidance to migrate from ES to Neo4j without losing any data. Thanks in advance.

rapid-book-98432

09/23/2022, 12:34 PM

Hi hi. I'm facing a new problem deploying datahub with helm chart. Ilt seems to be linked to ES container setup job :

helm install datahub datahub/datahub -n demo --version 0.2.83 --debug

install.go178 [debug] Original chart version: "0.2.83"

install.go195 [debug] CHART PATH: /home/cmo/.cache/helm/repository/datahub-0.2.83.tgz

client.go299 [debug] Starting delete for "datahub-elasticsearch-setup-job" Job

client.go128 [debug] creating 1 resource(s)

client.go529 [debug] Watching for changes to Job datahub-elasticsearch-setup-job with timeout of 5m0s

client.go557 [debug] Add/Modify event for datahub-elasticsearch-setup-job: ADDED

client.go596 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0

client.go557 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED

client.go596 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 1, jobs succeeded: 0

client.go557 [debug] Add/Modify event for datahub-elasticsearch-setup-job: MODIFIED

client.go596 [debug] datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 2, jobs succeeded: 0

Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition

helm.go84 [debug] failed pre-install: timed out waiting for the condition

INSTALLATION FAILED

main.newInstallCmd.func2

helm.sh/helm/v3/cmd/helm/install.go:127

github.com/spf13/cobra.(*Command).execute

github.com/spf13/cobra@v1.3.0/command.go:856

github.com/spf13/cobra.(*Command).ExecuteC

github.com/spf13/cobra@v1.3.0/command.go:974

github.com/spf13/cobra.(*Command).Execute

github.com/spf13/cobra@v1.3.0/command.go:902

main.main

helm.sh/helm/v3/cmd/helm/helm.go:83

runtime.main

runtime/proc.go:255

runtime.goexit

runtime/asm_amd64.s:1581

If you have any idea. Thanks ! N.B : The es setup job container having this log :

2022/09/23 123401 Problem with request: Get http://elasticsearch-master:9200: dial tcp 10.0.125.209200 connect: connection refused. Sleeping 1s

2022/09/23 123403 Problem with request: Get http://elasticsearch-master:9200: dial tcp 10.0.125.209200 connect: connection refused. Sleeping 1s

2022/09/23 123403 Timeout after 2m0s waiting on dependencies to become available: [http://elasticsearch-master:9200]

Well known error here -_-

thousands-solstice-2498

09/26/2022, 9:12 AM

Hi team, Please advise here. URI [/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 404 Not Found] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [datahubpolicyindex_v2]","resource.type":"index_or_alias","resource.id":"datahubpolicyindex_v2","index_uuid":"_na_","index":"datahubpolicyindex_v2"}],"type":"index_not_found_exception","reason":"no such index [datahubpolicyindex_v2]","resource.type":"index_or_alias","resource.id":"datahubpolicyindex_v2","index_uuid":"_na_","index":"datahubpolicyindex_v2"},"status":404}

cool-gpu-21169

09/27/2022, 2:38 PM

Hi , I've couple of questions regarding storage requirements; 1. We deployed DataHub to a Kubernetes cluster and are not sure storage requirements for various components, especially mysql. Understand it is case by case, but what is minimal storage requirement? 2. Is there a built-in process that can be used for backup and restore data? Thanks

able-evening-90828

09/27/2022, 10:55 PM

Has anyone seen the following error in

elasticsearch-setup-job

v0.8.45

Copy code

2022-09-27 15:53:34.835 PDT curl: option -k <http://elasticsearch-master:9200/_ilm/policy/datahub_usage_event_policy>: is unknown
Error
2022-09-27 15:53:34.835 PDT curl: try 'curl --help' or 'curl --manual' for more information
Error
2022-09-27 15:53:34.836 PDT /create-indices.sh: line 41: [: -eq: unary operator expected
Error
2022-09-27 15:53:34.836 PDT /create-indices.sh: line 45: [: -eq: unary operator expected
Error
2022-09-27 15:53:34.836 PDT /create-indices.sh: line 47: [: -eq: unary operator expected

bumpy-park-19085

09/28/2022, 3:17 PM

Hi all! I've been trying to set up a datahub project with a manually (terraformed) elastic search. the health checks for the front end come up but the gms service keeps returning a 403 (error in thread). i've tried both resource and identity based policies on the es cluster with no luck. has anyone else had this issue?

nutritious-finland-99092

09/28/2022, 5:17 PM

Hi guys, i'm running https://github.com/datahub-project/datahub/blob/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml to test some metadata-ingestion examples (https://github.com/datahub-project/datahub/blob/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml). But somehow at anytime my datahub-gms container dies and I can't connect to localhost anymore, on the gms container I receive the following error

Copy code

Command exited with error: signal: killed

Sometimes works fine, and others this happens, any ideas?

brief-ability-41819

09/29/2022, 7:17 AM

Hello, is there any example of datahub-frontend Helm chart configured for Okta? I assume I need to work with values listed here as `oidc`: https://github.com/acryldata/datahub-helm/tree/master/charts/datahub/subcharts/datahub-frontend ?

microscopic-mechanic-13766

09/29/2022, 9:05 AM

Good morning, could someone point me out where the Datahub's API usage guide is?? (If there is any of course) Thanks in advance! Edit: Is OpenAPI the only API for Datahub?

better-orange-49102

09/29/2022, 10:12 AM

general helm qn, if i want to adjust the liveness probe for datahub-gms, should i edit the the values.yaml inside /charts/datahub-gms/values.yaml or /subcharts/datahub-gms/values.yaml? i dont think it can be adjusted inside the top-level values.yaml

gentle-camera-33498

09/29/2022, 7:07 PM

Hello everyone! Which version of MySQL is recommended to use? On dockerfile the default version is 5.7, but this version will end the support next year:

clever-artist-51241

10/02/2022, 10:19 AM

Hello, Im new on datahub. Im looking config yaml file for example to connect to DBT,

full-apple-16103

10/03/2022, 8:39 AM

Hello, We want to test Datahub using a deployment of docker on ec2 in AWS , I could see the tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap area , is there a specific instance type that is recommended \ tested before ? And also - can we use spot group or it is preferred to use a single on-demand ?

microscopic-mechanic-13766

10/03/2022, 12:31 PM

Good Monday everyone, so I have been looking through the possible actions that can be done in both the glossary terms and the domains and I have one question. Why is it possible to create a hierarchy with glossary terms but not with domains?? Thanks in advance!

microscopic-mechanic-13766

10/03/2022, 2:23 PM

Hello again, is it possible to create policies at role level? Let's say I want to modify the default policies of the readers in my Datahub deployment, is that possible? Thanks in advance!

polite-application-51650

10/04/2022, 6:42 AM

Copy code

06:33:37.977 [pool-12-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 7ms
06:33:38.059 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector:41 - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
06:33:41.636 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:25 - Failed to feed bulk request. Number of events: 1 Took time ms: -1 Message: failure in bulk execution:
[0]: index [datahubexecutionrequestindex_v2], type [_doc], id [urn%3Ali%3AdataHubExecutionRequest%3A79ec91eb-af3b-4baa-87b4-ade8f202dfce], message [[datahubexecutionrequestindex_v2/wfhCmJ_jR0e48O5ItrryJA][[datahubexecutionrequestindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubExecutionRequest%3A79ec91eb-af3b-4baa-87b4-ade8f202dfce]: document missing]]]
06:33:42.648 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:28 - Successfully fed bulk request. Number of events: 4 Took time ms: -1
06:33:45.035 [Thread-283] WARN  c.l.m.s.e.q.r.SearchRequestHandler:444 - Found invalid filter field for entity search. Invalid or unrecognized facet ingestionSource
06:33:51.157 [Thread-286] WARN  c.l.m.s.e.q.r.SearchRequestHandler:444 - Found invalid filter field for entity search. Invalid or unrecognized facet ingestionSource

Hi team, can someone help me with this error I'm getting when doing the UI ingestion after setting up datahub in k8s

bland-orange-13353

10/04/2022, 11:14 AM

This message was deleted.

crooked-rose-22807

10/05/2022, 10:54 AM

Hello, I have couple of questions. 1. Is there any way we can clean up unused tags? 2. How can we customise permission and roles? 3. How can we add Tags, Glossary Terms & Domains NOT VIA the UI? Maybe json?

agreeable-belgium-70840

10/05/2022, 11:23 AM

Hello all, I am trying to upgrade to v0.8.45. However, the frontend pod is restarting and the error message is, any ideas? :

Copy code

Oops, cannot start the server.
java.nio.file.AccessDeniedException: /RUNNING_PID
	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
	at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)
	at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:478)
	at java.base/java.nio.file.Files.newOutputStream(Files.java:220)
	at play.core.server.ProdServerStart$.createPidFile(ProdServerStart.scala:162)
	at play.core.server.ProdServerStart$.start(ProdServerStart.scala:48)
	at play.core.server.ProdServerStart$.main(ProdServerStart.scala:30)
	at play.core.server.ProdServerStart.main(ProdServerStart.scala)

rich-machine-24265

10/06/2022, 9:04 AM

Hi team! I created PR some time ago, and I would really appreciate if somebody take a look and merge it https://github.com/datahub-project/datahub/pull/6090 . It's related to frontend deployment. Thanks!

fierce-monkey-46092

10/06/2022, 10:25 AM

HI everyone, I've created data lineage with file-based lineage (.yml file). I created too many wrong lineages just because i was learning at that movement. So my question is "Is it possible to delete the lineage that i created before?" Any answer will be helpful.

early-afternoon-71938

10/06/2022, 1:00 PM

Copy code

Hi, I am facing issues of pods being in pending and not running in EKS cluster after following the K8 deployment guide, can you please help:--~# kubectl get pods
NAME                                                READY   STATUS             RESTARTS       AGE
elasticsearch-master-0                              0/1     Pending            0              64m
elasticsearch-master-1                              0/1     Pending            0              64m
elasticsearch-master-2                              0/1     Pending            0              64m
prerequisites-cp-schema-registry-6f4b5b894f-8lzvj   1/2     CrashLoopBackOff   15 (38s ago)   64m
prerequisites-kafka-0                               0/1     Pending            0              64m
prerequisites-mysql-0                               0/1     Pending            0              64m
prerequisites-neo4j-community-0                     0/1     Pending            0              64m
prerequisites-zookeeper-0                           0/1     Pending            0              64m
:~# kubectl describe pods prerequisites-cp-schema-registry-6f4b5b894f-8lzvj
Name:             prerequisites-cp-schema-registry-6f4b5b894f-8lzvj
Namespace:        default
Priority:         0
Service Account:  default
Node:             ip-10-0-1-247.ec2.internal/10.0.1.247
Start Time:       Thu, 06 Oct 2022 17:06:53 +0530
Labels:           app=cp-schema-registry
                  pod-template-hash=6f4b5b894f
                  release=prerequisites
Annotations:      <http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
                  <http://prometheus.io/port|prometheus.io/port>: 5556
                  <http://prometheus.io/scrape|prometheus.io/scrape>: true
Status:           Running
IP:               10.0.1.33
IPs:
  IP:           10.0.1.33
Controlled By:  ReplicaSet/prerequisites-cp-schema-registry-6f4b5b894f
Containers:
  prometheus-jmx-exporter:
    Container ID:  <docker://d106dfe9388bd4e0009227c3d68bb83bc81bcdb530f0d2f3ad4a94dee19df75>1
    Image:         solsson/kafka-prometheus-jmx-exporter@sha256:6f82e2b0464f50da8104acd7363fb9b995001ddff77d248379f8788e78946143
    Image ID:      <docker-pullable://solsson/kafka-prometheus-jmx-exporter@sha256:6f82e2b0464f50da8104acd7363fb9b995001ddff77d248379f8788e78946143>
    Port:          5556/TCP
    Host Port:     0/TCP
    Command:
      java
      -XX:+UnlockExperimentalVMOptions
      -XX:+UseCGroupMemoryLimitForHeap
      -XX:MaxRAMFraction=1
      -XshowSettings:vm
      -jar
      jmx_prometheus_httpserver.jar
      5556
      /etc/jmx-schema-registry/jmx-schema-registry-prometheus.yml
    State:          Running
      Started:      Thu, 06 Oct 2022 17:06:54 +0530
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/jmx-schema-registry from jmx-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xbgmf (ro)
  cp-schema-registry-server:
    Container ID:   <docker://9effa12c8c8cd8a6585b56155f72b0a1e51b79e3b3ce31473c5cc3dbf4863bb>6
    Image:          confluentinc/cp-schema-registry:6.0.1
    Image ID:       <docker-pullable://confluentinc/cp-schema-registry@sha256:b52e16cf232e3c9acd677ae8944de813e16fa541a367d9f805b300c5d2be1a1f>
    Ports:          8081/TCP, 5555/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Thu, 06 Oct 2022 18:27:55 +0530
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 06 Oct 2022 18:22:00 +0530
      Finished:     Thu, 06 Oct 2022 18:22:45 +0530
    Ready:          True
    Restart Count:  18
    Environment:
      SCHEMA_REGISTRY_HOST_NAME:                      (v1:status.podIP)
      SCHEMA_REGISTRY_LISTENERS:                     <http://0.0.0.0:8081>
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS:  prerequisites-kafka:9092
      SCHEMA_REGISTRY_KAFKASTORE_GROUP_ID:           prerequisites
      SCHEMA_REGISTRY_MASTER_ELIGIBILITY:            true
      SCHEMA_REGISTRY_HEAP_OPTS:                     -Xms512M -Xmx512M
      JMX_PORT:                                      5555
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xbgmf (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  jmx-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      prerequisites-cp-schema-registry-jmx-configmap
    Optional:  false
  kube-api-access-xbgmf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
  Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Normal   Pulled   6m29s (x18 over 81m)  kubelet  Container image "confluentinc/cp-schema-registry:6.0.1" already present on machine
  Warning  BackOff  91s (x307 over 80m)   kubelet  Back-off restarting failed container

better-orange-49102

10/07/2022, 9:34 AM

for helm, has anyone turned on jmx exporters for frontend? I'm getting a bunch of error messages in the jmx pod (I only removed the comments in the FE values.yaml) Also, jmx exporter has disappeared from gms's values.yaml? Then what will be our options for monitoring gms pod

full-chef-85630

10/09/2022, 12:59 AM

hi all,This error was encountered after the upgrade,version 0.8.45, Download the source code and build the front image，Nothing has changed @dazzling-judge-80093

Copy code

Validation error (FieldUndefined@[analyticsChart/rows/cells/linkParams/searchParams/filters/value]) : Field 'value' in type 'Filter' is undefined

Validation error (FieldUndefined@[listRecommendations/modules/content/params/searchParams/filters/value]) : Field 'value' in type 'Filter' is undefined (code undefined)

00:53:21 [application-akka.actor.default-dispatcher-25] ERROR controllers.TrackingController - Failed to emit product analytics event. actor: urn:li:corpuser:datahub, event: {"title":"Conviva Schema Center","url":"<http://xxxx/>","path":"/","hash":"","search":"","width":656,"height":971,"referrer":"<http://xxxx/>","prevPathname":"/login","type":"PageViewEvent","actorUrn":"urn:li:corpuser:datahub","timestamp":1665276799567,"date":"Sun Oct 09 2022 08:53:19 GMT+0800 (中国标准时间)","userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","browserId":"96726d31-fc24-47fd-98ff-6fa14864c04e"}

cuddly-arm-8412

10/09/2022, 6:02 AM

hi,team. I want to know which model is used for some of my data interfaces? dataset?or Post?....... eg:We have an internal interface eg: xxx.com/api/v1/xxxx We want to import into the datahub Does this interface data have a corresponding data model？

tall-butcher-30509

10/10/2022, 3:08 AM

Hi All, A question on API queries: We can use the below simple query to search for a specifically defined value of ‘ifMeta_interdomain_id’. Additionally, we would like to get a list of all datasets where the property is defined as any value (i.e. exclude datasets that do not have this property defined). Does anyone know the way to query?

Copy code

http://.../entities?action=search
{
    "input": "customProperties: ifMeta_interdomain_id=<specific value>",
    "entity": "dataset",
    "start": 0,
    "count": 10
}