https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • b

    bitter-waitress-17567

    01/03/2023, 4:42 PM
    Hi Everyone... We have deployed datahub on GKE and can see the metrics on port 4318. We want to pull these metrics in Grafana but could not find details of Grafana in Helm chart. Can some one let us know how to expose metrics. in Grafana?
    b
    • 2
    • 1
  • b

    bright-egg-51769

    01/04/2023, 7:27 PM
    Team - Trying to find where does datahub save the ingested data within a MYSQL database I understand everything is saved under metadata_aspects_v2. Where can I see what all data was brought in without accessing the UI. I can see the aspects but I cant figure out what aspect would relate to customer data for example
    ✅ 2
    b
    b
    • 3
    • 2
  • g

    great-monkey-52307

    12/28/2022, 9:43 PM
    Hi Team, I'm trying to connect to Azure MySQL Flexible server and I have made changes to the below configuration (attached image) using below connection string url: "jdbc:mysql://testmysql.mysql.database.azure.com:3306/datahub?verifyServerCertificate=false&useSSL=false&useUnicode=yes&characterEncoding=UTF-8&enabledTLSProtocols=TLSv1.2" I'm unable to connect if mysql server "require_secure_transport" parameter is ON , below is the error ERROR 3159 (HY000): Connections using insecure transport are prohibited while --require_secure_transport=ON. Can anyone let me know what changes I'm supposed to do in values.yaml to get the connection because I cannot turn off the require_secure_transport in my SQL server
    b
    • 2
    • 2
  • s

    strong-belgium-32572

    01/05/2023, 4:39 PM
    Hello, Anyone encountered this issue while upgrading the helm chart to latest Datahub Version: Any possible fixes for it.
    Copy code
    Error: template: datahub/charts/datahub-ingestion-cron/templates/cron.yaml:38:109: executing "datahub/charts/datahub-ingestion-cron/templates/cron.yaml" at <.Values.image.tag>: nil pointer evaluating interface {}.image
    b
    b
    • 3
    • 4
  • m

    microscopic-mechanic-13766

    01/09/2023, 12:39 PM
    Hello, quick doubt: Has anyone tried sending validation tests from other sources rather than GE to Datahub? I want this as I already have some own test results for a few datasets and I want to know what would have to be done (add such info to the corresponding indexes, ingest them via API, ...)
    ✅ 1
    h
    • 2
    • 1
  • s

    strong-belgium-32572

    01/09/2023, 1:10 PM
    Anyone encountering issues upgrading to latest Datahub version regarding upgrade job failure. Details here in issue ticket: https://github.com/acryldata/datahub-helm/issues/234#issuecomment-1375410537
    ✅ 1
    👀 1
    w
    i
    • 3
    • 3
  • r

    refined-energy-76018

    01/09/2023, 11:51 PM
    Hi, if I try to increase the
    replicaCount
    of datahub-gms from 1 to 2-3, do the mae/mce consumers have to be run in standalone mode as to not run into any issues?
    ✅ 1
    i
    • 2
    • 1
  • m

    microscopic-mechanic-13766

    01/10/2023, 9:44 AM
    Good morning, I have a question regarding Great_Expectations that I am not sure if it will have to do with Datahub's integration of Great_Expectations or just GE itself, so please let me know. The thing is that in the whole process of creation of the validation test you have to create 3 files: datasource, suite and checkpoint. Those files have to be created in jupyter. So my question is the following, is there a way to stop it from using a "volatile" instance of Jupyter Notebook and redirect the creation of the mentioned files to an outer instance of Jupyter Notebook (for example, I have a container of Jupyter Notebook and I would like to store and edit the mentioned files there, not in the instance "awakened" by GE or Datahub) Thanks in advance!!
    ✅ 1
    👀 1
    a
    b
    • 3
    • 12
  • l

    limited-library-89060

    01/11/2023, 10:57 AM
    Hi, we are resetting our database, so most of our metadata in our database are new. However, the UI is still displaying the old data. We believe it is from the elasticsearch indices. So how to reset the indices so that it is reflecting the new database ?, I've read this documentation regarding the indices restoration, however we are deploying our stack on nomad + the elasticsearch is shared between another services, and also the HTTP part is only for restoring the indices, not resetting them.
    👀 2
    a
    s
    • 3
    • 13
  • b

    bitter-waitress-17567

    01/11/2023, 3:49 PM
    Hi everyone. I am new to Datahub. We are running Datahub on GKE along with Prometheus. Somehow Prometheus is not able to capture the metrics.
    i
    b
    • 3
    • 25
  • r

    red-waitress-53338

    01/11/2023, 6:34 PM
    Hi Good Afternoon Is there a way to use Elasticsearch AUTH KEY instead of ELASTICSEARCH_USERNAME and ELASTICSEARCH_PASSWORD in the GMS?
    👀 1
    i
    • 2
    • 5
  • w

    witty-motorcycle-52108

    01/11/2023, 9:43 PM
    hi all! we just experienced an error in the datahub-actions container for v0.9.2 stating
    SSLError(OSError(24, 'Too many open files')))
    . we're using the pre-built container from docker hub so im not sure what the ulimit is, but it seems like something in actions may not be releasing open file descriptors. has anyone seen this before?
    a
    b
    d
    • 4
    • 57
  • r

    red-waitress-53338

    01/11/2023, 11:40 PM
    Hi, I am getting the following error on GMS, can someone please help?
  • l

    late-book-30206

    01/12/2023, 9:28 AM
    Hello, My infrastructure team and I are trying to update DataHub on our pre-production environment. We have different details: • We already worked on DataHub on this environment and we don't want to lose that work (datasets, descriptions, tags ...) • We would like to update DataHub to the latest version • My infrastructure team told me the helm in pre-production is not stable and it seems not let us update the version For information : • version : datahub-0.2.89, app-version 0.8.43 Does anyone know how to deal with our problem? If you need more information, let me know. Thank you in advance.
    👀 1
    i
    • 2
    • 3
  • w

    wide-butcher-58942

    01/12/2023, 3:36 PM
    Hello folks,
  • w

    wide-butcher-58942

    01/12/2023, 3:48 PM
    Hello Folks, I am building Datahub Frontend using this guide,
    ./gradlew :datahub-frontend:dist -x yarnTest -x yarnLint
    is Failing on this step
    Task :datahub-frontend:compileScala FAILED
    . (Error in Thread) * What went wrong:
    Copy code
    Execution failed for task ':datahub-frontend:compileScala'.
    > java.io.IOException: Cannot run program "/usr/lib/jvm/java-11-openjdk-amd64/bin/javac" (in directory "/home/rupesh/.gradle/workers"): error=2, No such file or directory
    Any thoughts on how to resolve. (I am on openJDK 11) Thanks
    ✅ 1
    ✅ 1
    👀 1
    i
    • 2
    • 4
  • r

    red-waitress-53338

    01/13/2023, 2:36 AM
    Hi All, Is there a way to use GCP SQL Auth Proxy with the datahub GMS? Based on this link we can do that but I am confused how can I integrate Cloud SQL Auth proxy with the GMS service. Can someone please help? https://cloud.google.com/sql/docs/mysql/connect-instance-private-ip
    ✅ 1
    👀 1
    s
    a
    b
    • 4
    • 16
  • g

    gentle-portugal-21014

    01/13/2023, 5:04 PM
    Hi all, I'd need an advice for building a docker image for the datahub-actions container so that it contains modified / extended version of the metadata-ingestion folder (i.e. modified version of the schema files generated from PDL files and also extended functionality of source plugins - in particular the openapi module). We tried modifying docker-compose.yml so that the default download of datahub-actions image is replace with reference to dockerfile in https://github.com/datahub-project/datahub/tree/master/docker/datahub-ingestion, but a container built this way finishes immediately after deployement to docker / start without doing anything. Any ideas, please?
    ✅ 1
    👀 1
    a
    • 2
    • 7
  • r

    red-waitress-53338

    01/15/2023, 10:41 PM
    Hi I have been stuck on this issue for quite a long time, and the logs are not helping much, can someone please help me out? Here is my GMS docker config (the Kafka part only):
    KAFKA_BOOTSTRAP_SERVER=<http://kafka.xxxxx.com:443|kafka.xxxxx.com:443>
    KAFKA_SCHEMAREGISTRY_URL=<https://schemaregistry.xxxxx.com:443>
    SPRING_KAFKA_PROPERTIES_SECURITY_PROTOCOL=SASL_SSL
    SPRING_KAFKA_PROPERTIES_SASL_JAAS_CONFIG=org.apache.kafka.common.security.plain.PlainLoginModule   required username='xxxxx' password='xxxxx';
    SPRING_KAFKA_PROPERTIES_SASL_MECHANISM=PLAIN
    SPRING_KAFKA_PROPERTIES_CLIENT_DNS_LOOKUP=use_all_dns_ips
    SPRING_KAFKA_PROPERTIES_SSL_TRUSTSTORE_LOCATION=/src/main/resources/truststore.jks
    SPRING_KAFKA_PROPERTIES_SSL_TRUSTSTORE_PASSWORD=xxxxx
    I am getting the following error when running GMS docker image locally. I think the issue is with the Kafka SSL.
    Copy code
    22:30:32.237 [ThreadPoolTaskExecutor-1] INFO  org.apache.kafka.clients.Metadata:277 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Cluster ID: XpLVkk39TyK_obCIQyz4rA
    22:30:32.237 [ThreadPoolTaskExecutor-1] INFO  org.apache.kafka.clients.Metadata:277 - [Consumer clientId=consumer-mce-consumer-job-client-1, groupId=mce-consumer-job-client] Cluster ID: XpLVkk39TyK_obCIQyz4rA
    22:30:32.239 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:149 - Authorization Exception and no authorizationExceptionRetryInterval set
    org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: mce-consumer-job-client
    22:30:32.239 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:140 - Fatal consumer exception; stopping container
    22:30:32.250 [ThreadPoolTaskExecutor-1] INFO  o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:292 - mce-consumer-job-client: Consumer stopped
    22:30:32.280 [ThreadPoolTaskExecutor-1] INFO  org.apache.kafka.clients.Metadata:277 - [Consumer clientId=consumer-generic-mce-consumer-job-client-2, groupId=generic-mce-consumer-job-client] Cluster ID: XpLVkk39TyK_obCIQyz4rA
    22:30:32.280 [ThreadPoolTaskExecutor-1] INFO  org.apache.kafka.clients.Metadata:277 - [Consumer clientId=consumer-generic-mce-consumer-job-client-2, groupId=generic-mce-consumer-job-client] Cluster ID: XpLVkk39TyK_obCIQyz4rA
    22:30:32.339 [ThreadPoolTaskExecutor-1] INFO  org.apache.kafka.clients.Metadata:277 - [Consumer clientId=consumer-datahub-usage-event-consumer-job-client-3, groupId=datahub-usage-event-consumer-job-client] Cluster ID: XpLVkk39TyK_obCIQyz4rA
    22:30:32.339 [ThreadPoolTaskExecutor-1] INFO  org.apache.kafka.clients.Metadata:277 - [Consumer clientId=consumer-datahub-usage-event-consumer-job-client-3, groupId=datahub-usage-event-consumer-job-client] Cluster ID: XpLVkk39TyK_obCIQyz4rA
    22:30:32.341 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:149 - Authorization Exception and no authorizationExceptionRetryInterval set
    org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: datahub-usage-event-consumer-job-client
    22:30:32.341 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:140 - Fatal consumer exception; stopping container
    22:30:32.344 [ThreadPoolTaskExecutor-1] INFO  o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:292 - datahub-usage-event-consumer-job-client: Consumer stopped
    22:30:32.387 [ThreadPoolTaskExecutor-1] INFO  org.apache.kafka.clients.Metadata:277 - [Consumer clientId=consumer-generic-mae-consumer-job-client-4, groupId=generic-mae-consumer-job-client] Cluster ID: XpLVkk39TyK_obCIQyz4rA
    22:30:32.387 [ThreadPoolTaskExecutor-1] INFO  org.apache.kafka.clients.Metadata:277 - [Consumer clientId=consumer-generic-mae-consumer-job-client-4, groupId=generic-mae-consumer-job-client] Cluster ID: XpLVkk39TyK_obCIQyz4rA
    22:30:32.388 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:149 - Authorization Exception and no authorizationExceptionRetryInterval set
    org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: generic-mae-consumer-job-client
    22:30:32.388 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:140 - Fatal consumer exception; stopping container
    22:30:32.390 [ThreadPoolTaskExecutor-1] INFO  o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:292 - generic-mae-consumer-job-client: Consumer stopped
    22:30:32.468 [ThreadPoolTaskExecutor-1] INFO  org.apache.kafka.clients.Metadata:277 - [Consumer clientId=consumer-generic-platform-event-job-client-5, groupId=generic-platform-event-job-client] Cluster ID: XpLVkk39TyK_obCIQyz4rA
    22:30:32.468 [ThreadPoolTaskExecutor-1] INFO  org.apache.kafka.clients.Metadata:277 - [Consumer clientId=consumer-generic-platform-event-job-client-5, groupId=generic-platform-event-job-client] Cluster ID: XpLVkk39TyK_obCIQyz4rA
    22:30:32.497 [main] INFO  c.l.metadata.boot.BootstrapManager:33 - Executing bootstrap step 2/10 with name IngestPoliciesStep...
    22:30:32.498 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:60 - Ingesting default access policies...
    22:30:32.500 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:85 - Ingesting default policy with urn urn:li:dataHubPolicy:0
    22:30:32.581 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:149 - Authorization Exception and no authorizationExceptionRetryInterval set
    org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: generic-mce-consumer-job-client
    22:30:32.582 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:140 - Fatal consumer exception; stopping container
    22:30:32.584 [ThreadPoolTaskExecutor-1] INFO  o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:292 - generic-mce-consumer-job-client: Consumer stopped
    22:30:32.769 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:149 - Authorization Exception and no authorizationExceptionRetryInterval set
    org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: generic-platform-event-job-client
    22:30:32.769 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:140 - Fatal consumer exception; stopping container
    ✅ 1
    b
    i
    • 3
    • 41
  • r

    red-waitress-53338

    01/15/2023, 10:41 PM
    Copy code
    22:30:32.771 [ThreadPoolTaskExecutor-1] INFO  o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:292 - generic-platform-event-job-client: Consumer stopped
    22:30:33.085 [pool-7-thread-1] WARN  org.elasticsearch.client.RestClient:65 - request [POST <https://99b1f46f43124c9dbfe6a6de8cee78e8.psc.us-central1.gcp.cloud.es.io:9243/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true>] returned 1 warnings: [299 Elasticsearch-7.17.8-120eabe1c8a0cb2ae87cffc109a5b65d213e9df1 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
    22:30:33.170 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:85 - Ingesting default policy with urn urn:li:dataHubPolicy:1
    22:30:34.329 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:93 - Skipping ingestion of editable policy with urn urn:li:dataHubPolicy:7
    22:30:34.713 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:93 - Skipping ingestion of editable policy with urn urn:li:dataHubPolicy:view-entity-page-all
    22:30:34.869 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:93 - Skipping ingestion of editable policy with urn urn:li:dataHubPolicy:view-dataset-sensitive
    22:30:34.870 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:85 - Ingesting default policy with urn urn:li:dataHubPolicy:admin-platform-policy
    22:30:35.494 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:85 - Ingesting default policy with urn urn:li:dataHubPolicy:admin-metadata-policy
    22:30:36.128 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:85 - Ingesting default policy with urn urn:li:dataHubPolicy:editor-platform-policy
    22:30:36.757 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:85 - Ingesting default policy with urn urn:li:dataHubPolicy:editor-metadata-policy
    22:30:37.384 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:85 - Ingesting default policy with urn urn:li:dataHubPolicy:reader-platform-policy
    22:30:38.004 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:85 - Ingesting default policy with urn urn:li:dataHubPolicy:reader-metadata-policy
    22:30:38.792 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:93 - Skipping ingestion of editable policy with urn urn:li:dataHubPolicy:asset-owners-metadata-policy
    22:30:38.835 [main] WARN  org.elasticsearch.client.RestClient:65 - request [POST <https://99b1f46f43124c9dbfe6a6de8cee78e8.psc.us-central1.gcp.cloud.es.io:9243/datahubpolicyindex_v2/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true>] returned 1 warnings: [299 Elasticsearch-7.17.8-120eabe1c8a0cb2ae87cffc109a5b65d213e9df1 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
    22:30:38.838 [main] INFO  c.l.m.boot.steps.IngestPoliciesStep:102 - Successfully ingested default access policies.
    22:30:38.838 [main] INFO  c.l.metadata.boot.BootstrapManager:41 - Starting asynchronous bootstrap step 3/10 with name IngestRolesStep...
    22:30:38.839 [main] INFO  c.l.metadata.boot.BootstrapManager:33 - Executing bootstrap step 4/10 with name IngestDataPlatformsStep...
    22:30:47.135 [main] INFO  c.l.metadata.boot.BootstrapManager:41 - Starting asynchronous bootstrap step 5/10 with name IngestDataPlatformInstancesStep...
    22:30:47.136 [main] INFO  c.l.metadata.boot.BootstrapManager:41 - Starting asynchronous bootstrap step 6/10 with name IngestRetentionPoliciesStep...
    22:30:47.136 [pool-10-thread-2] INFO  c.l.m.b.s.IngestDataPlatformInstancesStep:51 - Checking for DataPlatformInstance
    22:30:47.137 [pool-10-thread-3] INFO  c.l.m.b.s.IngestRetentionPoliciesStep:48 - Ingesting default retention...
    22:30:47.136 [main] INFO  c.l.metadata.boot.BootstrapManager:41 - Starting asynchronous bootstrap step 7/10 with name RestoreGlossaryIndices...
    22:30:47.138 [main] INFO  c.l.metadata.boot.BootstrapManager:41 - Starting asynchronous bootstrap step 8/10 with name RemoveClientIdAspectStep...
    22:30:47.138 [pool-10-thread-4] INFO  c.linkedin.metadata.boot.UpgradeStep:42 - Attempting to run RestoreGlossaryIndices Upgrade Step..
    22:30:47.139 [pool-10-thread-4] INFO  c.linkedin.metadata.boot.UpgradeStep:43 - Waiting 120 seconds..
    22:30:47.139 [main] INFO  c.l.metadata.boot.BootstrapManager:41 - Starting asynchronous bootstrap step 9/10 with name RestoreDbtSiblingsIndices...
    22:30:47.139 [main] INFO  c.l.metadata.boot.BootstrapManager:41 - Starting asynchronous bootstrap step 10/10 with name IndexDataPlatformsStep...
    22:30:47.140 [pool-10-thread-3] INFO  c.l.m.b.s.IngestRetentionPoliciesStep:64 - Setting 2 policies
    22:30:47.143 [main] INFO  o.s.web.context.ContextLoader:307 - Root WebApplicationContext initialized in 32668 ms
    22:30:47.149 [main] INFO  c.d.a.filter.AuthenticationFilter:175 - Auth is disabled. Building no-op authenticator chain...
    2023-01-15 22:30:47.196:INFO:oejshC.ROOT:main: Initializing Spring DispatcherServlet 'apiServlet'
    22:30:47.196 [main] INFO  o.s.web.servlet.DispatcherServlet:525 - Initializing Servlet 'apiServlet'
    22:30:47.637 [main] INFO  o.s.web.servlet.DispatcherServlet:547 - Completed initialization in 441 ms
    2023-01-15 22:30:47.638:INFO:oejshC.ROOT:main: Initializing Spring DispatcherServlet 'authServlet'
    22:30:47.638 [main] INFO  o.s.web.servlet.DispatcherServlet:525 - Initializing Servlet 'authServlet'
    22:30:47.695 [main] INFO  o.s.web.servlet.DispatcherServlet:547 - Completed initialization in 57 ms
    2023-01-15 22:30:47.695:INFO:oejshC.ROOT:main: Initializing Spring DispatcherServlet 'openapiServlet'
    22:30:47.695 [main] INFO  o.s.web.servlet.DispatcherServlet:525 - Initializing Servlet 'openapiServlet'
    22:30:48.062 [pool-10-thread-5] INFO  c.l.m.b.s.RemoveClientIdAspectStep:43 - Unknown aspects have been removed. Skipping...
    22:30:48.064 [pool-10-thread-5] INFO  c.l.m.b.s.RestoreDbtSiblingsIndices:61 - Attempting to run RestoreDbtSiblingsIndices upgrade..
    22:30:48.065 [pool-10-thread-5] INFO  c.l.m.b.s.RestoreDbtSiblingsIndices:62 - Waiting 120 seconds..
    22:30:48.225 [pool-10-thread-2] INFO  c.l.m.b.s.IngestDataPlatformInstancesStep:61 - Reading urns 0 to 1000 from the aspects table to generate dataplatform instance aspects
    22:30:48.319 [pool-10-thread-2] INFO  c.l.m.b.s.IngestDataPlatformInstancesStep:76 - Finished ingesting DataPlatformInstance for urn 0 to 1000
    22:30:48.319 [pool-10-thread-2] INFO  c.l.m.b.s.IngestDataPlatformInstancesStep:79 - Finished ingesting DataPlatformInstance for all entities
    22:30:48.320 [pool-10-thread-2] INFO  c.linkedin.metadata.boot.UpgradeStep:42 - Attempting to run IndexDataPlatformsStep Upgrade Step..
    22:30:48.320 [pool-10-thread-2] INFO  c.linkedin.metadata.boot.UpgradeStep:43 - Waiting 120 seconds..
    22:30:49.089 [main] INFO  o.s.web.servlet.DispatcherServlet:547 - Completed initialization in 1393 ms
    2023-01-15 22:30:49.089:INFO:oejsh.ContextHandler:main: Started o.e.j.w.WebAppContext@6eda5c9{Open source GMS,/,[file:///tmp/jetty-0_0_0_0-8080-war_war-_-any-8600198876037324782/webapp/, jar:file:///tmp/jetty-0_0_0_0-8080-war_war-_-any-8600198876037324782/webapp/WEB-INF/lib/swagger-ui-4.10.3.jar!/META-INF/resources],AVAILABLE}{file:///datahub/datahub-gms/bin/war.war}
    2023-01-15 22:30:49.106:INFO:oejs.AbstractConnector:main: Started ServerConnector@4387b79e{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
    2023-01-15 22:30:49.107:INFO:oejs.Server:main: Started @61430ms
  • r

    red-waitress-53338

    01/15/2023, 10:41 PM
    Copy code
    22:30:49.907 [pool-10-thread-3] ERROR i.c.k.s.client.rest.RestService:267 - Failed to send HTTP request to endpoint: <https://schemaregistry.xxxxx.com:443/subjects/a54808-preprod-MetadataChangeLog_Versioned_v1-value/versions>
    javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
            at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
            at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:353)
            at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:296)
            at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:291)
            at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:654)
            at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:473)
            at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:369)
            at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
            at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443)
            at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421)
            at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:183)
            at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
            at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1506)
            at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1416)
            at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456)
            at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:427)
            at java.base/sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:572)
            at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:201)
            at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1367)
            at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1342)
            at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:246)
            at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:263)
            at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:351)
            at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:494)
            at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:485)
            at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:458)
            at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:206)
            at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:268)
            at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:244)
            at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:74)
            at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:59)
            at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
            at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:902)
            at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:862)
            at com.linkedin.metadata.dao.producer.KafkaEventProducer.produceMetadataChangeLog(KafkaEventProducer.java:145)
            at com.linkedin.metadata.entity.EntityService.produceMetadataChangeLog(EntityService.java:1284)
            at com.linkedin.metadata.entity.EntityService.emitChangeLog(EntityService.java:1049)
            at com.linkedin.metadata.entity.EntityService.ingestProposal(EntityService.java:893)
            at com.linkedin.metadata.entity.RetentionService.setRetention(RetentionService.java:113)
            at com.linkedin.metadata.boot.steps.IngestRetentionPoliciesStep.execute(IngestRetentionPoliciesStep.java:67)
            at com.linkedin.metadata.boot.BootstrapManager.lambda$start$0(BootstrapManager.java:44)
            at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
            at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:439)
            at java.base/sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:306)
            at java.base/sun.security.validator.Validator.validate(Validator.java:264)
            at java.base/sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313)
            at java.base/sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:222)
            at java.base/sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:129)
            at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:638)
            ... 40 common frames omitted
    Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
            at java.base/sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
            at java.base/sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
            at java.base/java.security.cert.CertPathBuilder.build(CertPathBuilder.java:297)
            at java.base/sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:434)
            ... 46 common frames omitted
    22:30:49.908 [pool-10-thread-3] ERROR c.l.metadata.boot.BootstrapManager:46 - Caught exception while executing bootstrap step IngestRetentionPoliciesStep. Continuing...
    org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
    Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
            at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
            at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:353)
            at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:296)
            at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:291)
            at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:654)
            at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:473)
            at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:369)
            at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392)
            at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443)
            at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421)
            at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:183)
            at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172)
            at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1506)
            at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1416)
            at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456)
            at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:427)
            at java.base/sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:572)
            at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:201)
            at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1367)
            at java.base/sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1342)
            at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:246)
            at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:263)
            at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:351)
            at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:494)
            at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:485)
            at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:458)
            at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:206)
            at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:268)
            at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:244)
            at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:74)
            at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:59)
            at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:62)
            at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:902)
            at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:862)
            at com.linkedin.metadata.dao.producer.KafkaEventProducer.produceMetadataChangeLog(KafkaEventProducer.java:145)
            at com.linkedin.metadata.entity.EntityService.produceMetadataChangeLog(EntityService.java:1284)
            at com.linkedin.metadata.entity.EntityService.emitChangeLog(EntityService.java:1049)
            at com.linkedin.metadata.entity.EntityService.ingestProposal(EntityService.java:893)
            at com.linkedin.metadata.entity.RetentionService.setRetention(RetentionService.java:113)
            at com.linkedin.metadata.boot.steps.IngestRetentionPoliciesStep.execute(IngestRetentionPoliciesStep.java:67)
            at com.linkedin.metadata.boot.BootstrapManager.lambda$start$0(BootstrapManager.java:44)
            at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at java.base/java.lang.Thread.run(Thread.java:829)
  • q

    quick-student-61408

    01/16/2023, 1:53 PM
    Hi all, The Prerequisites for datahub are : 2 CPUs, 8GB RAM, 2GB Swap area, and 10GB disk space. But what is your advises if i need to capture : 1100 tables near 2834 partitions and more later... There will be fewer than 50 users. Tell me if I need to focus on other factors to define the size of my machine? Thank you ! 🙂
    ✅ 1
    👀 1
    i
    • 2
    • 6
  • m

    microscopic-mechanic-13766

    01/16/2023, 4:16 PM
    Hello, I have been able to send expectations to Hive using datahub's great_expectations integration. As I thought that some people could be interested on doing the same and it is quite tricky to get it done, I thought I should share how I have done. First of all, I would like to point out that my Hive instance was a Kerberized Hive with HTTP as transport mode. Having said that, I had to install a few things first in the actions container:
    Copy code
    pip install kerberos 
    pip install thrift==0.13.0 
    pip install great-expectations==0.15.43
    I don't know if it would worker with a lower GE version, but I am sure that it has to be a version higher than 0.15.2 (that, if I am not mistaken is the version installed in such module by default). Then you would have to select the
    other
    option when creating the datasource (see image). Once the datasource was created, the connection_string looked like this in my case -->
    hive+<http://hiveserver1:10001/default?auth=KERBEROS&kerberos_service_name=hive-server>
    Before executing the datasource, you would have to make a kinit of a user in order to be able to obtain the TGT of your hive service. The rest is really similar to creating any other type of validation tests, so I will not go into detail.
    ✅ 1
    👀 1
    b
    a
    • 3
    • 4
  • p

    polite-actor-701

    01/17/2023, 5:36 AM
    Hi. I created a source that ingests data selected from oracle by referring to sql_common. A problem occurred while ingesting data using it. I don't know if the attached picture will look good, but there was a log saying to reduce max.poll.interval.ms, so I added consumerProps.setMaxPollRecords like the second picture. Is it right that I handled it well? When I set it to 200, I get the same error, so I set it to 50 and try to test again. but I still get same error. Please advise if you have any other opinions.
    ✅ 1
    👀 1
    a
    o
    • 3
    • 8
  • b

    better-orange-49102

    01/17/2023, 10:21 AM
    was wondering if anyone could share a sample of the snippet to mount a custom config yml in datahub actions container - i tried to mount the file in /etc/datahub/actions/conf/myfile.yml but it kept showing up as a directory in the container instead.
    ✅ 1
    i
    • 2
    • 7
  • r

    refined-tent-35319

    01/17/2023, 11:22 AM
    I had deployed datahub on aws using Amazon Eks, and that is having "http" as the url, can somebody help how I can add ssl certificate so that it can be converted to "https"
    b
    h
    • 3
    • 5
  • b

    boundless-piano-94348

    01/17/2023, 3:15 PM
    Hi Datahub team, I am deploying Datahub to production for my organization and prior to deployment, I need to do security readiness check. Can you shed a light whether each of the aspect below is supported by Datahub? If yes, can you provide pointers to code that handles those aspects? I would love to have a discussion session or explain more if needed. Thank you very much for your help!
    Copy code
    Input and Output
    1. Ensure that input data is strongly typed, validated, range or length checked, or at worst, sanitized or filtered.
    2. Ensure that validation happen not only on client side but also on the corresponding server side.
    3. Ensure that output encoding happen in the interpreter. This can be handled by the framework or manually.
    4. Ensure that database query is protected against SQL injection
    
    Cryptography
    1. Ensure that any cryptograpic keys used by the system are documented: key purpose, where it's stored, and key specifications.
    2. Ensure that any cryptograpic keys stored in a secured secret manager.
    3. Ensure that database encryption is enabled.
    4. When using any cryptographic algorithm, ensure that using secured and appropriate algorithm/keys.
    
    Errors, Logging, and Auditing
    1. Ensure that application use JSON format & send log to cloud logging.
    2. Ensure that PII data, authentication data, sensisive financial data is not being logged.
    3. Ensure that authentication events are logged properly (upon success, failed, and changes of authentication method such as password reset).
    4. Ensure that there is log event for sensitive transactions or admin actions.
    5. Ensure that event log has information on user who perform the action, time of event, type of event, source of event (IP client).
    6. Ensure that the application return a useful and generic error message to user instead of the verbose application error.
    7. Ensure that application fail securely. There is a "last resort" error handler to catch all unhandled exception.
    ✅ 1
    b
    • 2
    • 4
  • r

    red-waitress-53338

    01/18/2023, 12:01 AM
    Hi Everyone, I am trying to setup SSL connection between a PostgreSQL instance which is used as a backend database by datahub and GMS. I am using the GMS docker image, are there any env variables that I need to set or something else? Has anyone configured the datahub to connect to PostgreSQL using SSL and certificates? Thanks in advance.
    ✅ 1
    • 1
    • 2
  • g

    great-monkey-52307

    01/18/2023, 5:32 AM
    Hi Team I'm trying to integrate Azure Kubernetes with Azure Key vault to access the secrets. for instance, I'm trying to access database secret from the volume mount location instead of getting it from the secret ref, can I pass the value to get from volume mount location? password: # secretRef: mysql-secrets # secretKey: mysql-root-password # --------------OR---------------- value: /mnt/secrets-store/mysql-password Note: I tried to mount the volume in both pods (GMS and SQL setup job) and If I cat the file using below command, I'm able to read the password -->kubectl exec datahub-datahub-gms-pod -- cat /mnt/secrets-store/mysql-password , please see the screenshot of volumes attached to sql setup job which has the password
    b
    • 2
    • 3
  • f

    fast-ice-59096

    01/18/2023, 11:04 AM
    Hi, everyone,
1...313233...53Latest