https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • l

    little-waitress-21103

    02/15/2023, 7:38 AM
    Hi Team, Actually i have an issue. Here explain : we are unable to see the Metadata in datahub that we are triggered from open-api getting response 200 ok but we are unable to see metadata in datahub UI for that root cause is like "we are deleted the elastic search master pvc metadata" and tried with restarting the pods even issue same
    a
    • 2
    • 2
  • b

    billions-family-12217

    02/15/2023, 10:10 AM
    image.png
    ✅ 1
    i
    • 2
    • 1
  • m

    magnificent-engine-69382

    02/15/2023, 11:44 AM
    Hi team, i hope everything is fine. I'm trying to deploy Datahub on Kubernets (GKE) w/ external postgresql database (Cloud SQL). We are facing issues when enables SSL for secured connections with the database (Cloud SQL). Anyone had faced a similar problem or figured out how to setup SSL for DataHub DB? Some advices and documentations are welcome 😃
    ✅ 1
    m
    • 2
    • 1
  • m

    most-jackal-69587

    02/15/2023, 12:51 PM
    Dear all, sorry it this is an off topic. I wonder if we can use DataHub to manage research datasets. Most of them are in "csv" format, so we would need to read the header to infer the schema. I think this was asked before (issue #1552), but I can't find the link in the issue. It seems that "File" is the way to go, but it's not clear how to build the "json" file or if I can provide a sample of the dataset. Thank you.
    ✅ 1
    a
    • 2
    • 3
  • w

    white-horse-97256

    02/15/2023, 11:32 PM
    Hi Team, if we have our datahub tool deployed on Kubernetes , where should we deploy the recipe files for datahub to access them?
    ✅ 2
    👀 1
    i
    • 2
    • 2
  • s

    stocky-plumber-3084

    02/16/2023, 3:03 AM
    Question here for everyone, to deploy datahub with Kubernetes, can I use rancher for datahub instead of Minikube in local environment?
    ✅ 2
    👀 1
    i
    • 2
    • 1
  • f

    flat-painter-78331

    02/16/2023, 6:16 AM
    Hi guys. Good Day!! Is there any way i can find out the important metrics of datahub for alert creation? (for example, metrics to monitor elasticsearch service, datahub-actions pods, etc..). If anyone knows any useful metrics please help me on this.. Thanks!
    ✅ 2
    👀 1
    i
    • 2
    • 6
  • s

    square-football-37770

    02/16/2023, 6:55 AM
    Hi! Is there a way to tell
    datahub docker quickstart
    to NOT download and overwrite existing
    docker-compose.yaml
    file or, alternatively, how would I set/change ENV VARS using docker desktop on a Mac? Thanks
    ✅ 1
    b
    • 2
    • 6
  • s

    square-football-37770

    02/16/2023, 9:17 AM
    On minikube, I installed
    datahub
    and it worked fine. then I deleted the cluster and recreated everything, now I keep getting
    Copy code
    configure-sysctl" in pod "elasticsearch-master-0" not found for default/elasticsearch-master-0 (configure-sysctl)
    when installing the
    prerequisites
    , any idea what might be going on?
    ✅ 2
    👀 1
    i
    • 2
    • 15
  • f

    fierce-baker-1392

    02/16/2023, 3:02 PM
    Hi, when I deploy kafkaSetup using our owner kafka, the program can create topic, but the last line in script failed to be executed, what does this function do? Thanks.
    👀 1
    ✅ 1
    i
    • 2
    • 2
  • w

    white-horse-97256

    02/16/2023, 8:03 PM
    Hi team, We are trying to go to prod with datahub tool and have few questions about deployment and best practices: • Which approach is better for prod deployments? • If we want to use our in-house docker images for data hub dependencies like mysql, elastic search etc….what is the approach and where can we configure them, so that data hub tool uses our in-house resources
    ✅ 2
    a
    • 2
    • 7
  • a

    astonishing-cartoon-6079

    02/17/2023, 8:26 AM
    Hi team, We are trying to build docker image by run
    COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build
    , but it fails.
    ✅ 1
    👀 1
    i
    • 2
    • 5
  • f

    fierce-baker-1392

    02/17/2023, 11:02 AM
    Hi team, We are trying to change logo (in k8s), I mount new logo to gms service, but when redeploy service, return 404. Does anyone know the reason?
    i
    o
    • 3
    • 6
  • v

    victorious-spoon-76468

    02/17/2023, 1:35 PM
    Hi team! I’m currently testing changing my datahub database from the default Mysql to a RDS for postgres. Though the
    postgresql-setup-job
    runs just fine and creates the
    metadata_aspect_v2
    on the database, when the GMS pod runs it crashes with the following error:
    Copy code
    13:23:22.927 [pool-9-thread-1] ERROR c.d.authorization.DataHubAuthorizer:229 - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
    javax.persistence.PersistenceException: Query threw SQLException:ERROR: relation "metadata_aspect_v2" does not exist
      Position: 94 Bind values:[urn:li:dataHubPolicy:7, dataHubPolicyKey, 0, urn:li:dataHubPolicy:7, dataHubPolicyInfo, 0] Query was:select urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ? UNION ALL SELECT urn, aspect, version, metadata, systemMetadata, createdOn, createdBy, createdFor FROM metadata_aspect_v2 WHERE urn = ? AND aspect = ? AND version = ?
    	at io.ebean.config.dbplatform.SqlCodeTranslator.translate(SqlCodeTranslator.java:52)
    	at io.ebean.config.dbplatform.DatabasePlatform.translate(DatabasePlatform.java:219)
    	at io.ebeaninternal.server.query.CQueryEngine.translate(CQueryEngine.java:149)
    	at io.ebeaninternal.server.query.DefaultOrmQueryEngine.translate(DefaultOrmQueryEngine.java:43)
    	at io.ebeaninternal.server.core.OrmQueryRequest.translate(OrmQueryRequest.java:102)
    	at io.ebeaninternal.server.query.CQuery.createPersistenceException(CQuery.java:702)
    	at io.ebeaninternal.server.query.CQueryEngine.findMany(CQueryEngine.java:411)
    	at io.ebeaninternal.server.query.DefaultOrmQueryEngine.findMany(DefaultOrmQueryEngine.java:133)
    	at io.ebeaninternal.server.core.OrmQueryRequest.findList(OrmQueryRequest.java:459)
    	at io.ebeaninternal.server.core.DefaultServer.findList(DefaultServer.java:1596)
    	at io.ebeaninternal.server.core.DefaultServer.findList(DefaultServer.java:1574)
    	at io.ebeaninternal.server.querydefn.DefaultOrmQuery.findList(DefaultOrmQuery.java:1481)
    	at com.linkedin.metadata.entity.ebean.EbeanAspectDao.batchGetUnion(EbeanAspectDao.java:360)
    	at com.linkedin.metadata.entity.ebean.EbeanAspectDao.batchGet(EbeanAspectDao.java:280)
    	at com.linkedin.metadata.entity.ebean.EbeanAspectDao.batchGet(EbeanAspectDao.java:261)
    	at com.linkedin.metadata.entity.EntityService.exists(EntityService.java:1624)
    	at com.linkedin.metadata.shared.ValidationUtils.lambda$validateSearchResult$0(ValidationUtils.java:34)
    	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176)
    	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
    	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
    	at com.linkedin.metadata.shared.ValidationUtils.validateSearchResult(ValidationUtils.java:35)
    	at com.linkedin.metadata.client.JavaEntityClient.search(JavaEntityClient.java:300)
    	at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:50)
    	at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:42)
    	at com.datahub.authorization.DataHubAuthorizer$PolicyRefreshRunnable.run(DataHubAuthorizer.java:222)
    	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    	at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    	at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: org.postgresql.util.PSQLException: ERROR: relation "metadata_aspect_v2" does not exist
      Position: 94
    	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
    	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365)
    	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355)
    	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
    	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
    	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
    	at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
    	at io.ebean.datasource.pool.ExtendedPreparedStatement.executeQuery(ExtendedPreparedStatement.java:136)
    	at io.ebeaninternal.server.query.CQuery.prepareResultSet(CQuery.java:376)
    	at io.ebeaninternal.server.query.CQuery.prepareBindExecuteQueryWithOption(CQuery.java:324)
    	at io.ebeaninternal.server.query.CQuery.prepareBindExecuteQuery(CQuery.java:319)
    	at io.ebeaninternal.server.query.CQueryEngine.findMany(CQueryEngine.java:384)
    	... 29 common frames omitted
    Any idea why this might be happening?
    👀 1
    ✅ 1
    i
    • 2
    • 11
  • m

    miniature-xylophone-2277

    02/17/2023, 6:07 PM
    Hi Team, I deployed datahub on GCP & Cloud run. It has been successfully deployed and I was able to digest some BQ and other DB tables and add removing ownership/tags etc, through CLI. Now, when I try to digest from UI, I see 'N/A'. I follow the steps, and checked everything to make sure
    datahub-actions
    container is not down. So I was wondering how can I trace back to debug what is the issue. Thanks in advance for your help.
    ✅ 1
    👀 1
    i
    • 2
    • 7
  • w

    white-horse-97256

    02/17/2023, 11:00 PM
    Hi Team, in k8 deployment https://datahubproject.io/docs/deploy/kubernetes instead of using
    helm install prerequisites datahub/datahub-prerequisites
    this command is there we can configure datahub to use mysql server hosted on-prem in our org?
    ✅ 1
    s
    b
    o
    • 4
    • 11
  • r

    red-waitress-53338

    02/18/2023, 11:15 PM
    Hi, We have successfully deployed DataHub on GCP CloudRun. We deployed the gms and the frontend docker images on CloudRun, and for the 3 dependencies, Kafka, MySQL, and ElasticSearch we are using the clusters already created and managed by our IT department. We are able to ingest the datasets using the DataHub CLI. Unfortunately, we are not able to ingest anything from the UI. I went through the DataHub's documentation, the documentation says to use datahub-actions framework for ingesting through the UI, is that correct? If that is correct then next question is how we can point the gms or the frontend container to the datahub-actions framework in order for the ingestion to work? I mean is there any environment variable for the gms or the frontend container which we can use for the binding?
    ✅ 1
    r
    o
    • 3
    • 15
  • b

    bright-receptionist-94235

    02/21/2023, 8:45 AM
    Hey, Is it possible to add sudo access to datahub user to the image?
    ✅ 2
    b
    • 2
    • 3
  • g

    gifted-diamond-19544

    02/21/2023, 11:24 AM
    Hello all. I am currently having a problem on my Athena ingestion. I have Datahub deployed on AWS ECS, and the Athena ingestion (set up via the UI) is failing because of storage space (full tracelog in the comments). If I trigger the Athena ingestion manually once, after it failed, it runs successfully. But if I trigger the ingestion a second time it will fail. I read in other threads that this might be related to the fact that the container does not have enough storage to write the logs. I suppose I could try to increase the the container storage, but I think this would just delay the problem, since the ingestion would fail again in the future when the storage is full. How can I handle the log cleaning? Ideally, I would clear up the logs every time the ingestion runs, since the logs are being stored on the CloudWatch anyways. Thank you!
    ✅ 1
    d
    • 2
    • 5
  • f

    fierce-guitar-16421

    02/21/2023, 11:57 AM
    Dear Community and Deployment Pros, Recently I’m trying to build the datahub-frontend image myself. But the Gradle task
    docker
    fails and complains that the flag
    --load
    is unknown. It looks like the palantir docker plugin translates the task option
    load(true)
    into the flag
    --load
    to the underlying docker CLI but then the underlying docker does not recognize it. (See following pic.) Any thought if this could be fixed, or am I doing something wrong? Thanks! My setup: • Machine: Mac 2019 Intel • Docker version: 20.10.23
    ✅ 1
    • 1
    • 1
  • s

    shy-dog-84302

    02/21/2023, 7:05 PM
    Hi! Why do we need 90 days retention period for this backend Kafka topic (
    METADATA_CHANGE_LOG_TIMESERIES_TOPIC_NAME=MetadataChangeLog_Timeseries_v1
    ) referred here unlike other Kafka topics(which is 7 days)? And what is the consequence of limiting this also to 7 days?
    ✅ 1
    o
    • 2
    • 2
  • e

    eager-electrician-64984

    02/22/2023, 8:16 AM
    Hi guys! I'm facing a problem. I'm trying to have my "datahub-frontend" available through URL (<<domain-name>.com/datahub>) with ingress (nginx), but unfortunately the App is overwriting anything after "/" ("/datahub"), which causes that the page have no content at all (blank). Do you have any idea how can I make it accessible with my domain and the "/"? To be clear, the page can be accessed when there is nothing after slash "/", but it is not a solution for me to leave it like this. Thank you for your time :)
    o
    • 2
    • 3
  • b

    blue-microphone-24514

    02/23/2023, 3:45 PM
    Hi there. I'm deploying DH on AWS EKS through latest Helm chart. Facing two issues. First is SSO via Azure AD, it's KO. Frontend pod logs says it's missing email. Triple checked with the doc, everything is right (tenant id / client id / client secret are good, Azure AD app permissions microsoft graph on email / openid / profile / User.Read are there, Helm's
    datahub-frontend.oidcAuthentification.scope
    is explicitely set to the recommended openid profile email ...
    a
    o
    c
    • 4
    • 9
  • b

    blue-microphone-24514

    02/23/2023, 3:48 PM
    Second issue is with datahub's GMS, seems it can parse a port, I guess it's related to DATAHUB_GMS_PORT=tcp://some_ip:8080, not sure though, couldn't find an incoherent setting here (using the defaults / haven't set any port anywhere)
    o
    • 2
    • 7
  • b

    blue-microphone-24514

    02/23/2023, 3:49 PM
    Thanks in advance for any help !
  • b

    brief-oyster-50637

    02/23/2023, 10:55 PM
    Hi there. We’ve deployed DataHub to a simple GCE instance (Google Cloud VM) using
    quickstart
    , in order to validate DataHub. Now we want to gradually improve this infrastructure to be more reliable, so we thought of keeping most part of the quickstart setup and start off by just migrating the database to a managed mysql service (Google Cloud SQL). So we’ve configured the docker-compose file to point to this instance instead of running the mysql container in the same VM. However we’ve been experiencing some issues in the db initialization (e.g. it doesn’t create the datahub users, not even the user “datahub”). But instead of asking about how to solve this initialization problem, I’d like to take a step back and ask if anyone has already tried this type of deployment in production: quickstart + managed mysql service. Does it make sense trying to make this setup work for a light production deployment? Are there any major concern related to this setup? Thank you!
    👀 1
    ✅ 1
    o
    • 2
    • 7
  • r

    rough-island-44285

    02/24/2023, 5:59 AM
    Hi there. I'm new to datahub and now using
    ./gradlew build
    trying to get get datahub deployed locally. But it has been running for more than hour, can anyone share your experience, how long would it take normally. Any help would be much appreciated.
    ✅ 1
    f
    • 2
    • 1
  • l

    limited-forest-73733

    02/24/2023, 6:21 AM
    Hey team any ETA of new release 0.10.x ?
    a
    • 2
    • 5
  • f

    flat-painter-78331

    02/24/2023, 8:09 AM
    Hi team 🙂 Has anyone created alerting rules for Datahub on Prometheus?
    a
    • 2
    • 3
  • g

    gifted-diamond-19544

    02/24/2023, 9:19 AM
    Hello all! How could I get a list of all the Datahub queries that were made to Datahub via Graphql, with the corresponding user? Thank you 🙂
    ✅ 1
    o
    • 2
    • 1
1...353637...53Latest