https://datahubproject.io logo
Join SlackCommunities
Powered by
# all-things-deployment
  • r

    rapid-house-76230

    08/02/2022, 1:38 AM
    Hey all, I set up a separate Elastic Search instance on AWS to use with Datahub, but when I use its host:port I got the following error on the GMS pod:
    Copy code
    2022/08/02 01:34:44 Received 400 from <http://search-datahub-elasticsearch-foobar.us-west-2.es.amazonaws.com:443>. Sleeping 1s
    I used
    ELASTICSEARCH_HOST
    and
    ELASTICSEARCH_PORT
    env variables in the GMS deployment (I used Kustomize). Any idea?
    b
    a
    • 3
    • 19
  • s

    steep-winter-48926

    07/22/2022, 5:43 AM
    Hi all! I'm doing some experimentation with DataHub. Is there any easy way to have helm chart only deploy datahub pods if the trying to get DataHub to run against hosted ElasticSearch, Postgres (or MySQL), and Kafka? For various reasons, I have to deploy via terraform so it might be easiest just to load the docker required docker containers into ECS using Terraform, setup ALB using Terraform, and then point everything at the right place instead of trying to use Helm. Feels a little bit off the beaten path so wanted to get a sense of whether this is sensible before I go down this route.
    b
    • 2
    • 11
  • b

    billions-twilight-48559

    12/23/2021, 7:40 AM
    Hi! there are any upgrade guide for helm deployments? assuming I just want to upgrade one version? Thanks
    p
    d
    a
    • 4
    • 7
  • a

    ancient-apartment-23316

    07/29/2022, 5:15 PM
    Hi, I tried to enable the internal load balancer for service datahub-frontend. I downloaded the acryldata/datahub-helm repository locally, enabled annotation in subchart
    aws-load-balancer-internal: "true"
    here https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/subcharts/datahub-frontend/values.yaml#L48 and tried to install from local folder
    helm install datahub .
    But I am getting an error message
    Error: INSTALLATION FAILED: Chart.yaml file is missing
    Chart.yaml file is in place, I downloaded the full repository acryldata/datahub-helm locally Please advise how best to install datahub-frontend k8s service with enabled annotation
    <http://service.beta.kubernetes.io/aws-load-balancer-internal|service.beta.kubernetes.io/aws-load-balancer-internal>: "true"
    b
    s
    • 3
    • 7
  • l

    lemon-terabyte-66903

    07/19/2022, 2:57 AM
    Hello guys, it looks like with the pip version change, some plugins of pypi package are taking a lot of time to install. Issue is something like this: https://github.com/pypa/pip/issues/9215 New pip resolver takes a lot of time to install all deps everytime it is installed on a fresh cluster. For example, it took more than 30 minutes to install
    acryl-datahub[s3]
    on a databricks cluster and hence got timed out. Can the devs make a fix to this?
    g
    g
    • 3
    • 4
  • a

    average-vr-23088

    08/02/2022, 4:48 PM
    Hi, question about kafka topic retention. I noticed by default we get a 90 days retention for the “audit” log. Is this log required when replaying data into Elastic Search? In other words, after 90 days have passed, would we be unable to replay some subset of data into Elastic Search (in the event we lose elastic search).
    s
    • 2
    • 3
  • f

    faint-translator-23365

    08/02/2022, 7:10 PM
    Hi, I want to use ldap just for authentication and while doing so I want to retrieve the user attributes(email, username etc) from ldap server, which module should I use in my jaas.conf? I used com.sun.security.auth.module.LdapLoginModule and also org.eclipse.jetty.server.server.plus.jaas.spi.LdapLoginModule but these modules doen't have the option to retrieve those user attributes. Can anyone please help and share the sample configuration if possible, thanks!
    l
    • 2
    • 2
  • b

    bland-orange-13353

    08/02/2022, 10:27 PM
    This message was deleted.
  • w

    wonderful-author-3020

    08/03/2022, 12:00 PM
    Hello, we're running a sorta bespoke deployment of DataHub (we unpacked the https://github.com/datahub-project/datahub/blob/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml file as separate containers, managed in terraform). We're currently at version 0.8.18. Do we have to run the
    datahub-upgrade
    container to upgrade to the newest version?
    b
    • 2
    • 3
  • q

    quiet-wolf-56299

    08/03/2022, 5:41 PM
    We are looking at bringing on Datahub, However in our prod environment at a minimum we want to be behind our SSO implementation. However that implementation only supports SAML and CAS. Does anyone have an experience or advice for where to start thinking about authentication using a SAML based SSO implementation? We are planning OIDC support eventually but that go-live date will be well outside the timeframe that we would need datahub up and running
    i
    • 2
    • 2
  • r

    rapid-house-76230

    08/03/2022, 6:02 PM
    I’m getting the following errors on my GMS pod logs. Any idea to how do I get more info about what these are?
    Copy code
    2022/08/03 21:35:16 Connected to <tcp://boot-foobar.c1.kafka-serverless.us-west-2.amazonaws.com:9098>
    2022/08/03 21:35:16 Connected to <tcp://datahub.cluster-foobar.us-west-2.rds.amazonaws.com:5432>
    2022/08/03 21:35:16 Received 200 from <https://datahub:Datahub%21123@search-datahub-elasticsearch-foobar.us-west-2.es.amazonaws.com:443>
    2022/08/03 21:35:17 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
    2022/08/03 21:35:18 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
    2022/08/03 21:35:19 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
    2022/08/03 21:35:20 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
    2022/08/03 21:35:21 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
    2022/08/03 21:35:22 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
    2022/08/03 21:35:23 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
    2022/08/03 21:35:24 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
    2022/08/03 21:35:25 Problem with request: Get http:: http: no Host in request URL. Sleeping 1s
    I have the envs variables on my GMS deployment as follows
    Copy code
    - KAFKA_SCHEMAREGISTRY_URL=<http://schema-registry:8081>
          - KAFKA_BOOTSTRAP_SERVER=<http://boot-foobar.kafka-serverless.us-west-2.amazonaws.com:9098|boot-foobar.kafka-serverless.us-west-2.amazonaws.com:9098>
          - ALLOW_PLAINTEXT_LISTENER=yes
          - ZOOKEEPER_CLIENT_PORT=2181
          - ZOOKEEPER_TICK_TIME=2000
          - KAFKA_BROKER_ID=1
          - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
          - KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
          - KAFKA_ADVERTISED_LISTENERS=<PLAINTEXT://broker:29092>,PLAINTEXT_<HOST://localhost:9092>
          - KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
          - KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0
          - KAFKA_HEAP_OPTS=-Xms256m -Xmx256m
    Is there any config error or anything that you could think of on my end?
    b
    i
    • 3
    • 22
  • a

    average-vr-23088

    08/03/2022, 7:13 PM
    Hi, is there an official way to backup and restore data from one DataHub instance to another? The way i’m thinking about is from this page: • snapshot the tables from the existing instance
    datahub.metadata_aspect_v2
    • restore tables to new instance • run restore indices on new instance Would be nice if there was a backup / restore high-level command.
    o
    r
    • 3
    • 9
  • h

    high-summer-78960

    08/03/2022, 8:56 PM
    Is there any guidance on using GCP IAP as a source of AuthN?
    i
    • 2
    • 2
  • m

    microscopic-mechanic-13766

    08/04/2022, 8:09 AM
    Good morning team, so I am trying to change some files of the datahubfront service. My "problem" is that I can't find the directory inside of the service's container where those same files are. I have been reading its Dockerfile but haven't found much. Could someone please point the directory/directories out?? Thanks in advance!
    i
    • 2
    • 9
  • j

    jolly-yacht-10587

    08/05/2022, 3:39 AM
    Hi, I’m running DataHub on GKE as a single production instance and I was wondering if we can separate dataset project into something like PRD, DEV on DataHub UI? so when we test loading metadata into DataHub we can load it to dataset “dev”. Is this possible? if not I’ll deploy another instance as a development one and load metadata to this instance instead or do you have any suggestion for this? Thanks.
    ✅ 1
    b
    • 2
    • 5
  • a

    average-vr-23088

    08/05/2022, 5:03 PM
    Hi, my ingestion runs which were set up using the UI are failing with the following error:
    Copy code
    Unable to emit metadata to DataHub GMS
    401 Client Error: Unauthorized for url: <http://datahub-gms.datahub.local:8080/aspects?action=ingestProposal>
    I’m running DataHub v0.8.40 and DataHub Actions v0.0.4. The above log is showing up in the DataHub actions container. I have configured the DataHub Actions container with the client id and secret for GMS as well. It is worth noting that this setup was working previously but I migrated data from one DataHub deployment to another deployment. They both had the same version of containers but the client id / secrets and token generation secrets differed between the two. After the migration, which involved copying the metadata v2 table and ReIndexing, i’m getting ingestion failures. I’ve also tried deleting the ingestion as well as the secrets it references and recreating them with no success. I noticed another warning in the logs:
    Copy code
    ❗Client-Server Incompatible❗ Your client version 0.8.38.2 is older than your server version 0.8.40. Upgrading the cli to 0.8.40 is recommended
    I’m guessing the latest release of the actions container doesn’t use the up to date CLI version? Would i need to set the
    UI_INGESTION_DEFAULT_CLI_VERSION=0.8.38.2
    in GMS?
    i
    b
    t
    • 4
    • 19
  • m

    microscopic-mechanic-13766

    08/08/2022, 9:50 AM
    Good morning everyone, so in an attempt of better understanding the guts of Datahub, I have encountered with some env variables that don't know what they are used for but they have to be created. Some examples of said variables would be
    DATAHUB_APP_VERSION
    ,
    DATAHUB_SECRET
    ,
    DATAHUB_SYSTEM_CLIENT_SECRET
    , .... I have been trying to look in the site of the project, but haven't been lucky. Any help would be apreciated!
    a
    • 2
    • 3
  • a

    acceptable-baker-8114

    08/08/2022, 11:11 AM
    Hello all, I’m trying to get the Token based authentication enabled, I’ve updated my helm values.yaml file with
    Copy code
    datahub:
              metadata_service_authentication:
                enabled: true
    I can see this when I run helm get values datahub but it still comes up with the error below when I try and generate a token, is there something else I need to do?
    Copy code
    Token based authentication is currently disabled. Contact your DataHub administrator to enable this feature.
    i
    p
    +2
    • 5
    • 19
  • m

    microscopic-mechanic-13766

    08/09/2022, 12:22 PM
    Good afternoon, I have one doubt: Is it normal for the gms to have this kind of maximum of CPU usage during initialization? The actions service also has high maximums of CPU use when the ingestion is done, but they are not this extremely high (100-300%). My current GMS version is v0.8.42 and the actions is
    acryldata/datahub-actions:v0.0.4
    . I haven't specify any type of limit either to the CPU they can use or to the memory they can use.
    b
    • 2
    • 1
  • b

    bland-orange-13353

    08/10/2022, 9:04 AM
    This message was deleted.
    teamwork 1
    s
    n
    • 3
    • 4
  • d

    damp-greece-27806

    08/10/2022, 6:34 PM
    Hi! Is there a hook anywhere in the deployment process to install additional pip modules? Like writing a requirements-extra.txt file that any of the bootstrap scripts or application code would look for and inject as part of the installation process?
    b
    • 2
    • 36
  • a

    astonishing-lizard-90580

    08/10/2022, 8:20 PM
    Hey folks, I've been working a way for small teams with small budgets, non-profits, academia etc. to deploy DataHub in as simple a way as possible. If you're in that boat, I've set up a guide: https://github.com/languageconvo/datahub-deployed Some quick notes: • Read the readme 🙂 You should probably use Acryl Data's managed product when it's available! • Cost is ~ $150 - $180/month although you might be able to get away with ~$100/month • Non-kubernetes. Having devops experience will be needed though, we use AWS RDS, Elasticsearch, and docker-compose quickstart • Would love feedback from anyone that has time and knows what they're doing (as opposed to us/this being our first time every deploying DataHub) -- is what we come up with an OK long-term strategy, or are we going to run into major problems? Thanks!
    datahubbbb 3
    🙌 6
    thanks bear 2
    b
    c
    t
    • 4
    • 4
  • a

    ancient-apartment-23316

    08/11/2022, 12:52 PM
    Hi team! I’m trying to integrate OKTA + Datahub, I’v done all steps from documentation https://datahubproject.io/docs/authentication/guides/sso/configure-oidc-react-okta But it doesn’t work for some reason I can access the datahub just like before, so nothing has changed The documentation says:
    To do so, you must update the datahub-frontend docker.env file with the values received from your identity provider:
    the thing is I have installed the datahub on kubernetes (EKS) using helm and AWS managed services I can’t edit the docker.env file Instead, I manually edit k8s deployment.apps/datahub-datahub-frontend and put the envs (AUTH_OIDC_ENABLED, AUTH_OIDC_CLIENT_ID, etc.) there. The new frontend pod is ready and contains all the envs. But redirection to okta does not work, I still have access to the datahab, w/o okta
    i
    c
    • 3
    • 16
  • f

    fancy-thailand-73281

    08/11/2022, 8:20 PM
    Hi All, We deployed the datahub
    v0.8.36
    in AWS EKS cluster with helm charts, we are using AWS MSK (KAFKA), with as a Bootstrap servers :*SASL/SCRAM* and ZooKeeper TLS. Everything works fine till here. But we are not able to Ingest data(SNOWFLAKE) from UI , I see 'N/A' when I try to run ingestion. We found in Datahub docs(https://datahubproject.io/docs/ui-ingestion/) saying that we need to enable
    datahub-actions
    , then we deployed the public.ecr.aws/datahub/acryl-datahub-actions pods. The pod is not running(CrashLoopBackOff) and we see error logs shows as below
    Copy code
    [2022-08-11 19:56:59,004] ERROR    {datahub.entrypoints:138} - File "/usr/local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 77, in run                                                                                                    │
    │     67   def run(config: str, dry_run: bool, preview: bool, strict_warnings: bool) -> None:                                                             
    
    
                                                                                                                                                                                                                                                             │
    │ KafkaException: KafkaError{code=_INVALID_ARG,val=-186,str="Failed to create consumer: No provider for SASL mechanism GSSAPI: recompile librdkafka with libsasl2 or openssl support. Current build options: PLAIN SASL_SCRAM OAUTHBEARER"}                  │
    │ 2022/08/11 19:56:59 Command exited with error: exit status 1
    could someone please help us and thanks in Advance
  • e

    elegant-article-21703

    08/12/2022, 10:04 AM
    Hi everyone, I'm performing an update of the current release (using v0.8.41) but, during the upgrade it fails. How can I extract extra information about the issue that might be triggering the failure? The
    --debug
    message extract the following:
    Copy code
    wait.go:225: [debug] Deployment is not ready: datahub/datahub-datahub-frontend. 0 out of 1 expected pods are ready
    upgrade.go:360: [debug] warning: Upgrade "datahub" failed: timed out waiting for the condition
    upgrade.go:378: [debug] Upgrade failed and atomic is set, rolling back to last successful release
    history.go:53: [debug] getting history for release datahub
    rollback.go:64: [debug] preparing rollback of datahub
    rollback.go:112: [debug] rolling back datahub (current: v14, target: v13)
    rollback.go:71: [debug] creating rolled back release for datahub
    and the command I'm using is (I'm updating the root password):
    Copy code
    helm upgrade  -n datahub --atomic --debug  datahub  ./helm/datahub --values .\helm\datahub\charts\datahub-frontend\values.yaml
    Thank you in advance!
    o
    • 2
    • 5
  • g

    great-account-95406

    08/16/2022, 5:05 AM
    Hi everyone! Is there a way to authorize DataHub in Kafka Schema Registry using username/password? Can’t find any solution in the docs.
    o
    • 2
    • 2
  • e

    eager-terabyte-73886

    08/16/2022, 9:00 PM
    Hi, I am completely new to datahub. I set it up locally and ingested some data (created a .yaml file ). N ow I want to know how the backend is working, like where stuff is being stored , the databases, microservices etc. But I have no clue about any of this rn. Is there some resource that can help me understand ? I really want to know what code files are doing what..in my local system. If anyone can help out/guide, I'd be really grateful. Also, I know pretty much know nothing about datahub's working so any resource/doc/video that could help me get an understanding?
    o
    • 2
    • 1
  • s

    silly-oil-35180

    08/17/2022, 2:51 AM
    Hello, I set up Datahub and write some descriptions about tables and schema. Markdown is supported when I write some descriptions. But, I want to change color, font and size in text like html web pages. Is there anyway to use html or to change color,font, size in description text?
    b
    • 2
    • 5
  • t

    thousands-solstice-2498

    08/17/2022, 7:22 AM
    *kube:*Error: parse error in "datahub/templates/datahub-encryption-secrets***redacted*** template: datahub/templates/datahub-encryption-secrets***redacted*** function "lookup" not defined
    p
    o
    • 3
    • 7
  • t

    thousands-solstice-2498

    08/17/2022, 7:22 AM
    Someone please advise above issue.
1...192021...53Latest