https://datahubproject.io logo
Join Slack
Powered by
# all-things-deployment
  • g

    gifted-diamond-19544

    10/21/2022, 9:13 AM
    Good morning all. We have deployed Datahub on ECS + Managed service on AWS. Currently we are running the GMS, React Front End and Actions containers on ECS, but our security team is giving the following warning:
    Copy code
    ECS.5 ECS containers should be limited to read-only access to root filesystemsThis control checks if ECS containers are limited to read-only access to mounted root filesystems. This control fails if the ReadonlyRootFilesystem parameter in the container definition of ECS task definitions is set to 'false'.
    We tried to enable read only access to Root filesystems, but the containers do not run. Is there anyway we can fix this? Thank you!
    s
    • 2
    • 2
  • m

    microscopic-mechanic-13766

    10/21/2022, 11:43 AM
    Hello, so I have made a few ingestions on Datahub. When searched
    *
    to obtain all the things that have been created/ingested into Datahub and I saw this thing. Is this supposed to be like this? Thanks in advance!
    s
    • 2
    • 5
  • m

    microscopic-mechanic-13766

    10/21/2022, 12:04 PM
    Hello again, I have another doubt but this time it related to permissions. I have disabled all the default policies but for the
    Asset Owners - Metadata Policy
    which grants all metadata privileges ONLY for assets owners. I also have created one group, which has 2 users inside: one with read role and the other with edit role. After adding the group as the owner of a dataset, the read user is able to add glossary terms, domains, .... Is this correct?? Shouldn't roles be over policies??
    s
    • 2
    • 5
  • b

    best-umbrella-88325

    10/21/2022, 1:36 PM
    Hi All. We've deployed Datahub on EKS, which has given us 2 classic load balancers, one for GMS and the other for Frontend. However, while ingesting metadata from the UI, it fails every time. Nothing major in the logs. We provided the host of the LB which hosts GMS, but that didn't help either. It works from the CLI though, when I configure the GMS host using datahub init. Are we missing something? Thanks in advance. This is the recipe that works from the CLI but not from the UI. Error logs attached as part of thread.
    Copy code
    sink:
        type: datahub-rest
        config:
            server: '<http://a35f8626d7XXXXXbeec24fdaa5720-XXX.us-west-1.elb.amazonaws.com:8080/>'
    source:
        type: s3
        config:
            path_spec:
                include: '<s3://XX-bkt/*.*>'
            platform: s3
            aws_config:
                aws_access_key_id: XXXXXXX
                aws_region: us-west-1
                aws_secret_access_key: XXXXXXXXX
    pipeline_name: 'urn:li:dataHubIngestionSource:f751376f-ec1a-4dee-a71f-7f4f96c3cdda'
    • 1
    • 1
  • n

    numerous-bird-32188

    10/21/2022, 2:13 PM
    Hi are there any recommendations for storage types and size for datahub but with SQL,ES and Kafka in the cluster and externally using RDS,OS and MSK? Im trying to work out how to size the resources.
    s
    • 2
    • 7
  • h

    helpful-librarian-40144

    10/24/2022, 3:37 AM
    how to deploy in k8s just using gms and front-end service,User external Mysql,ElasticSearch,Neo4j etc ?
    b
    • 2
    • 1
  • b

    bland-orange-13353

    10/24/2022, 7:24 AM
    This message was deleted.
    i
    • 2
    • 1
  • m

    microscopic-mechanic-13766

    10/24/2022, 10:59 AM
    Good morning everyone, I integrated Airflow with Datahub a few versions ago (round version 0.8.44) and it was working just fine. Today I have tried installing the version 0.8.45 of airflow's plugin but I have been getting the following error:
    Copy code
    Traceback (most recent call last):
      File "/home/airflow/.local/bin/airflow", line 5, in <module>
        from airflow.__main__ import main
      File "/home/airflow/.local/lib/python3.7/site-packages/airflow/__init__.py", line 35, in <module>
        from airflow import settings
      File "/home/airflow/.local/lib/python3.7/site-packages/airflow/settings.py", line 35, in <module>
        from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401
      File "/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py", line 1187, in <module>
        conf.validate()
      File "/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py", line 224, in validate
        self._validate_config_dependencies()
      File "/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py", line 267, in _validate_config_dependencies
        raise AirflowConfigException(f"error: cannot use sqlite with the {self.get('core', 'executor')}")
    airflow.exceptions.AirflowConfigException: error: cannot use sqlite with the CeleryExecutor
    After doing some testing I have found that the source of the errors might be that on such version, sqlalchemy's version is downgraded to 1.3.24. Is it done for a certain reason?
    Copy code
    Collecting sqlalchemy==1.3.24
      Downloading SQLAlchemy-1.3.24-cp37-cp37m-manylinux2010_x86_64.whl
    ......
    Attempting uninstall: sqlalchemy
        Found existing installation: SQLAlchemy 1.4.9
        Uninstalling SQLAlchemy-1.4.9:
          Successfully uninstalled SQLAlchemy-1.4.9
    I am using Airflow 2.3.2
    i
    d
    g
    • 4
    • 6
  • c

    curved-magazine-23582

    10/24/2022, 6:01 PM
    hello, DataHub team, great job all along, and we are getting more interested in DataHub as data catalog solution, I signed up for managed hosting. Now wondering if there is rough timeline when that option / pricing would be avail.
    c
    • 2
    • 1
  • r

    rhythmic-judge-41554

    10/24/2022, 6:17 PM
    Hello. We are looking very closely at DataHub and was looking to evangelize with some show and tell using QuickStart. I see Mac M1 arm64 containers aren't built yet.
    no matching manifest for linux/arm64/v8 in the manifest list entries
    Any chance of supporting those in the future. I can get around this myself, so this is just an FYI and a question.
    i
    m
    • 3
    • 13
  • l

    late-insurance-69310

    10/25/2022, 2:16 PM
    is there any recommendation on running a production based instance using ec2 and nginx instead of kubernetes
  • m

    microscopic-mechanic-13766

    10/26/2022, 8:37 AM
    Good morning everyone, I have a deployment of Datahub that uses v0.8.45. Sometimes it takes too long to load the information shown in pages like the landing page. After some debugging I have found that the main problem is that the GraphQL queries take too long (as shown in the picture). What would be a work-around to make it faster??
    i
    b
    • 3
    • 56
  • g

    glamorous-wire-83850

    10/27/2022, 10:58 AM
    Hello everyone, I am try implement management processes for controlling changes at descriptions/tags/etc. Is there any logging for that to monitor the changes?
    i
    • 2
    • 1
  • f

    full-apple-16103

    10/27/2022, 1:46 PM
    Hey everyone, I am deploying Datahub on aws ect2 using docker quick-start and followed the docs:https://datahubproject.io/docs/quickstart , I’m trying to run the final command :
    Copy code
    datahub docker quickstart
    I get the following: [ec2-user@ip- ~]$ datahub version DataHub CLI version: 0.9.0.4 Python version: 3.7.10 (default, Jun 3 2021, 000201) [GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] [ec2-user@ip- ~]$ datahub docker quickstart No Datahub Neo4j volume found, starting with elasticsearch as graph service. To use neo4j as a graph backend, run
    datahub docker quickstart --quickstart-compose-file ./docker/quickstart/docker-compose.quickstart.yml
    from the root of the datahub repo Fetching docker-compose file https://raw.githubusercontent.com/datahub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml from GitHub Pulling docker images... unknown shorthand flag: ‘f’ in -f See ‘docker --help’. Error while pulling images. Going to attempt to move on to docker compose up assuming the images have been built locally
    i
    • 2
    • 29
  • b

    bitter-dog-24903

    10/27/2022, 6:01 PM
    Hello @gray-shoe-75895 I included the elasticsearch-setup container, which succeeded deployment. When I try to login to datahub, I am again getting the same error: Failed to log in!
    SyntaxError: Unexpected token '<', " <!DOCTYPE "... is not valid JSON
  • b

    bitter-dog-24903

    10/27/2022, 6:01 PM
    The datahub-gms logs are as below:
  • b

    bitter-dog-24903

    10/27/2022, 6:02 PM
    com.datahub.util.exception.ESQueryException: Search query failed:
    at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:73)
    at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:100)
    at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:67)
    at com.linkedin.entity.client.JavaEntityClient.search(JavaEntityClient.java:288)
    at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:50)
    at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:42)
    at com.datahub.authorization.DataHubAuthorizer$PolicyRefreshRunnable.run(DataHubAuthorizer.java:222)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
    at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:857)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:259)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
    at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
    at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
    at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
    at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:60)
    ... 13 common frames omitted
    Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
    at org.apache.http.util.Asserts.check(Asserts.java:46)
    at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
    at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:255)
    ... 19 common frames omitted
  • b

    bitter-dog-24903

    10/27/2022, 6:03 PM
    The elasticsearch container does not any logs related to login. I have deployed the datahub docker containers in AWS ECS..
  • m

    many-piano-52097

    10/28/2022, 3:21 AM
    hello everyone! I use K8S to deploy the integration of datahub and keyboard lock. Why can't I log out after I log in. By viewing the container datahub frontend log, it displays the error authentication - the session cannot be renewed. Session storage may not support this feature
    i
    • 2
    • 3
  • h

    high-hospital-85984

    10/28/2022, 11:31 AM
    Image
    linkedin/datahub-ingestion:v0.9.0
    seems to be missing in Dockerhub. Is this intentional?
    d
    m
    • 3
    • 5
  • b

    best-umbrella-88325

    11/01/2022, 5:24 AM
    Hi Community. I've deployed Datahub using EKS on datahub with ALB as ingress, but when I hit the ALB host, it says this website cannot be reached. Is there anything extra that needs to be done? I updated the ingress config in the values.yaml as suggested in the documentation. Thanks in advance
    b
    l
    • 3
    • 3
  • w

    witty-motorcycle-52108

    11/02/2022, 5:40 PM
    hey all, did a quick search and didnt see any official guidelines about deployments + replicas, and if/how work is parallelized. here's a few Qs im hoping to get a better understanding of: • Can frontend and gms be run with any number of replicas, or is there a limit somewhere? I'm primarily asking for knowledge purposes, not because i plan to deploy double digits of those containers from the start • Can actions be deployed with replicas? ◦ If so, how is work parallelized across these replicas (if at all)? Like are ingestion tasks split across replicas to speed them up?
    b
    m
    • 3
    • 6
  • c

    cuddly-arm-8412

    11/03/2022, 9:38 AM
    hi,team.I want to know whether to set the search only according to the dataset name? Or specify specific fields?
    b
    • 2
    • 3
  • a

    ancient-apartment-23316

    11/03/2022, 6:25 PM
    Hello, I’m trying to setup Okta, the helm update went without errors, but I get an error when I try to open the datahub frontend page
    Copy code
    Failed to redirect to Single Sign-On provider. Please contact your DataHub Administrator, or refer to server logs for more information.
    Here is my values.yaml:
    Copy code
    datahub-frontend:
      enabled: true
      image:
        repository: linkedin/datahub-frontend-react
        tag: "v0.9.1"
      # Set up ingress to expose react front-end
      ingress:
        enabled: false
      service:
        port: 80 ##################### Not 9002
      oidcAuthentication:
        enabled: true
        provider: okta
        clientId: "q"
        clientSecret: "q"
        oktaDomain: "<https://q.com>"
        baseUrl: "dev-datahub.q.com/sso"
        discoveryUrl: "q.com/.well-known/openid-configuration"
      extraEnvs:
    #    - name: AUTH_OIDC_ENABLED
    #      value: "true"
    #    - name: AUTH_OIDC_CLIENT_ID
    #      value: "q"
    #    - name: AUTH_OIDC_CLIENT_SECRET
    #      value: "q"
    #    - name: AUTH_OIDC_DISCOVERY_URI
    #      value: "<https://qq.com/.well-known/openid-configuration>"
        - name: AUTH_OIDC_BASE_URL
          value: "q.com/sso"
    #    - name: AUTH_OIDC_SCOPE
    #      value: "openid profile email groups"
    i
    m
    e
    • 4
    • 16
  • g

    green-intern-1667

    11/03/2022, 6:54 PM
    Good afternoon. Trying to deploy locally DataHub on Mac. Getting the following message:
    Copy code
    Error response from daemon:  invalid mount config for type "bind": bind source path does not exist: /Users/my_user/.datahub/mysql/init.sql
    Any clue on that? I'm just following the quick start but facing it for a while
    ✅ 1
    s
    • 2
    • 31
  • q

    quiet-wolf-56299

    11/03/2022, 10:54 PM
    Has anyone successfully deployed datahub under podman-compose rather than docker-compose?
    g
    • 2
    • 3
  • c

    creamy-tent-10151

    11/04/2022, 7:24 AM
    Hi Datahub Team, Does this update mean we can now add https/SSL to our frontend directly? or do we still need a proxy for it? Thanks
  • m

    microscopic-mechanic-13766

    11/04/2022, 12:27 PM
    Hi, could someone please point me to the files in the source code where the database connections are done or tell me how the DB connections are done?? I am getting lately this error in my PostgreSQL and I want to know if it could be due to Datahub or to other services I have deployed.
    Copy code
    FATAL: remaining connection slots are reserved for non-replication superuser connections
    Thanks in advance!
    s
    • 2
    • 1
  • m

    most-monkey-10812

    11/04/2022, 1:55 PM
    Hi! Is column-level lineage information for datajob (dataset->datajob->dataset) some how reflected at lineage visualisation UI or in Column-level Impact Analysis screen in versions 0.9.0 or 0.9.1? I am trying to ingest column-level lineage info as dataJobInputOutput aspect of the datajob entity. But I don't see anything in the UI. There is also a possibility to ingest this info as upstreamLineage aspect of dataset. Are this two approaches complement each other or are they mutually-exclusive?
    s
    b
    • 3
    • 4
  • l

    lemon-cat-72045

    11/04/2022, 6:35 PM
    Hi, all I am seeing unable to renew session Error when I am using oidc authentication. Does anyone know how to fix this issue?
    s
    m
    +3
    • 6
    • 16
1...262728...53Latest