https://datahubproject.io logo
Join SlackCommunities
Powered by
# all-things-deployment
  • b

    bland-orange-13353

    06/20/2022, 1:17 PM
    This message was deleted.
    s
    c
    • 3
    • 10
  • h

    helpful-processor-71693

    06/20/2022, 4:32 PM
    Hi All, my
    datahub-datahub-gms
    pod is failing with following error:
    Copy code
    15:38:00.026 [main] ERROR o.s.web.context.ContextLoader:313 - Context initialization failed
    org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'dataHubAuthorizerFactory': Unsatisfied dependency expressed through field 'entityClient'; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'javaEntityClientFactory': Unsatisfied dependency expressed through field '_entityService'; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'entityAspectDao' defined in com.linkedin.gms.factory.entity.EntityAspectDaoFactory: Unsatisfied dependency expressed through method 'createEbeanInstance' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'ebeanServer' defined in com.linkedin.gms.factory.entity.EbeanServerFactory: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [io.ebean.EbeanServer]: Factory method 'createServer' threw exception; nested exception is java.lang.NullPointerException
    	at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.resolveFieldValue(AutowiredAnnotationBeanPostProcessor.java:659)
    	at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:639)
    	at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:119)
    	at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:399)
    	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1431)
    We are using AWS managed ES, MSK and RDS postgres instance for my datahub setup and i see following connection messages in gms pod so the connectivity between RDS looks fine Also, my postgres setup job was completed without any issue. Can someone help on this ?
    Copy code
    2022/06/20 15:35:40 Waiting for: <https://xxxxxxxxxxxxxxxxx.es.amazonaws.com:443>
    2022/06/20 15:35:40 Waiting for: <tcp://xxxxxxxxxxxxx.rds.amazonaws.com:5432>
    2022/06/20 15:35:40 Waiting for: <tcp://xxxxxxx.amazonaws.com:9096>
    2022/06/20 15:35:40 Waiting for: <tcp://xxxxxxxxx.us-west-2.amazonaws.com:9096>
    2022/06/20 15:35:40 Waiting for: <tcp://xxxxxxxxx.us-west-2.amazonaws.com:9096>
    2022/06/20 15:35:45 Connected to <tcp://xxxxxxxxx.us-west-2.amazonaws.com:9096>
    2022/06/20 15:35:45 Connected to <tcp://xxxxxxxx.us-west-2.amazonaws.com:9096>
    2022/06/20 15:35:45 Connected to <tcp://xxxxxxxxxxxx.us-west-2.amazonaws.com:9096>
    2022/06/20 15:35:45 Connected to <tcp://xxxxxxxxxxxx.rds.amazonaws.com:5432>
    2022/06/20 15:35:45 Received 200 from <https://xxxxxxxxxx.us-west-2.es.amazonaws.com:443>
    i
    c
    • 3
    • 4
  • s

    sparse-barista-40860

    06/20/2022, 9:54 PM
    i have issues in here
  • s

    sparse-barista-40860

    06/20/2022, 9:54 PM
    https://gist.github.com/johnfelipe/c42939ad27793fbabbd745f636b770b2
  • s

    sparse-barista-40860

    06/20/2022, 10:30 PM
    done, solved
  • s

    sparse-barista-40860

    06/20/2022, 10:30 PM
    Copy code
    /root/datahub/metadata-ingestion/examples/recipes/mysql_to_datahub.dhub.yaml
  • s

    sparse-barista-40860

    06/20/2022, 10:34 PM
    is any repository for upload or ingest into locally instance a good examples of datasets?
  • c

    calm-dinner-63735

    06/22/2022, 9:21 AM
    HI i am getting this error - Error: failed to download “datahub/datahub-prerequisites”
  • c

    calm-dinner-63735

    06/22/2022, 9:21 AM
    i did helm repo add datahub https://helm.datahubproject.io/
  • c

    calm-dinner-63735

    06/22/2022, 10:20 AM
    Hi I am getting this error from datahub-gms pod 2022/06/22 100800 Connected to tcp://********.c6.kafka.eu-central-1.amazonaws.com:9092 2022/06/22 100800 Connected to tcp://********.eu-central-1.rds.amazonaws.com:3306 2022/06/22 100800 Connected to tcp://********.c6.kafka.eu-central-1.amazonaws.com:9092 2022/06/22 100800 Connected to tcp://********.********.c6.kafka.eu-central-1.amazonaws.com:9092 2022/06/22 100800 Received 200 from https://********.eu-central-1.es.amazonaws.com:443 2022/06/22 100801 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100802 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100803 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100804 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100805 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s
    b
    • 2
    • 5
  • b

    better-orange-49102

    06/22/2022, 10:48 AM
    Can someone help confirm: glossaryNode page is not viewable unless "view entity page" for ALL resources is granted in privileges. (Affects view ACL)
    e
    b
    • 3
    • 9
  • a

    able-rain-74449

    06/22/2022, 11:22 AM
    Hi all is there a way i can deploy different verison datahub i.e
    helm install datahub datahub/datahub --version 0.8.38
    b
    e
    • 3
    • 2
  • c

    calm-dinner-63735

    06/22/2022, 3:58 PM
    Hi when i use
    global.graph_service_impl = elasticsearch with 7.10
  • c

    calm-dinner-63735

    06/22/2022, 3:58 PM
    i am getting this error -
  • c

    calm-dinner-63735

    06/22/2022, 3:58 PM
    2022/06/22 100801 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100802 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100803 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100804 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s 2022/06/22 100805 Problem with request: Get “http“ http: no Host in request URL. Sleeping 1s
  • c

    calm-dinner-63735

    06/22/2022, 3:59 PM
    switch neo4j this is working , so i can tell where the problem , but could please someone hlep me what is the solution
  • m

    microscopic-breakfast-5726

    06/22/2022, 6:06 PM
    Hi team DH waveboi. We are currently using `docker-compose-without-neo4j.quickstart.yml`to deploy DH. If we want to swap out the default MySql with RDS as the storage layer, is there anything else that need to be updated in the quickstart config besides these lines?
    Copy code
    - EBEAN_DATASOURCE_USERNAME=<<rds-username>>
        - EBEAN_DATASOURCE_PASSWORD=<<rds-password>>
        - EBEAN_DATASOURCE_HOST=<<rds-endpoint>>:3306
        - EBEAN_DATASOURCE_URL=jdbc:mysql://<<rds-endpoint>>:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8
        - EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
    e
    • 2
    • 3
  • g

    great-toddler-2251

    06/22/2022, 11:42 PM
    Hi. I have sort of a deployment question. We have data and datasets in multiple AWS Regions. Currently it is painful to do search & discovery of data & datasets since we have to log in to each region using our current home grown approach (don’t really want to call it a data catalog since it isn’t). What I want is to provide a global view. Eventually consistent is fine. So I’m wondering if anyone has tried deploying DH in multiple regions with some kind of sync set up?
    m
    • 2
    • 3
  • b

    bitter-toddler-42943

    06/23/2022, 2:25 AM
    Hello, I am trying to delete datahub dataset using datahub delete --env PROD --entity_type dataset but when I run the command above and collect it again, the deleted data seems to keep coming out. Do you know what command or option to run to delete all data? And I don't know what a URI is, but what should I do to create a command with the hard option by writing a URI?
    o
    • 2
    • 2
  • l

    lemon-terabyte-66903

    06/23/2022, 11:51 PM
    Hello, which process/service runs the init queries once the RDS service is set up for datahub?
  • l

    lemon-terabyte-66903

    06/24/2022, 2:46 PM
    Hello, anybody here to address the question 👆?
    b
    m
    • 3
    • 16
  • b

    bland-orange-13353

    06/27/2022, 12:29 PM
    This message was deleted.
    o
    l
    h
    • 4
    • 3
  • a

    agreeable-belgium-70840

    06/28/2022, 9:40 AM
    Hello, I am trying to deploy datahub v0.8.39. All the components are working, except the mae consumer. I am getting this error:
    Copy code
    g.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'siblingAssociationHook' defined in URL [jar:file:/datahub/datahub-mae-consumer/bin/mae-consumer-job.jar!/BOOT-INF/lib/mae-consumer.jar!/com/linkedin/metadata/kafka/hook/siblings/SiblingAssociationHook.class]: Unsatisfied dependency expressed through constructor parameter 1; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'entityService' defined in class path resource [com/linkedin/gms/factory/entity/EntityServiceFactory.class]: 'entityService' depends on missing bean 'entityAspectDao'; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No bean named 'entityAspectDao' available
    
    -------
    Parameter 1 of constructor in com.linkedin.metadata.kafka.hook.siblings.SiblingAssociationHook required a bean named 'entityAspectDao' that could not be found.
    The injection point has the following annotations:
    	- @javax.annotation.Nonnull(when=ALWAYS)
    Any ideas?
    plus1 1
    s
    l
    • 3
    • 2
  • b

    bulky-jackal-3422

    06/28/2022, 3:25 PM
    Hey all, we've done all that we can to test out datahub locally, and think that's it's time to host it. Currently we use managed airflow on AWS and want to be able to easily integrate with that. Is there any established best practice for hosting datahub? ECS or EC2 immediately come to mind.
    m
    • 2
    • 1
  • l

    lemon-terabyte-66903

    06/28/2022, 4:16 PM
    Hey team, I want to check if datahub supports SAML for SSO using AWS.
  • c

    creamy-van-28626

    06/28/2022, 6:20 PM
    Is it possible to disable datahub from airflow ? Means I don't want my datahub changes affect the airflow runs ?
    d
    l
    • 3
    • 6
  • n

    nutritious-finland-99092

    06/28/2022, 6:53 PM
    Hi all, were trying to deploy an ECS service and we are facing some connection issues with elasticsearch and elasticsearch-setup
    Copy code
    Problem with request: Get <http://localhost:9200>: dial tcp [::1]:9200: connect: connection refused. Sleeping 1s
    Does anyone facing this issue or have deployed with ECS using elasticsearch and can help us?
    plus1 2
    o
    • 2
    • 3
  • l

    lemon-terabyte-66903

    06/28/2022, 8:15 PM
    Hello anybody?
    o
    • 2
    • 4
  • g

    green-lion-58215

    06/28/2022, 8:36 PM
    Hi Team, I am trying to set up a pipeline to ingest DBT metadata into datahub through a lambda function. However, I am getting the below error. Does anyone know how to resolve it? I am using a layer which has 'acryl-datahub[dbt]' package installed. below is the pipeline step:
    Copy code
    from datahub.ingestion.run.pipeline import Pipeline
     pipeline = Pipeline.create(
            {
                "source": {
                    "type": "dbt",
                    "config": {
                        "manifest_path": "/tmp/manifest.json",
                        "catalog_path": "/tmp/catalog.json",
                        "sources_path": "/tmp/sources.json",
                        "target_platform": "databricks",
                        "load_schemas": True
                    },
                },
                "sink": {
                    "type": "datahub-rest",
                    "config": {"server": "http://<masked>:8080"},
                },
            }
        )
    • 1
    • 2
  • s

    some-kangaroo-13734

    06/29/2022, 4:24 PM
    👋 hello 🙂 I’m running a PoC with Datahub and I’m super impressed by the project 🚀 I have an outstanding issue that I haven’t sorted out yet and I was hoping to get some ideas on possible solutions: I have a number of Airflow deployments sending metadata via the lineage plugin and I have the gms’s auth service enabled; everything is working fine but I’m wondering what’s the best way to refresh or rotate the token used by the Airflow integration, which I currently specify in the connection string? Will you ever support PAT without expiry date? I have a similar problem also for the other ingestion sources (like Bigquery or Postgres), which I’m currently running in Docker containers as Kubernetes cronjobs Thanks!
    b
    • 2
    • 2
1...151617...53Latest