https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • b

    better-oyster-93449

    02/23/2022, 2:52 PM
    Hi everyone, I’ve ingested some data and logoUrl has the following shape:
    /assets/platforms/lookerlogo.png
    . However when I deploy the frontend docker container, this URL is unreachable. This translates to icons not showing up in my deployments. Has any one experienced something similar? I’m new to the project, so any help is appreciated!
    b
    • 2
    • 3
  • l

    lively-fall-12210

    02/23/2022, 2:54 PM
    Hi! Is there any way to delete a domain again? It does not seem possible in the UI, and the call to
    datahub delete --urn xyz
    fails, because the status aspect is unknown for domain entities.
    e
    b
    b
    • 4
    • 10
  • n

    nutritious-bird-77396

    02/23/2022, 6:43 PM
    Hi Team, Is there a setting to always
    Show Full Titles
    by default in the Lineage View - https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:kafka,cdc.UserAccount_ChangeEvent,PROD)?is_lineage_mode=true
    e
    • 2
    • 1
  • n

    nutritious-bird-77396

    02/23/2022, 9:34 PM
    In the Kafka Connect Ingestion its not clear to me what
    provided_configs
    property for? https://datahubproject.io/docs/metadata-ingestion/source_docs/kafka-connect I am having an issue where the kafka connect ingestion also ingests the destination topics but it ingests with
    PROD
    as env instead of the provided
    env
    value in the recipe such as
    STG
    just wondering if the provided_configs could be a solution here.
    h
    • 2
    • 7
  • b

    bland-orange-95847

    02/24/2022, 7:17 AM
    Hi after upgrading to latest release the catalog is not usable anymore as I am getting a lot of
    Validation error of type FieldUndefined: Field 'displayName' in type 'CorpUserEditableProperties' is undefined @ 'searchResults/searchResults/entity/editableProperties/displayName' (code undefined
    errors. Anyone else facing this after the upgrade?
    n
    l
    +3
    • 6
    • 31
  • b

    busy-dusk-4970

    02/24/2022, 4:01 PM
    Getting this error when running
    docker-compose -f docker-compose.dev.yml up
    view error log in thread 🙏
    e
    • 2
    • 18
  • n

    nutritious-machine-80578

    02/24/2022, 7:57 PM
    hi! 👋. Question about the DataHub API, is there a way we can create POST requests to load descriptions to datasets? I couldnt find it in the docs
    e
    m
    • 3
    • 8
  • r

    rich-policeman-92383

    02/25/2022, 10:09 AM
    Hi Datahub I am trying to build datahub frontend in an air gapped environment that allows to reach internet only via proxy. To build the frontend i have specified the proxy on line but gradlew is unable to reach registry.npmjs.org. Also while building the gms similarly the build fails with error could not find tools.jar.
    s
    l
    b
    • 4
    • 11
  • f

    freezing-nightfall-82415

    02/25/2022, 10:16 AM
    Hi guys, I have trouble getting Datahub running on K8s with existing backends. I think I finally got existing Postgres and Kafka DBs working with Datahub, but I'm still getting weird errors with Elasticsearch. elasticsearch-setup-job logs show status 200
    l
    e
    • 3
    • 7
  • w

    witty-painting-90923

    02/25/2022, 4:03 PM
    Hi! 👋 Question about stateful ingestion for postgres. It should be supported, but it doesnt work for me 😞 I run the pipeline below, then drop the table in postgres, and run it again. The table metadata in UI is not removed. The same happens when we test with bigquery.
    curl http://<datahub-gms-endpoint>/config
    is saying
    statefulIngestionCapable: true
    so it should be fine we have
    gms v0.8.26
    datahub cli v0.8.26.3
    Any help would be much appreciated, thank you!
    Copy code
    pipeline = Pipeline.create(
            # This configuration is analogous to a recipe configuration.
            {
                "source": {
                    "type": "postgres",
                    "config": {
                        "env": ENV,
                        "host_port": sql_host_port,
                        "database": database,
                        "username": sql_login,
                        "password": sql_password,
                        "include_views": False,
                        "profiling": {
                            "enabled": True
                        },
                        "stateful_ingestion": {
                            "enabled": True,
                            "remove_stale_metadata": True,
                            "state_provider": {
                                "type": "datahub",
                                "config": {"datahub_api": {"server": datahub_host}},
                            },
                        },
                    },
                },
                "pipeline_name": "my_postgres_pipeline_1",
                "sink": {
                    "type": "datahub-rest",
                    "config": {"server": datahub_host},
                },
            }
    )
    l
    h
    +5
    • 8
    • 15
  • b

    broad-thailand-41358

    02/25/2022, 5:38 PM
    Hi all, I'm trying to ingest data from a Trino database but need a way to pass SOCKS proxy host/port information to connect due to my company's security policies. In DataGrip, I would do this by editing the database connection and adding the following parameters
    DsocksProxyHost=127.0.0.1 -DsocksProxyPort=8080
     in 
    VM Options
     under the
    Advanced Settings
    Is there any way to mimic this in a data ingestion recipe yml file?
    d
    • 2
    • 1
  • r

    red-napkin-59945

    02/25/2022, 9:09 PM
    Hey Team, I changed some data model and get
    *:metadata-service:restli-servlet-impl:checkRestModel* FAILED
    issue. anyone has any idea of how to fix it?
    e
    • 2
    • 22
  • n

    numerous-application-54063

    02/28/2022, 2:03 PM
    Hey guys, we are experimenting issues while upgrading the cli to version 0.8.27.1 on our airflow environment. Seems that the cli is using a version of makupsafe that it's not compatible with our airflow version, forcing us to upgrade it, that for now it's not an option.
    The conflict is caused by:
    #17 110.4   jinja2 2.11.3 depends on MarkupSafe>=0.23
    #17 110.4   apache-airflow 2.1.2 depends on markupsafe<2.0 and >=1.1.1
    #17 110.4   acryl-datahub 0.8.27.1 depends on markupsafe==2.0.1
    is the markupsafe==2.0.1 requirement strictly necessary for datahub cli?
    b
    • 2
    • 2
  • n

    numerous-camera-74294

    02/28/2022, 2:43 PM
    hi folks! I have been trying to ingest lineage data using the datahub-cli (awesome tool btw), and I was facing an error when the cli does a request to retrieve the upstreamLineage from the GMS.
    Copy code
    com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Invalid value type for parameter aspects
    looking deeper into it, that error is thrown when the header X-RestLi-Protocol-Version set to 2.0.0, if I remove or change it to i.e. 1.0.0, the request looks good the header is added in https://github.com/acryldata/datahub/blob/master/metadata-ingestion/src/datahub/cli/cli_utils.py#L163 any hint why this is causing the backend to crash?
    b
    m
    • 3
    • 44
  • e

    elegant-article-21703

    02/28/2022, 5:53 PM
    Hello everyone! I'm having an unexpected issue while deploying Datahub using a pipeline in Azure DevOps. I have included the annotation
    <http://service.beta.kubernetes.io/azure-load-balancer-internal|service.beta.kubernetes.io/azure-load-balancer-internal>: "true"
    at
    values.yaml
    in the frontend and gms charts. Addticionally, I have included the the
    LoadBalancerIP
    for each at their correspondant
    service.yaml
    inside templates. Once the pipeline reach the installation task, I get this message:
    Copy code
    wait.go:225: [debug] Service does not have load balancer ingress IP address: default/datahub-datahub-frontend
    I don't know what do I need to include in the files, what am I missing here? Thank you in advance for any help! 🙂
    b
    e
    • 3
    • 14
  • s

    strong-architect-67189

    02/28/2022, 9:13 PM
    Hello! I'm new to DataHub, and I've been very impressed with the product so far! This is honestly exactly what I've been looking for. However, I'm currently having some issues ingesting from BigQuery with my GKE deployment. I've been desperately searching the channel looking for fixes but can't seem to find anything. • Deployed to GKE using helm • Running
    acryl-datahub, version 0.8.27
    • Port-forwarding datahub-datahub-frontend --> 9002 and datahub-datahub-gms --> 8080 • Currently have a working ingress for the frontend using my domain which was created by following the exact instructions from the docs. •
    curl '<http://datahub.xxxx.xxxx/api/gms/config>'
    returns nothing at the moment • Trying to ingest from BigQuery located in the same private project as my GKE instance Here is the recipe I created in the UI:
    Copy code
    source:
        type: bigquery
        config:
            project_id: xxxxxxxxxxxx
            credential:
                project_id: xxxxxxxxxx
                private_key_id: xxxxxxxxxxx
                private_key: "-----BEGIN PRIVATE KEY-----xxxxxxxxxxxxxx-----END PRIVATE KEY-----\n"
                client_email: <mailto:xxxxxx@xxxxxxx.iam.gserviceaccount.com|xxxxxx@xxxxxxx.iam.gserviceaccount.com>
                client_id: 'xxxxxxxxxxx'
    sink:
        type: datahub-rest
        config:
            server: '<http://datahub.xxxx.xxxx/api/gms>'
    Keep receiving this error (did not include the entire stack trace)
    Copy code
    'ConfigurationError: Unable to connect to <http://datahub.xxxx.xxx/api/gms/config> with status_code: 401. Maybe you need to set up 
    authentication? Please check your configuration and make sure you are talking to the DataHub GMS (usually <datahub-gms-host>:8080) or 
    Frontend GMS API (usually <frontend>:9002/api/gms).'
    I've tried including an access token generated from the UI, but that still doesn't seem to give me the right authentication. Is this an authentication issue with GKE? Do I need to create an ingress for the datahub-gms service? I've tried quite a few combinations and I'm still very confused. Any help would be greatly appreciated. Thank you in advance!!!!
    n
    b
    • 3
    • 6
  • c

    cool-painting-92220

    02/28/2022, 10:59 PM
    Hey everyone! I've been struggling with this for a few days and haven't been able to fully figure it out - I was initially trying to get my metadata ingestion working again, because I think I was experiencing some versioning issues upon receiving this message
    Unable to emit metadata to DataHub GMS
    . My new focus was to update my datahub instance, so I updated acryl-datahub to 0.8.27 and then took a look at the
    datahub-upgrade.sh
    script. I attempted to run it and was met with the following message:
    Copy code
    Starting upgrade with id NoCodeDataMigration...
    Cleanup has not been requested.
    Skipping Step 1/6: RemoveAspectV2TableStep...
    Executing Step 2/6: GMSQualificationStep...
    Completed Step 2/6: GMSQualificationStep successfully.
    Executing Step 3/6: UpgradeQualificationStep...
    -- V1 table does not exist
    Any pointers on how I can fix this? I know that versioning and instances can be a bit tricky to debug without the full context, so I'd be happy to hop on a quick call as well to sort out my instance if that's easier.
    b
    e
    • 3
    • 9
  • a

    able-rain-74449

    03/01/2022, 2:42 PM
    i have converted helm template to YAML
    m
    • 2
    • 4
  • a

    able-rain-74449

    03/01/2022, 2:47 PM
    it's
    CrashLoopBackOff
    Copy code
    NAME                                                        READY   STATUS             RESTARTS   AGE
    datahub-elasticsearch-master-0                              1/1     Running            0          42m
    datahub-elasticsearch-master-1                              1/1     Running            0          42m
    datahub-elasticsearch-master-2                              0/1     Running            0          20m
    datahub-prerequisites-cp-schema-registry-65d8777cc8-m88mn   1/2     CrashLoopBackOff   10         38m
    datahub-prerequisites-kafka-0                               1/1     Running            0          48m
    datahub-prerequisites-mysql-0                               1/1     Running            0          66m
    datahub-prerequisites-neo4j-community-0                     1/1     Running            0          70m
    datahub-prerequisites-zookeeper-0                           1/1     Running            0          45m
    b
    • 2
    • 4
  • m

    miniature-account-72792

    03/01/2022, 2:53 PM
    Hi everyone, I've tried setting up datahub via kubernetes the last two days. For kafka I use a Strimzi cluster I set up, so I don't use the built in bitnami kafka from the prerequisites. This Strimzi cluster is using mutual TLS so I need to pass the certificates to every client of Kafka. I've managed so far to set up everything and get most of the datahub components up and running on my kubernetes cluster. However the
    datahub-acryl-datahub-actions
    components is constantly logging the following error
    Copy code
    %3|1646144911.225|FAIL|rdkafka#consumer-1| [thrd:<ssl://testing-strimzi-cluster-kafka-bootstrap:9092/bootstrap>]: <ssl://testing-strimzi-cluster-kafka-bootstrap:9092/bootstrap>: SSL handshake failed: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed: broker certificate could not be verified, verify that ssl.ca.location is correctly configured or root CA certificates are installed (install ca-certificates package) (after 5ms in state SSL_HANDSHAKE, 31 identical error(s) suppressed)
    I currently pass the following environment variables via the deployment yaml: • KAFKA_PROPERTIES_SSL_KEY_PASSWORD • KAFKA_PROPERTIES_KAFKASTORE_SSL_TRUSTSTORE_PASSWORD • KAFKA_PROPERTIES_SSL_TRUSTSTORE_PASSWORD • KAFKA_PROPERTIES_SSL_KEYSTORE_PASSWORD • KAFKA_PROPERTIES_KAFKASTORE_SSL_KEYSTORE_PASSWORD • KAFKA_PROPERTIES_SSL_TRUSTSTORE_TYPE • KAFKA_PROPERTIES_SSL_KEYSTORE_LOCATION • KAFKA_PROPERTIES_SSL_TRUSTSTORE_LOCATION • KAFKA_PROPERTIES_KAFKASTORE_SSL_TRUSTSTORE.LOCATION • KAFKA_PROPERTIES_SECURITY_PROTOCOL • KAFKA_PROPERTIES_KAFKASTORE_SECURITY_PROTOCOL • KAFKA_PROPERTIES_SSL_PROTOCOL • KAFKA_PROPERTIES_SSL_ENDPOINT_IDENTIFICATION.ALGORITHM • KAFKA_PROPERTIES_SSL_CA_LOCATION (= truststore location)
    plus1 1
    b
    • 2
    • 8
  • p

    plain-farmer-27314

    03/01/2022, 3:37 PM
    LookML ingestion does not appear to be correctly parsing table names from all of our view files Context: datahub-lookml version:
    0.8.26.3
    (also tried this on latest pip version
    0.8.27.1
    ) We use BQ with looker There were no errors associated with this view in the logs parse_table_names_from_sql is set to
    true
    I have double checked the view definition in datahub and it matches what we have in Looker, there are very clearly 3-4 tables that were not picked up by ingestion, and one that was ingested incorrectly
    m
    r
    • 3
    • 7
  • h

    handsome-football-66174

    03/01/2022, 5:22 PM
    Hi Everyone, I am trying to get those datasets which have description starting with Project, but I do not get any results ( even if I give * ) -
    Copy code
    {
      search(input: {start: 0, count: 100, query: "*", type: DATASET,filters:{field:"description",value:"Project*"}}) {
        searchResults {
          entity {
            urn
            type
          }
          matchedFields {
            name
            value
          }
        }
      }
    }
    b
    b
    • 3
    • 11
  • s

    some-crayon-90964

    03/01/2022, 6:21 PM
    Hi community, I pulled the latest datahub and I got following issue when building. The error happens in
    testQuick
    . I can build with
    -x testQuick
    , but I would like to make sure issues go away in the long term. Please advise, thanks in advanced!
    b
    • 2
    • 1
  • r

    red-napkin-59945

    03/01/2022, 9:31 PM
    Do we still need to manually modify the graphql schema if we want to add new entities?
    b
    • 2
    • 6
  • r

    red-napkin-59945

    03/02/2022, 1:01 AM
    Hey team, I am testing my new entity type locally. I successfully write the entity record but failed to get the entity with error:
    java.lang.UnsupportedOperationException: Failed to find Typeref schema associated with Config-based Entity
    b
    h
    • 3
    • 29
  • a

    adorable-flower-19656

    03/02/2022, 5:37 AM
    Hi datahub, I'm using datahub via helm chart. How can I set log level to DEBUG for datahub-frontend pod? I'd like to see this log in my pod. https://github.com/linkedin/datahub/blob/master/datahub-frontend/app/auth/sso/oidc/OidcCallbackLogic.java#L241
    r
    b
    • 3
    • 2
  • r

    red-napkin-59945

    03/02/2022, 5:38 AM
    Hey team, I got several other questions when trying to implement corresponding GraphQL logic in
    datahub-graphql-core
    module according to the "DataHub GraphQL Core" readme page: 1. Looks like the ReadMe needs some update? I could not easily find what the doc told me to do, like
    resources/gms.graphql
    ,
    DataLoaders
    ,
    Mappers
    and
    DataFetchers
    2. Looks like the
    SearchableEntityType
    is deprecated. Should the new LoadableType extend SearchableEntityType? If no, what’s the alternative? 3. I am a little confused about
    RestliEntityClient
    and
    JavaEntityClient
    Looks like, they both finally call
    EntityService -> EbeanAspectDao -> DB
    The difference is
    RestliEntityClient
    send a Restli request and the Restli server calles
    EntityService
    But
    JavaEntityClient
    call
    EntityService
    directly? If I would like to introduce a new entity, looks like I do not need to change them? 4. Looks like Mappers and DataFetchers are not needed now since batchLoad() return GraphQL object?
    b
    g
    • 3
    • 6
  • r

    rapid-sundown-8805

    03/02/2022, 1:45 PM
    Hi community, I've found what I believe to be a bug. The datahub actions container is failing with the following error log:
    Copy code
    KafkaException: KafkaError{code=_INVALID_ARG,val=-186,str="Failed to create consumer: No provider for SASL mechanism GSSAPI: recompile librdkafka with libsasl2 or openssl support. Current build options: PL ││ AIN SASL_SCRAM OAUTHBEARER"}                                                                                                                                                                                  ││ 2022/03/02 13:30:11 Command exited with error: exit status 1
    We don't use GSSAPI, but PLAIN, so there is some setting that the container does not pick up correctly. However, when I look at the deployment manifest for the actions container, it has these variables set:
    Copy code
    spec:
          containers:
          - env:
            - name: GMS_HOST
              value: dfds-datahub-datahub-gms
            - name: GMS_PORT
              value: "8080"
            - name: KAFKA_BOOTSTRAP_SERVER
              value: REDACTED
            - name: SCHEMA_REGISTRY_URL
              value: <http://datadelivery-schema-registry:8081>
            - name: KAFKA_AUTO_OFFSET_POLICY
              value: latest
            - name: ACTION_FILE_NAME
              value: executor.yaml
            - name: KAFKA_PROPERTIES_KAFKASTORE_SECURITY_PROTOCOL
              value: SASL_SSL
            - name: KAFKA_PROPERTIES_SASL_JAAS_CONFIG
              value: org.apache.kafka.common.security.plain.PlainLoginModule required
                username="REDACTED" password="REDACTED";
            - name: KAFKA_PROPERTIES_SASL_MECHANISM
              value: PLAIN
            - name: KAFKA_PROPERTIES_SASL_PASSWORD
              value: REDACTED
            - name: KAFKA_PROPERTIES_SASL_USERNAME
              value: REDACTED
            - name: KAFKA_PROPERTIES_SECURITY_PROTOCOL
              value: SASL_SSL
            image: public.ecr.aws/datahub/acryl-datahub-actions:v0.0.1-beta.8
            imagePullPolicy: IfNotPresent
            name: acryl-datahub-actions
            ports:
            - containerPort: 9093
              name: http
              protocol: TCP
            resources:
              limits:
                cpu: 500m
                memory: 512Mi
              requests:
                cpu: 300m
                memory: 256Mi
            securityContext: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
    So the SASL_MECHANISM should be set to PLAIN, no? It is also set to PLAIN in the global values in the helm chart, see our values file here: https://github.com/dfds-data/datahub-infrastructure/blob/master/datahub/dfdsvals.yaml Is it a bug?
    s
    • 2
    • 1
  • g

    gentle-father-80172

    03/02/2022, 2:20 PM
    Hi Team! 👋 I think I'm missing something.... 🤔 If I want to use GraphQL search to filter datasets on a platform is this the correct way to filter the query? Thanks!
    Copy code
    {
      search(input: {start: 0, count: 100, query: "*", type: DATASET, filters:{field:"Dataset.platform.name",value:"glue"}}) {
        searchResults {
          entity {
            urn
            type
          }
          matchedFields {
            name
            value
          }
        }
      }
    }
    s
    • 2
    • 2
  • s

    salmon-area-51650

    03/02/2022, 3:01 PM
    Hi Team!!! After deploying the Datahub helm chart, I realized thath
    datahub
    user is not admin and I cannot access to
    Ingestion UI
    ,
    Policies
    , …. How can activate
    datahub
    user as administrator?
    b
    • 2
    • 10
1...181920...119Latest