https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • b

    broad-crowd-13788

    11/12/2021, 10:35 PM
    Any idea what does this error mean?
    Copy code
    Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema being registered is incompatible with an earlier schema; error code: 409
    b
    • 2
    • 2
  • s

    stocky-guitar-68560

    11/13/2021, 4:46 PM
    Hi all I wanted to use my own elasticsearch instead of datahub using the elasticsearch using an image,how can I change the configurations?
    e
    • 2
    • 2
  • n

    nice-planet-17111

    11/15/2021, 1:17 AM
    Hi, does anyone tried changing master id/pw when deploying via helm chart on k8s? 🙂 I checked the docs and it says restart the container or run docker-compose up after editing user.props... i tried helm update / restart deployment, and both did not work. 😞 Any ideas on what i'm missing?
    b
    • 2
    • 1
  • c

    clean-crayon-15379

    11/15/2021, 8:55 AM
    Hi all, do you also get a docker unhealthy with the latest datahub-gms version? When logging in before docker kills it, I get the following error: "JSON.parse: unexpected character at line 1 column 1 of the JSON data". Thank you Update: As it seems, the new GMS container had an issue with my stored metadata. After nuking it and re-ingesting, everything works again.
    m
    • 2
    • 1
  • w

    wooden-arm-26381

    11/15/2021, 1:11 PM
    Hi, after performing a rollback of a glossary term ingestion, the deleted terms cant be removed anymore from datasets. Following message is displayed on the UI:
    Failed to remove term: An unknown error occurred.
    Only a re-ingestion of those terms and then removing them seems to work for me.
    m
    • 2
    • 2
  • h

    handsome-belgium-11927

    11/15/2021, 2:05 PM
    Hi! Anybody knows what does this error mean?
    Copy code
    The field at path '/listRecommendations/modules[4]/content[0]/entity/glossaryTermInfo' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'GlossaryTermInfo' within parent type 'GlossaryTerm' (code undefined)
    Appears everywhere after glossary ingestion
    w
    m
    • 3
    • 14
  • b

    brief-lizard-77958

    11/15/2021, 2:11 PM
    I have trouble compiling using gradlew build. It completes everything successfully up to metadata iotest. It fails for ElasticSearchSystemMetadataServiceTest. setup From the report in metadata-io folder: org.testcontainers.containers.ContainerLaunchException: Container startup failed Full log of the report for ElasticSearchSystemMetadataServiceTest in the attached html file. I have datahub running (using quickstart.sh) in the background and everything is working properly. Is it possible that this is related to not having enough resources (CPU) to perform the build?
    report2.html
    ✅ 1
    m
    m
    • 3
    • 6
  • f

    future-hamburger-62563

    11/16/2021, 2:39 AM
    (low priority) I'm having trouble figuring out how to fix this issue with whitespace in environment variables. Anyone have some spare time to take a look at it? Basically,
    docker compose
    v2 is throwing an error when the environment variables have whitespace in them. The command also appears to throw an error when there is a dot (.) in the key of shell variable. Shirshanka mentioned there was just one file to change, but I can't figure out how to do it. I thought about putting escaped quotes around the values in/around line 50 of
    generate_docker_quickstart.py
    but I dont know how it would affect numerical values like ip addresses or port numbers. Another area I'm unsure of is the Actions/build & test. Its failing at quickstart-compose-validation, but I'm not really getting what's happening. Is the script running
    quickstart_docker_quickstart.sh
    that is in
    docker/quickstart
    in order to generate the
    temp.quickstart.yml
    and generating/comparing it to a fresh copy generated by
    generate_docker_quickstart.py
    . Or is there something else happening? (Like the temp.quickstart.yml getting pulled from some preconfigured folder out of sight). PR: https://github.com/linkedin/datahub/pull/3522 Any thoughts would be appreciated. If you want to test it on your system load docker, turn on v2 docker compose in the settings and try and run
    docker/dev.sh
    . In my case, I got an error: https://datahubproject.io/docs/docker/development#unexpected-character
    e
    • 2
    • 3
  • n

    nice-planet-17111

    11/17/2021, 5:55 AM
    Does anyone tried / succeeded on applying
    database_alias
    when ingestion from mysql? (Mine does not work, it just get ingested as original database name, and even in urn it's still the original name. )
    s
    m
    • 3
    • 7
  • m

    melodic-helmet-78607

    11/17/2021, 6:51 AM
    Hi, does anyone know if it is possible to filter aspects by its aspect name in searchAcrossEntities API? I can't find documentation about filtering aspect
    e
    • 2
    • 11
  • h

    handsome-football-66174

    11/17/2021, 1:37 PM
    Hi, I am running the following command helm upgrade --install --namespace default --create-namespace datahub ./datahub -f ./datahub/values.y*ml -f ./datahub/values-dev.y*ml using the 0.8.16 version of the code. The frontend UI ends up with only Datasets, Dashboards and charts. No pipelines tab. Any directions on what I could be doing wrong ?
    e
    • 2
    • 30
  • b

    better-orange-49102

    11/17/2021, 1:46 PM
    if u use quickstart command, is it normal to be unable to access the GraphQL Explorer Tool? am getting
    javax.servlet.ServletException: org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.lang.UnsupportedOperationException: GraphQL gets not supported.
    b
    • 2
    • 3
  • a

    aloof-london-98698

    11/17/2021, 5:28 PM
    Hiya, I'm trying to set up ingestion for Snowflake and I'm facing issues with the recipe around the database_pattern.allow setting. Here's my recipe with all the sensitive info removed. When I run ingestion, I see the error below. "ignoreCase" which is a nested field seems to work, but "allow" which has the same hierarchy throws an error. Any idea what I'm doing wrong here?
    Copy code
    1 validation error for SnowflakeConfig
    database_pattern -> allow
      value is not a valid list (type=type_error.list)
    Copy code
    source:
      type: "snowflake"
      config:
        # Coordinates
        host_port: "xxxxxx"
        warehouse: "xxxxxxx"
    
        # Credentials
        username: "username"
        password: "password"
        role: "role"
        
        include_table_lineage: "True"
        database_pattern:
         allow: "database_name"
         ignoreCase: "True"
    
    sink:
      type: "datahub-rest"
      config:
        server: xxxxxx
    m
    m
    • 3
    • 5
  • a

    agreeable-thailand-43234

    11/17/2021, 11:27 PM
    Hi guys!! I just started trying datahub…i’m using
    acryl-datahub, version 0.8.16.11
    with the docker quickstart, image tag says
    head
    , I’m trying to ingest data using
    linkedin/datahub-ingestion
    docker image with the following command
    Copy code
    docker run -v /Desktop/test:/datahub-ingestion linkedin/datahub-ingestion ingest -c ./datahub-ingestion/config.yaml
    the config.yaml looks like this:
    Copy code
    source:
      type: "athena"
      config:
        # Coordinates
        aws_region: "xxx"
        work_group: "xxx"
    
        # Credentials
        username: "xxx"
        password: "xxx"
        database: "xxx"
    
        # Options
        s3_staging_dir: "<s3://xxx/>"
    
    sink:
      type: "datahub-rest"
      config:
        server: "<http://localhost:8080>". #also tried "<http://datahub-gms:8080>"
    then i’ve got this error:
    Copy code
    ERROR    {datahub.ingestion.run.pipeline:52} - failed to write record with workunit admincube.cubeprod with ('Unable to emit metadata to DataHub GMS', {'message': "HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /datasets?
    I tried
    linkedin/datahub-ingestion:latest
    as well as
    linkedin/datahub-ingestion:head
    Any idea? Cheers!
    m
    • 2
    • 35
  • h

    handsome-belgium-11927

    11/18/2021, 1:29 PM
    Hi again 😞 In the search panel I can see 22 datasets attached to the Glossary Term, but when I click on it
    Related Entities
    tab is empty. What may be the problem?
    p
    e
    +2
    • 5
    • 7
  • t

    thousands-intern-95970

    11/18/2021, 2:45 PM
    Hello everyone ! While starting the datahub locally, I got an error : 'docker command not found. ' And when I tried to start the datahub from docker dashboard, the following error occurs :" Cannot start Docker Compose application. Reason: Error invoking remote method 'compose-action': Error: Command failed: docker-compose --file "tmpjnujd2yd.yml" --project-name "datahub" --project-directory "/var/folders/d8/cdx8th3s70b265071wcp64380000gn/T" up -d stat /private/var/folders/d8/cdx8th3s70b265071wcp64380000gn/T/tmpjnujd2yd.yml: no such file or directory" Can someone help in this ?? It will be really helpful :))
    m
    m
    • 3
    • 8
  • m

    mysterious-park-53124

    11/19/2021, 4:07 AM
    java.net.UnknownHostException: schema-registry
     after run 
    docker-compose up
    I custom file user.props, then run
    docker-compose up
    and ingest data What may be the problem?
    Copy code
    datahub-gms               | 04:05:22.429 [qtp544724190-23] ERROR i.c.k.s.client.rest.RestService - Failed to send HTTP request to endpoint: <http://schema-registry:8081/subjects/MetadataAuditEvent_v4-value/versions>
    datahub-gms               | java.net.UnknownHostException: schema-registry
    datahub-gms               | 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
    datahub-gms               | 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    datahub-gms               | 	at java.net.Socket.connect(Socket.java:607)
    datahub-gms               | 	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
    datahub-gms               | 	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
    datahub-gms               | 	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
    datahub-gms               | 	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
    datahub-gms               | 	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
    datahub-gms               | 	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
    datahub-gms               | 	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
    b
    • 2
    • 3
  • s

    some-microphone-33485

    11/19/2021, 5:14 PM
    hello Team , good day . question if anyone integrated mwaa to datahub ? For us this is not working as mwaa is not recognizing the backend lineage . thank you .
    m
    • 2
    • 4
  • n

    nutritious-bird-77396

    11/19/2021, 11:03 PM
    I deployed the
    datahub-frontend
    using docker and I am able to browse thru datasets successfully. When clicking on Analytics I get
    An unknown error occurred. (code 500)
    I have
    DATAHUB_ANALYTICS_ENABLED=true
    To connect to ES I have these env variables...
    Copy code
    ELASTIC_CLIENT_HOST=<http://zzzzzzzzz.us-east-1.es.amazonaws.com|zzzzzzzzz.us-east-1.es.amazonaws.com>
    ELASTIC_CLIENT_PORT=443
    ELASTIC_CLIENT_USERNAME=username
    ELASTIC_CLIENT_PASSWORD=password
    ELASTIC_CLIENT_USE_SSL=true
    USE_AWS_ELASTICSEARCH=true
    Any idea if this is because of ES/Kafka Connection or something else? Don't see much in the logs though...
    e
    c
    • 3
    • 57
  • n

    nutritious-bird-77396

    11/19/2021, 11:06 PM
    I see the difference between GMS and Frontend where al the env variables in GMS has
    ELASTICSEARCH_HOST
    whereas in frontend
    ELASTIC_CLIENT_HOST
    I think its not a bad idea to make the env vars haev the same name across apps...
    e
    b
    • 3
    • 4
  • r

    red-pizza-28006

    11/22/2021, 12:27 PM
    i created an empty datahub instance with 0.8.17, I don’t see the datasets tab at all, is this expected
    p
    l
    +4
    • 7
    • 15
  • r

    red-pizza-28006

    11/22/2021, 7:23 PM
    It looks I cannot configure the lineage backend of MWAA (Amazon managed Airflow). Has anyone gotten around this? (My last hope is to use the DatahubEmitterOperator, but trying to keep my fingers crossed if there is an easier way someone has found out). 😄
    l
    s
    • 3
    • 2
  • b

    breezy-guitar-97226

    11/23/2021, 11:31 AM
    Hi here, we are experiencing problems with OIDC in the datahub frontend and more specifically
    Copy code
    Caused by: org.pac4j.core.exception.TechnicalException: State parameter is different from the one sent in authentication request. Session expired or possible threat of cross-site request forgery
    b
    s
    a
    • 4
    • 14
  • w

    wonderful-quill-11255

    11/23/2021, 12:59 PM
    Hello. We are in the process of upgrading from 0.8.10->0.8.11 (yes...baby steps) and got hit by a limitation that the frontend can only talk
    http
    with the gms, not
    https
    . In our setup all datahub components talk with each other over SSL. However, if I change the scheme here to https I get a 400 Bad Request response back from the GMS. I was wondering if I'm missing something else that might have to be configured to make the connection work over SSL. I see that by coincidence, 12 hours ago some support for https was committed to master, but we prefer to stay a few releases behind latest. Perhaps @big-carpet-38439 you have a tip?
    b
    • 2
    • 6
  • b

    brief-wolf-70822

    11/23/2021, 8:10 PM
    Hey, I'm having an issue with configuring topic names. I have the following env vars set in my GMS containers:
    Copy code
    ❯ kubectl exec -n xxxxxx datahub-datahub-gms-7bfb87d7cd-7sksf -- env | grep METADATA                                                                                
    METADATA_CHANGE_EVENT_NAME=xxx.MetadataChangeEvent_v4
    METADATA_AUDIT_EVENT_NAME=xxx.MetadataAuditEvent_v4
    FAILED_METADATA_CHANGE_EVENT_NAME=xxx.FailedMetadataChangeEvent
    However, GMS startup fails with:
    Copy code
    java.lang.IllegalStateException: Topic(s) [MetadataChangeEvent_v4] is/are not present and missingTopicsFatal is true
    I also tried setting
    SPRING_KAFKA_LISTENER_MISSING_TOPICS_FATAL=false
    but that didn't seem to do anything. Any advice?
    e
    • 2
    • 16
  • l

    lively-jackal-83760

    11/24/2021, 12:23 PM
    Hey guys I tried to use the new feature OpenApi ingestion, but seems like did something wrong. I set URL and swagger_file and run ingestion, but saw Unknown error for reaching endpoint I see that ingestor tries to do GET request without any get params. As the result my API returns 400 code and ingestors fails. What do I do wrong?
    m
    s
    +2
    • 5
    • 8
  • n

    nice-country-99675

    11/24/2021, 11:36 PM
    👋 Hi Team! I'm trying to delete some dataset from my datahub instance... I ran
    datahub delete -query AuM
    there's one match for the query but it's not deleted... I was able to delete everything else but for some reason there are two dataset that refuse to be deleted 🤷 ... do you want me to provide you more debug information before I nuke the DB?
    e
    m
    • 3
    • 15
  • l

    lemon-receptionist-90470

    11/25/2021, 1:04 PM
    Hello everyone, I'm testing Datahub on K8s and I created the below ingress for GMS REST API: My configuration:
    datahub-gms:
    enabled: true
    image:
    repository: xxxxx/datahub-gms
    tag: "v0.8.14"
    service:
    type: ClusterIP
    ingress:
    enabled: true
    annotations:
    <http://cert-manager.io/cluster-issuer|cert-manager.io/cluster-issuer>: vault
    hosts:
    - host: "datahub-gms-api.xxxx.xxxx"
    paths: ["/"]
    tls:
    - secretName: datahub-gms-tls
    hosts:
    - "datahub-gms-api.xxxx.xxxx"
    My file
    custom-ingestion.yml
    source:
    type: file
    config:
    # Coordinates
    filename: output.json
    sink:
    type: "datahub-rest"
    config:
    server: "<http://datahub-gms-api.xxxx.xxxx>"
    Error When I execute
    datahub ingest -c custom-ingestion.yml --dry-run
    I have the following error:
    HTTPError: 404 Client Error: Not Found for url: <http://datahub-gms-api.xxxx.xxxx/config>
    Is something missing me? Thanks!
    m
    e
    • 3
    • 4
  • a

    abundant-flag-19546

    11/26/2021, 8:32 AM
    Hello, I’m trying to add/delete lineage with python emitter. Adding and modifying lineage works, but I cannot delete the lineage.
    Copy code
    import datahub.emitter.mce_builder as builder
    from datahub.emitter.rest_emitter import DatahubRestEmitter
    
    # Construct a lineage object.
    lineage_mce = builder.make_lineage_mce(
        [], # Empty upstream dataset to delete the lineage.
        builder.make_dataset_urn("bigquery", "test.TEST_DATASET.dev", "PROD"),
    )
    
    # Create an emitter to the GMS REST API.
    emitter = DatahubRestEmitter("<http://localhost:8080>")
    
    # Emit metadata!
    emitter.emit_mce(lineage_mce)
    How can I delete the lineage with python REST Emitter? I’m using the latest(
    v0.8.17
    ) version.
    e
    b
    +3
    • 6
    • 15
  • r

    red-pizza-28006

    11/29/2021, 3:44 PM
    after updating to latest datahub
    0.8.17.2
    , I suddenly started seeing this error in Airflow DAGs. The only change in the config is to add a simple transformer of adding dataset owners.
    Copy code
    Traceback (most recent call last):
      File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1869, in _instantiate_datasource_from_config
        ] = self._build_datasource_from_config(name=name, config=config)
      File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 1938, in _build_datasource_from_config
        config_defaults={"module_name": module_name},
      File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/data_context/util.py", line 121, in instantiate_class_from_config
        class_instance = class_(**config_with_defaults)
      File "/usr/local/airflow/.local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 64, in sqlalchemy_datasource_init
        underlying_datasource_init(self, *args, **kwargs, engine=conn)
      File "/usr/local/airflow/.local/lib/python3.7/site-packages/great_expectations/datasource/sqlalchemy_datasource.py", line 217, in __init__
        name, "ModuleNotFoundError: No module named 'sqlalchemy'"
    great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_sqlalchemy_datasource-a18b60ef-52a5-481c-a73f-769ff10a8ffe, error: ModuleNotFoundError: No module named 'sqlalchemy'
    
    During handling of the above exception, another exception occurred:
    m
    • 2
    • 15
1...789...119Latest