https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • m

    mysterious-advantage-78411

    05/31/2023, 2:25 PM
    Hi, could somebody help with Vertica ingestion? what setting shoud be set to enable profiling? I know there is a documend about it https://datahubproject.io/docs/generated/ingestion/sources/vertica but i just don't undertand how to spell it properly in yaml: _profiling._enabled: true didn't work... Thanks for answer
    s
    a
    • 3
    • 3
  • m

    microscopic-lizard-81562

    06/01/2023, 7:20 AM
    Hello everyone, can you please advice me about the following problem? When I start DataHub via the quickstart command
    datahub docker quickstart
    I can successfully start it on a Ubuntu EC2 instance from AWS. However, when I do it like this DataHub will be started with the root user "datahub datahub" for the frontend. This is not very secure. Therefore I want to change the docker-compose.yml file at datahub/quickstart to add a volume for the datahub-frontend/conf folder in the datahub-frontend-react container. This way I can change the user.props file once and it will be used whenever DataHub is restarted. I successfully changed the .yml file but when I run
    docker compose up
    the broker container always exits and interrupts the startup.
    dependency failed to start: container broker exited (1)
    I checked the log to see what the issue is:
    kafka.common.InconsistentClusterIdException: The Cluster ID b6cE4L94QtOEZYqg09wdYg doesn't match stored clusterId Some(n9TOunRIRL2gkIzUX9WiCg) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
    Is there a known way how I can make sure that the kafka Cluster ID matches the one that is stored?
    b
    • 2
    • 3
  • c

    crooked-state-81977

    06/01/2023, 8:40 AM
    Hi Everyone, I am trying to get access token from GMS API. Datahub is configured with Keycloak. When token request is sent to datahub, 401 status is returned. Below is the curl trace.
    ✅ 1
  • c

    crooked-state-81977

    06/01/2023, 8:41 AM
    user@:~$ curl --verbose --insecure --location --request POST 'https//<>9002/api/v2/graphql' \
    --header 'X-DataHub-Actor: urnlicorpuser:datahub' \
    --header 'Content-Type: application/json' \
    --data-raw '{ "query":"mutation { createAccessToken(input: { type: PERSONAL, actorUrn: \"urnlicorpuser:datahub\", duration: ONE_HOUR, name: \"my personal token\" } ) { accessToken metadata { id name description} } }", "variables":{}}'
    Note: Unnecessary use of -X or --request, POST is already inferred. * Trying ip:9002... * TCP_NODELAY set * Connected to ip (ip) port 9002 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/certs/ca-certificates.crt CApath: /etc/ssl/certs * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 * ALPN, server did not agree to a protocol * Server certificate: * subject: <> * start date: Mar 29 075254 2023 GMT * expire date: Jan 7 075254 2025 GMT * issuer: <> * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
    POST /api/v2/graphql HTTP/1.1
    Host: <>
    User-Agent: curl/7.68.0
    Accept: /
    X-DataHub-Actor: urnlicorpuser:datahub
    Content-Type: application/json
    Content-Length: 223
    * upload completely sent off: 223 out of 223 bytes * Mark bundle as not supporting multiuse < HTTP/1.1 401 Unauthorized < Date: Thu, 01 Jun 2023 082932 GMT < Content-Length: 0 <
    g
    • 2
    • 1
  • c

    crooked-state-81977

    06/01/2023, 8:43 AM
    However, I can login to datahub web UI and generate access token. I do not want to use UI because we have a backend that will invoke APIs to datahub so we can't hard code the token that was generated from web UI.
    m
    g
    • 3
    • 2
  • e

    early-hydrogen-27542

    06/01/2023, 1:32 PM
    👋 team! There are a lot of threads around this issue for Redshift ingestion:
    Failed to find a registered source for type redshift: 'str' object is not callable
    , with advice ranging from upgrading DataHub itself to pinning a sqlparse version to other solutions. Is there a definitive recommendation on how to fix this issue? I was planning on upgrading from 0.10.1 to 0.10.3 to fix it, but it's not clear that it will actually fix it.
    ✅ 1
    m
    s
    • 3
    • 6
  • a

    adorable-sugar-76640

    06/01/2023, 7:06 PM
    Hi Community, I've noticed that my DataHub is currently utilizing the root volume of the node to run its services. As a result, the volume has run out of spzce. To address this issue, i have set up a new StorageClass and make it as default storage class. However, I'm unsure why DataHub is still using the root volume instead of new StorageClass. I attempted to add the persistence section to the datahub-gms section, but it still utilizes the root volume. Interestingly, I created a simple pod and configured it to use the StorageClass I created. The pod successfully ran the service on the specified storageclass, confirming that the storage class is properly configured. Do you have any suggestions on how to resolve this issue? I've attached the screenshot of the settings for your reference. Thanks
    ✅ 1
    b
    • 2
    • 3
  • e

    early-hydrogen-27542

    06/01/2023, 8:25 PM
    In upgrading to 0.10.3, I noticed the Redshift logo now looks like the below. Is there a way to get the actual Redshift logo back?
    ✅ 2
    g
    d
    • 3
    • 4
  • f

    fast-vegetable-81275

    06/01/2023, 9:21 PM
    hi All, I am trying to ingest a CSV file to datahub
    localhost:8080
    from my local machine. Below is the yaml file I have created named as `csvingestion.dhub.yaml`:
    source:
    type: csv-enricher
    config:
    # relative path to your csv file to ingest
    filename: .\path\to\file\census_income_morethan50K.csv
    sink:
    type: "datahub-rest"
    config:
    server: "<http://localhost:8080>"
    I am getting the below error when I do the ingestion using the command
    python3 -m datahub ingest -c .\path\to\file\csvingestion.dhub.yaml
    I have also done installations using
    python3 -m pip install acryl-datahub[csv]
    and
    python3 -m pip install acryl-datahub[csv-enricher]
    Please advise what should be done. Also, if there is another effective way to ingest local CSV file please let me know. Thanks in advance!
    ✅ 1
    b
    g
    • 3
    • 4
  • p

    proud-lamp-13920

    06/02/2023, 8:07 AM
    Hi all, A new error occurred today. Data Hub is using v0.10.2.2. We are currently running scheduling by setting ingestion in the UI. We are using bigquery for ingestion. If i access the table detail page from the ui I see an error message like first attach file. If i look at the network log in the chrome developer console, it shows as second attach file. I upgraded to v0.10.3 to see if it would be resolved by upgrading the version, and immediately executed ingestion, but the error is not resolved. Could you help me?
    m
    g
    • 3
    • 4
  • l

    loud-hospital-37195

    06/02/2023, 11:17 AM
    Good morning, I am trying to implement SSO on my datahub hosted on an AKS. However, for it to work I have to change the url of the datahub from http to https - how can I do this?
    ✅ 1
    g
    a
    • 3
    • 2
  • a

    agreeable-address-71270

    06/02/2023, 10:13 PM
    Hi folks! I recently ran into an issue of losing data being shown on the frontend. The issue started when I tried ingesting a dbt source and I believe that caused my small elasticsearch instance to crash (ES and MYSQL are being hosted on AWS, rest of containers are ran on ECS). I increased my ES instance, and restarted the containers. When I logged back into the datahub frontend I am missing all data that was there previously. I am able to use my user I had created previously. But I cannot see the users I created in the admin page, as well as my other team member users. Also the demo data I ingested does not show up, and when I try re-ingesting the demo data it succeeds on the cli, but nothing shows up in the datahub frontend. My guess is that that GMS know I had ingested previously, but the frontend is not showing any info on those datasets. Any ideas? I am at the early stages of testing out datahub within the team.
    m
    • 2
    • 6
  • p

    powerful-shampoo-81990

    06/03/2023, 3:16 AM
    Hello Datahub team, How can I delete specific groups AD with determined prefix using datahub cli?
    ✅ 1
    g
    • 2
    • 1
  • s

    straight-spoon-27189

    06/03/2023, 5:53 PM
    hey team I'm trying to run locally the datahub and see that
    broker
    has not been started. I found following error message there:
    Copy code
    2023-06-03 20:48:41 [2023-06-03 17:48:41,171] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
    2023-06-03 20:48:41 kafka.common.InconsistentClusterIdException: The Cluster ID BMDP-W0MR6aquQ37huXxUw doesn't match stored clusterId Some(AQAbmr5wQuWXANgHiNq9GA) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
    2023-06-03 20:48:41     at kafka.server.KafkaServer.startup(KafkaServer.scala:230)
    2023-06-03 20:48:41     at kafka.Kafka$.main(Kafka.scala:109)
    2023-06-03 20:48:41     at kafka.Kafka.main(Kafka.scala)
    I tried to restart few times but no luck. I'm using the version
    0.10.3.1
    any suggestion?
    ✅ 1
    b
    h
    • 3
    • 3
  • s

    shy-dog-84302

    06/04/2023, 5:10 AM
    Hi! I am experiencing the following problem with DataHub cleanup job(logs in 🧵) which is deployed in k8s through Helm charts on version 0.10.3
    ✅ 1
    g
    • 2
    • 2
  • s

    straight-spoon-27189

    06/04/2023, 11:34 AM
    hey team, here the example how do I create the URN with java api
    Copy code
    new DatasetUrn(new DataPlatformUrn("delta-lake"), "entities/user", FabricType.TEST)
    // or with codes
    new DatasetUrn(new DataPlatformUrn("delta-lake"), "entities\u002Fuser", FabricType.TEST)
    as a result slash is not considered while if this entity would be pulled directly by datahub the slash would be considered and we would see one more item in the path.
    g
    • 2
    • 14
  • b

    better-fireman-33387

    06/04/2023, 12:28 PM
    Hi, how can I change, watch or edit the secrets for ingestion?
    ✅ 1
    g
    • 2
    • 6
  • b

    bland-orange-13353

    06/05/2023, 1:31 AM
    This message was deleted.
    ✅ 1
    g
    s
    • 3
    • 2
  • m

    millions-cat-71706

    06/05/2023, 6:03 AM
    I'm having problems setting up a clean instance from scratch following the quickstart instructions. This is on a fresh ubuntu 22 running in aws, only service running is ssh, Installed docker (24.0.2) and docker compose following docker's instructions. Maybe I'm not being patient enough but it is stuck on this for some time now. A couple weeks ago I installed and ran it on my local ubuntu 18. Checking the mysql container log, it doesn't show any errors. I tried on aws linux as well and see the same result. What am I missing here?
    ✅ 1
    s
    g
    • 3
    • 4
  • b

    brainy-jewelry-96288

    06/05/2023, 6:21 AM
    Hello team , I am facing the below error when I run
    Copy code
    datahub docker quickstart.
    my m/c is Apple M1 Pro with macod 13.2.1 (22D68)
    Copy code
    [+] Running 8/8
     ✔ Container zookeeper            Healthy                                                                                                                                             0.5s 
     ✔ Container mysql                Healthy                                                                                                                                             0.5s 
     ✔ Container elasticsearch        Healthy                                                                                                                                             0.5s 
     ✔ Container broker               Healthy                                                                                                                                             1.5s 
     ✔ Container mysql-setup          Exited                                                                                                                                              2.4s 
     ✔ Container elasticsearch-setup  Exited                                                                                                                                            121.3s 
     ✔ Container schema-registry      Healthy                                                                                                                                             1.5s 
     ✔ Container kafka-setup          Exited                                                                                                                                              2.3s 
    service "elasticsearch-setup" didn't complete successfully: exit 1
    .
    Unable to run quickstart - the following issues were detected:
    - datahub-frontend-react is not running
    - datahub-actions is not running
    - datahub-gms is not running
    - datahub-upgrade is still running
    - elasticsearch-setup exited with an error
    ✅ 1
    g
    • 2
    • 7
  • s

    square-football-37770

    06/05/2023, 7:34 AM
    Hi! While ingesting BQ data I keep getting many such `WARNING`s, not sure what they mean?
    Copy code
    '[2023-06-05 07:27:39,530] WARNING  {datahub.ingestion.source.bigquery_v2.usage:854} - Unable to parse <class '
               "'google.cloud.logging_v2.entries.ProtobufEntry'> missing read principalEmail, missing query serviceData missing v2 jobChange for "
               "ProtobufEntry(log_name='projects/my-project/logs/cloudaudit.googleapis.com%2Fdata_access', labels=None, insert_id='-nant09dgh1m', "
               "severity='INFO', http_request=None, timestamp=datetime.datetime(2023, 6, 5, 7, 2, 11, 831792, tzinfo=datetime.timezone.utc), "
    ✅ 1
    g
    • 2
    • 1
  • b

    busy-honey-716

    06/05/2023, 7:54 AM
    Hi, there , After installation using datahub docker quickstart cannot start it locally. Following errors occur. please support on trouble shooting.
    ✅ 1
  • b

    busy-honey-716

    06/05/2023, 8:01 AM
    image.png
    ✅ 1
    s
    g
    • 3
    • 2
  • b

    brief-ability-41819

    06/05/2023, 12:33 PM
    Hello, Fresh installation of DH 0.10.0 on EKS - empty storages (RDS, MSK, OpenSearch). Any ideas why
    datahub-gms
    pod throws:
    Copy code
    2023-06-05 08:07:17,373 [ThreadPoolTaskExecutor-1] ERROR o.s.k.l.KafkaMessageListenerContainer$ListenerConsumer:149 - Consumer exception
    java.lang.IllegalStateException: This error handler cannot process 'SerializationException's directly; please consider configuring an 'ErrorHandlingDeserializer' in the value and/or key deserializer at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:194)
    It’s constantly on 0/1 readyness status, without it
    datahub-actions
    pod cannot start.
    ✅ 1
    o
    g
    f
    • 4
    • 4
  • c

    creamy-ram-28134

    06/05/2023, 1:57 PM
    Hi Team I was trying to update datahub and am running into this issue in the update job - does anyone know how to fix this ? ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2
    ✅ 1
    g
    • 2
    • 2
  • w

    witty-wall-84488

    06/05/2023, 2:19 PM
    Has anyone managed to find a workaround to use GreatExpectations checks in Datahub on sources that do not work with SqlAlchemyExecutionEngine, for example - s3. There is an open query but it is unclear how quickly it can be accepted into work. https://feature-requests.datahubproject.io/p/great-expectations-support-different-execution-engines
  • a

    adamant-honey-44884

    06/05/2023, 9:52 PM
    Hello. When using the Spark Lineage in AWS, if we are using glue as our Hive Metastore do we have to use the Glue source for the datasets or does using an Athena source? I have been trying to get it working with Athena but the pipeline/tasks are not connected to the Athena imported datasets. Thanks in advance for the help.
    g
    • 2
    • 5
  • d

    dazzling-rainbow-96194

    06/06/2023, 3:42 AM
    Hello! Can snowflake be used as the base storage component in datahub? If yes is there a documentation for how we can use snowflake instead of MySQL?
    ✅ 1
    m
    • 2
    • 1
  • b

    bland-orange-13353

    06/06/2023, 4:54 AM
    This message was deleted.
    ✅ 1
    g
    • 2
    • 1
  • b

    brief-afternoon-9651

    06/06/2023, 8:15 AM
    Hello! While deploying DataHub with Docker, mysql and broker container is giving the following error.
    m
    a
    • 3
    • 10
1...99100101...119Latest