https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • a

    astonishing-dusk-99990

    04/10/2023, 9:13 AM
    Hi, is someone know how to make static ip on datahub frontend through ingress when deploy using helm chart? Currently my yaml on datahub frontend look like this
    Copy code
    # Set up ingress to expose react front-end
      ingress:
        enabled: true
        podAnnotations:
          <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: "gce-internal"
          <http://kubernetes.io/ingress.regional-static-ip-name|kubernetes.io/ingress.regional-static-ip-name>: "your-domain-name-internal-address"
        hosts:
        - host: your-domain-name
          http:
            paths:
              - path: /
                pathType: Prefix
                backend:
                  service:
                    name: datahub-frontend
                    port:
                      name: http
          #path: /
          #redirectPaths: []
    
      service:
        type: NodePort # ClusterIP or NodePort
        port: 9002
        targetPort: http
        protocol: TCP
        name: http
        annotations:
          <http://cloud.google.com/neg|cloud.google.com/neg>: '{"ingress": true}'
        # annotations:
        #   <http://networking.gke.io/load-balancer-type|networking.gke.io/load-balancer-type>: Internal
    Since in service section we canโ€™t using
    arg loadBalancerIP
    , is there anyway to make datahub front end from dynamic IP to static IP when we deploy using helm chart? Also when I tried to do helm upgrade it always got an error look like this
    Error: UPGRADE FAILED: error validating "": error validating data: ValidationError(Ingress.spec.rules[0].http): missing required field "paths" in io.k8s.api.networking.v1.HTTPIngressRuleValue
    Anyone know the problems and how to fix it? Notes: โ€ข Image datahub v0.10.0
    ๐Ÿ” 1
    ๐Ÿ“– 1
    โœ… 1
    l
    a
    +2
    • 5
    • 6
  • b

    best-umbrella-88325

    04/10/2023, 12:03 PM
    Hello Community! I'm trying to build the docker image for datahub-actions after making a few changes. I've created the image using the command
    Copy code
    docker build -f docker/datahub-actions/Dockerfile . --no-cache
    as mentioned in the documentation. Once I use this in my helm chart, I get the following error from the actions pod:
    Copy code
    2023/04/10 11:59:06 Waiting for: <http://datahub-datahub-gms:8080/health>
    2023/04/10 11:59:06 Received 200 from <http://datahub-datahub-gms:8080/health>
    2023/04/10 11:59:06 Error starting command: `/start_datahub_actions.sh` - fork/exec /start_datahub_actions.sh: no such file or directory
    Can someone help me with this? Thanks in advance..
    l
    a
    a
    • 4
    • 4
  • v

    victorious-planet-2053

    04/10/2023, 1:19 PM
    Hi! Tell me please, how to delete objects that was added by ingestion? On a filter page I see "This action is not supported for the selected types."
    โœ… 1
    l
    a
    • 3
    • 5
  • h

    handsome-football-66174

    04/10/2023, 5:21 PM
    Hi Team, Trying to use OpenAPI /entities endpoint to ingest Metadata. When I go through the documentation, looks like we are able to ingest one Metadata Aspect at a time , like SchemaMetadata . If we need to add tags etc to the Datasets, will this need to be ingested separately ?
    l
    a
    • 3
    • 3
  • p

    proud-printer-88070

    04/11/2023, 3:15 AM
    Hello DataHub, I am getting an error when I try to ingest a file into datahub GMS via the CLI. It seems that the issue is related to configuration (it's the first time we are trying to do this). The command I am issuing is:
    Copy code
    python3 -m datahub ingest -c source.yml
    The log is attached as
    cli-error-log.txt
    my .datahubenv looks something like this:
    Copy code
    gms:
      server: https://<<<gms-host>>>.<http://us-east-1.elb.amazonaws.com:8080|us-east-1.elb.amazonaws.com:8080>
      token: <<<token>>>
    And I can curl the following URL successfully:
    Copy code
    curl http://<<<gms-host>>>.<http://us-east-1.elb.amazonaws.com:8080/config|us-east-1.elb.amazonaws.com:8080/config>
    {
      "models" : { },
      "patchCapable" : true,
      "versions" : {
        "linkedin/datahub" : {
          "version" : "v0.10.0",
          "commit" : "cf1e627e55431fc69d72918b2bcc3c5f3a1d5002"
        }
      },
      "managedIngestion" : {
        "defaultCliVersion" : "0.10.0",
        "enabled" : true
      },
      "statefulIngestionCapable" : true,
      "supportsImpactAnalysis" : true,
      "telemetry" : {
        "enabledCli" : true,
        "enabledIngestion" : false
      },
      "datasetUrnNameCasing" : false,
      "retention" : "true",
      "datahub" : {
        "serverType" : "prod"
      },
      "noCode" : "true"
    }
    I looked at this post: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#your-proxy-appears-to-only-use-http-and-not-https In my setup, there are no env vars setup for HTTP_PROXY or HTTPS_PROXY. The error happens when trying to access the /config endpoint and says
    try changing your proxy URL to be HTTP
    GMS is installed in a kubernetes pod, in a production environment and we are in a VPN while running the above commands. Thanks !
    cli-error-log.txt
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    b
    a
    • 4
    • 5
  • m

    mysterious-scooter-52411

    04/11/2023, 7:27 AM
    ./gradlew quickstart takes more than 30 minutes to execute. Is this normal ? Is there a way to make it fast
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    a
    • 3
    • 2
  • c

    colossal-waitress-83487

    04/11/2023, 10:51 AM
    Hello DataHub,How to query all Ingestionsources using graphql or other means
    ๐Ÿ” 1
    ๐Ÿ“– 1
    l
    a
    • 3
    • 3
  • e

    elegant-salesmen-99143

    04/11/2023, 12:12 PM
    Hi all. We recently upgraded our stage environment from 0.9.6.1 to 10.1 and after that it seems like entities that have been soft-deleted are appearing again as if they've never been deleted. Any idea what might have caused that, and how can we get them back to being soft-deleted? We're using kubernetes and datahub helm chart, and restore-indices job has run successfully
    ๐Ÿ” 1
    ๐Ÿ“– 1
    l
    a
    +2
    • 5
    • 15
  • e

    eager-animal-48107

    04/11/2023, 4:27 PM
    Hi Team, We are getting following error when we try to ingest from iceberg.
    l
    a
    • 3
    • 5
  • e

    eager-animal-48107

    04/11/2023, 4:28 PM
    Copy code
    ERROR: could not serialize access due to concurrent update  Call getNextException to see other errors in the batch.
    	at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:165)
    	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
    	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:559)
    	at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:887)
    	at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:910)
    	at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1649)
    	at io.ebean.datasource.delegate.PreparedStatementDelegator.executeBatch(PreparedStatementDelegator.java:357)
    	at io.ebeaninternal.server.persist.BatchedPstmt.executeAndCheckRowCounts(BatchedPstmt.java:130)
    	at io.ebeaninternal.server.persist.BatchedPstmt.executeBatch(BatchedPstmt.java:97)
    	at io.ebeaninternal.server.persist.BatchedPstmtHolder.flush(BatchedPstmtHolder.java:124)
    	at io.ebeaninternal.server.persist.BatchControl.flushPstmtHolder(BatchControl.java:206)
    	at io.ebeaninternal.server.persist.BatchControl.executeNow(BatchControl.java:220)
    	at io.ebeaninternal.server.persist.BatchedBeanHolder.executeNow(BatchedBeanHolder.java:100)
    	at io.ebeaninternal.server.persist.BatchControl.flush(BatchControl.java:271)
    	at io.ebeaninternal.server.persist.BatchControl.flush(BatchControl.java:227)
    	at io.ebeaninternal.server.transaction.JdbcTransaction.batchFlush(JdbcTransaction.java:678)
    	... 101 common frames omitted
    Caused by: org.postgresql.util.PSQLException: ERROR: could not serialize access due to concurrent update
    	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
    	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365)
    	... 115 common frames omitted
  • f

    flat-engineer-75197

    04/11/2023, 5:26 PM
    ๐Ÿ‘‹ Is there a way to pull all glossary terms via the Python SDK? The closest thing Iโ€™ve seen is this but it is entity-specific. I want to grab ALL terms. https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L206
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    a
    • 3
    • 2
  • c

    cuddly-butcher-39945

    04/11/2023, 7:10 PM
    Hi team, experiencing this issue on my Docker Desktop environment on a Mac M1. @brainy-tent-14503 Iโ€™m posting the image to see if this can help based on our conversation this morning, thanks!!
    ๐Ÿ“– 1
    ๐Ÿ” 1
    โœ… 1
    l
    a
    • 3
    • 7
  • b

    best-eve-12546

    04/11/2023, 9:24 PM
    Hi yโ€™all, not sure if I missed any documentation, but Iโ€™m trying to use
    datahub delete
    to delete datasets with a specific schema. Looking at https://datahubproject.io/docs/how/delete-metadata/ it looks like it supports a query operator, but I couldnโ€™t figure out exactly how to use it. i.e. Iโ€™m trying to do something like
    Copy code
    datahub delete --enitity_type dataset --env PROD --query "thisschema"
    To delete
    Copy code
    urn:li:dataset:(urn:li:dataPlatform:platform,thisschema.table1,PROD)
    urn:li:dataset:(urn:li:dataPlatform:platform,thisschema.table2,PROD)
    but NOT
    Copy code
    urn:li:dataset:(urn:li:dataPlatform:platform,wrong_schema.thisschema,PROD)
    The query operator seems to match all 3 since the target string is in the table-name. Is this possible?
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    a
    a
    • 4
    • 15
  • i

    incalculable-zebra-69091

    04/12/2023, 3:55 AM
    Hi Team, I'm trying to run
    datahub docker quickstart --version=v0.10.1
    (datahub version 0.10.1). but when i sign in (GUI) have error /track and /login. i check log container datahub-frontend-react have error "[kafka-producer-network-thread | datahub-frontend] WARN o.apache.kafka.clients.NetworkClient - [Producer clientId=datahub-frontend] Connection to node -1 (broker/172.18.0.6:29092) could not be established. Broker may not be available", and datahub-gms have error. I need to be able to sign in ?
    ๐Ÿ” 1
    plus1 1
    ๐Ÿ“– 1
    ๐Ÿค’ 1
    l
    f
    +2
    • 5
    • 35
  • a

    able-city-76673

    04/12/2023, 6:01 AM
    https://datahubspace.slack.com/archives/CV2UVAPPG/p1681279205072399
    โœ… 1
    l
    a
    • 3
    • 2
  • m

    microscopic-room-90690

    04/12/2023, 6:59 AM
    Hi team, for some reason, I use v0.9.6.1 in dev env and use v0.8.43 in prod env and there are 3000 tables in dev and 5000 tables in prod from hive source. To ingest the metadata into Datahub, it takes about 1h in dev, while more than 5 days in prod. I'm wondering what caused the huge difference. Does it have anything to do with the version or how should I troubleshoot?
    Copy code
    [2023-03-31 14:47:57,830] INFO     {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.43
    [2023-04-04 10:50:44,875] INFO     {datahub.cli.ingest_cli:137} - Finished metadata ingestion
    Command exiting with ret '0'
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    a
    a
    • 4
    • 4
  • f

    few-carpenter-93837

    04/12/2023, 9:54 AM
    Hi, can anyone confirm that with DataHub Tableau integration, they have successfully got the new project_patterns to work (using the allow, deny configurations in recipe)?
    l
    a
    g
    • 4
    • 4
  • s

    steep-fountain-54482

    04/12/2023, 11:04 AM
    hi, iยดm getting this error when trying to capture lineage on a project ... it fails before my dispatcher is even called
    l
    a
    • 3
    • 2
  • s

    steep-fountain-54482

    04/12/2023, 11:04 AM
    Copy code
    23/04/12 10:29:22 ERROR SplineAgent: Unexpected error occurred during lineage processing for application: launcher #00f9a8uvf3tjqt09
    java.lang.IllegalStateException: WithField.dataType should not be called.
  • b

    bland-orange-13353

    04/12/2023, 12:16 PM
    This message was deleted.
    โœ… 1
    l
    • 2
    • 1
  • b

    bland-orange-13353

    04/12/2023, 12:23 PM
    This message was deleted.
    โœ… 1
    l
    • 2
    • 1
  • w

    wide-afternoon-79955

    04/12/2023, 4:25 PM
    Hi All, I am trying to push GMS pod logs to a mounted location hence,
    Copy code
    datahub-gms:
      extraEnvs:
        - name: LOG_DIR
          value: /tmp/datahub-gms/log/
    but the logback file does not seems to pickup the env var LOG_DIR
    l
    a
    • 3
    • 2
  • h

    hallowed-lizard-92381

    04/12/2023, 6:20 PM
    Iโ€™m seeing inconsistencies between the results returned via graphql call initiated from the webapp and that returned when executing query from igraphql. For example: Web/frontend shows โ€˜no roleโ€™ for these two users, but the graphql response shows โ€˜Adminโ€™ role Anyone have similar experience or have a recommendation?
    ๐Ÿ” 1
    ๐Ÿ“– 1
    l
    a
    • 3
    • 4
  • c

    cuddly-butcher-39945

    04/12/2023, 6:56 PM
    Hello Everyone. I've experienced an issue with Snowflake ingestion failing when it used to work. Here are my environment details: *****Environment********** Kubernetes deployment on AWS DataHub CLI version: 0.9.5 Python version: 3.7.10 (default, Jun 3 2021, 000201) [GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] *****Ingestion Method********** I am trying both CLI and UI ingestions of my snowflake environment. *****Error********** datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type snowflake: snowflake is disabled; *****Debugging Step********** datahub check plugins --verbose snowflake (disabled) ModuleNotFoundError("No module named 'great_expectations.datasource.sqlalchemy_datasource'") *****Debugging Step********** pip3 list | grep SQLAlchemy Flask-SQLAlchemy 2.5.1 SQLAlchemy 1.4.40 SQLAlchemy-JSONField 1.0.0 SQLAlchemy-Utils 0.38.3 *****Debugging Step********** pip3 install 'acryl-datahub[sqlalchemy]' ---Requirement already satisfied: acryl-datahub[sqlalchemy] in /home/joshua.garza/.local/lib/python3.7/site-packages (0.9.5) *****Debugging Step********** pip3 install --upgrade great_expectations ---Requirement already satisfied: great_expectations in /home/joshua.garza/.local/lib/python3.7/site-packages (0.16.6) *****Debugging Step********** datahub check plugins --verbose ---snowflake (disabled) ModuleNotFoundError("No module named 'great_expectations.datasource.sqlalchemy_datasource'") Not sure what else to do here. Thanks in advance!
    ๐Ÿ” 1
    ๐Ÿ“– 1
    โœ… 1
    l
    a
    +2
    • 5
    • 18
  • e

    elegant-salesmen-99143

    04/12/2023, 8:06 PM
    I have a working API query that gets me the name of the container and number of entities in it. But I also want to get the description of a container (aka Documentation). How do i get it? I've tried putting
    description
    under
    name
    in the query, but it returns
    null
    , even though documentation is not empty for this container. Is it called something different? the property like 'documentation' is not found
    Copy code
    {container(urn:"urn:li:container:XXX") {
      properties
      {name
      description
      } 
      entities{
        total
        start
      }
    }}
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    a
    a
    • 4
    • 9
  • m

    microscopic-room-90690

    04/13/2023, 3:37 AM
    Hi team, when ingesting hive metadata into datahub (v0.9.6.1), the execution log confuse me. It shows 42 tables are ingested in about 1min, while it takes 8min to ingest another 3 tables! Anyone can help?
    Copy code
    source:
      type: hive
      config:
        host_port: localhost:10000
        database_alias: hive
        schema_pattern:
          allow: ["^web_hudi$"]
            
    sink:
      type: "datahub-rest"
      config:
        server: ${datahub_server}
        token: ${token}
    ๐Ÿ“– 1
    ๐Ÿ” 1
    l
    a
    a
    • 4
    • 5
  • b

    busy-analyst-35820

    04/13/2023, 3:57 AM
    Hi Team, Can anyone help us on this https://datahubspace.slack.com/archives/C029A3M079U/p1680678159704919
    ๐Ÿ“– 1
    โœ… 1
    ๐Ÿ” 1
    l
    a
    +2
    • 5
    • 9
  • b

    better-fireman-33387

    04/13/2023, 8:41 AM
    Hi all, I am using Datahub with helm deployment and was moving it to use our own elastic instance (ver 7.17.3). though itโ€™s working Iโ€™m getting some errors (inside the thread) also I canโ€™t see any index template was created and my datahub usage event index name is
    datahub_datahub_usage_event
    (I set datahub prefix for all indices) could anyone assist please?
    l
    f
    +2
    • 5
    • 27
  • b

    bland-orange-13353

    04/13/2023, 10:28 AM
    This message was deleted.
    l
    • 2
    • 1
  • f

    future-holiday-32084

    04/13/2023, 10:30 AM
    Hi Folks, I'm new to DataHub. When using DataHub Spark Lineage (io.acryldatahub spark lineage0.10.1-1) with a Spark job, it ingests lineage perfectly. However, in the MySQL DataHub database, the "createdby" field shows "urnlicorpuser:__datahub_system". As a result, I cannot remove the lineage manually through the DataHub UI. Could anyone please provide a solution? Additionally, when executing this write command
    Copy code
    spark.sql("select * from <database>.<table_source>").write.mode("append").format("parquet").saveAsTable("<database>.<table_sink>")
    The lineage, as shown in the image below, has been inferred perfectly for the sink table. However, the source table displays the location on my Hadoop Data Lake, even though I'm reading from a table, not a path.
    a
    r
    • 3
    • 2
1...888990...119Latest