https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • l

    loud-camera-71352

    11/29/2021, 3:53 PM
    Hi guys! Is it possible to add a tag to a column via curl ? I tried this but I get a 400 error “Cannot parse request entity”
    Copy code
    curl '<http://localhost:8080/entities?action=ingest>' -X POST --data '{
    	"entity": {
    		"value": {
    			"com.linkedin.metadata.snapshot.DatasetSnapshot": {
    				"urn": "urn:li:dataset:(urn:li:dataPlatform:exasol,main.dds.test2,PROD)",
    				"aspects": [
    					{
    						"com.linkedin.schema.EditableSchemaMetadata":[
                				"editableSchemaFieldInfo":{
    								"fieldPath": "member_id",
    								"globalTags": { "tags": [{ "tag": "urn:li:tag:PII" }] }
                				}
        					]
    					}
    				]
    			}
    		}
    	}
    }'
    m
    • 2
    • 3
  • c

    cool-painting-92220

    11/29/2021, 7:22 PM
    Hi everyone! I'm still ramping up on learning Datahub and had question about the ingested metadata. If I needed to shutdown the datahub server (
    datahub docker nuke
    ) but wanted to save all the metadata I had previously ingested so that I don't have to run the ingestion again when I started datahub back up again, what would be the best way to approach this? Additionally, if I had used Datahub's UI to add some descriptions and documentation to a few tables, how could this data be saved if the server were to be shutdown? Thank you for any guidance that can be provided! 😄
    e
    • 2
    • 4
  • p

    polite-flower-25924

    11/29/2021, 9:51 PM
    Hey folks, I’m not able to see the owned datasets for a specific group. Even though, several entities are assigned to this group (event-tracking), they don’t appear in the Ownership part. 😕
    b
    • 2
    • 11
  • p

    plain-farmer-27314

    11/30/2021, 3:55 PM
    hey all - we are looking into leverage datahub's lineage for some backend processes/alerting curious as to what the most efficient method is to fetch all downstream of entities of a given type, or upstream might be? Our use case is: Table X is experience ETL delays and we want to determine which charts/dashboards are impacted by this in Looker? I'm currently experimenting with the graphql endpoint but it seems like a lot of computation to do for each table
    b
    l
    • 3
    • 4
  • c

    cool-painting-92220

    11/30/2021, 9:09 PM
    Hey everyone! I've been trying to read up on Datahub's architecture and its storage of data and was wondering - if the server hosting Datahub were to suddenly crash, is there any data (either ingested metadata or information that users have contributed to on the platform like documentation, tags, etc.) that would be at risk of being lost? And what would be the best setup for ensuring that this information isn't lost if any is at risk? I've come across the following link for restoring search and graph indices, but wasn't sure about the rest of the data in Datahub (https://datahubproject.io/docs/how/restore-indices)
    b
    • 2
    • 3
  • n

    numerous-translator-7230

    12/01/2021, 3:26 AM
    Hi everyone! I've been trying to set up DataHub on AWS ECS. There are two ports, Frontend:9002 and GMS:8080, needed to be exposed through load balancer. Is there anybody who figured out the same issue?
    b
    e
    • 3
    • 5
  • r

    red-pizza-28006

    12/01/2021, 10:52 AM
    trying to delete a schema, I am getting this error
    datahub delete -n --query "fivetran_headscarf_hurray_staging"
    . Any ideas?
    Copy code
    ---- (full traceback above) ----
    File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/datahub/entrypoints.py", line 95, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
    File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
    File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
    File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
    File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
    File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/delete_cli.py", line 148, in delete
        deletion_result = delete_with_filters(
    File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/delete_cli.py", line 211, in delete_with_filters
        batch_deletion_result.merge(one_result)
    File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/delete_cli.py", line 51, in merge
        self.sample_records.extend(another_result.sample_records)
    
    AttributeError: 'FieldInfo' object has no attribute 'extend'
    s
    • 2
    • 9
  • h

    handsome-football-66174

    12/01/2021, 6:07 PM
    Hi everyone - Trying out the GraphQL API. I am able to use this to get the list of users. How to also get the relationships ?
    {
    listUsers(input: {  start: 0, count: 10 }){
    start
    count
    total
    users {
    urn
    type
    username
    status
    properties {
    displayName
    email
    title
    departmentId
    departmentName
    firstName
    lastName
    fullName
    countryCode
    }
    editableProperties {
    aboutMe
    pictureLink
    }
    }
    }
    }
    i
    g
    • 3
    • 4
  • b

    bland-orange-13353

    12/01/2021, 7:33 PM
    This message was deleted.
    e
    r
    • 3
    • 12
  • r

    refined-branch-44251

    12/02/2021, 1:03 AM
    Copy code
    curl --location --request POST '<http://localhost:8080/entities?action=search>' \
    --header 'X-RestLi-Protocol-Version: 2.0.0' \
    --header 'Content-Type: application/json' \
    --data-raw '{
        "input": "glossaryTerms:Classification.Sensitive",
        "entity": "dataset",
        "start": 0,
        "count": 10
    }'
    this returns all datasets with the glossary term 'Classification.Sensitive'. is there a way to search datasets where this glossary term has been applied to fields of the dataset (and not the dataset)?
    e
    • 2
    • 8
  • a

    abundant-flag-19546

    12/02/2021, 3:56 AM
    I’m trying to make MLExperiment entity to ingest from mlflow. (There is already Pull Request https://github.com/linkedin/datahub/pull/2725, but I need to make a lineage like ‘dataset -> consumes -> MLExperiment -> trained by -> MLModel’ so I’m trying to implement MLExperiment entity.) I implemented these files: •
    MLExperimentUrn.java
    ,
    MLExperimentUrn.pdl
    in li-utils, •
    MLExperimentKey.pdl
    MLExperimentSnapshot.pdl
    MLExperimentProperties.pdl
    MLExperimentAspect.pdl
    • and registered these items in
    Aspect.pdl
    and
    Snapshot.pdl
    When I tried to build with the command
    Copy code
    COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build
    I got these error:
    Copy code
    #10 235.9 > Task :metadata-service:restli-impl:checkRestModel FAILED
    #10 235.9 [checker]
    #10 235.9 [checker] idl compatibility report:
    #10 235.9 [checker] Incompatible changes:
    #10 235.9 [checker]   1) /collection/actions/batchIngest/parameters/entities/type: new union added members com.linkedin.metadata.snapshot.MLExperimentSnapshot
    #10 235.9 [checker]   2) com.linkedin.entity.Entity/value/com.linkedin.metadata.snapshot.Snapshot/ref/union: new union added members com.linkedin.metadata.snapshot.MLExperimentSnapshot, breaks old readers
    #10 235.9 [checker]
    #10 235.9 [checker] [RS-COMPAT]: false
    #10 235.9 [checker] [MD-COMPAT]: false
    #10 235.9 [checker] [RS-I]:/collection/actions/batchIngest/parameters/entities/type: new union added members com.linkedin.metadata.snapshot.MLExperimentSnapshot
    #10 235.9 [checker] [MD-I]:com.linkedin.entity.Entity/value/com.linkedin.metadata.snapshot.Snapshot/ref/union: new union added members com.linkedin.metadata.snapshot.MLExperimentSnapshot, breaks old readers
    #10 235.9 [checker]
    #10 235.9
    #10 235.9 FAILURE: Build failed with an exception.
    So I tried this workaround (found from official docs)
    Copy code
    ./gradlew :gms:impl:build -Prest.model.compatibility=ignore
    but makes ‘project gms not found’ error. 1. Is this correct way to implement new Entity? (Creating custom urn java code, Registering items in aspect.pdl and snapshot.pdl) 2. How can I ignore that error while building docker images?
    e
    b
    g
    • 4
    • 32
  • q

    quaint-branch-37931

    12/02/2021, 10:15 AM
    Hey all, I'm trying to ingest data using the REST sink. This seems to work fine, but afterwards the UI still shows up empty. There are no errors in the gms or react-webapp logs, and when I check the backing postgres database the data does seem to be there. I'm running react-webapp and gms version v0.8.17, backed by postgres and AWS managed ES and kafka. Any ideas on how I could track down the issue?
    s
    b
    e
    • 4
    • 20
  • f

    full-area-6720

    12/02/2021, 11:14 AM
    I am getting this error while trying to ingest business glossary. This is the sample file provided.
    s
    • 2
    • 9
  • r

    refined-apple-6340

    12/02/2021, 2:58 PM
    i have a self signed cert using opensearch for the elasticsearch-start the wait does not use a curl -k so it fails waiting on opensearch any ideas?
    e
    • 2
    • 6
  • a

    aloof-forest-55926

    12/02/2021, 7:31 PM
    1. hi, i'm new in datahub and i get this error (Failed to log in! SyntaxError: Unexpected token < in JSON at position 0) while trying to sign in to http://localhost:9002 using 
    datahub
     as both the username and password
    g
    b
    • 3
    • 2
  • b

    bulky-controller-34643

    12/03/2021, 2:04 AM
    Hi all, I'm new to datahub and I recently install the datahub for test using the helm chart version on local kubernetes, I follow the install guideline here(link) and didn't change any settings, and the installation looks fine. However, when I login to the website I found something different to the demo site(link), on my site, there is no settings button on the right top. Do I need to enable or install something to get it visible? I would like to use the access token feature in the settings. Thanks for any helps.
    plus1 1
    e
    b
    a
    • 4
    • 16
  • a

    ambitious-guitar-89068

    12/03/2021, 6:35 AM
    Hi Folks, trying to follow this document, set the two environment variables in frontend and gms and not seeing the settings menu after docker containers restart… https://datahubproject.io/docs/introducing-metadata-service-authentication/
    • 1
    • 1
  • b

    best-crayon-19865

    12/03/2021, 1:26 PM
    Hi all. I'm trying to use lineage backend and get error when trying to run airflow tasks. I set url in AWS SSM as
    Copy code
    <http://datahub.dwh-stage.corp.loc>
    Did someone have similar error?
    Copy code
    ERROR - ('Unable to emit metadata to DataHub GMS', {'message': "Invalid URL 'datahub.dwh-stage.corp.loc/entities?action=ingest': No schema supplied. Perhaps you meant <http://datahub.dwh-stage.corp.loc/entities?action=ingest>?"})
    s
    • 2
    • 2
  • a

    ambitious-vegetable-3452

    12/03/2021, 2:06 PM
    Hey folks, I've been trying to install the prerequisites for datahub on our EKS cluster with the helm charts. It fails to create the ElasticSearch instances and with the following error:
    Copy code
    {
      "type": "server",
      "timestamp": "2021-12-03T14:03:22,365Z",
      "level": "WARN",
      "component": "r.suppressed",
      "cluster.name": "elasticsearch",
      "node.name": "elasticsearch-master-1",
      "message": "path: /_cluster/health, params: {wait_for_status=green, timeout=1s}",
      "stacktrace": [
        "org.elasticsearch.discovery.MasterNotDiscoveredException: null",
        "at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:220) [elasticsearch-7.9.3.jar:7.9.3]",
        "at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [elasticsearch-7.9.3.jar:7.9.3]",
        "at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.9.3.jar:7.9.3]",
        "at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:605) [elasticsearch-7.9.3.jar:7.9.3]",
        "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:678) [elasticsearch-7.9.3.jar:7.9.3]",
        "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
        "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
        "at java.lang.Thread.run(Thread.java:832) [?:?]"
      ]
    }
    Does anyone know how i could solve this?
    b
    • 2
    • 2
  • r

    refined-apple-6340

    12/03/2021, 7:34 PM
    2021/12/03 191715 Problem with request: Get "https://opensearch:9200": x509: certificate relies on legacy Common Name field, use SANs instead. Sleeping 1s
    e
    • 2
    • 74
  • p

    plain-farmer-27314

    12/03/2021, 8:06 PM
    Hey all, wondering if there is an ideal way to query all dataset entities that belong to a certain dataPlatform
    m
    • 2
    • 4
  • b

    best-planet-6756

    12/03/2021, 8:11 PM
    Hi All, looking for some help on an issue I am facing. I pulled the latest and then ran:
    Copy code
    COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build
    But I'm getting the following error:
    Copy code
    #9 320.0 > Task :metadata-io:compileJava FAILED
    #9 320.0
    #9 320.0 FAILURE: Build failed with an exception.
    #9 320.0
    #9 320.0 * What went wrong:
    #9 320.0 Execution failed for task ':metadata-io:compileJava'.
    #9 320.0 > Could not resolve all files for configuration ':metadata-io:compileClasspath'.
    #9 320.0    > Could not resolve com.linkedin.datahub-gma:ebean-dao:0.2.81.
    #9 320.0      Required by:
    #9 320.0          project :metadata-io
    #9 320.0       > Could not resolve com.linkedin.datahub-gma:ebean-dao:0.2.81.
    #9 320.0          > Could not get resource '<https://plugins.gradle.org/m2/com/linkedin/datahub-gma/ebean-dao/0.2.81/ebean-dao-0.2.81.pom>'.
    #9 320.0             > Could not GET '<https://jcenter.bintray.com/com/linkedin/datahub-gma/ebean-dao/0.2.81/ebean-dao-0.2.81.pom>'.
    #9 320.0                > Connect to <http://jcenter.bintray.com:443|jcenter.bintray.com:443> [<http://jcenter.bintray.com/34.95.74.180|jcenter.bintray.com/34.95.74.180>] failed: connect timed out
    #9 320.0    > Could not resolve com.linkedin.datahub-gma:restli-resources:0.2.81.
    #9 320.0      Required by:
    #9 320.0          project :metadata-io
    #9 320.0       > Could not resolve com.linkedin.datahub-gma:restli-resources:0.2.81.
    #9 320.0          > Could not get resource '<https://plugins.gradle.org/m2/com/linkedin/datahub-gma/restli-resources/0.2.81/restli-resources-0.2.81.pom>'.
    #9 320.0             > Could not GET '<https://jcenter.bintray.com/com/linkedin/datahub-gma/restli-resources/0.2.81/restli-resources-0.2.81.pom>'.
    #9 320.0                > Connect to <http://jcenter.bintray.com:443|jcenter.bintray.com:443> [<http://jcenter.bintray.com/34.95.74.180|jcenter.bintray.com/34.95.74.180>] failed: connect timed out
    #9 320.0    > Could not resolve com.linkedin.datahub-gma:elasticsearch-dao-7:0.2.81.
    #9 320.0      Required by:
    #9 320.0          project :metadata-io
    #9 320.0       > Could not resolve com.linkedin.datahub-gma:elasticsearch-dao-7:0.2.81.
    #9 320.0          > Could not get resource '<https://plugins.gradle.org/m2/com/linkedin/datahub-gma/elasticsearch-dao-7/0.2.81/elasticsearch-dao-7-0.2.81.pom>'.
    #9 320.0             > Could not GET '<https://jcenter.bintray.com/com/linkedin/datahub-gma/elasticsearch-dao-7/0.2.81/elasticsearch-dao-7-0.2.81.pom>'.
    #9 320.0                > Connect to <http://jcenter.bintray.com:443|jcenter.bintray.com:443> [<http://jcenter.bintray.com/34.95.74.180|jcenter.bintray.com/34.95.74.180>] failed: connect timed out
    #9 320.0
    #9 320.0 * Try:
    #9 320.0 Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
    #9 320.0
    #9 320.0 * Get more help at <https://help.gradle.org>
    #9 320.0
    #9 320.0 Deprecated Gradle features were used in this build, making it incompatible with Gradle 6.0.
    #9 320.0 Use '--warning-mode all' to show the individual deprecation warnings.
    #9 320.0 See <https://docs.gradle.org/5.6.4/userguide/command_line_interface.html#sec:command_line_warnings>
    #9 320.0
    #9 320.0 BUILD FAILED in 5m 19s74 actionable tasks: 74 executed
    #9 320.0
    ------
    failed to solve with frontend dockerfile.v0: failed to build LLB: executor failed running [/bin/sh -c cd /datahub-src && ./gradlew :metadata-service:war:build -x test]: runc did not terminate sucessfully
    ERROR: Service 'datahub-gms' failed to build : Build failed
    Anyone run into this before?
    m
    e
    r
    • 4
    • 42
  • l

    lemon-greece-73651

    12/04/2021, 1:32 AM
    attempting to run a mysql to datahub ingestion through datahub and getting a strange error when running it via airflow. error details attached in the reply. for the record - the metadata ingestion works perfectly through the datahub ingestion cli when run indepedently. any ideas?
    b
    s
    +2
    • 5
    • 45
  • f

    full-area-6720

    12/06/2021, 7:05 AM
    This is ingested here, but isn't reflected in the ui
    s
    m
    • 3
    • 37
  • b

    bulky-controller-34643

    12/07/2021, 3:50 AM
    Hi, all. I try to use the ingestion feature from helm chart version datahub, and I successfully ingest the metadata from postgres as pic 1. However, I cannot see datasets on the datahub website like the pic 2, what could be the reason? I check the gms logs as pic 3 and saw the ingest logs
    b
    • 2
    • 24
  • n

    nice-country-99675

    12/07/2021, 12:56 PM
    👋 Hi Team! I found this strange behaviour... since I'm still playing with DataHub, a lot of back and forth, create and delete usually happens during the day... and this is what happened. I have a data set created using a custom platform (QuickSight), and my ingestion process also include some lineage from these datasets to postgres tables. At some point I delete the QuickSight datasets with a
    datahub delete --platform quicksight
    and some of them were not deleted which I have to delete them by
    urn
    . But when I try to re ingest these datasets, none appear in the UI. The ingestion process didn't fail, I see no errors in the logs... but nothing shows up. As a matter of fact, when I try a new
    datahub delete --platform quicksight
    it tells me there are 0 records. Then I checked the postgres tables in DataHub I still see the upstream lineage to the QuickSight datasets, so the datasets are in some way still in the DB, but not available to the UI,.... When I remove them one by one using the
    urn
    I'm able to re ingest them and the UI properly display them....
    plus1 1
    s
    m
    • 3
    • 11
  • b

    broad-crowd-13788

    12/07/2021, 9:10 PM
    I see the following error when trying to run ingestion with kafka as the sink. Any idea how do I fix this?
    Copy code
    ValueSerializationError: KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="Schema being registered is incompatible with an earlier schema for subject "MetadataChangeEvent_v4-value" (HTTP status code 409, SR code 409)"}
    m
    • 2
    • 8
  • c

    cool-painting-92220

    12/08/2021, 12:57 AM
    Hey everyone! I'm working through Snowflake metadata ingestion and am trying to prevent a few tables and DBs from being ingested. The code below worked perfectly fine for ingesting all tables earlier, but as soon as I added the sections for
    database_pattern, view_pattern,
    and
    schema_pattern
    , I got the error message shown below. Any thoughts on why I might be running into these issues? Error:
    Copy code
    3 validation errors for SnowflakeConfig
    schema_pattern -> deny
      value is not a valid list (type=type_error.list)
    view_pattern -> deny
      value is not a valid list (type=type_error.list)
    database_pattern -> deny
      value is not a valid list (type=type_error.list)
    Ingestion File:
    Copy code
    source:
      type: snowflake
      config:
        host_port: ****
        warehouse: ****
        username: ****
        password: ****
        role: ****
        database_pattern:
          deny: ****
        view_pattern:
          deny: ****
        schema_pattern:
          deny: ****
    
    sink:
      type: "datahub-rest"
      config:
        server: "<http://localhost:8080>"
    m
    • 2
    • 3
  • c

    cool-painting-92220

    12/08/2021, 1:21 AM
    Hey everyone! I've tried to search around for guides on this but couldn't find any: how do I create other users for Datahub? I want to give my peers accounts to login with but wasn't sure how to achieve this
    m
    • 2
    • 2
  • r

    refined-apple-6340

    12/08/2021, 3:56 AM
    what is the docker config for the analytics tab to work in datahub / docker (I have - DATAHUB_ANALYTICS_ENABLED=true for the elasticsearch-setup, datahub-frontend-react) and
    e
    • 2
    • 1
1...8910...119Latest