https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • s

    square-solstice-69079

    05/12/2022, 6:10 AM
    when metadata-service authentication is on, how do I run datahub delete commands? Getting auth errors similar to ingestion errors before adding the token to the sink
    h
    • 2
    • 2
  • m

    most-plumber-32123

    05/12/2022, 6:42 AM
    Hi All am facing an issue when ingest the recipe file for snowflake
    Copy code
    [2022-05-12 12:11:31,452] INFO     {datahub.cli.ingest_cli:96} - DataHub CLI version: 0.8.34.1
    [2022-05-12 12:11:31,738] ERROR    {datahub.entrypoints:165} - Unable to connect to <http://localhost:9002/api/gms/config> with status_code: 401. Maybe you need to set up authentication? Please check your configuration and make sure you are talking to the DataHub GMS (usually <datahub-gms-host>:8080) or Frontend GMS API (usually <frontend>:9002/api/gms).
    [2022-05-12 12:11:31,738] INFO     {datahub.entrypoints:176} - DataHub CLI version: 0.8.34.1 at C:\Users\*****\AppData\Local\Programs\Python\Python39\lib\site-packages\datahub\__init__.py
    [2022-05-12 12:11:31,738] INFO     {datahub.entrypoints:179} - Python version: 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] at C:\Users\*****\AppData\Local\Programs\Python\Python39\python.exe on Windows-10-10.0.22000-SP0
    [2022-05-12 12:11:31,738] INFO     {datahub.entrypoints:182} - GMS config {}
    d
    • 2
    • 3
  • h

    handsome-football-66174

    05/12/2022, 6:12 PM
    Hi Everyone, We are trying to upgrade the Packages in Airflow , but it is upgrading the version of Airflow. Any suggestions how to resolve Commands used- python3 -m pip install 'acryl-datahub==0.8.32' python3 -m pip install 'acryl-datahub[airflow,kafka,mysql,athena,glue,hive,kafka-connect,ldap,looker,lookml,okta,postgres,druid,redshift,sagemaker,snowflake,sql-profiles,sqlalchemy,datahub-rest,datahub-kafka,datahub-business-glossary,great-expectations,s3]==0.8.32' Attaching the logs for reference
    datahub_upgrade_log.txt
    d
    • 2
    • 14
  • a

    aloof-author-52810

    05/12/2022, 8:08 PM
    Hi, I am using the product but found that there are tags now missing from docker for the following: - image: docker.repo1.uhc.com/linkedin/datahub-elasticsearch-setup:debug - image: docker.repo1.uhc.com/linkedin/datahub-kafka-setup:debug - image: docker.repo1.uhc.com/linkedin/datahub-frontend-react:debug referenced in docker-compose.dev.yml the one for gms is there but the others are not (
    o
    • 2
    • 4
  • m

    modern-zoo-97059

    05/13/2022, 2:47 AM
    Hello Everyone 🙂 When is
    datahub_usage_event
    created in ElasticSearch? It's missing.
    o
    • 2
    • 4
  • s

    sticky-dawn-95000

    05/13/2022, 5:42 AM
    Hello, everyone. Is it possible to delete all business glossary in the datahub? I inserted business glossaries into my DataHub using datahub cli command. After that, I want to reset my business glossaries, so I tried some datahub cli commands but it faied. :( How can I delete all business glossary such as dataset delete cli command like “datahub delete —env PROD —entity_type dataset —platform bigquery”?
    o
    • 2
    • 1
  • m

    miniature-journalist-76345

    05/13/2022, 3:59 PM
    Hi, team. Anybody ingesting glossary terms with python? Getting error even for 1st ingestion:
    Copy code
    Duplicate entry 'urn:li:glossaryTerm:test_term-glossaryTermKey-0' for key 'PRIMARY'
    Had the same error with platforms few months ago. More information in the thread
    g
    • 2
    • 6
  • h

    handsome-football-66174

    05/13/2022, 5:13 PM
    Hi Everyone - Using v0.8.32, Just noticed while creating Groups are we not able to add characters in it ?
    o
    • 2
    • 6
  • s

    shy-parrot-64120

    05/14/2022, 8:46 PM
    Hi folks when migrating from linkedin:0.8.33 -> acryldata:0.8.33.3 (0.8.34.1 behaves same) mae/mce containers threws an Cassandra exception
    Copy code
    Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cassandraSession' defined in class path resource [org/springframework/boot/autoconfigure/cassandra/CassandraAutoConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.datastax.oss.driver.api.core.CqlSession]: Factory method 'cassandraSession' threw exception; nested exception is com.datastax.oss.driver.api.core.AllNodesFailedException: Could not reach any contact point, make sure you've provided valid addresses (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=/127.0.0.1:9042, hostId=null, hashCode=41fdd1a):
     [com.datastax.oss.driver.api.core.connection.ConnectionInitException: [s0|control|connecting...] Protocol initialization request, step 1 (OPTIONS): failed to send request (io.netty.channel.StacklessClosedChannelException)]
    any suggestions why so? We are not mentioning Cassandra anywhere in setup
    plus1 1
    g
    o
    • 3
    • 4
  • s

    swift-breakfast-25077

    05/16/2022, 9:58 AM
    Hi all, Trying to use Great expectation for Data Validation. The checkpoint runs, but the Validations are not getting displayed in Datahub. Added this in checkpoint configuration :
    Copy code
    - name: datahub_action
        action:
          module_name: datahub.integrations.great_expectations.action
          class_name: DataHubValidationAction
          server_url: <http://localhost:8080> #datahub server url
    Getting this message when checkpoint runs :
    g
    m
    • 3
    • 4
  • i

    icy-portugal-26250

    05/16/2022, 11:01 AM
    I'm trying to bring up a local datahub cluster using the
    docker/quickstart.sh
    (using m1 compose-files). The
    datahub-gms
    container is unhealty and checkking the logs it cannot connect to the elasticsearch service:
    Copy code
    Problem with request: Get "<http://elasticsearch:9200>": dial tcp 172.30.0.2:9200: connect: connection refused. Sleeping 1s
    Is there a way to troubleshoot this?
    d
    • 2
    • 3
  • s

    stale-exabyte-65991

    05/16/2022, 2:08 PM
    I'm having some trouble ingesting to a POC datahub instance we have setup on AWS. I'm getting 401 errors when attempting to ingest but I have generated a personal access token that I am setting as the DATAHUB_GMS_TOKEN. CLI Ingestion is enabled in our configuration and we have no problems seeing the frontend. Am I missing a step? EDIT: It looks like those are for front end api routes. How do I find / configure a token for CLI ingestion into GMS?
    i
    • 2
    • 1
  • b

    brash-fountain-36115

    05/16/2022, 2:13 PM
    I am getting an error when I run profiling against the Snowflake instance:
    Copy code
    '[2022-05-13 14:18:26,848] WARNING  {datahub.ingestion.source.sql.snowflake:496} - lineage => Extracting lineage from Snowflake '
               'failed.Please check your premissions. Continuing...\n'
               'Error was (snowflake.connector.errors.ProgrammingError) 003030 (02000): SQL compilation error:\n'
               'Shared database is no longer available for use. It will need to be re-created if and when the publisher makes it available again.\n'
               '[SQL: \n'
               'WITH table_lineage_history AS (\n'
               '    SELECT\n'
               '        r.value:"objectName" AS upstream_table_name,\n'
               '        r.value:"objectDomain" AS upstream_table_domain,\n'
               '        r.value:"columns" AS upstream_table_columns,\n'
               '        w.value:"objectName" AS downstream_table_name,\n'
               '        w.value:"objectDomain" AS downstream_table_domain,\n'
               '        w.value:"columns" AS downstream_table_columns,\n'
               '        t.query_start_time AS query_start_time\n'
               '    FROM\n'
               '        (SELECT * from snowflake.account_usage.access_history) t,\n'
               '        lateral flatten(input => t.DIRECT_OBJECTS_ACCESSED) r,\n'
               '        lateral flatten(input => t.OBJECTS_MODIFIED) w\n'
               '    WHERE r.value:"objectId" IS NOT NULL\n'
               '    AND w.value:"objectId" IS NOT NULL\n'
               '    AND w.value:"objectName" NOT LIKE \'%.GE_TMP_%\'\n'
               '    AND w.value:"objectName" NOT LIKE \'%.GE_TEMP_%\'\n'
               '    AND t.query_start_time >= to_timestamp_ltz(0, 3)\n'
               '    AND t.query_start_time < to_timestamp_ltz(1652486400000, 3))\n'
               'SELECT upstream_table_name, downstream_table_name, upstream_table_columns, downstream_table_columns\n'
               'FROM table_lineage_history\n'
               "WHERE upstream_table_domain in ('Table', 'External table') and downstream_table_domain = 'Table'\n"
               'QUALIFY ROW_NUMBER() OVER (PARTITION BY downstream_table_name, upstream_table_name ORDER BY query_start_time DESC) = 1        ]\n'
               '(Background on this error at: <http://sqlalche.me/e/13/f405>).\n'
               '[2022-05-13 14:18:26,848] INFO     {datahub.ingestion.source.sql.snowflake:449} - A total of 0 Table->Table edges found for 0 downstream '
               'tables.\n'
    Any hints what I can check to identify the root cause?
    • 1
    • 2
  • r

    red-pizza-28006

    05/16/2022, 3:28 PM
    hello, i am trying to use the glue source to ingest data from glue catalog, and started seeing this error
    Copy code
    self = <botocore.client.S3 object at 0x7fb305a355b0>
         operation_name = 'GetObject'
         api_params = {'Bucket': 'datalake-prod-bqtos3',
                       'Key': 'script.py'}
         http.status_code = 403
         error_code = 'AccessDenied'
         error_class = <class 'botocore.exceptions.ClientError'>
         self.exceptions.from_code = <method 'BaseClientExceptions.from_code' of <botocore.errorfactory.S3Exceptions object at 0x7fb305b1f580> errorfactory.p
                                      y:30>
         parsed_response = {'Error': {'Code': 'AccessDenied',
                                      'Message': 'Access Denied'},
                            'ResponseMetadata': {'RequestId': 'BNJ2Z2VWHBE64YYV',
                                                 'HostId': 'fwy92BEUGB+HlSJufrCUKRxa2WZ877BXQNdqWxX5Tx7WR7Br+6bCy16bv7GFTU1ICR0oJh4ingg=',
                                                 'HTTPStatusCode': 403,
                                                 'HTTPHeaders': {...},
                                                 'RetryAttempts': 0}}
    My AWS user has full access to Glue, what additional access do i need here to be able to read the glue catalog?
    • 1
    • 1
  • l

    late-country-26504

    05/16/2022, 6:26 PM
    Hey there, I’m having an issue viewing metadata via the frontend. I can confirm that ingestion was successful and retrieve it via api from the metadata service, but the frontend isn’t showing anything. I’ve even confirmed the metadata service from inside the frontend container. Any ideas what I can’t see anything the UI?
    i
    • 2
    • 20
  • m

    millions-waiter-49836

    05/16/2022, 10:31 PM
    Hey guys, I am developing glue profiling feature. When I emitted two partitions using these codes, and I can see two hits in ES (via Kibana) (see screenshot). However when I tried to query the partitions, it returns null (see the query in the thread)
    • 1
    • 3
  • b

    bland-morning-36590

    05/17/2022, 3:55 AM
    Hi all, I am trying to ingest metadata from teradata. I am using the attached sqlalchemy recipe. I am hitting “NoSuchModuleError”. Is there something wrong with the connection string? Thanks for the help
    s
    • 2
    • 3
  • c

    creamy-smartphone-10810

    05/17/2022, 1:14 PM
    Hello, I’m trying to execute the cleanup job on my datahub k8s deployment, but found the following error:
    Copy code
    Executing Step 3/4: DeleteLegacyGraphRelationshipStep...
    Failed to delete legacy data from graph: java.lang.ClassCastException: com.linkedin.metadata.graph.elastic.ElasticSearchGraphService cannot be cast to com.linkedin.metadata.graph.neo4j.Neo4jGraphService
    Failed to delete legacy data from graph: java.lang.ClassCastException: com.linkedin.metadata.graph.elastic.ElasticSearchGraphService cannot be cast to com.linkedin.metadata.graph.neo4j.Neo4jGraphService
    Failed Step 3/4: DeleteLegacyGraphRelationshipStep. Failed after 1 retries.
    I’m using
    elasticsearch
    as graph_service_impl, any idea of what could be happening?
    i
    e
    • 3
    • 9
  • r

    red-napkin-59945

    05/17/2022, 4:13 PM
    Hey team, I ingested dataset usage aspect. But on the UI, it shows the top user as gray circles like:
    g
    • 2
    • 10
  • p

    prehistoric-room-17640

    05/17/2022, 9:04 PM
    Hi Team. Are there rules around versioning between datahub client and server? yesterday I was trying to use the latest client version 0.8.34.2 with a current server version we had and the database was corrupted (in our dev env). I'm trying to see what the expectation is between the acryl-datahub client version and server version rules are. The error on the client was related to: "unrecognized field found but not allowed"
    i
    • 2
    • 2
  • p

    prehistoric-room-17640

    05/17/2022, 9:04 PM
    when I switched back to using the 0.8.28.1 client, all was good with ingestion, however this morning I found the UI was crashed.
    b
    • 2
    • 1
  • r

    rich-policeman-92383

    05/18/2022, 7:05 PM
    Hi Team While trying to do performance benchmark of datahub, we are getting below exception along with a few others. Are there any performance tuning parameters that need to be adjusted for production use. We are trying to simulate 100 users all trying to fetch a lineage from datahub.
    s
    b
    • 3
    • 6
  • g

    great-cpu-72376

    05/19/2022, 9:59 AM
    Hi, I am trying to get all dataplatform in datahub. I am trygin with graphql using:
    Copy code
    query {
      search(input: {type: DATA_PLATFORM, query: "*"}){
        total
        searchResults{
          entity{
            urn
          }
        }
      }
    }
    what is wrong? I receive always 0 but there are at list one data platform: PostgreSQL. If I write something in query it is the same. I am very new in datahub and graphql
    b
    • 2
    • 2
  • g

    great-cpu-72376

    05/19/2022, 3:35 PM
    I am sorry but I have a lot of problem with the docs I am trying to understand how create entity through rest API and through Python emitter. I want to create a new corpUser. I created this json, following, I hope, what is reported in openapi and what is reported in the datamodel:
    Copy code
    [
        {
            "aspect": {
    			"corpUserKey":{
    				"username": "xxx"
    			},
                "corpUserInfo":{
    				"firstName": "XXX First",
    				"lastName": "XXXX",
    				"countryCode": "IT"
    				
    			},
    			"corpUserEditableInfo":{
    				"displayName": "Giorgio",
    				"aboutMe": "I am trying to add this user",
    				"teams": ["it-svc-app"],
    				"skills": ["sql"],
    				"title": "Data Architect",
    				"email": "<mailto:giorgio@giorgi.net|giorgio@giorgi.net>"
    			},
    			"corpUserStatus":{
    				"status": "ACTIVE",
    				
    			}
            },
            "entityType": "CorpUser",
            "entityUrn": "urn:li:CorpUser:xxx"
        }
    ]
    There is the aspect dict, with the aspects: corpUserKey, corpUserInfo and corpUserEditableInfo and the entity type and the entityrn. I execute a post to the gms/openapi/entitites/v1 but I receive this error:
    Copy code
    15:34:14.631 [qtp544724190-21] WARN  o.s.w.s.m.s.DefaultHandlerExceptionResolver:208 - Resolved [org.springframework.http.converter.HttpMessageNotReadableException: JSON parse error: Unexpected character ('}' (code 125)): was expecting double-quote to start field name; nested exception is com.fasterxml.jackson.databind.JsonMappingException: Unexpected character ('}' (code 125)): was expecting double-quote to start field name<EOL> at [Source: (org.springframework.util.StreamUtils$NonClosingInputStream); line: 24, column: 5] (through reference chain: java.util.ArrayList[0]->io.datahubproject.openapi.dto.UpsertAspectRequest$UpsertAspectRequestBuilder["aspect"])]
    I validated the json with a parser, what is the problem?
    h
    o
    • 3
    • 4
  • m

    mysterious-butcher-86719

    05/19/2022, 3:40 PM
    Hi Team, We are seeing below issues while using the graphql api 1. We are only able to pull not more than 1000 records at a time. When we use to fetch 10000 we got below error { searchAcrossEntities(input: {types: [DATASET], query: "*", count: 10000}) { searchResults { entity { ... on Dataset { urn name } } } } } 2. We always see total=10000 in search result ( when we query with count<=1000 ) eventhough we have more in datahub We do not see the actual total value. 3. We also tried to loop through the records fecthing 1000 records each time, however we still were able to pull 10000 records only. Could you please let us know if there are any limitaions on this. Also please share if there is any solution available.
    b
    • 2
    • 3
  • v

    victorious-pager-14424

    05/19/2022, 6:48 PM
    Hi everyone! I’m having issues when trying to build DataHub locally. When running
    ./gradlew :datahub-frontend:dist -x yarnTest -x yarnLint
    , it fails during the
    :datahub-web-react:yarnGenerate
    step. More info in đź§µ
    • 1
    • 9
  • m

    microscopic-mechanic-13766

    05/20/2022, 10:56 AM
    Hi, so I am trying now to deploy Datahub with a Kerberized Kafka. So far, I have managed to make the front and gms connect to it, but not the actions framework. The actions container prints the following error:
    Failed to create consumer: No provider for SASL mechanism GSSAPI: recompile librdkafka with libsasl2 or openssl support. Current build options: PLAIN SASL_SCRAM OAUTHBEARER"}
    i
    • 2
    • 3
  • g

    gentle-camera-33498

    05/20/2022, 3:11 PM
    Hello everyone, I'm having some problems with the home page. Does someone know what could be?
    i
    • 2
    • 4
  • g

    gifted-bird-57147

    05/21/2022, 9:54 AM
    Hi, I'm trying to refresh my local deployment as per the instructions:
    Copy code
    datahub docker nuke --keep-data
    datahub docker quickstart
    But doing so results in the following error:
    Copy code
    ERROR: for datahub-frontend-react  Cannot start service datahub-frontend-react: driver failed programming external connectivity on endpoint datahub-frontend-react (e75695d412987a9e3b70806a9905d0798a64bf6d52c1e0afd9cd626c1895a5bf): Error starting userland proxy: listen tcp4 0.0.0.0:9002: bind: address already in use
    Any help solving this would be appreciated! (i'm not aware of anything else running on port 9002...)
    b
    b
    • 3
    • 11
  • s

    straight-wire-61463

    05/23/2022, 12:50 AM
    Hi all, I have deployed Datahub on a Kubernetes instance following the instructions in the documentation. What I'm trying to understand is how I can install additional plugins for data ingestion from the UI. Specifically I'm trying to set up pyodbc and acryl-datahub[mssql] for Microsoft SQL Server. I believe the relevant Docker image is acryl-data-actions, and I can create a new image based on the public one, and install the relevant packages, but it appears that the ingestion process creates it's own virtual environment with venv, and hence ignores any globally installed packages on the image. I can run the ingestion job from the command line in this container, but triggering it from the UI always fails with a python import error for pyodbc. So any direction on how additional libraries should be added for UI ingestion would be appreciated!
    m
    i
    • 3
    • 4
1...293031...119Latest