https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • g

    gray-ghost-82678

    02/09/2023, 2:54 AM
    Hello I am currently trying to connect a Microsoft SQL Server to Datahub but I am getting “No module named ‘Pyodbc’”. I have ODBC Drive 17 installed. I have reinstalled Pyodbc but it does nothing. I am using Windows. Thanks!
    👀 1
    b
    a
    g
    • 4
    • 17
  • t

    thousands-bird-50049

    02/09/2023, 7:19 AM
    where can I get a list of fields that are able to be used in OrFilter on GraphQL?
    e
    • 2
    • 1
  • b

    best-wire-59738

    02/09/2023, 9:16 AM
    Hello Team, Is there any way we could restrict custom Authenticator plugin only for GraphQL API. Rest API will still be using Token authentication.
    e
    • 2
    • 3
  • m

    magnificent-lock-58916

    02/09/2023, 9:32 AM
    For some reason, Datahub front displays as if it still has Tableau entities that were hard deleted from backend. It seems to concern only visual navigation (count of platform entities, filters, folders) and such — once you actually open a list, there’s no entities there. It also doesn’t display recent ingest request (most recent one was at least feb 8th)
    e
    c
    +3
    • 6
    • 24
  • r

    rhythmic-quill-75064

    02/09/2023, 10:10 AM
    Hi Team. I have a problem when switching from helm chart version 0.2.128 to version 0.2.129, with the same problem on version 0.2.130 (datahub 0.9.5 for this versions). Logs of
    datahub-datahub-upgrade-job
    :
    Copy code
    APPLICATION FAILED TO START Description:
    Field kafkaHealthChecker in com.linkedin.gms.factory.kafka.DataHubKafkaEventProducerFactory required a bean of type 'com.linkedin.metadata.dao.producer.KafkaHealthChecker' that could not be found.
    The injection point has the following annotations:
    @javax.inject.Inject() @javax.inject.Named(value="noCodeUpgrade") Action: Consider defining a bean of type 'com.linkedin.metadata.dao.producer.KafkaHealthChecker' in your configuration.
    ✅ 2
    👀 1
    i
    • 2
    • 6
  • c

    chilly-potato-57465

    02/09/2023, 10:36 AM
    Hello Everyone! I am trying out the GraphQL API on the public demo instance to query for some data. I use the following table as an example (see attached screenshot) and querying table owners, tags, terms and domain. Although the table has owners, tags, terms and domain I receive empty arrays. I am wondering what I am doing wrong. Thank you!
    Copy code
    {
           dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.active_customer_ltv,PROD)") {
             properties{name},
             editableProperties{description},
             type,
             platform{name},
             ownership{owners{associatedUrn}},
             tags{tags{tag{urn}}},
             glossaryTerms{terms{term{urn}}},
             domain{domain{urn}}
           }
         }
    e
    • 2
    • 10
  • s

    strong-kite-83354

    02/09/2023, 11:33 AM
    Has something changed with GraphQL between versions v0.9.6 and v0.10.0? I have queries that ran under the old version but not the new. The Changelog suggests there have been changes in the GraphQL system but not ones I'd expect to break queries. I used to be able to do a query like this with a variable like {"query_string": "customProperties: file_hash=046ca7acf03915c611818e20837c2e4b8756e885"} where file_hash is a customProperty which I can see in the GUI.
    Copy code
    query my_query($query_string:String!){
      search(input: { type: DATASET, query: $query_string, start: 0, count: 100 }) {
                    start
                    count
                    total
                    searchResults {
                    entity {
                        urn
                        type
                        ...on Dataset {
                            name
                            properties {customProperties {key
                            value
                            }}
                        }
                    }
                    }
                }
    }
    e
    a
    b
    • 4
    • 8
  • b

    bumpy-pharmacist-66525

    02/09/2023, 2:58 PM
    Hi everyone! I recently upgraded to v0.9.6.1 (from v0.9.2) and am encountering a strange bug. When I am on the explore page (
    Explore All
    button from the homepage), if I try to use the advanced filter, some of the options like filtering on tags does not work. It seems that the API call which is supposed to populate the dropdown is not returning anything. However, it does seem like the
    basic
    filtering works (the issue is related to the API call which is supposed to populate the dropdown menus). Does anyone have an idea of what is happening and/or have an idea of how to fix it? Here is a step by step set of instructions on how to reproduce the issue (start on the homepage of DataHub): 1. Select
    Explore All
    2. Under the 'Filter' column, select
    Advanced
    3. Select
    Add Filter
    4. Select
    Tag
    5. Search for a tag which exists in your datahub instance (in my case, I always get 'no data')
    e
    a
    b
    • 4
    • 25
  • f

    faint-actor-78390

    02/09/2023, 3:32 PM
    Hi all, Upgrade to v0.10 with command : docker pull acryldata/datahub-upgrade:head && docker run acryldata/datahub-upgrade:head -u NoCodeDataMigration Got error : Status: Image is up to date for acryldata/datahub-upgrade:head docker.io/acryldata/datahub-upgrade:head ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
    b
    • 2
    • 9
  • s

    stocky-apple-7404

    02/09/2023, 6:29 PM
    Hello Team... We are trying to configure SSO with AzureAD. We added below environment variables to frontend-react. - AUTH_OIDC_ENABLED=true - AUTH_OIDC_CLIENT_ID=xxxxx - AUTH_OIDC_CLIENT_SECRET=xxxx - AUTH_OIDC_DISCOVERY_URI=https://login.microsoftonline.com/xxxx/v2.0/.well-known/openid-configuration - AUTH_OIDC_BASE_URL=https://xxxxx - AUTH_OIDC_SCOPE="openid profile email" And we configured the application in AzureAD as documented in https://datahubproject.io/docs/authentication/guides/sso/configure-oidc-react-azure We are receiving below error. What are we missing? Any lead to troubleshoot this issue is highly appriciated. datahub-frontend-react | 2023-02-09 181854,210 [application-akka.actor.default-dispatcher-19] ERROR controllers.AuthenticationController - Caught exception while attempting to redirect to SSO identity provider! It's likely that SSO integration is mis-configured datahub-frontend-react | org.pac4j.core.exception.TechnicalException: com.nimbusds.oauth2.sdk.ParseException: The scope must include an "openid" value
    a
    e
    +3
    • 6
    • 13
  • m

    mysterious-motorcycle-80650

    02/09/2023, 6:52 PM
    @here hi, exists a way to export all data inside of datahub in a one shot to be use to import in another datahub environment
    a
    • 2
    • 1
  • w

    wooden-hamburger-59537

    02/09/2023, 7:04 PM
    Hello team, I have a datahub setup on K8s cluster running on AWS, using some AWS managed services like RDS, MSK, Opensearch, etc… It run fine at first but after a while (couple of day) it stop working, when I see log, the gms pod is failing with this weird log>
    Copy code
    kubectl get pods -n my-datahub                                          
    NAME                                             READY   STATUS             RESTARTS          AGE
    my-acryl-datahub-actions-7f7dbcb7cb-jwlm7   0/1     CrashLoopBackOff   126 (2m39s ago)   18h
    my-cp-schema-registry-5cbf4478f-2xgnt       2/2     Running            0                 51m
    my-datahub-frontend-795fb7dd7d-qj7p9        1/1     Running            0                 52m
    my-datahub-gms-7466d54b7-5hwxz              0/1     CrashLoopBackOff   194 (4m55s ago)   17h
    
    kubectl logs my-datahub-gms-7466d54b7-5hwxz  -n my-datahub  
    + echo
    + grep -q ://
    + NEO4J_HOST=http://
    + [[ ! -z datahubes ]]
    + [[ -z '' ]]
    ++ base64 --wrap 0
    ++ echo -ne 'username:password'
    + AUTH_TOKEN=username:password
    + ELASTICSEARCH_AUTH_HEADER='Authorization:Basic username:password'
    + [[ -z Authorization:Basic username:password ]]
    + [[ true == true ]]
    + ELASTICSEARCH_PROTOCOL=https
    + WAIT_FOR_EBEAN=
    + [[ '' != true ]]
    + [[ '' == ebean ]]
    + [[ -z '' ]]
    + WAIT_FOR_EBEAN=' -wait <tcp://my-datahub-dev-ue1.cluster-xxxxxxxxx.us-east-1.rds.amazonaws.com:3306> '
    + WAIT_FOR_CASSANDRA=
    + [[ '' == cassandra ]]
    + WAIT_FOR_KAFKA=
    + [[ '' != true ]]
    ++ echo <http://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096|b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
    ++ sed 's/,/ -wait tcp:\/\//g'
    + WAIT_FOR_KAFKA=' -wait <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> '
    + WAIT_FOR_NEO4J=
    + [[ elasticsearch != elasticsearch ]]
    + OTEL_AGENT=
    + [[ '' == true ]]
    + PROMETHEUS_AGENT=
    + [[ true == true ]]
    + PROMETHEUS_AGENT='-javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml '
    + auth_resource_dir=/etc/datahub/plugins/auth/resources
    + CLASSES_DIR=
    + [[ '' == true ]]
    + COMMON='
         -wait <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306>            -wait <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>           -timeout 240s     java            -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml      -jar /jetty-runner.jar     --jar jetty-util.jar     --jar jetty-jmx.jar      --config /datahub/datahub-gms/scripts/jetty.xml     /datahub/datahub-gms/bin/war.war'
    + [[ false != true ]]
    + exec dockerize -wait <https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443> -wait-http-header 'Authorization:Basic ZGF0YWh1YmVzOlhjTyEyNlNPKmI6VThQOmxyLTJnTjZOZDYwNXQ3PU0rK2l7PA==' -wait <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306> -wait <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -timeout 240s java -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
    2023/02/09 18:31:19 Waiting for: <https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443>
    2023/02/09 18:31:19 Waiting for: <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306>
    2023/02/09 18:31:19 Waiting for: <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
    2023/02/09 18:31:19 Waiting for: <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
    2023/02/09 18:31:19 Waiting for: <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
    2023/02/09 18:31:19 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    2023/02/09 18:31:19 Connected to <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
    2023/02/09 18:31:19 Connected to <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306>
    2023/02/09 18:31:19 Connected to <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
    2023/02/09 18:31:20 Received 200 from <https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443>
    2023/02/09 18:31:20 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    2023/02/09 18:31:21 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    ....... lots of log look like this .................
    2023/02/09 18:33:27 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    2023/02/09 18:33:28 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    2023/02/09 18:33:29 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    2023/02/09 18:33:30 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    2023/02/09 18:33:31 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    2023/02/09 18:33:32 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    2023/02/09 18:35:18 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    2023/02/09 18:35:19 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
    2023/02/09 18:35:19 Timeout after 4m0s waiting on dependencies to become available: [<https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443> <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306> <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>]
    so what is wrong with it?
    ✅ 1
    a
    • 2
    • 4
  • g

    glamorous-elephant-17130

    02/09/2023, 7:27 PM
    Hey guys, I updated datahub and now I am getting this error when trying to use profiling with Athena. Any clue?
    ✅ 1
    a
    • 2
    • 1
  • g

    glamorous-elephant-17130

    02/09/2023, 7:34 PM
    based on the documentation, profiling is supposed to be a valid key for athena
    a
    • 2
    • 2
  • g

    glamorous-elephant-17130

    02/09/2023, 7:34 PM
    Facing same issue with AWS glue as well
    g
    • 2
    • 3
  • q

    quaint-barista-82836

    02/09/2023, 7:34 PM
    Hi Team, I am getting issues to retrieve data from Datahub UI from advanced filter however, as I want to do a search on all platforms with specific tags. For now can only see the list in basic filter but can only select one platform at a time. Any suggestions for the same:
    👀 3
    a
    w
    b
    • 4
    • 13
  • q

    quaint-barista-82836

    02/09/2023, 11:03 PM
    Hi Team, I have enabled column level profiling for bigquery, but not getting any for string fields, any suggestions : Error log : Failed to get unique count for column project_nameFBT.FBT_Diff.Anchor_Item Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 203, in _execute self._query_job.result() File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 1499, in result do_get_result() File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 354, in retry_wrapped_func on_error=on_error, File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 191, in retry_target return target() File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 1489, in do_get_result super(QueryJob, self).result(retry=retry, timeout=timeout) File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job/base.py", line 728, in result return super(_AsyncJob, self).result(timeout=timeout, **kwargs) File "/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py", line 261, in result raise self._exception google.api_core.exceptions.BadRequest: 400 Unrecognized name: `"Anchor_Item"`; Did you mean Anchor_Item? at [1:30] Location: US Job ID: ad9f0d21-f927-4974-a896-8d54b0f8a655 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/_helpers.py", line 494, in with_closed_check return method(self, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 167, in execute formatted_operation, parameters, job_id, job_config, parameter_types File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 205, in _execute raise exceptions.DatabaseError(exc) google.cloud.bigquery.dbapi.exceptions.DatabaseError: 400 Unrecognized name: `"Anchor_Item"`; Did you mean Anchor_Item? at [1:30] Location: US Job ID: ad9f0d21-f927-4974-a896-8d54b0f8a655 The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 331, in _get_column_cardinality unique_count = self.dataset.get_column_unique_count(column) File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 130, in get_column_unique_count_patch ).select_from(self._table) File "/usr/local/lib/python3.7/site-packages/datahub/utilities/sqlalchemy_query_combiner.py", line 272, in _sa_execute_fake return _sa_execute_underlying_method(conn, query, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1380, in execute return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 335, in _execute_on_connection self, multiparams, params, execution_options File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1582, in _execute_clauseelement cache_hit=cache_hit, File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1944, in _execute_context e, statement, parameters, cursor, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2125, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=exc_info[2], from_=e File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 211, in raise_ raise exception File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/_helpers.py", line 494, in with_closed_check return method(self, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 167, in execute formatted_operation, parameters, job_id, job_config, parameter_types File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 205, in _execute raise exceptions.DatabaseError(exc) sqlalchemy.exc.DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 Unrecognized name: `"Anchor_Item"`; Did you mean Anchor_Item? at [1:30] Location: US Job ID: ad9f0d21-f927-4974-a896-8d54b0f8a655 [SQL: SELECT APPROX_COUNT_DISTINCT(
    "Anchor_Item"
    ) FROM
    project_name-thd
    .
    FBT.FBT_Diff
    ] (Background on this error at: https://sqlalche.me/e/14/4xp6)
    a
    b
    +2
    • 5
    • 10
  • a

    average-dinner-25106

    02/10/2023, 4:34 AM
    Hi, I executed the 'timeline API' to the table 'A' several days ago with 'DOCUMENTATION' category and got the history of adding/modifying it (as the first figure shows). But today doing same work to 'A' got response with empty dictionary like the second figure. What's strange is that all tables from postrgres got the same result. Other tables from types except postgres such as hive showed the history well. Why those problems occur?
    teamwork 1
    ✅ 1
    b
    a
    o
    • 4
    • 12
  • a

    astonishing-cartoon-6079

    02/10/2023, 5:59 AM
    #troubleshoot it occured error in the front when i viewing dateset belong to a tag. and it work well when pageNum is small, but it always fail when pageNum is more than 309. My colleage and i found it may be timeout when accessing graphql api
    searchAcrossEntities
    We looked through related code and found that
    com.linkedin.metadata.search.cache.CacheableSearcher#getSearchResults
    is the root case. just like its comment, it walks through all over the index from the beginning even though from a big pageNum. we are confused about this logic, I don't know what's side effect if using elasticSearch paging interface directly.
    Copy code
    /**
       * Get search results corresponding to the input "from" and "size"
       * It goes through batches, starting from the beginning, until we get enough results to return
       * This let's us have batches that return a variable number of results (we have no idea which batch the "from" "size" page corresponds to)
       */
    public SearchResult getSearchResults(int from, int size) {
      try (Timer.Context ignored = MetricUtils.timer(this.getClass(), "getSearchResults").time()) {
        int resultsSoFar = 0;
        int batchId = 0;
        boolean foundStart = false;
        List<SearchEntity> resultEntities = new ArrayList<>();
        SearchResult batchedResult;
        // Use do-while to make sure we run at least one batch to fetch metadata
        do {
          batchedResult = getBatch(batchId);
          int currentBatchSize = batchedResult.getEntities().size();
          // If the number of results in this batch is 0, no need to continue
          if (currentBatchSize == 0) {
            break;
          }
          if (resultsSoFar + currentBatchSize > from) {
            int startInBatch = foundStart ? 0 : from - resultsSoFar;
            int endInBatch = Math.min(currentBatchSize, startInBatch + size - resultEntities.size());
            resultEntities.addAll(batchedResult.getEntities().subList(startInBatch, endInBatch));
            foundStart = true;
          }
          // If current batch is smaller than the requested batch size, the next batch will return empty.
          if (currentBatchSize < batchSize) {
            break;
          }
          resultsSoFar += currentBatchSize;
          batchId++;
        } while (resultsSoFar < from + size);
        return new SearchResult().setEntities(new SearchEntityArray(resultEntities))
            .setMetadata(batchedResult.getMetadata())
            .setFrom(from)
            .setPageSize(size)
            .setNumEntities(batchedResult.getNumEntities());
      }
    }
    plus1 2
    b
    • 2
    • 2
  • b

    bland-orange-13353

    02/10/2023, 7:14 AM
    This message was deleted.
  • e

    elegant-article-21703

    02/10/2023, 11:25 AM
    Hi everyone! I'm not sure if this is an issue of the newest
    v0.10.0
    or is something else but, when I updated the version using the repo I realised that Datahub is not loading the
    glossaryterms
    , only the root nodes are shown. Children property of the glossary nodes is always empty
    Copy code
    {
      "status": "DB Updated. 1290 Glossary Terms created or updated. 2 Dashboard created or updated. 1 IA Models created or updated."
    }
    Someone had the same issue? Additionally, I downgraded to the previous version (
    v0.9.6.1
    ) but still facing the same problem. Thank you all in advance!!
    ✅ 1
    • 1
    • 1
  • b

    best-wire-59738

    02/10/2023, 1:19 PM
    Hello Team, I was facing the below issue in Lineage . when we click the Lineage tab for a dataset it take some time and pop up message “Failed to load results ! An unexpected error occurred” stay in same page with no data in the lineage. if we click the visualize Lineage then it shows the Lineage. Using the browser dev tools, I can see that a
    searchAcrossLineage
    GraphQL query was made and it is giving me 503 error. Also found the same issue reported few days back in the channel : https://datahubspace.slack.com/archives/C029A3M079U/p1673535713529589 We are currently on datahub v0.9.6.1
    a
    e
    • 3
    • 9
  • g

    green-hamburger-3800

    02/10/2023, 2:49 PM
    Hello folks, reporting some inconsistency between the documentation and behaviour of Glossary Entities: Within the GraphQL documentations we have the following:
    Copy code
    Create a new GlossaryNode. Returns the urn of the newly created GlossaryNode. If a node with the provided ID already exists, it will be overwritten.
    That is stated both for the
    createGlossaryNode
    and
    createGlossaryTerm
    mutations. But when I try to actually overwrite that I get the following error:
    14:42:31.364 [ForkJoinPool.commonPool-worker-49] ERROR c.l.d.g.r.g.CreateGlossaryNodeResolver:71 - Failed to create GlossaryNode with id: b4940ce5-ef8b-409f-a9c8-00588fda73a8, name: Status: This Glossary Node already exists!
    a
    • 2
    • 4
  • s

    silly-dog-87292

    02/10/2023, 8:08 PM
    Team, I am moving data within snowflake using tasks. I am not able to get the lineage between my raw table and final table. Attached is the log.
    exec-urn_li_dataHubExecutionRequest_e1c54149-19ce-4369-9ff8-299e7e235664.log
    a
    • 2
    • 1
  • p

    powerful-memory-77948

    02/10/2023, 8:33 PM
    Hi Everyone, Looking for help/suggestions to tackle an error that I'm encountering. I'm new to datahub. I deployed datahub on my mac M1 using the docker instructions. I was able to ingest sample data and also run a Kafka ingestion job. Now I'm looking to explore the Metadata Audit Event Consumer. I'm following the instructions documented here. I cloned the datahub project on my mac, after fixing all the basic environment related errors - I've hit this exception. I looked around in the slack channels to see if this problem has been discussed before. I could not find something close to this one. Command :
    ./gradlew :metadata-jobs:mae-consumer-job:bootRun
    Error :
    Copy code
    > Task :metadata-jobs:mae-consumer-job:bootRun FAILED
    ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
    
      .   ____          _            __ _ _
     /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
    ( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
     \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
      '  |____| .__|_| |_|_| |_\__, | / / / /
     =========|_|==============|___/=/_/_/_/
     :: Spring Boot ::               (v2.5.12)
    
    2023-02-10 12:15:53,225 [main] INFO  org.eclipse.jetty.util.log - Logging initialized @1412ms to org.eclipse.jetty.util.log.Slf4jLog
    2023-02-10 12:15:53,306 [main] INFO  org.eclipse.jetty.server.Server - jetty-9.4.45.v20220203; built: 2022-02-03T09:14:34.105Z; git: 4a0c91c0be53805e3fcffdcdcc9587d5301863db; jvm 11.0.18+0
    2023-02-10 12:15:53,319 [main] INFO  o.e.j.s.h.ContextHandler.application - Initializing Spring embedded WebApplicationContext
    2023-02-10 12:15:53,411 [main] INFO  org.eclipse.jetty.server.session - DefaultSessionIdManager workerName=node0
    2023-02-10 12:15:53,411 [main] INFO  org.eclipse.jetty.server.session - No SessionScavenger set, using defaults
    2023-02-10 12:15:53,412 [main] INFO  org.eclipse.jetty.server.session - node0 Scavenging every 660000ms
    2023-02-10 12:15:53,415 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.s.b.w.e.j.JettyEmbeddedWebAppContext@16c8e9b8{application,/,[file:///private/var/folders/gm/8g6pqmz169j1p9mzzfkz9k1w0000gn/T/jetty-docbase.9091.3795745778604343479/],AVAILABLE}
    2023-02-10 12:15:53,415 [main] INFO  org.eclipse.jetty.server.Server - Started @1603ms
    2023-02-10 12:15:53,482 [main] INFO  org.eclipse.jetty.server.session - node0 Stopped scavenging
    2023-02-10 12:15:53,483 [main] INFO  o.e.j.server.handler.ContextHandler - Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext@16c8e9b8{application,/,[file:///private/var/folders/gm/8g6pqmz169j1p9mzzfkz9k1w0000gn/T/jetty-docbase.9091.3795745778604343479/],STOPPED}
    ERROR LoggingFailureAnalysisReporter 
    
    ***************************
    APPLICATION FAILED TO START
    ***************************
    
    Description:
    
    Parameter 0 of constructor in com.linkedin.metadata.kafka.boot.DataHubUpgradeKafkaListener required a bean of type 'org.springframework.kafka.core.DefaultKafkaConsumerFactory' that could not be found.
    
    
    Action:
    
    Consider defining a bean of type 'org.springframework.kafka.core.DefaultKafkaConsumerFactory' in your configuration.
    
    
    FAILURE: Build failed with an exception.
    
    * What went wrong:
    Execution failed for task ':metadata-jobs:mae-consumer-job:bootRun'.
    ✅ 1
    b
    • 2
    • 2
  • s

    silly-angle-91497

    02/10/2023, 10:02 PM
    So we had Datahub running in AWS using v0.8.45 and just ran the updated helm charts to update to v0.10.0 and I'm seeing a lot of warnings in the gms pod:
    Copy code
    2023-02-10 21:53:46,052 [qtp447981768-17] WARN  c.d.a.a.AuthenticatorChain:80 - Authentication chain failed to resolve a valid authentication. Errors: [(com.datahub.authentication.authenticator.DataHubSystemAuthenticator,Failed to authenticate inbound request: Authorization header is missing Authorization header.), (com.datahub.authentication.authenticator.DataHubTokenAuthenticator,Failed to authenticate inbound request: Request is missing 'Authorization' header.)]
    The datahub site seems to be up and running. But when trying to search for entities I keep seeing a notification "An unknown error occurred. (Code 500). All the pods seem to be running and this warning is all I keep seeing.
    ✅ 1
    w
    • 2
    • 3
  • c

    chilly-ability-77706

    02/11/2023, 1:38 AM
    hI , I am getting elastic search exception, how do i resolve this error?
    i
    • 2
    • 1
  • c

    chilly-ability-77706

    02/11/2023, 1:38 AM
    Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [510422244/486.7mb], which is larger than the limit of [510027366/486.3mb], real usage: [510414672/486.7mb], new bytes reserved: [7572/7.3kb], usages [request=16440/16kb, fielddata=44528/43.4kb, in_flight_requests=7572/7.3kb, model_inference=0/0b, accounting=617200/602.7kb]] at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187) at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911) at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1888) at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1645) at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602) at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572) at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088) at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:57) ... 17 common frames omitted
    ✅ 1
    a
    • 2
    • 3
  • p

    powerful-cat-68806

    02/12/2023, 10:34 AM
    Hi team ☀️ I’m unable to connect my DB(
    pgSQL
    ) from DH deployment I’m able to connect from my local & from a connector(EC2) in the same VPC. I’ve also double-checked the PWD from k8s secret I’m using Following the error from the pod:
    Copy code
    Failed to pull image "acryldata/datahub-postgres-setup:v0.9.6.1": rpc error: code = Unknown desc = Error response from daemon: manifest for acryldata/datahub-postgres-setup:v0.9.6.1 not found: manifest unknown: manifest unknown
    I’ve tried to set the version to
    v0.9.6.1rc4
    as described here. No change Any idea?
    ✅ 1
    b
    a
    i
    • 4
    • 37
  • k

    kind-sunset-55628

    02/12/2023, 2:03 PM
    Hi Team, we have installed Datahub on Kubernetes and our Prod cluster is down with some issue. After few Snowflake Metadata ingestions, no new data in UI is getting reflected. Even I tried to add Domain and Secrets. The messages says successfully added, but they dont reflect in UI.
    👀 1
    a
    b
    • 3
    • 7
1...767778...119Latest