DataHub #troubleshoot

gray-ghost-82678

02/09/2023, 2:54 AM

Hello I am currently trying to connect a Microsoft SQL Server to Datahub but I am getting “No module named ‘Pyodbc’”. I have ODBC Drive 17 installed. I have reinstalled Pyodbc but it does nothing. I am using Windows. Thanks!

👀 1

thousands-bird-50049

02/09/2023, 7:19 AM

where can I get a list of fields that are able to be used in OrFilter on GraphQL?

best-wire-59738

02/09/2023, 9:16 AM

Hello Team, Is there any way we could restrict custom Authenticator plugin only for GraphQL API. Rest API will still be using Token authentication.

magnificent-lock-58916

02/09/2023, 9:32 AM

For some reason, Datahub front displays as if it still has Tableau entities that were hard deleted from backend. It seems to concern only visual navigation (count of platform entities, filters, folders) and such — once you actually open a list, there’s no entities there. It also doesn’t display recent ingest request (most recent one was at least feb 8th)

rhythmic-quill-75064

02/09/2023, 10:10 AM

Hi Team. I have a problem when switching from helm chart version 0.2.128 to version 0.2.129, with the same problem on version 0.2.130 (datahub 0.9.5 for this versions). Logs of

datahub-datahub-upgrade-job

Copy code

APPLICATION FAILED TO START Description:
Field kafkaHealthChecker in com.linkedin.gms.factory.kafka.DataHubKafkaEventProducerFactory required a bean of type 'com.linkedin.metadata.dao.producer.KafkaHealthChecker' that could not be found.
The injection point has the following annotations:
@javax.inject.Inject() @javax.inject.Named(value="noCodeUpgrade") Action: Consider defining a bean of type 'com.linkedin.metadata.dao.producer.KafkaHealthChecker' in your configuration.

✅ 2

👀 1

chilly-potato-57465

02/09/2023, 10:36 AM

Hello Everyone! I am trying out the GraphQL API on the public demo instance to query for some data. I use the following table as an example (see attached screenshot) and querying table owners, tags, terms and domain. Although the table has owners, tags, terms and domain I receive empty arrays. I am wondering what I am doing wrong. Thank you!

Copy code

{
       dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.active_customer_ltv,PROD)") {
         properties{name},
         editableProperties{description},
         type,
         platform{name},
         ownership{owners{associatedUrn}},
         tags{tags{tag{urn}}},
         glossaryTerms{terms{term{urn}}},
         domain{domain{urn}}
       }
     }

strong-kite-83354

02/09/2023, 11:33 AM

Has something changed with GraphQL between versions v0.9.6 and v0.10.0? I have queries that ran under the old version but not the new. The Changelog suggests there have been changes in the GraphQL system but not ones I'd expect to break queries. I used to be able to do a query like this with a variable like {"query_string": "customProperties: file_hash=046ca7acf03915c611818e20837c2e4b8756e885"} where file_hash is a customProperty which I can see in the GUI.

Copy code

query my_query($query_string:String!){
  search(input: { type: DATASET, query: $query_string, start: 0, count: 100 }) {
                start
                count
                total
                searchResults {
                entity {
                    urn
                    type
                    ...on Dataset {
                        name
                        properties {customProperties {key
                        value
                        }}
                    }
                }
                }
            }
}

bumpy-pharmacist-66525

02/09/2023, 2:58 PM

Hi everyone! I recently upgraded to v0.9.6.1 (from v0.9.2) and am encountering a strange bug. When I am on the explore page (

Explore All

button from the homepage), if I try to use the advanced filter, some of the options like filtering on tags does not work. It seems that the API call which is supposed to populate the dropdown is not returning anything. However, it does seem like the

basic

filtering works (the issue is related to the API call which is supposed to populate the dropdown menus). Does anyone have an idea of what is happening and/or have an idea of how to fix it? Here is a step by step set of instructions on how to reproduce the issue (start on the homepage of DataHub): 1. Select

Explore All

2. Under the 'Filter' column, select

Advanced

3. Select

Add Filter

4. Select

Tag

5. Search for a tag which exists in your datahub instance (in my case, I always get 'no data')

faint-actor-78390

02/09/2023, 3:32 PM

Hi all, Upgrade to v0.10 with command : docker pull acryldata/datahub-upgrade:head && docker run acryldata/datahub-upgrade:head -u NoCodeDataMigration Got error : Status: Image is up to date for acryldata/datahub-upgrade:head docker.io/acryldata/datahub-upgrade:head ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

stocky-apple-7404

02/09/2023, 6:29 PM

Hello Team... We are trying to configure SSO with AzureAD. We added below environment variables to frontend-react. - AUTH_OIDC_ENABLED=true - AUTH_OIDC_CLIENT_ID=xxxxx - AUTH_OIDC_CLIENT_SECRET=xxxx - AUTH_OIDC_DISCOVERY_URI=https://login.microsoftonline.com/xxxx/v2.0/.well-known/openid-configuration - AUTH_OIDC_BASE_URL=https://xxxxx - AUTH_OIDC_SCOPE="openid profile email" And we configured the application in AzureAD as documented in https://datahubproject.io/docs/authentication/guides/sso/configure-oidc-react-azure We are receiving below error. What are we missing? Any lead to troubleshoot this issue is highly appriciated. datahub-frontend-react | 2023-02-09 181854,210 [application-akka.actor.default-dispatcher-19] ERROR controllers.AuthenticationController - Caught exception while attempting to redirect to SSO identity provider! It's likely that SSO integration is mis-configured datahub-frontend-react | org.pac4j.core.exception.TechnicalException: com.nimbusds.oauth2.sdk.ParseException: The scope must include an "openid" value

mysterious-motorcycle-80650

02/09/2023, 6:52 PM

@here hi, exists a way to export all data inside of datahub in a one shot to be use to import in another datahub environment

wooden-hamburger-59537

02/09/2023, 7:04 PM

Hello team, I have a datahub setup on K8s cluster running on AWS, using some AWS managed services like RDS, MSK, Opensearch, etc… It run fine at first but after a while (couple of day) it stop working, when I see log, the gms pod is failing with this weird log>

Copy code

kubectl get pods -n my-datahub                                          
NAME                                             READY   STATUS             RESTARTS          AGE
my-acryl-datahub-actions-7f7dbcb7cb-jwlm7   0/1     CrashLoopBackOff   126 (2m39s ago)   18h
my-cp-schema-registry-5cbf4478f-2xgnt       2/2     Running            0                 51m
my-datahub-frontend-795fb7dd7d-qj7p9        1/1     Running            0                 52m
my-datahub-gms-7466d54b7-5hwxz              0/1     CrashLoopBackOff   194 (4m55s ago)   17h

kubectl logs my-datahub-gms-7466d54b7-5hwxz  -n my-datahub  
+ echo
+ grep -q ://
+ NEO4J_HOST=http://
+ [[ ! -z datahubes ]]
+ [[ -z '' ]]
++ base64 --wrap 0
++ echo -ne 'username:password'
+ AUTH_TOKEN=username:password
+ ELASTICSEARCH_AUTH_HEADER='Authorization:Basic username:password'
+ [[ -z Authorization:Basic username:password ]]
+ [[ true == true ]]
+ ELASTICSEARCH_PROTOCOL=https
+ WAIT_FOR_EBEAN=
+ [[ '' != true ]]
+ [[ '' == ebean ]]
+ [[ -z '' ]]
+ WAIT_FOR_EBEAN=' -wait <tcp://my-datahub-dev-ue1.cluster-xxxxxxxxx.us-east-1.rds.amazonaws.com:3306> '
+ WAIT_FOR_CASSANDRA=
+ [[ '' == cassandra ]]
+ WAIT_FOR_KAFKA=
+ [[ '' != true ]]
++ echo <http://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096|b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096,b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
++ sed 's/,/ -wait tcp:\/\//g'
+ WAIT_FOR_KAFKA=' -wait <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> '
+ WAIT_FOR_NEO4J=
+ [[ elasticsearch != elasticsearch ]]
+ OTEL_AGENT=
+ [[ '' == true ]]
+ PROMETHEUS_AGENT=
+ [[ true == true ]]
+ PROMETHEUS_AGENT='-javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml '
+ auth_resource_dir=/etc/datahub/plugins/auth/resources
+ CLASSES_DIR=
+ [[ '' == true ]]
+ COMMON='
     -wait <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306>            -wait <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>           -timeout 240s     java            -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml      -jar /jetty-runner.jar     --jar jetty-util.jar     --jar jetty-jmx.jar      --config /datahub/datahub-gms/scripts/jetty.xml     /datahub/datahub-gms/bin/war.war'
+ [[ false != true ]]
+ exec dockerize -wait <https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443> -wait-http-header 'Authorization:Basic ZGF0YWh1YmVzOlhjTyEyNlNPKmI6VThQOmxyLTJnTjZOZDYwNXQ3PU0rK2l7PA==' -wait <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306> -wait <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -wait <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> -timeout 240s java -javaagent:jmx_prometheus_javaagent.jar=4318:/datahub/datahub-gms/scripts/prometheus-config.yaml -jar /jetty-runner.jar --jar jetty-util.jar --jar jetty-jmx.jar --config /datahub/datahub-gms/scripts/jetty.xml /datahub/datahub-gms/bin/war.war
2023/02/09 18:31:19 Waiting for: <https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443>
2023/02/09 18:31:19 Waiting for: <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306>
2023/02/09 18:31:19 Waiting for: <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
2023/02/09 18:31:19 Waiting for: <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
2023/02/09 18:31:19 Waiting for: <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
2023/02/09 18:31:19 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:31:19 Connected to <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
2023/02/09 18:31:19 Connected to <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306>
2023/02/09 18:31:19 Connected to <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>
2023/02/09 18:31:20 Received 200 from <https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443>
2023/02/09 18:31:20 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:31:21 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
....... lots of log look like this .................
2023/02/09 18:33:27 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:33:28 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:33:29 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:33:30 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:33:31 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:33:32 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:35:18 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:35:19 Problem with dial: dial tcp 10.194.49.47:9096: connect: connection refused. Sleeping 1s
2023/02/09 18:35:19 Timeout after 4m0s waiting on dependencies to become available: [<https://vpc-my-datahub-xxxxxx.us-east-1.es.amazonaws.com:443> <tcp://my-datahub-dev-ue1.cluster-xxxxxx.us-east-1.rds.amazonaws.com:3306> <tcp://b-1.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> <tcp://b-2.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096> <tcp://b-3.mydatahub.xxxxx.c5.kafka.us-east-1.amazonaws.com:9096>]

so what is wrong with it?

✅ 1

glamorous-elephant-17130

02/09/2023, 7:27 PM

Hey guys, I updated datahub and now I am getting this error when trying to use profiling with Athena. Any clue?

✅ 1

glamorous-elephant-17130

02/09/2023, 7:34 PM

based on the documentation, profiling is supposed to be a valid key for athena

glamorous-elephant-17130

02/09/2023, 7:34 PM

Facing same issue with AWS glue as well

quaint-barista-82836

02/09/2023, 7:34 PM

Hi Team, I am getting issues to retrieve data from Datahub UI from advanced filter however, as I want to do a search on all platforms with specific tags. For now can only see the list in basic filter but can only select one platform at a time. Any suggestions for the same:

👀 3

quaint-barista-82836

02/09/2023, 11:03 PM

Hi Team, I have enabled column level profiling for bigquery, but not getting any for string fields, any suggestions : Error log : Failed to get unique count for column project_nameFBT.FBT_Diff.Anchor_Item Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 203, in _execute self._query_job.result() File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 1499, in result do_get_result() File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 354, in retry_wrapped_func on_error=on_error, File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 191, in retry_target return target() File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job/query.py", line 1489, in do_get_result super(QueryJob, self).result(retry=retry, timeout=timeout) File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job/base.py", line 728, in result return super(_AsyncJob, self).result(timeout=timeout, **kwargs) File "/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py", line 261, in result raise self._exception google.api_core.exceptions.BadRequest: 400 Unrecognized name: `"Anchor_Item"`; Did you mean Anchor_Item? at [1:30] Location: US Job ID: ad9f0d21-f927-4974-a896-8d54b0f8a655 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/_helpers.py", line 494, in with_closed_check return method(self, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 167, in execute formatted_operation, parameters, job_id, job_config, parameter_types File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 205, in _execute raise exceptions.DatabaseError(exc) google.cloud.bigquery.dbapi.exceptions.DatabaseError: 400 Unrecognized name: `"Anchor_Item"`; Did you mean Anchor_Item? at [1:30] Location: US Job ID: ad9f0d21-f927-4974-a896-8d54b0f8a655 The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 331, in _get_column_cardinality unique_count = self.dataset.get_column_unique_count(column) File "/usr/local/lib/python3.7/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 130, in get_column_unique_count_patch ).select_from(self._table) File "/usr/local/lib/python3.7/site-packages/datahub/utilities/sqlalchemy_query_combiner.py", line 272, in _sa_execute_fake return _sa_execute_underlying_method(conn, query, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1380, in execute return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS) File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 335, in _execute_on_connection self, multiparams, params, execution_options File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1582, in _execute_clauseelement cache_hit=cache_hit, File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1944, in _execute_context e, statement, parameters, cursor, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2125, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=exc_info[2], from_=e File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 211, in raise_ raise exception File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context cursor, statement, parameters, context File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/_helpers.py", line 494, in with_closed_check return method(self, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 167, in execute formatted_operation, parameters, job_id, job_config, parameter_types File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/dbapi/cursor.py", line 205, in _execute raise exceptions.DatabaseError(exc) sqlalchemy.exc.DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 Unrecognized name: `"Anchor_Item"`; Did you mean Anchor_Item? at [1:30] Location: US Job ID: ad9f0d21-f927-4974-a896-8d54b0f8a655 [SQL: SELECT APPROX_COUNT_DISTINCT(

"Anchor_Item"

) FROM

project_name-thd

FBT.FBT_Diff

] (Background on this error at: https://sqlalche.me/e/14/4xp6)

average-dinner-25106

02/10/2023, 4:34 AM

Hi, I executed the 'timeline API' to the table 'A' several days ago with 'DOCUMENTATION' category and got the history of adding/modifying it (as the first figure shows). But today doing same work to 'A' got response with empty dictionary like the second figure. What's strange is that all tables from postrgres got the same result. Other tables from types except postgres such as hive showed the history well. Why those problems occur?

teamwork 1

✅ 1

astonishing-cartoon-6079

02/10/2023, 5:59 AM

#troubleshoot it occured error in the front when i viewing dateset belong to a tag. and it work well when pageNum is small, but it always fail when pageNum is more than 309. My colleage and i found it may be timeout when accessing graphql api

searchAcrossEntities

We looked through related code and found that

com.linkedin.metadata.search.cache.CacheableSearcher#getSearchResults

is the root case. just like its comment, it walks through all over the index from the beginning even though from a big pageNum. we are confused about this logic, I don't know what's side effect if using elasticSearch paging interface directly.

Copy code

/**
   * Get search results corresponding to the input "from" and "size"
   * It goes through batches, starting from the beginning, until we get enough results to return
   * This let's us have batches that return a variable number of results (we have no idea which batch the "from" "size" page corresponds to)
   */
public SearchResult getSearchResults(int from, int size) {
  try (Timer.Context ignored = MetricUtils.timer(this.getClass(), "getSearchResults").time()) {
    int resultsSoFar = 0;
    int batchId = 0;
    boolean foundStart = false;
    List<SearchEntity> resultEntities = new ArrayList<>();
    SearchResult batchedResult;
    // Use do-while to make sure we run at least one batch to fetch metadata
    do {
      batchedResult = getBatch(batchId);
      int currentBatchSize = batchedResult.getEntities().size();
      // If the number of results in this batch is 0, no need to continue
      if (currentBatchSize == 0) {
        break;
      }
      if (resultsSoFar + currentBatchSize > from) {
        int startInBatch = foundStart ? 0 : from - resultsSoFar;
        int endInBatch = Math.min(currentBatchSize, startInBatch + size - resultEntities.size());
        resultEntities.addAll(batchedResult.getEntities().subList(startInBatch, endInBatch));
        foundStart = true;
      }
      // If current batch is smaller than the requested batch size, the next batch will return empty.
      if (currentBatchSize < batchSize) {
        break;
      }
      resultsSoFar += currentBatchSize;
      batchId++;
    } while (resultsSoFar < from + size);
    return new SearchResult().setEntities(new SearchEntityArray(resultEntities))
        .setMetadata(batchedResult.getMetadata())
        .setFrom(from)
        .setPageSize(size)
        .setNumEntities(batchedResult.getNumEntities());
  }
}

plus1 2

bland-orange-13353

02/10/2023, 7:14 AM

This message was deleted.

elegant-article-21703

02/10/2023, 11:25 AM

Hi everyone! I'm not sure if this is an issue of the newest

v0.10.0

or is something else but, when I updated the version using the repo I realised that Datahub is not loading the

glossaryterms

, only the root nodes are shown. Children property of the glossary nodes is always empty

Copy code

{
  "status": "DB Updated. 1290 Glossary Terms created or updated. 2 Dashboard created or updated. 1 IA Models created or updated."
}

Someone had the same issue? Additionally, I downgraded to the previous version (

v0.9.6.1

) but still facing the same problem. Thank you all in advance!!

✅ 1

best-wire-59738

02/10/2023, 1:19 PM

Hello Team, I was facing the below issue in Lineage . when we click the Lineage tab for a dataset it take some time and pop up message “Failed to load results ! An unexpected error occurred” stay in same page with no data in the lineage. if we click the visualize Lineage then it shows the Lineage. Using the browser dev tools, I can see that a

searchAcrossLineage

GraphQL query was made and it is giving me 503 error. Also found the same issue reported few days back in the channel : https://datahubspace.slack.com/archives/C029A3M079U/p1673535713529589 We are currently on datahub v0.9.6.1

green-hamburger-3800

02/10/2023, 2:49 PM

Hello folks, reporting some inconsistency between the documentation and behaviour of Glossary Entities: Within the GraphQL documentations we have the following:

Copy code

Create a new GlossaryNode. Returns the urn of the newly created GlossaryNode. If a node with the provided ID already exists, it will be overwritten.

That is stated both for the

createGlossaryNode

and

createGlossaryTerm

mutations. But when I try to actually overwrite that I get the following error:

14:42:31.364 [ForkJoinPool.commonPool-worker-49] ERROR c.l.d.g.r.g.CreateGlossaryNodeResolver:71 - Failed to create GlossaryNode with id: b4940ce5-ef8b-409f-a9c8-00588fda73a8, name: Status: This Glossary Node already exists!

silly-dog-87292

02/10/2023, 8:08 PM

Team, I am moving data within snowflake using tasks. I am not able to get the lineage between my raw table and final table. Attached is the log.

exec-urn_li_dataHubExecutionRequest_e1c54149-19ce-4369-9ff8-299e7e235664.log

powerful-memory-77948

02/10/2023, 8:33 PM

Hi Everyone, Looking for help/suggestions to tackle an error that I'm encountering. I'm new to datahub. I deployed datahub on my mac M1 using the docker instructions. I was able to ingest sample data and also run a Kafka ingestion job. Now I'm looking to explore the Metadata Audit Event Consumer. I'm following the instructions documented here. I cloned the datahub project on my mac, after fixing all the basic environment related errors - I've hit this exception. I looked around in the slack channels to see if this problem has been discussed before. I could not find something close to this one. Command :

./gradlew :metadata-jobs:mae-consumer-job:bootRun

Error :

Copy code

> Task :metadata-jobs:mae-consumer-job:bootRun FAILED
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::               (v2.5.12)

2023-02-10 12:15:53,225 [main] INFO  org.eclipse.jetty.util.log - Logging initialized @1412ms to org.eclipse.jetty.util.log.Slf4jLog
2023-02-10 12:15:53,306 [main] INFO  org.eclipse.jetty.server.Server - jetty-9.4.45.v20220203; built: 2022-02-03T09:14:34.105Z; git: 4a0c91c0be53805e3fcffdcdcc9587d5301863db; jvm 11.0.18+0
2023-02-10 12:15:53,319 [main] INFO  o.e.j.s.h.ContextHandler.application - Initializing Spring embedded WebApplicationContext
2023-02-10 12:15:53,411 [main] INFO  org.eclipse.jetty.server.session - DefaultSessionIdManager workerName=node0
2023-02-10 12:15:53,411 [main] INFO  org.eclipse.jetty.server.session - No SessionScavenger set, using defaults
2023-02-10 12:15:53,412 [main] INFO  org.eclipse.jetty.server.session - node0 Scavenging every 660000ms
2023-02-10 12:15:53,415 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.s.b.w.e.j.JettyEmbeddedWebAppContext@16c8e9b8{application,/,[file:///private/var/folders/gm/8g6pqmz169j1p9mzzfkz9k1w0000gn/T/jetty-docbase.9091.3795745778604343479/],AVAILABLE}
2023-02-10 12:15:53,415 [main] INFO  org.eclipse.jetty.server.Server - Started @1603ms
2023-02-10 12:15:53,482 [main] INFO  org.eclipse.jetty.server.session - node0 Stopped scavenging
2023-02-10 12:15:53,483 [main] INFO  o.e.j.server.handler.ContextHandler - Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext@16c8e9b8{application,/,[file:///private/var/folders/gm/8g6pqmz169j1p9mzzfkz9k1w0000gn/T/jetty-docbase.9091.3795745778604343479/],STOPPED}
ERROR LoggingFailureAnalysisReporter 

***************************
APPLICATION FAILED TO START
***************************

Description:

Parameter 0 of constructor in com.linkedin.metadata.kafka.boot.DataHubUpgradeKafkaListener required a bean of type 'org.springframework.kafka.core.DefaultKafkaConsumerFactory' that could not be found.


Action:

Consider defining a bean of type 'org.springframework.kafka.core.DefaultKafkaConsumerFactory' in your configuration.


FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':metadata-jobs:mae-consumer-job:bootRun'.

✅ 1

silly-angle-91497

02/10/2023, 10:02 PM

So we had Datahub running in AWS using v0.8.45 and just ran the updated helm charts to update to v0.10.0 and I'm seeing a lot of warnings in the gms pod:

Copy code

2023-02-10 21:53:46,052 [qtp447981768-17] WARN  c.d.a.a.AuthenticatorChain:80 - Authentication chain failed to resolve a valid authentication. Errors: [(com.datahub.authentication.authenticator.DataHubSystemAuthenticator,Failed to authenticate inbound request: Authorization header is missing Authorization header.), (com.datahub.authentication.authenticator.DataHubTokenAuthenticator,Failed to authenticate inbound request: Request is missing 'Authorization' header.)]

The datahub site seems to be up and running. But when trying to search for entities I keep seeing a notification "An unknown error occurred. (Code 500). All the pods seem to be running and this warning is all I keep seeing.

✅ 1

chilly-ability-77706

02/11/2023, 1:38 AM

hI , I am getting elastic search exception, how do i resolve this error?

chilly-ability-77706

02/11/2023, 1:38 AM

Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [510422244/486.7mb], which is larger than the limit of [510027366/486.3mb], real usage: [510414672/486.7mb], new bytes reserved: [7572/7.3kb], usages [request=16440/16kb, fielddata=44528/43.4kb, in_flight_requests=7572/7.3kb, model_inference=0/0b, accounting=617200/602.7kb]] at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187) at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911) at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1888) at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1645) at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602) at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572) at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088) at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:57) ... 17 common frames omitted

✅ 1

powerful-cat-68806

02/12/2023, 10:34 AM

Hi team ☀️ I’m unable to connect my DB(

pgSQL

) from DH deployment I’m able to connect from my local & from a connector(EC2) in the same VPC. I’ve also double-checked the PWD from k8s secret I’m using Following the error from the pod:

Copy code

Failed to pull image "acryldata/datahub-postgres-setup:v0.9.6.1": rpc error: code = Unknown desc = Error response from daemon: manifest for acryldata/datahub-postgres-setup:v0.9.6.1 not found: manifest unknown: manifest unknown

I’ve tried to set the version to

v0.9.6.1rc4

as described here. No change Any idea?

✅ 1

kind-sunset-55628

02/12/2023, 2:03 PM

Hi Team, we have installed Datahub on Kubernetes and our Prod cluster is down with some issue. After few Snowflake Metadata ingestions, no new data in UI is getting reflected. Even I tried to add Domain and Secrets. The messages says successfully added, but they dont reflect in UI.

👀 1