https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • b

    big-animal-76099

    08/29/2022, 4:02 PM
    i am not able to ingest metadata after i added OIDC auth
  • b

    bland-orange-13353

    08/29/2022, 4:13 PM
    This message was deleted.
  • r

    rich-machine-24265

    08/29/2022, 5:46 PM
    Hi! I have a question about "Compress lineage" feature. There is lineage of
    parent
    attached. Two datasets
    test-primary
    and
    test-secondary
    has upstream to it. test-primary and test-secondary are siblings - should they be collapsed into one node if compress lineage is enabled? Entity info in thread
    g
    r
    • 3
    • 16
  • a

    adamant-rain-51672

    08/29/2022, 5:54 PM
    Hey, have a question about datahub deployment on EKS. I followed the instruction specified in official tutorial: https://datahubproject.io/docs/deploy/aws All pods have status "RUNNING". However, when I log into the UI, I'm getting blank screen and logs are the following:
    Copy code
    Caused by: com.linkedin.r2.message.rest.RestException: Received error 404 from server for URI <http://datahub-datahub-gms:8080/dashboards>
            at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:76)
            ... 4 common frames omitted
    17:38:19 [Thread-256] WARN  n.g.e.SimpleDataFetcherExceptionHandler - Exception while fetching data (/search) : java.lang.RuntimeException: Failed to execute search: entity type CHART, query *, filters: [], start: 0, count: 20
    java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to execute search: entity type CHART, query *, filters: [], start: 0, count: 20
            at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
            at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
            at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1592)
            at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.RuntimeException: Failed to execute search: entity type CHART, query *, filters: [], start: 0, count: 20
            at com.linkedin.datahub.graphql.resolvers.search.SearchResolver.lambda$get$1(SearchResolver.java:62)
            at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
            ... 1 common frames omitted
    Caused by: com.linkedin.restli.client.RestLiResponseException: com.linkedin.restli.client.RestLiResponseException: Response status 404, serviceErrorMessage: No root resource defined for path '/charts'
            at com.linkedin.restli.internal.client.ExceptionUtil.wrapThrowable(ExceptionUtil.java:130)
            at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponseImpl(ResponseFutureImpl.java:130)
            at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponse(ResponseFutureImpl.java:94)
            at com.linkedin.chart.client.Charts.search(Charts.java:104)
            at com.linkedin.datahub.graphql.types.chart.ChartType.search(ChartType.java:98)
            at com.linkedin.datahub.graphql.resolvers.search.SearchResolver.lambda$get$1(SearchResolver.java:53)
            ... 2 common frames omitted
    Caused by: com.linkedin.restli.client.RestLiResponseException: RestException{_response=RestResponse[headers={Content-Length=7258, Date=Mon, 29 Aug 2022 17:38:19 GMT, Server=Jetty(9.4.46.v20220331), X-RestLi-Error-Response=true, X-RestLi-Protocol-Version=2.0.0},cookies=[],status=404,entityLength=7258]}
            at com.linkedin.restli.internal.client.ExceptionUtil.exceptionForThrowable(ExceptionUtil.java:102)
            at com.linkedin.restli.client.RestLiCallbackAdapter.convertError(RestLiCallbackAdapter.java:66)
            at com.linkedin.common.callback.CallbackAdapter.onError(CallbackAdapter.java:86)
            at com.linkedin.r2.message.timing.TimingCallback.onError(TimingCallback.java:81)
            at com.linkedin.r2.transport.common.bridge.client.TransportCallbackAdapter.onResponse(TransportCallbackAdapter.java:47)
            at com.linkedin.r2.filter.transport.FilterChainClient.lambda$createWrappedClientTimingCallback$0(FilterChainClient.java:113)
            at com.linkedin.r2.filter.transport.ResponseFilter.onRestError(ResponseFilter.java:79)
            at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
            at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
            at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
            at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
            at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
            at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
            at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
            at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
            at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
            at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
            at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
            at com.linkedin.r2.filter.message.rest.RestFilter.onRestError(RestFilter.java:84)
            at com.linkedin.r2.filter.TimedRestFilter.onRestError(TimedRestFilter.java:92)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:166)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnError(FilterChainIterator.java:132)
            at com.linkedin.r2.filter.FilterChainIterator.onError(FilterChainIterator.java:101)
            at com.linkedin.r2.filter.TimedNextFilter.onError(TimedNextFilter.java:48)
            at com.linkedin.r2.filter.transport.ClientRequestFilter.lambda$createCallback$0(ClientRequestFilter.java:102)
            at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:82)
            at com.linkedin.r2.transport.http.client.rest.ExecutionCallback.lambda$onResponse$0(ExecutionCallback.java:64)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            ... 1 common frames omitted
    Caused by: com.linkedin.r2.message.rest.RestException: Received error 404 from server for URI <http://datahub-datahub-gms:8080/charts>
            at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:76)
            ... 4 common frames omitted
    Do you maybe know what may have caused this issue?
    g
    b
    • 3
    • 16
  • b

    busy-airport-23391

    08/29/2022, 7:57 PM
    Hi all! I have a quick question regarding emitting validation results. I currently have a loop set up to push assertionInfo, assertionPlatform, and assertionResults to DataHub via a Python Kafka emitter. Everything is working properly, but of the three test assertions I've written, only the latest is showing up in the Validator tab. I'm having trouble finding an example of multiple assertions being pushed to DataHub -- do you have any pointers?
    l
    • 2
    • 5
  • b

    brave-businessperson-3969

    08/29/2022, 9:09 PM
    Is there an easy (=user friendly) way to undo a soft delete via the datahub tool? We have currently the problem that a table was soft-deleted a few days ago (because it was removed from DB). Now the table is back on the DB but won't show up in DataHub after scanning/ingesting this DB. Probably because the soft delete flag is still set and the ingestion process does not remove it? (DataHub 0.8.43, stateful ingestion)
    m
    • 2
    • 3
  • c

    cool-translator-98249

    08/29/2022, 11:46 PM
    Hi, I just got the install done and am trying a first few ingestions. When I do a dry run of our first CLI ingestion, I'm getting an error on the sink of:
    Copy code
    [2022-08-29 22:53:33,805] ERROR    {datahub.entrypoints:195} - Command failed: 
    	Tree is empty.
    The status check shows the sink running, does anyone know what might be happening?
    g
    • 2
    • 2
  • c

    cool-kitchen-48091

    08/30/2022, 5:50 AM
    Hey, I’m setting up LDAP with jaas.conf and it doesn’t seem to work properly. This is my config file:
    Copy code
    WHZ-Authentication {
      com.sun.security.auth.module.LdapLoginModule sufficient
      userProvider="ldap://****(no port provided?)/dc=office,dc=****,dc=com"
      authIdentity="{USERNAME}"
      userFilter="(|(sAMAccountName={USERNAME})(cn=*{USERNAME}*))"
      java.naming.security.authentication="simple"
      bindDn="CN=****,OU=****,OU=****,DC=office,DC=****,DC=com"
      bindCredential="****"
      debug="true"
      useSSL="false";
    };
    Do I need to provide a port? userFilter is as I need it to be. Am I missing params? Do I have a way to debug the lookup for the LDAP groups? Would appreciate a hint.
  • b

    bright-monitor-64617

    08/30/2022, 8:30 AM
    Hi! I have a problem when I try to deploy Datahub in local Kubernetes using Minikube. When I install the prerequisites helm, all the pods start working but elasticsearch-master, you can see in the attach photo. I am using the default configuration values. I was follow this guide Thanks!!
    • 1
    • 1
  • b

    big-animal-76099

    08/30/2022, 11:28 AM
    whole server goes down when i ingest metadata datahub
  • b

    brave-zebra-97479

    08/30/2022, 5:25 PM
    Hey everyone, I have a question about extending the metadata model. In the extending the metamodel page, the flowchart says that if you need to create a new entity, then you should fork the datahub repo rather than create your own model extension repository. However, on this page about custom metamodels it says "`entities: A list of entities with aspects attached to them that you are creating additional aspects for as well as any new entities you wish to define.`". This implies to me that using the custom repository would allow you to define new entities as well. Can someone please clarify this for me?
    m
    • 2
    • 5
  • a

    adamant-rain-51672

    08/30/2022, 7:03 PM
    Hey, following guide about deploying on eks: https://datahubproject.io/docs/deploy/aws/ I added my subdomain and replaced values for
    datahub-frontend
    Copy code
    datahub-frontend:
      enabled: true
      image:
        repository: linkedin/datahub-frontend-react
        tag: "latest"
      ingress:
        enabled: true
        annotations:
          <http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: alb
          <http://alb.ingress.kubernetes.io/scheme|alb.ingress.kubernetes.io/scheme>: internet-facing
          <http://alb.ingress.kubernetes.io/target-type|alb.ingress.kubernetes.io/target-type>: instance
          <http://alb.ingress.kubernetes.io/certificate-arn|alb.ingress.kubernetes.io/certificate-arn>: <<my-certificate-arn>>
          <http://alb.ingress.kubernetes.io/inbound-cidrs|alb.ingress.kubernetes.io/inbound-cidrs>: 0.0.0.0/0
          <http://alb.ingress.kubernetes.io/listen-ports|alb.ingress.kubernetes.io/listen-ports>: '[{"HTTP": 80}, {"HTTPS":443}]'
          <http://alb.ingress.kubernetes.io/actions.ssl-redirect|alb.ingress.kubernetes.io/actions.ssl-redirect>: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
        hosts:
          - host: <<my-ost-name>>
            redirectPaths:
              - path: /*
                name: ssl-redirect
                port: use-annotation
            paths:
              - /*
    Ofc, I replaced my arn and host. When I added this config and I tried to reach through my subdomain, I get blank page after providing credentials. Subdomain is loading fine which means that traffic is correctly routed. But for some reason I see blank page. Did you experience a similar issue? If not, are these values ones that should be used when exposing frontend on EKS?
    b
    o
    +2
    • 5
    • 6
  • v

    victorious-spoon-76468

    08/30/2022, 6:59 PM
    Hi! I am working on data profiling for a MSSQL datasource. I can run it locally with
    profiling.enabled: True
    in my recipe. I can see aspect names
    datasetProfile
    in the output, along with the values in the respective aspects being produced, in terms of
    rowCount
    ,
    columnCount
    ,
    fieldProfiles
    , etc. However, when I ingest onto the UI, with sink type set to
    datahub-rest
    , I don't see the data. The
    Stats
    and
    Validiation
    tabs remain disabled, even though i can see the CURL call being made to GMS in the ingestion logs with the with the exact same
    datasetProfile
    values . Any idea why this might be happening?
    b
    • 2
    • 2
  • c

    clever-garden-23538

    08/31/2022, 12:55 AM
    running a recipe with the file source and getting this error
    Copy code
    Failed to configure source (file) due to 2 validation errors for FileSourceConfig                                                                                                                                                                                             │
    │ filename                                                                                                                                                                                                                                                                      │
    │   field required (type=value_error.missing)                                                                                                                                                                                                                                   │
    │ path                                                                                                                                                                                                                                                                          │
    │   extra fields not permitted (type=value_error.extra)
    but docs say that filename is optional and path is required. i'm using the datahub-ingestion image at tag 0.8.43, has something changed since then? seems to be the latest stable release on dockerhub
    b
    m
    • 3
    • 2
  • a

    adamant-rain-51672

    08/31/2022, 9:43 AM
    Hey, trying to run Tableau ingestion on server deployed on EKS. Getting the following error:
    Copy code
    'Source (tableau) report:\n'
               "{'workunits_produced': '0',\n"
               " 'workunit_ids': [],\n"
               " 'warnings': {},\n"
               " 'failures': {'tableau-login': ['Unable to Login with credentials providedReason: \\n'\n"
               "                                '\\n'\n"
               "                                '\\t401001: Signin Error\\n'\n"
               "                                '\\t\\tError signing in to Tableau Server']},\n"
               " 'cli_version': '0.8.43',\n"
               " 'cli_entry_location': '/tmp/datahub/ingest/venv-7b77d206-7f5b-4a63-9969-96e360d4e070/lib/python3.9/site-packages/datahub/__init__.py',\n"
               " 'py_version': '3.9.9 (main, Dec 21 2021, 10:03:34) \\n[GCC 10.2.1 20210110]',\n"
               " 'py_exec_path': '/tmp/datahub/ingest/venv-7b77d206-7f5b-4a63-9969-96e360d4e070/bin/python3',\n"
               " 'os_details': 'Linux-5.4.209-116.363.amzn2.x86_64-x86_64-with-glibc2.31'}\n"
               'Sink (datahub-rest) report:\n'
               "{'records_written': '0', 'warnings': [], 'failures': [], 'gms_version': 'v0.8.43'}\n"
               '\n'
               'Pipeline finished with 1 failures in source producing 0 workunits\n',
               "2022-08-31 09:22:05.119103 [exec_id=7b77d206-7f5b-4a63-9969-96e360d4e070] INFO: Failed to execute 'datahub ingest'",
               '2022-08-31 09:22:05.119610 [exec_id=7b77d206-7f5b-4a63-9969-96e360d4e070] INFO: Caught exception EXECUTING '
               'task_id=7b77d206-7f5b-4a63-9969-96e360d4e070, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
               '    self.event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
               '    return f.result()\n'
               '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
               '    raise self._exception\n'
               '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
               '    result = coro.send(None)\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    Execution finished with errors.
    false
    It complains about login credentials, the problem is that I'm using exactly same credentials locally and there's not problem. Have you experienced a similar problem? Thanks!:)
    g
    • 2
    • 6
  • g

    great-motherboard-71467

    08/31/2022, 9:44 AM
    Hi guys, i`m trying to debug issue with ElasticSearch. I`m trying to use external ElasticSearch where the user/password were provided to me. ElasticSearch Server is using SSL. When i`m trying to force datahub-gms to use this elasticsearch instances i`m get following errors: When
    SKIP_ELASTICSEARCH_CHECK
    is set to false
    Copy code
    datahub-gms               | 2022/08/31 09:17:40 Problem with request: Get "<https://some.external.elasticsearch.eu:11920>": x509: certificate signed by unknown authority. Sleeping 1s
    When
    Copy code
    ELASTICSEARCH_SSL_PROTOCOL
    ELASTICSEARCH_SSL_TRUSTSTORE_FILE/TYPE
    ELASTICSEARCH_SSL_KEYSTORE_FILE/TYPE
    is undefined and only basic authorization is used
    Copy code
    datahub-gms               | 2022/08/31 09:20:07 Problem with request: Get "<https://some.external.elasticsearch.eu:11920>": x509: certificate signed by unknown authority. Sleeping 1s
    When
    SKIP_ELASTICSEARCH_CHECK
    IS SET TO TRUE and rest of
    ELASTICSEARCH_SSL_*
    NOT ENABLED During creation of Indexes
    Copy code
    datahub-gms               | Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.metadata.kafka.hook.UpdateIndicesHook]: Constructor threw exception; nested exception is java.lang.RuntimeException: Could not configure system metadata index
    datahub-gms               |     at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:224)
    datahub-gms               |     at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:117)
    datahub-gms               |     at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:311)
    datahub-gms               |     ... 42 common frames omitted
    datahub-gms               | Caused by: java.lang.RuntimeException: Could not configure system metadata index
    datahub-gms               |     at com.linkedin.metadata.systemmetadata.ElasticSearchSystemMetadataService.configure(ElasticSearchSystemMetadataService.java:203)
    datahub-gms               |     at com.linkedin.metadata.kafka.hook.UpdateIndicesHook.<init>(UpdateIndicesHook.java:83)
    datahub-gms               |     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    datahub-gms               |     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    datahub-gms               |     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    datahub-gms               |     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    datahub-gms               |     at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:211)
    datahub-gms               |     ... 44 common frames omitted
    datahub-gms               | Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    datahub-gms               |     at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:844)
    datahub-gms               |     at org.elasticsearch.client.RestClient.performRequest(RestClient.java:259)
    datahub-gms               |     at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
    datahub-gms               |     at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
    datahub-gms               |     at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1598)
    datahub-gms               |     at org.elasticsearch.client.IndicesClient.exists(IndicesClient.java:974)
    datahub-gms               |     at com.linkedin.metadata.search.elasticsearch.indexbuilder.ESIndexBuilder.buildIndex(ESIndexBuilder.java:51)
    datahub-gms               |     at com.linkedin.metadata.systemmetadata.ElasticSearchSystemMetadataService.configure(ElasticSearchSystemMetadataService.java:200)
    datahub-gms               |     ... 50 common frames omitted
    datahub-gms               | Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    datahub-gms               |     at sun.security.ssl.Alert.createSSLException(Alert.java:131)
    datahub-gms               |     at sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
    datahub-gms               |     at sun.security.ssl.TransportContext.fatal(TransportContext.java:267)
    datahub-gms               |     at sun.security.ssl.TransportContext.fatal(TransportContext.java:262)
    datahub-gms               |     at sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:654)
    datahub-gms               |     at sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:473)
    datahub-gms               |     at sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:369)
    datahub-gms               |     at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:377)
    datahub-gms               |     at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:444)
    datahub-gms               |     at sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:968)
    datahub-gms               |     at sun.security.ssl.SSLEngineImpl$DelegatedTask$DelegatedAction.run(SSLEngineImpl.java:955)
    datahub-gms               |     at java.security.AccessController.doPrivileged(Native Method)
    datahub-gms               |     at sun.security.ssl.SSLEngineImpl$DelegatedTask.run(SSLEngineImpl.java:902)
    datahub-gms               |     at org.apache.http.nio.reactor.ssl.SSLIOSession.doRunTask(SSLIOSession.java:285)
    datahub-gms               |     at org.apache.http.nio.reactor.ssl.SSLIOSession.doHandshake(SSLIOSession.java:345)
    datahub-gms               |     at org.apache.http.nio.reactor.ssl.SSLIOSession.isAppInputReady(SSLIOSession.java:523)
    datahub-gms               |     at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:120)
    datahub-gms               |     at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
    datahub-gms               |     at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
    datahub-gms               |     at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
    datahub-gms               |     at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
    datahub-gms               |     at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
    datahub-gms               |     at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
    datahub-gms               |     at java.lang.Thread.run(Thread.java:748)
    With provided own truststore and keystore, the same error as above with PKIX path building failed. With provided certs but with
    SKIP_ELASTICSEARCH_CHECK = FALSE
    then the same error as
    Copy code
    datahub-gms               | 2022/08/31 09:17:40 Problem with request: Get "<https://some.external.elasticsearch.eu:11920>": x509: certificate signed by unknown authority. Sleeping 1s
    Any idea what could be wrong ? I were trying to generate truststore and keystore in multiple way and even include cacerts which was provided to me. But all options are failing to me to enable elasticsearch secure communication @blue-megabyte-68048 might be you will be able to look in to it, as you were having similar problem Thanks in advance for hints
    s
    • 2
    • 2
  • h

    hallowed-dog-79615

    08/31/2022, 9:53 AM
    Greetings! I am not sure whether I found a bug or this an intended behavior. I am designing the privileges in our DataHub instance, in order to give different permissions to different user groups. That is how I realized that there is no mention to "GlossaryTerm Group" objects in the metadata privileges dropdown. This means that you cannot specifically define privileges for them, so I found this weird situation: a user has rights to "see entity page" for every object type in the dropdown menu, but cannot access those Glossary Term folders. She reaches the "Unauthorized" page when trying to access them. If I send her a link of a specific Term inside a folder, she can access it with no problem. But cannot access the entity page of the folder itself. So, in order to give her privileges, I had to create a rule, which allows her to "See Entity Page", and let the object type field empty, so it applied to every object within DataHub. As I said, I made sure that this was not fixed when selecting ALL the possible objects in the dropdown, and also I tried one by one. Not a single option, but leaving the field blank fixed the access to Term folders. Is this something intended? I think this is not desirable when creating Policies, you should be able to refer each specific urn/object type and design specific privileges, rather than relying on creating a "all types by default" rule. thanks!
    plus1 1
    b
    b
    • 3
    • 4
  • s

    stocky-minister-77341

    08/31/2022, 1:20 PM
    Hi, I tried to run a mysql ingestion from the cli, while connection to datahub that is deployed on k8s. It got stuck with the following error:
    Copy code
    Cli report:
    {'cli_version': '0.8.43.6',
     'cli_entry_location': '/Users/liza.raskin/datahub-env/lib/python3.9/site-packages/datahub/__init__.py',
     'py_version': '3.9.7 (default, Sep 16 2021, 08:50:36) \n[Clang 10.0.0 ]',
     'py_exec_path': '/Users/liza.raskin/datahub-env/bin/python3',
     'os_details': 'macOS-10.16-x86_64-i386-64bit'}
    Source (mysql) report:
    {'events_produced': '1001',
     'events_produced_per_sec': '13',
     'event_ids': ['container-info-trc-urn:li:container:abb952b2a1b5dfedb83e8e726f23be70',
                   'container-platforminstance-trc-urn:li:container:abb952b2a1b5dfedb83e8e726f23be70',
                   'container-subtypes-trc-urn:li:container:abb952b2a1b5dfedb83e8e726f23be70',
                   'container-urn:li:container:abb952b2a1b5dfedb83e8e726f23be70-to-urn:li:dataset:(urn:li:dataPlatform:mysql,trc._account_history_tmp,PROD)',
                   'trc._account_history_tmp',
                   'trc._account_history_tmp-subtypes',
                   '... 990 more elements',
                   'container-urn:li:container:abb952b2a1b5dfedb83e8e726f23be70-to-urn:li:dataset:(urn:li:dataPlatform:mysql,trc.fpp_client_placement_identifier_config,PROD)',
                   'trc.fpp_client_placement_identifier_config',
                   'trc.fpp_client_placement_identifier_config-subtypes',
                   'container-urn:li:container:abb952b2a1b5dfedb83e8e726f23be70-to-urn:li:dataset:(urn:li:dataPlatform:mysql,trc.fpp_experiment_variant_rules,PROD)',
                   'trc.fpp_experiment_variant_rules'],
     'warnings': {'trc.cra_predictions_task': ['unable to map type BIT(length=1) to metadata schema'],
                  'trc.dco_advertisers': ['unable to map type BIT(length=1) to metadata schema',
                                          'unable to map type BIT(length=1) to metadata schema',
                                          'unable to map type BIT(length=1) to metadata schema']},
     'failures': {},
     'tables_scanned': '333',
     'views_scanned': '0',
     'entities_profiled': '0',
     'filtered': [],
     'soft_deleted_stale_entities': [],
     'start_time': '2022-08-31 15:19:18.625657',
     'running_time_in_seconds': '72'}
    Sink (datahub-rest) report:
    {'total_records_written': '0',
     'records_written_per_second': '0',
     'warnings': [],
     'failures': [],
     'start_time': '2022-08-31 15:17:38.528119',
     'current_time': '2022-08-31 15:20:30.860222',
     'total_duration_in_seconds': '172.33',
     'gms_version': 'v0.8.43',
     'pending_requests': '1000'}
    
    :hourglass_flowing_sand: Pipeline running with 4 warnings so far; produced 1001 events
    looking at the datahub logs I see this error:
    Copy code
    org.apache.kafka.common.errors.TimeoutException: Topic MetadataChangeLog_Versioned_v1 not present in metadata after 60000 ms.
    Kafka pod is up and running Any ideas what can be causing this?
    b
    • 2
    • 1
  • r

    refined-barista-17110

    08/31/2022, 3:08 PM
    Hi All, I created new user groups and changed some privileges on existing defaults and now all of my permissions have changed 😞 We have google auth set-up and don't have the default password for datahub_user. Any suggestions on how to fix this issue?
    b
    • 2
    • 1
  • k

    kind-whale-32412

    08/31/2022, 8:01 PM
    Hi I am trying to deserialize
    /aspects/
    endpoint for
    ?aspect=editableSchemaMetadata&version=0
    format in Java. I noticed that for this we have to use LinkedIn's version of
    EditableSchemaMetadata
    (com.linkedin.schema.EditableSchemaMetadata) which does not work well with DataHub-GMS's response type; because eventually LinkedIn's version of this class is using
    TagUrn
    class where the constructor explicitly makes this call:
    Copy code
    super("tag", TupleKey.create(new Object[]{name}));
    which causes tags to have duplicate namespace ie if tag is called
    urn:li:tag:test
    after deserializing with LinkedIn's
    EditableSchemaMetadata
    class; I am getting
    urn:li:tag:urn:li:tag:test
    I have tried to use DataHub's one (
    io.datahubproject.openapi.generated.EditableSchemaMetadata
    ) however,
    MetadataChangeProposalWrapper
    's
    aspect
    method only accepts LinkedIn's
    EditableSchemaMetadata
    I'm wondering if I'm the only one that's ever hit this problem? Is there a way to get around with that issue? The only thing that comes to my mind is to write very complicated deserialization rule
  • c

    clever-garden-23538

    08/31/2022, 8:59 PM
    hey, i'm setting custom browse paths and they're working for the most part. except that the last segment in the browse paths im supplying don't appear when exploring through the ui. for example, if i had the browse path /PLATFORM/foo/bar/baz, I would expect to be able to click through the UI, drilling down into each folder. however, what i'm seeing is that when I click on "bar", i just see the full list of datasets under "bar", the "baz"-level of the folder hierarchy doesn't appear. when i click on one of them, though, i do see the "baz" in the path under the search bar.
    b
    • 2
    • 7
  • s

    steep-laptop-41463

    09/01/2022, 9:19 AM
    Hello, having trouble with datahub-lineage-file Try to add sample data from https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/file_lineage.yml And getting error (in thread)
    b
    • 2
    • 5
  • g

    great-motherboard-71467

    09/01/2022, 11:10 AM
    Hi community, As you know from my previous post i were able to connect to external ElasticSearch, provide ability to create indices. We created manually templates and policies to fit our security requirements. And here is the point i`m not able to find where the application is creating data streams. Can someone post me place where is the function/classes responsible for data stream creation/ Unfortunately i`m getting following error:
    Copy code
    datahub-gms               |     Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<https://some.server.eu:11920>], URI [/datahub_usage_event/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildc
    ards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
    datahub-gms               | Warnings: [[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices.]
    datahub-gms               | {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a ke
    yword field instead. Alternatively, set fielddata=true on [browserId] in order to load field data by uninverting the inverted index. Note that this can use significant memory."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","group
    ed":true,"failed_shards":[{"shard":0,"index":"datahub_usage_event","node":"xn4qYxnUQh6UvcZIJ0AGSQ","reason":{"type":"illegal_argument_exception","reason":"Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so t
    In the logs i`m not seeing anything related to the PUT method to create datahub_usage_event stream propperly All other creation of the index is made successful I was looking as well for this https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/elastic_search.py But here methods are more rather then for get some specific datastreams staff Any idea, suggestion ?
  • d

    delightful-barista-90363

    09/01/2022, 1:22 PM
    Hey, was wondering when the next release will be, waiting on some bug fixes to the spark lineage . Nvm spoke too soon
  • r

    ripe-alarm-85320

    09/01/2022, 6:54 PM
    Whats the cadence for the helm chart to be updated and pointing to the new release!
  • d

    delightful-barista-90363

    09/01/2022, 11:21 PM
    hey, wondering when the datahub-spark-lineage will get updated to 0.8.44 soon Noticed that check jars failed in CI/CD
    m
    • 2
    • 2
  • b

    bland-orange-13353

    09/02/2022, 4:34 AM
    This message was deleted.
    d
    • 2
    • 2
  • n

    numerous-account-62719

    09/02/2022, 6:40 AM
    @dazzling-judge-80093 @little-megabyte-1074 Please look into it on priority
  • g

    great-branch-515

    09/02/2022, 2:23 PM
    @here I am trying to integrate okta. Login is working but groups are not getting created. Any pointers to debug it further?
    c
    • 2
    • 3
  • a

    adamant-rain-51672

    09/02/2022, 3:35 PM
    Hey, anyone experiencing problems with running Tableau ingestion?
1...464748...119Latest