https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • a

    adamant-rain-51672

    09/23/2022, 12:40 PM
    Is there a guide on datahub version upgrade on EKS/Kubernetes? I upgraded but there seems to be an error with defined ingestions.
    b
    c
    • 3
    • 3
  • e

    early-oil-62555

    09/21/2022, 12:47 PM
    Hi Team, we are noticing the below error while doing search on datahub, In the logs it says
    Copy code
    2:33:26.645 [pool-10-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /aspects/urn%3Ali%3Atelemetry%3AclientId?aspect=telemetryClientId&version=0 - get - 200 - 5ms
    12:33:27.303 [ForkJoinPool.commonPool-worker-11] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:21 - Failed to execute DataFetcher
    java.lang.NullPointerException: null
    	at com.linkedin.datahub.graphql.resolvers.load.EntityTypeBatchResolver.lambda$get$0(EntityTypeBatchResolver.java:46)
    	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
    	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
    	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    can anyone please help on this? datahub version is 0.8.38.4
    m
    • 2
    • 1
  • s

    silly-oil-35180

    09/26/2022, 6:33 AM
    Hello team. I have trouble using GMS api. I read guide about getting aspect from urn(https://github.com/datahub-project/datahub/blob/master/metadata-service/README.md?plain=1#L1372). However, when I sent api request to GMS, I got this error.
    Copy code
    'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat com.linkedin.metadata.restli.RestliUtil.resourceNotFoundException(RestliUtil.java:79)\n\tat com.linkedin.metadata.restli.RestliUtil.resourceNotFoundException(RestliUtil.java:74)\n\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$get$0(AspectResource.java:81)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:30)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)\n\tat com.linkedin.metadata.resources.entity.AspectResource.get(AspectResource.java:78)\n\tat sun.reflect.GeneratedMethodAccessor344.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:177)\n\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.java:333)\n\tat com.linkedin.restli.internal.server.filter.FilterChainDispatcherImpl.onRequestSuccess(FilterChainDispatcherImpl.java:47)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:86)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.lambda$onRequest$0(RestLiFilterChainIterator.java:73)\n\tat java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)\n\tat java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683)\n\tat java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:72)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChain.onRequest(RestLiFilterChain.java:55)\n\tat com.linkedin.restli.server.BaseRestLiServer.handleResourceRequest(BaseRestLiServer.java:262)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequestWithRestLiResponse(RestRestLiServer.java:294)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:262)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:232)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:130)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:106)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:63)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)\n\tat org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)\n\tat com.datahub.authentication.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:88)\n\tat org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)\n\tat org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\n\tat java.lang.Thread.run(Thread.java:748)\n',
    Is it impossible to get aspects data by using GMS api?
    b
    • 2
    • 9
  • m

    microscopic-mechanic-13766

    09/26/2022, 7:30 AM
    Hello everyone, I am trying to integrate Datahub (v0.8.44) with Apache Ranger (2.2.0). I have managed to connect Datahub to Ranger, although I am getting some errors that I don't know why they are occuring (although I think the errors obtained from Ranger are the source of the errors obtained in Ranger). Errors obtained in the GMS container:
    Copy code
    WARN  o.a.r.a.client.RangerAdminRESTClient:1228 - Received 404 error code with body:[null], Ignoring
    WARN  o.a.r.a.client.RangerAdminRESTClient:868 - Error getting policies. secureMode=false, user=datahub (auth:SIMPLE), response={"httpStatusCode":400,"statusCode":0}, serviceName=ranger_datahub
    WARN  o.a.r.plugin.util.PolicyRefresher:393 - cache file does not exist or not readable '/tmp/datahub_ranger_datahub.json'
    ERROR o.a.r.a.client.RangerAdminRESTClient:1220 - Error getting Roles; service not found. secureMode=false, user=datahub (auth:SIMPLE), response=404, serviceName=ranger_datahub, lastKnownRoleVersion=-1, lastActivationTimeInMillis=1663933107990
    Errors obtained in Ranger trying to create the service named "ranger_datahub":
    Copy code
    Error! Datahub failed to find service class com.datahub.authorizer.plugin.ranger.DataHubRangerAuthPlugin. Resource lookup will not be available. Please make sure plugin jar is in the correct place.
    I don't know why I am obtaining the latter error as I have downloaded the datahub plugin in the route
    /opt/ranger-2.2.0-admin/ews/webapp/WEB-INF/claases/ranger-plugins/datahub
    and execute the curl command to define the service inside of Ranger. Any help would be appreciated.
    c
    • 2
    • 2
  • n

    numerous-account-62719

    09/26/2022, 8:14 AM
    Hi Team I just created a new user in datahub. The login was successful but what I saw is that any user can change the policies in datahub. Do we have some feature in which only administrator can assign roles and policies to users and users will not be able to change them?? Please help me out on priority
    m
    • 2
    • 6
  • w

    wonderful-egg-79350

    09/26/2022, 9:00 AM
    Hello All. I have a question about datahub library Like Below URL https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/dataset_schema.py How could I find library source code like 'datahub.metadata.schema_classes'?
    b
    • 2
    • 1
  • f

    fancy-alligator-33404

    09/26/2022, 9:12 AM
    hello. An error occurred while deleting the meta information of datahub gms.
    datahub delete --urn "urn:li:dataset:(dataPlatform:hive,gsc_ods.sstp_stp_item,PROD)" --soft
    When I run this command, I get the following error message:
    Copy code
    [2022-09-26 09:00:06,643] INFO     {datahub.cli.delete_cli:142} - DataHub configured with <http://datahub-datahub-gms:8080>
    [2022-09-26 09:00:07,586] ERROR    {datahub.entrypoints:188} - Command failed with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: INTERNAL SERVER ERROR\n\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:210)\n\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.java:333)\n\tat com.linkedin.restli.internal.server.filter.FilterChainDispatcherImpl.onRequestSuccess(FilterChainDispatcherImpl.java:47)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:86)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.lambda$onRequest$0(RestLiFilterChainIterator.java:73)\n\tat java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)\n\tat java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:683)\n\tat java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2010)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:72)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChain.onRequest(RestLiFilterChain.java:55)\n\tat com.linkedin.restli.server.BaseRestLiServer.handleResourceRequest(BaseRestLiServer.java:262)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequestWithRestLiResponse(RestRestLiServer.java:294)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:262)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:232)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:130)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:106)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:63)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)\n\tat org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)\n\tat com.datahub.authentication.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:88)\n\tat org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)\n\tat org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: com.linkedin.data.template.TemplateOutputCastException: Invalid URN syntax: Urn doesn't start with 'urn:'. Urn: dataPlatform:hive at index 0: dataPlatform:hive\n\tat com.linkedin.common.urn.UrnCoercer.coerceOutput(UrnCoercer.java:27)\n\tat com.linkedin.common.urn.UrnCoercer.coerceOutput(UrnCoercer.java:12)\n\tat com.linkedin.data.template.DataTemplateUtil.coerceCustomOutput(DataTemplateUtil.java:1126)\n\tat com.linkedin.metadata.key.DatasetKey.getPlatform(DatasetKey.java:126)\n\tat com.linkedin.metadata.search.utils.BrowsePathUtils.getDefaultBrowsePath(BrowsePathUtils.java:42)\n\tat com.linkedin.metadata.search.utils.BrowsePathUtils.buildBrowsePath(BrowsePathUtils.java:29)\n\tat com.linkedin.metadata.entity.EntityService.generateDefaultAspectsIfMissing(EntityService.java:1087)\n\tat com.linkedin.metadata.resources.entity.AspectUtils.getAdditionalChanges(AspectUtils.java:36)\n\tat com.linkedin.metadata.resources.entity.AspectResource.ingestProposal(AspectResource.java:139)\n\tat sun.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:177)\n\t... 81 more\nCaused by: java.net.URISyntaxException: Urn doesn't start with 'urn:'. Urn: dataPlatform:hive at index 0: dataPlatform:hive\n\tat com.linkedin.common.urn.Urn.<init>(Urn.java:80)\n\tat com.linkedin.common.urn.Urn.createFromString(Urn.java:231)\n\tat com.linkedin.common.urn.UrnCoercer.coerceOutput(UrnCoercer.java:25)\n\t... 93 more\n", 'message': 'INTERNAL SERVER ERROR', 'status': 500}). Run with --debug to get full trace
    [2022-09-26 09:00:07,586] INFO     {datahub.entrypoints:191} - DataHub CLI version: 0.8.43.2 at /usr/local/lib/python3.9/site-packages/datahub/__init__.py
    I am building a datahub in the kubernates environment, and one of the containers in the datahub is in 'ContainerStatusUnkown' status. Could this be the reason? Plz help me...T^T
    b
    • 2
    • 3
  • f

    few-carpenter-93837

    09/26/2022, 9:48 AM
    Hello, Vertica isn't recognized as a valid platform in Datahub, i.e it's not seeable under platform on the main page (with entitites count) and in the analytics it's shown as urnlidataPlatform:vertica
    h
    • 2
    • 1
  • f

    fast-oyster-93603

    09/26/2022, 2:34 PM
    Hi, after upgrading to v0.8.45 (using datahub-helm v0.2.106), elasticsearch-setup-job is failing with the following error:
    Copy code
    2022/09/26 14:28:36 Waiting for: <http://elasticsearch-master:9200>
    2022/09/26 14:28:41 Received 200 from <http://elasticsearch-master:9200>
    Going to use protocol: http
    Going to use default elastic headers
    Create datahub_usage_event if needed against Elasticsearch at elasticsearch-master:9200
    Going to use index prefix::
    curl: option -k <http://elasticsearch-master:9200/_ilm/policy/datahub_usage_event_policy>: is unknown
    curl: try 'curl --help' or 'curl --manual' for more information
    Policy GET response code is
    /create-indices.sh: line 41: [: -eq: unary operator expected
    /create-indices.sh: line 45: [: -eq: unary operator expected
    /create-indices.sh: line 47: [: -eq: unary operator expected
    Got response code while creating policy so exiting.
    2022/09/26 14:28:41 Command exited with error: exit status 1
    Any idea how to solve it?
    plus1 2
    i
    l
    • 3
    • 9
  • h

    helpful-carpet-81510

    09/26/2022, 4:11 PM
    Hi, could some one help me. I had to encrypt elastic search and it recreated my service, as I understood I should have created indexes one more time, I did it like it was recommended here like: 1. shut down gms service 2. delete some exist indexes 3. rerun elasticsearchSetupJob 4. up gms and from the beginning it looks fine(I didn’t see any error logs), but I have realised that all my data is gone(at least from UI), i didn’t touch mysql, so everything must be safely. I have tried to ingest data from airflow and pipelines worked without error, but data still not visualised. now I am getting error:
    error_logs.txt
  • g

    gentle-camera-33498

    09/26/2022, 4:57 PM
    Hello everyone, I'm constantly receiving this error message from graphql. Does anyone know what this could be? (This message always appears when I click on the datahub icon to go back to the home page)
    Copy code
    "errors": [
        {
          "message": "An unknown error occurred.",
          "locations": [
            {
              "line": 546,
              "column": 5
            }
          ],
          "path": [
            "searchAcrossEntities",
            "searchResults",
            3,
            "entity",
            "entities"
          ],
          "extensions": {
            "code": 500,
            "type": "SERVER_ERROR",
            "classification": "DataFetchingException"
          }
        }
      ]
    g
    • 2
    • 14
  • h

    helpful-carpet-81510

    09/27/2022, 11:16 AM
    hi everyone, can I configure datahub-datahub-restore-indices-job-template working with AWS RDS and AWS OpenSearch or it only for local purpose?
    • 1
    • 1
  • b

    breezy-portugal-43538

    09/27/2022, 12:41 PM
    Hello everyone, I wanted to ingest some data, but elasticsearch is logging some kind of error and nothing can be uploaded to datahub. When ignestion happens i get the info that records and urns were created/written but when accessing datahub page I see nothing. I am pasting below the error from datahub-gms logs Thank you deeply for the help
    Copy code
    12:13:42.624 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:25 - Failed to feed bulk request. Number of events: 18 Took time ms: -1 Message: failure in bulk execution:
    [0]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.MetadataChangeLog_Timeseries_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [1]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.MetadataChangeLog_Timeseries_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [2]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.MetadataAuditEvent_v4%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [3]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.MetadataAuditEvent_v4%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [4]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.MetadataChangeEvent_v4%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [5]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.MetadataChangeEvent_v4%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [6]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.MetadataChangeProposal_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [7]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.MetadataChangeProposal_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [8]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.FailedMetadataChangeEvent_v4%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [9]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.FailedMetadataChangeEvent_v4%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [10]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.PlatformEvent_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [11]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.PlatformEvent_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [12]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.FailedMetadataChangeProposal_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [13]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.FailedMetadataChangeProposal_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [14]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.MetadataChangeLog_Versioned_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [15]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.MetadataChangeLog_Versioned_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [16]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.DataHubUsageEvent_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    [17]: index [datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Akafka%2C0.DataHubUsageEvent_v1%2CPROD%29], message [ElasticsearchException[Elasticsearch exception [type=cluster_block_exception, reason=index [datasetindex_v2] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];]]]
    b
    • 2
    • 5
  • l

    little-breakfast-38102

    09/27/2022, 2:06 AM
    Hello Team, I am testing to add DatahubSparkListener as extra listener to my spark job. I need some assistance in troubleshooting my errors. Here are some details about my execution and environment. 1) Running a spark submit job locally in a Python virtual environment 2) Passing in jars path part of spark submit command 3) attached screenshots of my spark config and logs that suggest DatahunSparklistener is initialized. 4) Attached additional screen shots of error message. “java.lang.NullPointerException” 5) when tried to connect to GMS endpoint which is running in K8 from browser I am getting 404. Appreciate any help. @careful-pilot-86309 @dazzling-judge-80093
    d
    • 2
    • 4
  • s

    stocky-truck-96371

    09/27/2022, 1:32 PM
    Hi team , which platform privilege is required to delete tags? I could not delete tags by enabling 'create tags' and 'manage tags' platform privileges
  • c

    cuddly-butcher-39945

    09/27/2022, 3:28 PM
    Hey team, I am getting this error on my personal Quickstart environment this morning. Error while pulling images. Going to attempt to move on to docker compose up assuming the images have been built locally FileSystem looks good [joshua.garza@ip-10-4-64-11 quickstart]$ df -kh Filesystem Size Used Avail Use% Mounted on devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 40K 3.9G 1% /dev/shm tmpfs 3.9G 106M 3.8G 3% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/nvme0n1p1 50G 19G 32G 37% / tmpfs 788M 0 788M 0% /run/user/1004 Containers appear to be fine, although not started (due to datahub docker quickstart not finishing)..
    Copy code
    [joshua.garza@ip-10-4-64-11 quickstart]$ docker container ls -a
    CONTAINER ID   IMAGE                                       COMMAND                   CREATED        STATUS                        PORTS     NAMES
    f48fe54c489b   linkedin/datahub-kafka-setup:head           "/bin/sh -c ./kafka-…"    25 hours ago   Exited (0) 25 hours ago                 kafka-setup
    616bd5de4d46   confluentinc/cp-schema-registry:5.4.0       "/etc/confluent/dock…"    25 hours ago   Exited (143) 54 minutes ago             schema-registry
    d07ed16939af   linkedin/datahub-frontend-react:head        "/bin/sh -c ./start.…"    25 hours ago   Exited (143) 54 minutes ago             datahub-frontend-react
    cd35b6db56ee   acryldata/datahub-actions:head              "/bin/sh -c 'dockeri…"    25 hours ago   Exited (137) 54 minutes ago             datahub_datahub-actions_1
    1201a0d08572   acryldata/datahub-mysql-setup:head          "/bin/sh -c 'dockeri…"    25 hours ago   Exited (0) 25 hours ago                 mysql-setup
    25b16a41ed67   confluentinc/cp-kafka:5.4.0                 "/etc/confluent/dock…"    25 hours ago   Exited (143) 54 minutes ago             broker
    beb7110f863b   linkedin/datahub-gms:head                   "/bin/sh -c /datahub…"    25 hours ago   Exited (143) 54 minutes ago             datahub-gms
    ad08ddb3ddcc   linkedin/datahub-elasticsearch-setup:head   "/bin/sh -c 'if [ \"$…"   25 hours ago   Exited (0) 25 hours ago                 elasticsearch-setup
    ff8a398dd9b7   mysql:5.7                                   "docker-entrypoint.s…"    25 hours ago   Exited (0) 54 minutes ago               mysql
    6cb9bc6a950c   confluentinc/cp-zookeeper:5.4.0             "/etc/confluent/dock…"    25 hours ago   Exited (143) 54 minutes ago             zookeeper
    5548f9c0d78e   elasticsearch:7.9.3                         "/tini -- /usr/local…"    2 months ago   Exited (143) 54 minutes ago             elasticsearch
  • c

    cuddly-butcher-39945

    09/27/2022, 3:49 PM
    A little more debugging info...
    Copy code
    [2022-09-27 15:45:47,778] DEBUG    {datahub.telemetry.telemetry:210} - Sending init Telemetry
    --- Logging error ---
    Traceback (most recent call last):
      File "/home/joshua.garza/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 986, in _wrap_create_connection
        return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
      File "/usr/lib64/python3.7/asyncio/base_events.py", line 962, in create_connection
        raise exceptions[0]
      File "/usr/lib64/python3.7/asyncio/base_events.py", line 949, in create_connection
        await self.sock_connect(sock, address)
      File "/usr/lib64/python3.7/asyncio/selector_events.py", line 473, in sock_connect
        return await fut
      File "/usr/lib64/python3.7/asyncio/selector_events.py", line 503, in _sock_connect_cb
        raise OSError(err, f'Connect call failed {address}')
    ConnectionRefusedError: [Errno 111] Connect call failed ('127.0.0.1', 8080)
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/home/joshua.garza/.local/lib/python3.7/site-packages/datahub/upgrade/upgrade.py", line 123, in get_server_version_stats
        server_config = await get_server_config(host, token)
      File "/home/joshua.garza/.local/lib/python3.7/site-packages/datahub/upgrade/upgrade.py", line 110, in get_server_config
        async with session.get(config_endpoint) as dh_response:
      File "/home/joshua.garza/.local/lib/python3.7/site-packages/aiohttp/client.py", line 1138, in __aenter__
        self._resp = await self._coro
      File "/home/joshua.garza/.local/lib/python3.7/site-packages/aiohttp/client.py", line 536, in _request
        req, traces=traces, timeout=real_timeout
      File "/home/joshua.garza/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 542, in connect
        proto = await self._create_connection(req, traces, timeout)
      File "/home/joshua.garza/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 907, in _create_connection
        _, proto = await self._create_direct_connection(req, traces, timeout)
      File "/home/joshua.garza/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection
        raise last_exc
      File "/home/joshua.garza/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 1187, in _create_direct_connection
        client_error=client_error,
      File "/home/joshua.garza/.local/lib/python3.7/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
        raise client_error(req.connection_key, exc) from exc
    aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host localhost:8080 ssl:default [Connect call failed ('127.0.0.1', 8080)]
    
    During handling of the above exception, another exception occurred:
    m
    • 2
    • 2
  • b

    blue-crowd-84759

    09/27/2022, 5:03 PM
    Hey all, I’m running into the following issue in UI-triggered runs of a docker-setup. The error seems pretty self-explanatory tbh but I can’t seem to resolve it —
    Copy code
    "[2022-09-27 15:41:58,839] ERROR    {datahub.ingestion.run.pipeline:127} - 'Did not find a registered class for bigquery-beta'\n"
               '[2022-09-27 15:41:58,840] INFO     {datahub.cli.ingest_cli:119} - Starting metadata ingestion\n'
               '[2022-09-27 15:41:58,845] INFO     {datahub.cli.ingest_cli:137} - Finished metadata ingestion\n'
               "[2022-09-27 15:41:59,052] ERROR    {datahub.entrypoints:188} - Command failed with 'Pipeline' object has no attribute 'source'. Run with "
               '--debug to get full trace\n'
               '[2022-09-27 15:41:59,053] INFO     {datahub.entrypoints:191} - DataHub CLI version: 0.8.42 at '
               '/tmp/datahub/ingest/venv-bigquery-beta-0.8.42/lib/python3.10/site-packages/datahub/__init__.py\n',
               "2022-09-27 15:41:59.221819 [exec_id=a037e7da-0b12-4056-aba4-d83b2915f3cb] INFO: Failed to execute 'datahub ingest'",
               '2022-09-27 15:41:59.222128 [exec_id=a037e7da-0b12-4056-aba4-d83b2915f3cb] INFO: Caught exception EXECUTING '
               'task_id=a037e7da-0b12-4056-aba4-d83b2915f3cb, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 203, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    Execution finished with errors.
    Seems that datahub CLI version is pinned to 0.8.42 somehow, but I installed my CLI with version specification
    >= 0.8.45
    and can confirm this in my command line. I’ve run
    datahub docker quickstart
    a few times, even purging images/containers and I’m still seeing this. What am I missing?
    m
    m
    • 3
    • 4
  • a

    ambitious-guitar-89068

    09/28/2022, 7:43 AM
    Is Datahub Delta lake connector tested with Databricks Delta lake? If yes, I am getting an error below while trying to load delta tables from s3 base path, can someone help if they have seen this:
    Copy code
    [2022-09-28 07:36:12,222] ERROR    {datahub.entrypoints:192} -
    Traceback (most recent call last):
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/entrypoints.py", line 149, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
      File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/ec2-user/.local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/home/ec2-user/.local/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper
        raise e
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/telemetry/telemetry.py", line 299, in wrapper
        res = func(*args, **kwargs)
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/utilities/memory_leak_detector.py", line 91, in wrapper
        return func(*args, **kwargs)
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/cli/ingest_cli.py", line 212, in run
        loop.run_until_complete(run_func_check_upgrade(pipeline))
      File "/usr/lib64/python3.7/asyncio/base_events.py", line 587, in run_until_complete
        return future.result()
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/cli/ingest_cli.py", line 166, in run_func_check_upgrade
        ret = await the_one_future
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/cli/ingest_cli.py", line 158, in run_pipeline_async
        None, functools.partial(run_pipeline_to_completion, pipeline)
      File "/usr/lib64/python3.7/concurrent/futures/thread.py", line 57, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/cli/ingest_cli.py", line 148, in run_pipeline_to_completion
        raise e
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/cli/ingest_cli.py", line 134, in run_pipeline_to_completion
        pipeline.run()
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/run/pipeline.py", line 350, in run
        self.preview_workunits if self.preview_mode else None,
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/delta_lake/source.py", line 329, in get_workunits
        for wu in self.process_folder(self.source_config.complete_path, get_folders):
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/delta_lake/source.py", line 296, in process_folder
        delta_table = read_delta_table(path, self.source_config)
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/delta_lake/delta_lake_utils.py", line 32, in read_delta_table
        raise e
      File "/home/ec2-user/.local/lib/python3.7/site-packages/datahub/ingestion/source/delta_lake/delta_lake_utils.py", line 28, in read_delta_table
        delta_table = DeltaTable(path, storage_options=opts)
      File "/home/ec2-user/.local/lib/python3.7/site-packages/deltalake/table.py", line 92, in __init__
        table_uri, version=version, storage_options=storage_options
    deltalake.PyDeltaTableError: Failed to load checkpoint: Invalid JSON in checkpoint: expected value at line 1 column 1
    [2022-09-28 07:36:12,223] ERROR    {datahub.entrypoints:196} - Command failed:
    	Failed to load checkpoint: Invalid JSON in checkpoint: expected value at line 1 column 1.
    s
    • 2
    • 3
  • f

    fresh-cricket-75926

    09/28/2022, 9:09 AM
    Hi All, we have changed the default mysql-prerequisites password and now can't install datahub . Would be helpful if anyone can help us to fix the Issue.
  • c

    crooked-holiday-47153

    09/28/2022, 9:42 AM
    Hi All, I am trying to upgrade my local DataHub getting started setup as I already did successfully in the past but now it doesn't work. This is the command I execute:
    Copy code
    datahub docker quickstart --quickstart-compose-file docker/quickstart/docker-compose-without-neo4j.quickstart.yml
    and this is the output I am getting:
    Copy code
    Pulling docker images...
    unknown shorthand flag: 'f' in -f
    See 'docker --help'.
    
    Usage:  docker [OPTIONS] COMMAND
    
    A self-sufficient runtime for containers
    
    Options:
          --config string      Location of client config files (default "/home/ssm-user/.docker")
      -c, --context string     Name of the context to use to connect to the daemon (overrides DOCKER_HOST env var and default context set with "docker context use")
      -D, --debug              Enable debug mode
      -H, --host list          Daemon socket(s) to connect to
      -l, --log-level string   Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
          --tls                Use TLS; implied by --tlsverify
          --tlscacert string   Trust certs signed only by this CA (default "/home/ssm-user/.docker/ca.pem")
          --tlscert string     Path to TLS certificate file (default "/home/ssm-user/.docker/cert.pem")
          --tlskey string      Path to TLS key file (default "/home/ssm-user/.docker/key.pem")
          --tlsverify          Use TLS and verify the remote
      -v, --version            Print version information and quit
    
    Management Commands:
      builder     Manage builds
      config      Manage Docker configs
      container   Manage containers
      context     Manage contexts
      image       Manage images
      manifest    Manage Docker image manifests and manifest lists
      network     Manage networks
      node        Manage Swarm nodes
      plugin      Manage plugins
      secret      Manage Docker secrets
      service     Manage services
      stack       Manage Docker stacks
      swarm       Manage Swarm
      system      Manage Docker
      trust       Manage trust on Docker images
      volume      Manage volumes
    
    Commands:
      attach      Attach local standard input, output, and error streams to a running container
      build       Build an image from a Dockerfile
      commit      Create a new image from a container's changes
      cp          Copy files/folders between a container and the local filesystem
      create      Create a new container
      diff        Inspect changes to files or directories on a container's filesystem
      events      Get real time events from the server
      exec        Run a command in a running container
      export      Export a container's filesystem as a tar archive
      history     Show the history of an image
      images      List images
      import      Import the contents from a tarball to create a filesystem image
      info        Display system-wide information
      inspect     Return low-level information on Docker objects
      kill        Kill one or more running containers
      load        Load an image from a tar archive or STDIN
      login       Log in to a Docker registry
      logout      Log out from a Docker registry
      logs        Fetch the logs of a container
      pause       Pause all processes within one or more containers
      port        List port mappings or a specific mapping for the container
      ps          List containers
      pull        Pull an image or a repository from a registry
      push        Push an image or a repository to a registry
      rename      Rename a container
      restart     Restart one or more containers
      rm          Remove one or more containers
      rmi         Remove one or more images
      run         Run a command in a new container
      save        Save one or more images to a tar archive (streamed to STDOUT by default)
      search      Search the Docker Hub for images
      start       Start one or more stopped containers
      stats       Display a live stream of container(s) resource usage statistics
      stop        Stop one or more running containers
      tag         Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
      top         Display the running processes of a container
      unpause     Unpause all processes within one or more containers
      update      Update configuration of one or more containers
      version     Show the Docker version information
      wait        Block until one or more containers stop, then print their exit codes
    
    Run 'docker COMMAND --help' for more information on a command.
    
    To get more help with docker, check out our guides at <https://docs.docker.com/go/guides/>
    
    Error while pulling images. Going to attempt to move on to docker compose up assuming the images have been built locally
    Any help will be appreciated. 10x, Eyal
    plus1 1
    d
    b
    +3
    • 6
    • 29
  • n

    numerous-account-62719

    09/28/2022, 1:28 PM
    Hi Team Where can get the resource URN? Like if I want a URN for dataset then where to get that?
    b
    • 2
    • 3
  • b

    bumpy-daybreak-85714

    09/28/2022, 1:58 PM
    Hello Team. I am trying to ingest the dbt test assertion results into datahub. Everything seem to work. I am getting with the debug flag a message:
    Copy code
    DEBUG:datahub.emitter.rest_emitter:Attempting to emit to DataHub GMS; using curl equivalent to:
    curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "assertion", "entityUrn": "urn:li:assertion:40a3eff29425c75b6b631333dc2f6c7e", "changeType": "UPSERT", "aspectName": "assertionRunEvent", "aspect": {"value": "{\"timestampMillis\": 1664365729476, \"partitionSpec\": {\"type\": \"FULL_TABLE\", \"partition\": \"FULL_TABLE_SNAPSHOT\"}, \"runId\": \"57a18a92-a9f8-47e2-92b6-c3f2aae7c675\", \"assertionUrn\": \"urn:li:assertion:40a3eff29425c75b6b631333dc2f6c7e\", \"asserteeUrn\": \"urn:li:dataset:(urn:li:dataPlatform:snowflake,kafka_dev_db.shared_dimensions.dim_region,DEV)\", \"status\": \"COMPLETE\", \"result\": {\"type\": \"SUCCESS\", \"nativeResults\": {}}}", "contentType": "application/json"}, "systemMetadata": {"lastObserved": 1664372986001, "runId": "dbt-2022_09_28-15_49_43"}}}' 'http://****/aspects?action=ingestProposal'
    DEBUG:urllib3.connectionpool:http://**** */:80 "POST /aspects?action=ingestProposal HTTP/1.1" 200 61
    DEBUG:datahub.ingestion.run.pipeline: sink wrote workunit urn:li:assertion:40a3eff29425c75b6b631333dc2f6c7e-assertionRunEvent-urn:li:dataset:(urn:li:dataPlatform:snowflake,kafka_dev_db.shared_dimensions.dim_region,DEV)
    However, if I look into the gms logs I can see the error:
    Copy code
    13:56:15.814 [qtp1630521067-906] INFO  c.l.m.r.entity.AspectResource:138 - INGEST PROPOSAL proposal: {aspectName=assertionRunEvent, systemMetadata={lastObserved=1664372986001, runId=dbt-2022_09_28-15_49_43}, entityUrn=urn:li:assertion:40a3eff29425c75b6b631333dc2f6c7e, entityType=assertion, aspect={contentType=application/json, value=ByteString(length=414,bytes=7b227469...7b7d7d7d)}, changeType=UPSERT}
    13:56:15.861 [pool-12-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 47ms
    13:56:16.768 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:25 - Failed to feed bulk request. Number of events: 2 Took time ms: -1 Message: failure in bulk execution:
    [0]: index [assertionindex_v2], type [_doc], id [urn%3Ali%3Aassertion%3A40a3eff29425c75b6b631333dc2f6c7e], message [[assertionindex_v2/rSUsoXQjSwCCAoTpK7f2gQ][[assertionindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3Aassertion%3A40a3eff29425c75b6b631333dc2f6c7e]: document missing]]]
    And no assertions (test results) are shown in the UI. Any ideas?
    m
    • 2
    • 3
  • e

    eager-oil-39220

    09/28/2022, 1:33 PM
    Hi Team, I was trying to run ./gradlew build on the latest datahub but I got this error. Any ideas about why this error occurs? Thanks
    m
    • 2
    • 4
  • g

    gorgeous-library-38151

    09/28/2022, 4:04 PM
    I'm trying to bring up datahub by quickstart.sh, but need to change the port of dathub-gms from 8080 to 8082 (since other service has taken the port). How should I modify
    Copy code
    docker-compose-without-neo4j.quickstart.yml
    g
    m
    • 3
    • 4
  • l

    limited-forest-73733

    09/28/2022, 5:14 PM
    Hey with latest datahub release i.e. 0.8.45 still we are getting same incompatibility issue, airflow 2.3.1 with sqlalchemy
    d
    f
    • 3
    • 8
  • l

    limited-forest-73733

    09/28/2022, 5:14 PM
    Is there any fix for this?Thanks in advance!
  • r

    ripe-tailor-61058

    09/28/2022, 7:30 PM
    Do you know how I can search for customProperties key=value? We could see customerProperties: key, but couldn't figure out how to search for the values we care about. Thanks in advance
    g
    • 2
    • 1
  • g

    gentle-camera-33498

    09/28/2022, 9:46 PM
    Hello! Does someone know what can cause this error on GMS instances?
    Copy code
    [gmsEbeanServiceConfig.heartBeat] ERROR i.e.datasource.pool.PooledConnection:311 - Error when fully closing connection [name[gmsEbeanServiceConfig16] slot[7] startTime[1664401306308] busySeconds[150] stackTrace[] stmt[null]]
    java.sql.SQLNonTransientConnectionException: Communications link failure during rollback(). Transaction resolution unknown.
            at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110)
            at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
            at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)
            at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)
            at com.mysql.cj.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:1848)
            at com.mysql.cj.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:1705)
            at com.mysql.cj.jdbc.ConnectionImpl.close(ConnectionImpl.java:721)
            at io.ebean.datasource.pool.PooledConnection.closeConnectionFully(PooledConnection.java:308)
            at io.ebean.datasource.pool.FreeConnectionBuffer.trim(FreeConnectionBuffer.java:91)
            at io.ebean.datasource.pool.PooledConnectionQueue.trimInactiveConnections(PooledConnectionQueue.java:442)
            at io.ebean.datasource.pool.PooledConnectionQueue.trim(PooledConnectionQueue.java:422)
            at io.ebean.datasource.pool.ConnectionPool.trimIdleConnections(ConnectionPool.java:441)
            at io.ebean.datasource.pool.ConnectionPool.checkDataSource(ConnectionPool.java:459)
            at io.ebean.datasource.pool.ConnectionPool.access$000(ConnectionPool.java:43)
            at io.ebean.datasource.pool.ConnectionPool$HeartBeatRunnable.run(ConnectionPool.java:260)
            at java.util.TimerThread.mainLoop(Timer.java:555)
            at java.util.TimerThread.run(Timer.java:505)
    b
    • 2
    • 24
  • w

    witty-butcher-82399

    09/29/2022, 8:12 AM
    Hi datahubers! I have just enabled stateful ingestion in DBT connector and the process failed with following exception
    Copy code
    │ File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/state/sql_common_state.py", line 35, in _get_lightweight_repr                                                                                                                                  │     31   def _get_lightweight_repr(dataset_urn: str) -> str:                                                                                                                                                                                                         │     32       """Reduces the amount of text in the URNs for smaller state footprint."""                                                                                                                                                                               │     33       SEP = BaseSQLAlchemyCheckpointState._get_separator()                                                                                                                                                                                                    │     34       key = dataset_urn_to_key(dataset_urn)                                                                                                                                                                                                                   │ --> 35       assert key is not None                                                                                                                                                                                                                                  │     36       return f"{key.platform}{SEP}{key.name}{SEP}{key.origin}"                                                                                                                                                                                                │     ..................................................                                                                                                                                                                                                               │      dataset_urn = 'urn:li:assertion:98375e72b6e0e0303961b3ac35fa3559'                                                                                                                                                                                               │      SEP = '||'                                                                                                                                                                                                                                                      │      key = None                                                                                                                                                                                                                                                      │     ..................................................
    Having a look to the code in
    dataset_urn_to_key
    , it requires to be a dataset URN and definitely the assertion does not match the pattern
    Copy code
    def dataset_urn_to_key(dataset_urn: str) -> Optional[DatasetKeyClass]:
        pattern = r"urn:li:dataset:\(urn:li:dataPlatform:(.*),(.*),(.*)\)"
        results = re.search(pattern, dataset_urn)
        if results is not None:
            return DatasetKeyClass(platform=results[1], name=results[2], origin=results[3])
        return None
    This is with Datahub v0.8.40 and sounds like a bug when committing the checkpoint. Has these been fixed in later versions or do you want me to create an issue in github? thankyou1
    m
    • 2
    • 1
1...515253...119Latest