https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • s

    stocky-plumber-3084

    05/26/2023, 5:45 AM
    I logged an issue on github https://github.com/datahub-project/datahub/issues/8133 Cannot start up/bring up container broker(kafka) service with v0.10.3 but older versions.
    thank you 1
    ✅ 1
  • w

    white-knife-12883

    05/26/2023, 6:18 PM
    I'm a little confused at the
    max_workers
    setting for profiling. https://datahubproject.io/docs/generated/ingestion/sources/postgres/ say the default is 10, but here <https://github.com/datahub-project/datahub/blob/e14d7edd8aaa742b484fae04f37488799c[…]a-ingestion/src/datahub/ingestion/source/ge_profiling_config.py> it is a dynamic value
    ✅ 1
    g
    • 2
    • 1
  • b

    bland-barista-59197

    05/26/2023, 10:45 PM
    Hi Team, Kindly please help me on Graphql. The following search query return tags that contains
    PII
    in either name or description. I just wanted to search based on name only.
    Copy code
    {
      search(
        input: {type: TAG, query: "PII", start: 0, count: 1000}
      ) {
        start
        count
        total
        searchResults {
          entity {
            urn
          }
        }
      }
    }
    g
    b
    • 3
    • 6
  • b

    brief-advantage-89816

    05/26/2023, 10:52 PM
    Hi Team, I have deployed on kubernetes using helm chart for the last version 10.2 with the default configuration. MySQL, however, is constantly restarting. Last 13 days is restarted 44 times. Does anyone have any idea why is this happening?
    Copy code
    NAME                                                READY   STATUS      RESTARTS        AGE
    datahub-acryl-datahub-actions-867f6bb5d4-kpq57      1/1     Running     0               13d
    datahub-cli-65548b99c7-bx7v9                        1/1     Running     0               30h
    datahub-datahub-frontend-7c5bf5654b-xbhmq           1/1     Running     0               13d
    datahub-datahub-gms-56f8848747-t2gwp                1/1     Running     1 (4h42m ago)   13d
    datahub-datahub-system-update-job-clcgs             0/1     Completed   0               13d
    datahub-kafka-setup-job-b5rxq                       0/1     Completed   0               13d
    datahub-mysql-setup-job-cg4gb                       0/1     Completed   0               13d
    datahub-nocode-migration-job-rgsbn                  0/1     Completed   0               13d
    elasticsearch-master-0                              1/1     Running     4 (174m ago)    30h
    prerequisites-cp-schema-registry-6f4b5b894f-kvh5p   2/2     Running     0               13d
    prerequisites-kafka-0                               1/1     Running     0               30h
    prerequisites-mysql-0                               1/1     Running     43 (11m ago)    30h
    prerequisites-zookeeper-0                           1/1     Running     0               30h
    prerequisites-mysql-0                               0/1     Running     44 (3s ago)     30h
    describing the pod:
    Copy code
    Type     Reason     Age                 From     Message
      ----     ------     ----                ----     -------
      Warning  Unhealthy  47m (x39 over 8h)   kubelet  Startup probe failed: command "/bin/bash -ec password_aux=\"${MYSQL_ROOT_PASSWORD:-}\"\nif [[ -f \"${MYSQL_ROOT_PASSWORD_FILE:-}\" ]]; then\n    password_aux=$(cat \"$MYSQL_ROOT_PASSWORD_FILE\")\nfi\nmysqladmin status -uroot -p\"${password_aux}\"\n" timed out
      Warning  Unhealthy  32m (x123 over 9h)  kubelet  Startup probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure.
    mysqladmin: connect to server at 'localhost' failed
    error: 'Can't connect to local MySQL server through socket '/opt/bitnami/mysql/tmp/mysql.sock' (2)'
    Check that mysqld is running and that the socket: '/opt/bitnami/mysql/tmp/mysql.sock' exists!
      Warning  Unhealthy  6m43s (x436 over 30h)  kubelet  Readiness probe failed: command "/bin/bash -ec password_aux=\"${MYSQL_ROOT_PASSWORD:-}\"\nif [[ -f \"${MYSQL_ROOT_PASSWORD_FILE:-}\" ]]; then\n    password_aux=$(cat \"$MYSQL_ROOT_PASSWORD_FILE\")\nfi\nmysqladmin status -uroot -p\"${password_aux}\"\n" timed out
      Warning  Unhealthy  2m43s (x428 over 30h)  kubelet  Liveness probe failed: command "/bin/bash -ec password_aux=\"${MYSQL_ROOT_PASSWORD:-}\"\nif [[ -f \"${MYSQL_ROOT_PASSWORD_FILE:-}\" ]]; then\n    password_aux=$(cat \"$MYSQL_ROOT_PASSWORD_FILE\")\nfi\nmysqladmin status -uroot -p\"${password_aux}\"\n" timed out
    There is no meaningful information in the logs:
    Copy code
    2023-05-26T22:21:42.762925Z 0 [System] [MY-010931] [Server] /opt/bitnami/mysql/bin/mysqld: ready for connections. Version: '8.0.29'  socket: '/opt/bitnami/mysql/tmp/mysql.sock'  port: 3306  Source distribution.
    2023-05-26T22:38:42.603109Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.29).
    2023-05-26T22:38:44.996744Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 56  user: 'root'.
    2023-05-26T22:38:44.996858Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 57  user: 'root'.
    2023-05-26T22:38:44.996887Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 58  user: 'root'.
    2023-05-26T22:38:44.996913Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 59  user: 'root'.
    2023-05-26T22:38:44.996936Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 124  user: 'root'.
    2023-05-26T22:38:44.996962Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 60  user: 'root'.
    2023-05-26T22:38:44.996999Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 61  user: 'root'.
    2023-05-26T22:38:44.997025Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 62  user: 'root'.
    2023-05-26T22:38:44.997048Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 54  user: 'root'.
    2023-05-26T22:38:44.997079Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 23  user: 'root'.
    2023-05-26T22:38:44.997101Z 0 [Warning] [MY-010909] [Server] /opt/bitnami/mysql/bin/mysqld: Forcing close of thread 55  user: 'root'.
    2023-05-26T22:38:48.845805Z 0 [System] [MY-010910] [Server] /opt/bitnami/mysql/bin/mysqld: Shutdown complete (mysqld 8.0.29)  Source distribution.
    g
    o
    +2
    • 5
    • 4
  • m

    most-room-32003

    05/27/2023, 7:41 PM
    upgraded to 0.10.3, trying to emit validation assertion, hitting this error. searched slack and internet and 0 results. help? i see in your codebase
    URN_NUM_BYTES_LIMIT
    is set to 512, how do i modify?
    Copy code
    ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Error: cannot provide an URN longer than 512 bytes (when URL encoded)\n\tat com.linkedin.metadata.restli.RestliUtil.badRequestException(RestliUtil.java:84)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:35)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)\n\tat com.linkedin.metadata.resources.entity.AspectResource.ingestProposal(AspectResource.java:191)\n\tat jdk.internal.reflect.GeneratedMethodAccessor223.invoke(Unknown Source)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:177)\n\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.java:333)\n\tat com.linkedin.restli.internal.server.filter.FilterChainDispatcherImpl.onRequestSuccess(FilterChainDispatcherImpl.java:47)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:86)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.lambda$onRequest$0(RestLiFilterChainIterator.java:73)\n\tat java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753)\n\tat java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731)\n\tat java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:72)\n\tat com.linkedin.restli.internal.server.filter.RestLiFilterChain.onRequest(RestLiFilterChain.java:55)\n\tat com.linkedin.restli.server.BaseRestLiServer.handleResourceRequest(BaseRestLiServer.java:262)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequestWithRestLiResponse(RestRestLiServer.java:294)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:262)\n\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:232)\n\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215)\n\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171)\n\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:130)\n\tat com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n\tat com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:106)\n\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n\tat com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat com.linkedin.restli.server.RestliHandlerServlet.service(RestliHandlerServlet.java:21)\n\tat com.linkedin.restli.server.RestliHandlerServlet.handleRequest(RestliHandlerServlet.java:26)\n\tat org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)\n\tat org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)\n\tat com.datahub.auth.authentication.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:102)\n\tat org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)\n\tat org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n', 'message': 'Error: cannot provide an URN longer than 512 bytes (when URL encoded)', 'status': 400})
    ✅ 1
    plus1 2
    g
    o
    +2
    • 5
    • 18
  • p

    powerful-cat-68806

    05/28/2023, 9:51 AM
    Hi team, I’m facing the same issue like this Executing helm on my namespace, but not sure if upgraded is relevant here GMS helm chart:
    Copy code
    apiVersion: v2
    appVersion: v0.9.3
    description: A Helm chart for LinkedIn DataHub's datahub-gms component
    name: datahub-gms
    type: application
    version: 0.2.121
    Also - how can I found, from the namespace, what’s the GMS version?
    m
    g
    +5
    • 8
    • 51
  • m

    many-rocket-80549

    05/29/2023, 10:33 AM
    Hi, I am evaluating Datahub to be implemented in our company. I am trying to ingest a sample file that you provide in the docs . I am not sure where I should drop the file, I just dropped it in a folder like so /home/miquelp/datahub/file_onboarding/test_containers.json However I am getting an error while executing the recipe, seems like it doesn't find the file (The error message could be improved)? I have looked for a similar error but couldn't find anything. What linux user is the one that executes the ingestion? Can you give us a hand? Thanks
    Copy code
    ~~~~ Execution Summary - RUN_INGEST ~~~~
    Execution finished with errors.
    {'exec_id': '06b7698c-048e-470e-bf2c-1ff4fca75bd0',
     'infos': ['2023-05-29 10:18:33.415813 INFO: Starting execution for task with name=RUN_INGEST',
               "2023-05-29 10:18:37.476974 INFO: Failed to execute 'datahub ingest'",
               '2023-05-29 10:18:37.477118 INFO: Caught exception EXECUTING task_id=06b7698c-048e-470e-bf2c-1ff4fca75bd0, name=RUN_INGEST, '
               'stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
     'errors': []}
    
    ~~~~ Ingestion Report ~~~~
    {
      "cli": {
        "cli_version": "0.10.0.7",
        "cli_entry_location": "/usr/local/lib/python3.10/site-packages/datahub/__init__.py",
        "py_version": "3.10.10 (main, Mar 14 2023, 02:37:11) [GCC 10.2.1 20210110]",
        "py_exec_path": "/usr/local/bin/python",
        "os_details": "Linux-5.15.0-72-generic-x86_64-with-glibc2.31",
        "peak_memory_usage": "57.82 MB",
        "mem_info": "57.82 MB"
      },
      "source": {
        "type": "file",
        "report": {
          "events_produced": 0,
          "events_produced_per_sec": 0,
          "entities": {},
          "aspects": {},
          "warnings": {},
          "failures": {},
          "total_num_files": 0,
          "num_files_completed": 0,
          "files_completed": [],
          "percentage_completion": "0%",
          "estimated_time_to_completion_in_minutes": -1,
          "total_bytes_read_completed_files": 0,
          "total_parse_time_in_seconds": 0,
          "total_count_time_in_seconds": 0,
          "total_deserialize_time_in_seconds": 0,
          "aspect_counts": {},
          "entity_type_counts": {},
          "start_time": "2023-05-29 10:18:35.206188 (now)",
          "running_time": "0 seconds"
        }
      },
      "sink": {
        "type": "datahub-rest",
        "report": {
          "total_records_written": 0,
          "records_written_per_second": 0,
          "warnings": [],
          "failures": [],
          "start_time": "2023-05-29 10:18:35.161225 (now)",
          "current_time": "2023-05-29 10:18:35.208860 (now)",
          "total_duration_in_seconds": 0.05,
          "gms_version": "v0.10.3",
          "pending_requests": 0
        }
      }
    }
    
    ~~~~ Ingestion Logs ~~~~
    Obtaining venv creation lock...
    Acquired venv creation lock
    venv setup time = 0
    This version of datahub supports report-to functionality
    datahub  ingest run -c /tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/recipe.yml --report-to /tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/ingestion_report.json
    [2023-05-29 10:18:35,113] INFO     {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.0.7
    No ~/.datahubenv file found, generating one for you...
    [2023-05-29 10:18:35,164] INFO     {datahub.ingestion.run.pipeline:184} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
    [2023-05-29 10:18:35,206] INFO     {datahub.ingestion.run.pipeline:201} - Source configured successfully.
    [2023-05-29 10:18:35,207] INFO     {datahub.cli.ingest_cli:129} - Starting metadata ingestion
    [2023-05-29 10:18:35,209] INFO     {datahub.ingestion.reporting.file_reporter:52} - Wrote UNKNOWN report successfully to <_io.TextIOWrapper name='/tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/ingestion_report.json' mode='w' encoding='UTF-8'>
    [2023-05-29 10:18:35,209] INFO     {datahub.cli.ingest_cli:134} - Source (file) report:
    {'events_produced': 0,
     'events_produced_per_sec': 0,
     'entities': {},
     'aspects': {},
     'warnings': {},
     'failures': {},
     'total_num_files': 0,
     'num_files_completed': 0,
     'files_completed': [],
     'percentage_completion': '0%',
     'estimated_time_to_completion_in_minutes': -1,
     'total_bytes_read_completed_files': 0,
     'total_parse_time_in_seconds': 0,
     'total_count_time_in_seconds': 0,
     'total_deserialize_time_in_seconds': 0,
     'aspect_counts': {},
     'entity_type_counts': {},
     'start_time': '2023-05-29 10:18:35.206188 (now)',
     'running_time': '0 seconds'}
    [2023-05-29 10:18:35,210] INFO     {datahub.cli.ingest_cli:137} - Sink (datahub-rest) report:
    {'total_records_written': 0,
     'records_written_per_second': 0,
     'warnings': [],
     'failures': [],
     'start_time': '2023-05-29 10:18:35.161225 (now)',
     'current_time': '2023-05-29 10:18:35.210294 (now)',
     'total_duration_in_seconds': 0.05,
     'gms_version': 'v0.10.3',
     'pending_requests': 0}
    [2023-05-29 10:18:35,809] ERROR    {datahub.entrypoints:188} - Command failed: Failed to process /home/miquelp/datahub/file_onboarding/test_containers.json
    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 175, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
      File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
        raise e
      File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
        res = func(*args, **kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
        return func(ctx, *args, **kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 198, in run
        loop.run_until_complete(run_func_check_upgrade(pipeline))
      File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
        return future.result()
      File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
        ret = await the_one_future
      File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_async
        return await loop.run_in_executor(
      File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
        raise e
      File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
        pipeline.run()
      File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 339, in run
        for wu in itertools.islice(
      File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/file.py", line 196, in get_workunits
        for f in self.get_filenames():
      File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/file.py", line 193, in get_filenames
        raise Exception(f"Failed to process {self.config.path}")
    Exception: Failed to process /home/miquelp/datahub/file_onboarding/test_containers.json
    ✅ 1
    g
    • 2
    • 2
  • r

    rhythmic-stone-77840

    05/29/2023, 2:55 PM
    Hey All - I'm hitting a weird issue with GraphQL. I'm doing a
    searchAcrossLineage
    for two different starting tables, but the search criteria is the same (same direction, type, text query etc). The first table returns matches back without issue, but when I go to search the second table I'm hit with
    Copy code
    {
      "errors": [
        {
          "message": "Failed to execute 'text' on 'Response': body stream already read",
          "stack": "TypeError: Failed to execute 'text' on 'Response': body stream already read\n    at https://<domain>/api/graphiql:57:33\n    at async <https://unpkg.com/graphiql/graphiql.min.js:2:568905>"
        }
      ]
    }
    And now when I go to re-run the first table (which was successful) I hit the same error. Hoping someone can help me understand whats going on in the GraphQL backend that would produce this error?
    g
    • 2
    • 2
  • b

    better-fireman-33387

    05/29/2023, 4:45 PM
    Hi all, is there any health checks I can check that gms is running correctly - connecting to elastic/mysql/kafka and that can be check from outside the cluster like this one: http://datahub-frontend.taboolasyndication.com:9002/health it is only returning “Good” without any details, does it check gms also? elastic connectivity or just that the frontend is up?
    ✅ 1
    g
    o
    d
    • 4
    • 9
  • f

    fast-vegetable-81275

    05/30/2023, 12:47 AM
    Hi All, I recently installed Datahub on my windows machine using Docker. Today when I went to http://localhost:9002/ I was not able to login with the default credentials; it kept saying "Failed to login. An error occured" (Docker was already running all this time). I then ran the quickstart commands from cmd i.e. "datahub docker quickstart". In the end the cmd said "Datahub is now running." That is when I was able to login. SO my question is, do I have to run the quickstart command from CLI every time before accessing localhost? Any help would be appreciated, Thanks in advance!
    ✅ 1
    m
    • 2
    • 2
  • f

    future-yak-13169

    05/30/2023, 4:46 AM
    Hi Guys - this is regarding Oracle Column level lineage. I have used sqllineage to emit table and column-level lineage from Oracle source. Table-level lineage is fine. But for column-level lineage I see some discrepancies between the functionality in version 0.9.6.3 and latest 10.2. In the previous version, the column dropdown under the Column lineage tab below, it used to show the different columns to view lineage. But now it doesn't show in 10.2. However, now I see the column lineage option if I right-click on the actual column under the Schema tab - on clicking it takes me back to the LIneage tab with column-lineage option selected and the column populated. It shows the lineage. - My question is - why there are 2 places to see column level lineage and why are they showing different results ? I would expect columns to show in the dropdown under the LIneage tab on selecting Column Lineage.
    ✅ 1
    g
    a
    • 3
    • 2
  • m

    microscopic-elephant-47912

    05/30/2023, 7:17 AM
    Hi team, After upgrading to 0.10.2 looker ingestion gets an error like “‘Dashboard’ object has no attribute ‘updated_at’“. Is there anyone faced the error before ?
    Copy code
    Traceback (most recent call last):
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/entrypoints.py", line 186, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
      File "/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
      File "/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
      File "/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
      File "/opt/anaconda3/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
        raise e
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
        res = func(*args, **kwargs)
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
        return func(ctx, *args, **kwargs)
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 198, in run
        loop.run_until_complete(run_func_check_upgrade(pipeline))
      File "/opt/anaconda3/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
        return future.result()
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
        ret = await the_one_future
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_async
        return await loop.run_in_executor(
      File "/opt/anaconda3/lib/python3.8/concurrent/futures/thread.py", line 57, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
        raise e
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
        pipeline.run()
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 365, in run
        for wu in itertools.islice(
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/api/source_helpers.py", line 109, in auto_stale_entity_removal
        for wu in stream:
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/api/source_helpers.py", line 133, in auto_workunit_reporter
        for wu in stream:
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/api/source_helpers.py", line 146, in auto_materialize_referenced_tags
        for wu in stream:
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/api/source_helpers.py", line 60, in auto_status_aspect
        for wu in stream:
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/source/looker/looker_source.py", line 1254, in get_workunits_internal
        ) = job.result()
      File "/opt/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 437, in result
        return self.__get_result()
      File "/opt/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
        raise self._exception
      File "/opt/anaconda3/lib/python3.8/concurrent/futures/thread.py", line 57, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/source/looker/looker_source.py", line 1120, in process_dashboard
        looker_dashboard = self._get_looker_dashboard(dashboard_object, self.looker_api)
      File "/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/source/looker/looker_source.py", line 939, in _get_looker_dashboard
        last_updated_at=dashboard.updated_at,
    AttributeError: 'Dashboard' object has no attribute 'updated_at'
    ✅ 1
    g
    a
    • 3
    • 6
  • p

    powerful-cat-68806

    05/30/2023, 8:45 AM
    Hi team, Trying again - can someone help with this?
  • b

    bland-orange-13353

    05/30/2023, 8:48 AM
    This message was deleted.
    ✅ 1
    w
    g
    • 3
    • 2
  • a

    adorable-megabyte-63781

    05/30/2023, 10:49 AM
    Hi All, For building the datahub project for development as per https://datahubproject.io/docs/developers/, it seems like its referring gradle 6.x version and eventually failing the build . I was trying to see if we can upgrade the gradle to 8.1.1 and try the build . Is it possible ? if so then how we can change or upgrade the gradle while building the project. Any help or suggestion .
    ✅ 1
    g
    a
    • 3
    • 8
  • w

    wonderful-quill-11255

    05/30/2023, 12:06 PM
    Hello. I'm encountering a permission denied problem after enabling rest api authorization (and authentication). We are on version
    v0.10.1
    I'm using the python-sdk and talking to the gms via the frontend-proxy. I'm authenticating via the frontends
    /logIn
    endpoint setting the
    Cookie
    header on my
    DataHubGraphConfig
    before making further api calls. An example api call I'm making is
    client.get_aspect(entity_urn=entity_urn, aspect_type=DatasetProperties)
    I have a policy that says all users have all privileges. When I run the get_aspect call, I get a 401 - Unauthorized on the client side and a "User is unauthorized to get aspect for ..." on the gms side. This flow worked before I enabled the auth+auth feature. However, if I log in to the UI via the same user, I have no problem viewing the dataset page for my example entity. I'm not sure where to look for the problem here. I'll be grateful if someone could give me a pointer or two.
    ✅ 1
    m
    • 2
    • 5
  • c

    crooked-match-16163

    05/30/2023, 12:10 PM
    hi, trying to add a multiple descriptions to a table that already has some columns with descriptions. I’m using this Python code here https://datahubproject.io/docs/api/tutorials/descriptions#add-description-on-column. Basically, I’m now trying to run a loop where each time I’m changing the “documentation_to_add” and “column”. But it seems that the set of if.. else… overwrites the description of the first column that already has a description and after that, it gets stuck on the else clause of
    Copy code
    else:
        <http://log.info|log.info>("Documentation already exists and is identical, omitting write")
    tried to debug, but got stuck myself. It seems that if a column with a description already exists, it can’t change any other columns. Or if it does, it deletes all other columns’ descriptions. Help? 🙂
    b
    g
    • 3
    • 11
  • f

    faint-hair-91313

    05/30/2023, 2:55 PM
    Hi everyone, I am trying to delete some urns from our Kubernetes deployed Datahub, but I am not sure how... I don't find information on how to config the Datahub endpoint to delete from my local linux machine. https://datahubproject.io/docs/how/delete-metadata
    ✅ 1
    b
    • 2
    • 3
  • a

    average-dinner-25106

    05/31/2023, 1:26 AM
    Hello. When I log in to datahub, error msg 'An unknown error occured (code 500)' occurs. To check what's wrong, I found that 'elasticsearch is not running' as the second figure shows. how to solve this?
    ✅ 1
    g
    • 2
    • 3
  • b

    bland-gigabyte-28270

    05/31/2023, 1:54 AM
    Hi, I’m trying out the
    0.10.3
    version using the updated helm chart, by using
    INTERNAL
    schema registry type. However I’m getting the following error in
    gms
    , can someone take a look?
    Copy code
    2023-05-31 01:51:55,223 [pool-15-thread-1] INFO  c.l.m.boot.OnBootApplicationListener:68 - Failed to connect to open servlet: prerequisites-cp-schema-registry
    2023-05-31 01:51:55,223 [pool-15-thread-1] INFO  c.l.m.boot.OnBootApplicationListener:60 - Sleeping for 1 second
    2023-05-31 01:51:56,224 [pool-15-thread-1] INFO  c.l.m.boot.OnBootApplicationListener:68 - Failed to connect to open servlet: prerequisites-cp-schema-registry
    2023-05-31 01:51:56,224 [pool-15-thread-1] INFO  c.l.m.boot.OnBootApplicationListener:60 - Sleeping for 1 second
    2023-05-31 01:51:57,303 [pool-15-thread-1] INFO  c.l.m.boot.OnBootApplicationListener:68 - Failed to connect to open servlet: prerequisites-cp-schema-registry
    2023-05-31 01:51:57,303 [pool-15-thread-1] INFO  c.l.m.boot.OnBootApplicationListener:60 - Sleeping for 1 second
    2023-05-31 01:51:58,304 [pool-15-thread-1] INFO  c.l.m.boot.OnBootApplicationListener:68 - Failed to connect to open servlet: prerequisites-cp-schema-registry
    2023-05-31 01:51:58,306 [pool-15-thread-1] INFO  c.l.m.boot.OnBootApplicationListener:60 - Sleeping for 1 second
    ✅ 1
    • 1
    • 2
  • a

    adventurous-apple-52621

    05/31/2023, 2:55 AM
    Hi, guys. When I click the lineage detail, after 30s, the web server throws a 503 exception. I wonder what may cause this problem ?
    g
    • 2
    • 14
  • r

    rich-policeman-92383

    05/31/2023, 6:36 AM
    Hello Team We are using postgres as the datastore for datahub. Recently we have started seeing below errors: On searching online we have found that we either need to reindex the pg_toast table or delete the corrupted rows. We tried reindexing the pg_toast but it did not help. Is there a way to restore postgres data from ES data. datahub version: v0.9.6.1
    Copy code
    ERROR: missing chunk number 0 for toast value 734921 in pg_toast_83651
    ✅ 1
    g
    • 2
    • 2
  • o

    orange-painter-32802

    05/31/2023, 7:44 AM
    Hi, team! I’m trying to integrate our python (django, faust) applications with datahub using your git examples, but it doesn’t work out right for me: I first create a flow and register tasks in it, then I pass the lineage and link the datasets, but as soon as I transfer information about the beginning / end of the task, then the connections disappear, and at the very start time it is displayed correctly, but there is no data on inputs/outputs. i used code example: for emit job runs --> link for lineage --> link It is necessary to transfer again lineage after each operation of transfer of data of start of the task? Or how does it work? How to add inputs/outputs information to job data?
    g
    • 2
    • 1
  • l

    limited-train-99757

    05/31/2023, 7:48 AM
    Hi, I encountered a puzzling issue while importing a Hive database. As shown in the screenshot below, there are 12.5k tables listed, but it only shows 10k assets. So, how many tables were actually imported?
    plus1 1
    g
    b
    • 3
    • 5
  • p

    polite-rainbow-40375

    05/31/2023, 12:09 PM
    Hi Team! I’m installing the latest helm chart version of
    datahub v0.2.165
    and
    prerequisites v0.0.16
    Perquisites installed successfully, but datahub release fail on
    datahub-system-update-job
    and this is the error I got:
    Copy code
    ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.
    Any help you can give would be greatly appreciated 🙏🏻
    ✅ 1
    w
    b
    • 3
    • 4
  • l

    little-park-33017

    05/31/2023, 12:10 PM
    Hi Team, I have 2 workspaces in PowerBI Service, one workspace contains datasets and another reports. When i am trying to ingest it in DataHub using powerbi recipe i have a problem, there are no entities 'datasets' coming from workspace that contains the datasets, it only creates the container(workspace) in Datahub. I was checking in debug mode, but couldn't find anything that can help me find the source of my problem. Any idea what can be the issue?
    ✅ 1
    g
    a
    • 3
    • 9
  • w

    white-guitar-82227

    05/31/2023, 12:30 PM
    Hi Team! We are repeatedly losing our secret values that we enter via the frontend. We suspect a container restart to be the cause for this, but we can’t tell for sure. Hard to imagine that this is common behavior given the static nature of secrets. The version we use is v0.10.2 and we use AWS OpenSearch/RDS/MSK as prerequisites.
    a
    h
    +6
    • 9
    • 82
  • l

    loud-hospital-37195

    05/31/2023, 1:27 PM
    Hi, we have DataHub deployed in a Kubernetes service in Azure, how can I activate the SSO?
    a
    • 2
    • 1
  • f

    faint-hair-91313

    05/31/2023, 1:51 PM
    Hi everyone, has anyone experienced missing Data Products after they are created in the UI? You create and then refreshing the page, it disappears. I am on latest version on Kubernetes.
    a
    b
    b
    • 4
    • 3
  • a

    ancient-policeman-73437

    05/31/2023, 1:57 PM
    Dear DataHub, I have a question regarding DataHub Chrome plugin. I get an error as on the picture. As I understand it happens because SSO (Azure) doesn't work in iframe. Is there any way to make it working? Only with hardcoded credentials ?
    l
    a
    b
    • 4
    • 7
1...9899100...119Latest