https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • n

    numerous-address-22061

    05/25/2023, 5:23 PM
    Hello, I am noticing buggy behavior with the
    browse path
    of my ingested
    Kakfa Topics
    . Some are getting a nice, fully qualified browse path, and some are just not. I am not explicitly defining the browse path in my ingestion, here is an example...
    Ingestion
    Copy code
    pipeline_name: ${PIPELINE_NAME}
    source:
      type: "kafka"
      config:
        platform_instance: ${CLUSTER_NAME}
        connection:
          bootstrap: ${BOOTSTRAP_BROKERS}
          consumer_config:
            security.protocol: "SASL_SSL"
            sasl.mechanism: "SCRAM-SHA-512"
            sasl.username: "${KAFKA_USERNAME}"
            sasl.password: "${KAFKA_PASSWORD}"
          schema_registry_url: ${SCHEMA_REGISTRY_URL}
    sink:
      type: "datahub-rest"
      config:
        server: ${DATAHUB_GMS_ENDPOINT}
    First topic
    (queried using GraphQL)
    Copy code
    {
      "data": {
        "dataset": {
          "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,platform-instance.org.db.app.topic_name,PROD)",
          "platform": {
            "name": "kafka"
          },
          "browsePaths": [
            {
              "path": [
                "prod",
                "kafka",
                "platform-instance",
                "org",
                "db",
                "app"
              ]
            }
          ],
          "properties": {
            "name": "org.db.app.topic_name"
          }
        }
      }
    }
    Second Topic
    (note this is
    undesired
    and I cant figure out why it is getting a different browse path than the topic above)
    Copy code
    {
      "data": {
        "dataset": {
          "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,platform-instance.org.db.app.topic_name_2,PROD)",
          "platform": {
            "name": "kafka"
          },
          "browsePaths": [
            {
              "path": [
                "prod",
                "kafka",
                "platform-instance"
              ]
            }
          ],
          "properties": {
            "name": "org.db.app.topic_name_2"
          }
        }
      }
    }
    Why is the second browse path so short? It is very unfortunate for discovery in the UI
    g
    • 2
    • 4
  • c

    creamy-ram-28134

    05/25/2023, 7:56 PM
    Hey all - I am having trouble executing ingestion - can someone share examples for csv and file ingestion
    g
    f
    +2
    • 5
    • 9
  • b

    brainy-balloon-97302

    05/25/2023, 9:38 PM
    Hi all! I have a glue ingestion job that constantly fails. It's failing with this error and was wondering if anyone has came across it before and was able to fix it?
    Copy code
    'failures': {'<s3://aws-glue-assets-XXXXXX-us-west-2/scripts/Untitled> job.py': ['Unable to download DAG for Glue job from <s3://aws-glue-assets-XXXXXX-us-west-2/scripts/Untitled> job.py, so job subtasks and lineage will be missing: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.', 'Unable to download DAG for Glue job from <s3://aws-glue-assets-XXXXXX-us-west-2/scripts/Untitled> job.py, so job subtasks and lineage will be missing: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.']}
    I don't have that file in s3 nor a glue job called
    Untitled job.py
    so I am trying to see what I can do to resolve. The rest of the metadata is being pulled over but it's annoying it's marking it as a failure.
    ✅ 1
    g
    • 2
    • 2
  • h

    hundreds-airline-29192

    05/26/2023, 5:18 AM
    Hey iam facing this err when ingest from gcs.Please help me!!
    ✅ 1
  • h

    hundreds-airline-29192

    05/26/2023, 7:55 AM
    Copy code
    botocore.exceptions.PaginationError: Error during pagination: The same next token was received twice: {'Marker': 'dwh/dev/fact/fact_gross_profit/order_date_key_07%3D20230109/part-00018-f1470254-2c8b-4a23-aaad-0260cdca7054.c000.snappy.parquet'}
    g
    a
    +2
    • 5
    • 50
  • h

    hundreds-airline-29192

    05/26/2023, 7:55 AM
    Anyone know this error can help me ?
  • g

    gifted-bird-57147

    05/26/2023, 10:28 AM
    Hi Team, I receive the following warning in my ingestion recipe for our Athena source: '''Global Warnings: ['env is deprecated and will be removed in a future release. Please use platform_instance instead.']''' However, I think this warning is misleading. Since env points to the 'general' environment the datasource is part of (so: urnlidataset:(urnlidataPlatform:athena,mytablename,PROD) whereas platform_instance is refering to a subset within the platform (so: urnlidataset:(urnlidataPlatform:athena,PROD.mytablename,PROD) Is this a bug or a misunderstanding on my side?
    ✅ 1
    g
    g
    • 3
    • 2
  • f

    freezing-fall-69290

    05/26/2023, 10:32 AM
    hi guys, may I ingest like “dataset-notebook-dataset” lineage using python api?
    ✅ 1
    g
    • 2
    • 1
  • b

    brainy-needle-61527

    05/26/2023, 12:46 PM
    Has anyone been able to visualize the lineage between AWS Glue and AWS Redshift?
    g
    d
    • 3
    • 2
  • b

    brainy-intern-50400

    05/26/2023, 4:34 PM
    Hi community, i am using a lot the python api. Now we encouter the probleme, that we want to emit a lot of events, but with the datahub emitter that needs time.. somebody has a solution to emit a list of mcp events to datahub or something similar? i thought about a mcp stack, which could be emitted parallel to creating events.
    g
    • 2
    • 3
  • l

    late-addition-48515

    05/26/2023, 5:07 PM
    Hi everyone, I have ingested parquet files into DH from GCS. I is there a cleaner way of ingesting lineage data for parquet files that have been partitioned than my example in the comments?
    g
    • 2
    • 4
  • r

    rapid-controller-60841

    05/29/2023, 8:29 AM
    Hi community ! I would like to know whether the timeout period of connection can be set through configuration. If anyone knows, please tell me how to set it. Thank you very much
    source:
    type: hive config: env: PROD platform: databricks host_port: 'http://JD-in-us.cloud.databricks.com/published' username: token password: '${databricks_token}'
  • m

    many-rocket-80549

    05/29/2023, 10:36 AM
    Hi, I am evaluating Datahub to be implemented in our company. I am trying to ingest a sample file that you provide in the docs . I am not sure where I should drop the file, I just dropped it in a folder like so: /home/miquelp/datahub/file_onboarding/test_containers.json However I am getting an error while executing the recipe, seems like it doesn't find the file (The error message could be improved)? I have looked for a similar error but couldn't find anything. What linux user is the one that executes the ingestion? Can you give us a hand? Thanks
    Copy code
    ~~~~ Execution Summary - RUN_INGEST ~~~~
    Execution finished with errors.
    {'exec_id': '06b7698c-048e-470e-bf2c-1ff4fca75bd0',
     'infos': ['2023-05-29 10:18:33.415813 INFO: Starting execution for task with name=RUN_INGEST',
               "2023-05-29 10:18:37.476974 INFO: Failed to execute 'datahub ingest'",
               '2023-05-29 10:18:37.477118 INFO: Caught exception EXECUTING task_id=06b7698c-048e-470e-bf2c-1ff4fca75bd0, name=RUN_INGEST, '
               'stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
     'errors': []}
    
    ~~~~ Ingestion Report ~~~~
    {
      "cli": {
        "cli_version": "0.10.0.7",
        "cli_entry_location": "/usr/local/lib/python3.10/site-packages/datahub/__init__.py",
        "py_version": "3.10.10 (main, Mar 14 2023, 02:37:11) [GCC 10.2.1 20210110]",
        "py_exec_path": "/usr/local/bin/python",
        "os_details": "Linux-5.15.0-72-generic-x86_64-with-glibc2.31",
        "peak_memory_usage": "57.82 MB",
        "mem_info": "57.82 MB"
      },
      "source": {
        "type": "file",
        "report": {
          "events_produced": 0,
          "events_produced_per_sec": 0,
          "entities": {},
          "aspects": {},
          "warnings": {},
          "failures": {},
          "total_num_files": 0,
          "num_files_completed": 0,
          "files_completed": [],
          "percentage_completion": "0%",
          "estimated_time_to_completion_in_minutes": -1,
          "total_bytes_read_completed_files": 0,
          "total_parse_time_in_seconds": 0,
          "total_count_time_in_seconds": 0,
          "total_deserialize_time_in_seconds": 0,
          "aspect_counts": {},
          "entity_type_counts": {},
          "start_time": "2023-05-29 10:18:35.206188 (now)",
          "running_time": "0 seconds"
        }
      },
      "sink": {
        "type": "datahub-rest",
        "report": {
          "total_records_written": 0,
          "records_written_per_second": 0,
          "warnings": [],
          "failures": [],
          "start_time": "2023-05-29 10:18:35.161225 (now)",
          "current_time": "2023-05-29 10:18:35.208860 (now)",
          "total_duration_in_seconds": 0.05,
          "gms_version": "v0.10.3",
          "pending_requests": 0
        }
      }
    }
    
    ~~~~ Ingestion Logs ~~~~
    Obtaining venv creation lock...
    Acquired venv creation lock
    venv setup time = 0
    This version of datahub supports report-to functionality
    datahub  ingest run -c /tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/recipe.yml --report-to /tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/ingestion_report.json
    [2023-05-29 10:18:35,113] INFO     {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.0.7
    No ~/.datahubenv file found, generating one for you...
    [2023-05-29 10:18:35,164] INFO     {datahub.ingestion.run.pipeline:184} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms:8080>
    [2023-05-29 10:18:35,206] INFO     {datahub.ingestion.run.pipeline:201} - Source configured successfully.
    [2023-05-29 10:18:35,207] INFO     {datahub.cli.ingest_cli:129} - Starting metadata ingestion
    [2023-05-29 10:18:35,209] INFO     {datahub.ingestion.reporting.file_reporter:52} - Wrote UNKNOWN report successfully to <_io.TextIOWrapper name='/tmp/datahub/ingest/06b7698c-048e-470e-bf2c-1ff4fca75bd0/ingestion_report.json' mode='w' encoding='UTF-8'>
    [2023-05-29 10:18:35,209] INFO     {datahub.cli.ingest_cli:134} - Source (file) report:
    {'events_produced': 0,
     'events_produced_per_sec': 0,
     'entities': {},
     'aspects': {},
     'warnings': {},
     'failures': {},
     'total_num_files': 0,
     'num_files_completed': 0,
     'files_completed': [],
     'percentage_completion': '0%',
     'estimated_time_to_completion_in_minutes': -1,
     'total_bytes_read_completed_files': 0,
     'total_parse_time_in_seconds': 0,
     'total_count_time_in_seconds': 0,
     'total_deserialize_time_in_seconds': 0,
     'aspect_counts': {},
     'entity_type_counts': {},
     'start_time': '2023-05-29 10:18:35.206188 (now)',
     'running_time': '0 seconds'}
    [2023-05-29 10:18:35,210] INFO     {datahub.cli.ingest_cli:137} - Sink (datahub-rest) report:
    {'total_records_written': 0,
     'records_written_per_second': 0,
     'warnings': [],
     'failures': [],
     'start_time': '2023-05-29 10:18:35.161225 (now)',
     'current_time': '2023-05-29 10:18:35.210294 (now)',
     'total_duration_in_seconds': 0.05,
     'gms_version': 'v0.10.3',
     'pending_requests': 0}
    [2023-05-29 10:18:35,809] ERROR    {datahub.entrypoints:188} - Command failed: Failed to process /home/miquelp/datahub/file_onboarding/test_containers.json
    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 175, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
      File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
        raise e
      File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
        res = func(*args, **kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
        return func(ctx, *args, **kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 198, in run
        loop.run_until_complete(run_func_check_upgrade(pipeline))
      File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
        return future.result()
      File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
        ret = await the_one_future
      File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_async
        return await loop.run_in_executor(
      File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
        raise e
      File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
        pipeline.run()
      File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 339, in run
        for wu in itertools.islice(
      File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/file.py", line 196, in get_workunits
        for f in self.get_filenames():
      File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/file.py", line 193, in get_filenames
        raise Exception(f"Failed to process {self.config.path}")
    Exception: Failed to process /home/miquelp/datahub/file_onboarding/test_containers.json
    b
    g
    • 3
    • 9
  • a

    acceptable-helmet-19082

    05/29/2023, 10:53 AM
    Hello, I am using Datahub to ingest the data table Metadata of Databricks, source selected Hive, but there is currently a 200 million data table that has been stuck in the analysis step during the ingestion process, and the log content prompts that there are no newly generated logs for many seconds (WARNING: These logs appear to be stale. No new logs have been received since 2023-05-26 102225.389811 (53443 seconds ago). However, the ingestion process still appears to be running and may complete normally.), I guess it may be caused by the execution time of the analysis SQL exceeding the time to connect to the databricks. The execution time of the analysis SQL takes about two minutes, so I want to know How to configure the timeout for the databricks connection. Can you help me?
    ✅ 1
    g
    d
    a
    • 4
    • 8
  • a

    astonishing-father-13229

    05/29/2023, 5:15 PM
    Hi Team, I'm facing build issue for datahub/metadata-ingestion Steps to reproduce: Clone datahub repository cd metadata-ingestion ../gradlew build Attached screenshos in the thread for the reference Could you please advise me ? Advance thanks 🙏
    ✅ 1
    g
    • 2
    • 6
  • h

    hundreds-airline-29192

    05/30/2023, 2:21 AM
    Why my datahub cannot load data from elasticsearch ????
    ✅ 1
  • h

    hundreds-airline-29192

    05/30/2023, 2:24 AM
    Copy code
    com.linkedin.restli.server.RestLiServiceException: com.datahub.util.exception.ESQueryException: Search query failed:
            at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)
            at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)
            at com.linkedin.metadata.resources.usage.UsageStats.query(UsageStats.java:320)
            at com.linkedin.metadata.resources.usage.UsageStats.queryRange(UsageStats.java:386)
            at jdk.internal.reflect.GeneratedMethodAccessor375.invoke(Unknown Source)
            at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.base/java.lang.reflect.Method.invoke(Method.java:566)
            at com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:177)
            at com.linkedin.restli.internal.server.RestLiMethodInvoker.invoke(RestLiMethodInvoker.java:333)
            at com.linkedin.restli.internal.server.filter.FilterChainDispatcherImpl.onRequestSuccess(FilterChainDispatcherImpl.java:47)
            at com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:86)
            at com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.lambda$onRequest$0(RestLiFilterChainIterator.java:73)
            at java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753)
            at java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731)
            at java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108)
            at com.linkedin.restli.internal.server.filter.RestLiFilterChainIterator.onRequest(RestLiFilterChainIterator.java:72)
            at com.linkedin.restli.internal.server.filter.RestLiFilterChain.onRequest(RestLiFilterChain.java:55)
            at com.linkedin.restli.server.BaseRestLiServer.handleResourceRequest(BaseRestLiServer.java:262)
            at com.linkedin.restli.server.RestRestLiServer.handleResourceRequestWithRestLiResponse(RestRestLiServer.java:294)
            at com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:262)
            at com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:232)
            at com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:215)
            at com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:171)
            at com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:130)
            at com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)
            at com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)
            at com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)
            at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)
            at com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)
            at com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)
            at com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)
            at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)
            at com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)
            at com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)
            at com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:76)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)
            at com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)
            at com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)
            at com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:106)
            at com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)
            at com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)
            at com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)
            at com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)
            at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
            at com.linkedin.restli.server.RestliHandlerServlet.service(RestliHandlerServlet.java:21)
            at com.linkedin.restli.server.RestliHandlerServlet.handleRequest(RestliHandlerServlet.java:26)
            at org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)
            at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
            at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
            at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)
            at com.datahub.auth.authentication.filter.AuthenticationFilter.doFilter(AuthenticationFilter.java:102)
            at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
            at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
            at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
            at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
            at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)
            at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
            at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
            at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
            at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
            at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
            at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
            at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
            at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
            at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
            at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
            at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
            at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
            at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
            at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
            at org.eclipse.jetty.server.Server.handle(Server.java:516)
            at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
            at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
            at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
            at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
            at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
            at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
            at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
            at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
            at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
            at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
            at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
            at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
            at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
            at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
            at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: com.datahub.util.exception.ESQueryException: Search query failed:
            at com.linkedin.metadata.timeseries.elastic.query.ESAggregatedStatsDAO.getAggregatedStats(ESAggregatedStatsDAO.java:375)
            at com.linkedin.metadata.timeseries.elastic.ElasticSearchTimeseriesAspectService.getAggregatedStats(ElasticSearchTimeseriesAspectService.java:216)
            at com.linkedin.metadata.resources.usage.UsageStats.getBuckets(UsageStats.java:182)
            at com.linkedin.metadata.resources.usage.UsageStats.lambda$query$1(UsageStats.java:348)
            at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:30)
            ... 89 common frames omitted
    Caused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
            at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
            at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
            at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1888)
            at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1645)
            at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1602)
            at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1572)
            at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1088)
            at com.linkedin.metadata.timeseries.elastic.query.ESAggregatedStatsDAO.getAggregatedStats(ESAggregatedStatsDAO.java:371)
            ... 93 common frames omitted
            Suppressed: org.elasticsearch.client.ResponseException: method [POST], host
    g
    p
    • 3
    • 6
  • b

    bland-orange-13353

    05/30/2023, 2:24 AM
    This message was deleted.
    f
    • 2
    • 1
  • h

    hundreds-airline-29192

    05/30/2023, 2:27 AM
    I know this is open source , but why does it have so many bugs and no stable version . Besides, the support team does not provide timely support. ???
    b
    m
    f
    • 4
    • 26
  • h

    hundreds-airline-29192

    05/30/2023, 2:53 AM
    Copy code
    think you are demo datahub with company and boom! Unable to load description of tables
    ✅ 1
  • b

    bitter-evening-61050

    05/30/2023, 5:50 AM
    Hi Team, I have datahub running in kubernates . We have created one user from kubernates and are able to login to datahub with it . But using this user we are not able to add tokens , glossary terms , owners etc . The permissions and policy tab is missing . Error:Failed to add: Unauthorized to perform this action. Please contact your DataHub administrator." any one please help me in resolving this issue
    ✅ 1
    g
    a
    m
    • 4
    • 10
  • m

    microscopic-room-90690

    05/30/2023, 6:06 AM
    Hi team, I'm wondering how to exclude specific path using regex pattern for source S3.
    "**/*test*/**"
    works, while
    **/(^|_)(tmp|temp|test)(_|$)/**"
    do not work. Anyone can help?
    ✅ 1
    g
    d
    • 3
    • 4
  • l

    lemon-scooter-69730

    05/30/2023, 11:06 AM
    Hello while trying to injest with the datahub kafka sink I keep getting this error
    Copy code
    datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (bigquery): Missing provider configuration.
    This is what the recipe looks like
    Copy code
    pipeline_name: analytics
    source:
        type: bigquery
        config:
            env: DEV
            include_table_lineage: true
            include_usage_statistics: true
            include_tables: true
            include_views: true
            profiling:
                enabled: true
                profile_table_level_only: false
            stateful_ingestion:
                enabled: true
            credential:
                project_id: <redacted>
                private_key: <redacted>
                private_key_id: <redacted>
                client_email: <redacted>
                client_id: <redacted>
    sink:
        type: datahub-kafka
        config:
            connection:
                bootstrap: 'datahub-prerequisites-kafka:9092'
                schema_registry_url: '<http://datahub-prerequisites-cp-schema-registry:8081>'
    ✅ 1
    m
    g
    • 3
    • 8
  • m

    microscopic-elephant-47912

    05/30/2023, 12:08 PM
    Untitled.txt
    Untitled.txt
    ✅ 1
    g
    f
    a
    • 4
    • 6
  • m

    microscopic-elephant-47912

    05/30/2023, 12:09 PM
    Hi team, I’m using quickstart on docker and after ingesting looker metadata I can not see any of them in the UI. I checked the kafka and see the events in the topics and when I checked the gms docker logs I saw many consumer errors like above.
  • n

    narrow-bear-42430

    05/30/2023, 2:57 PM
    Hi DataHub folks - this was asked a while ago, but I was wondering if anyone in the community has done any work to integrate Thoughtspot with DataHub? Any pointers/ info would be gratefully received! Thank you
    ✅ 1
    g
    h
    • 3
    • 4
  • a

    astonishing-father-13229

    05/30/2023, 8:37 PM
    Hi Team, I'm facing build issue for datahub Steps to reproduce: Clone datahub repository cd datahub ../gradlew build Attached screenshos in the thread for the reference Could you please advise me ? Advance thanks 🙏
    ✅ 1
    m
    g
    • 3
    • 8
  • g

    great-rainbow-70545

    05/30/2023, 9:52 PM
    I’ve been working through a bunch of permutations trying to get Hive ingestion working. There is no auth, access is controlled via security groups and have verified that I can connect from the container. The error is always:
    Command failed: TSocket read 0 bytes
    . Not finding much via google other than possible wrong thrift version but I figured that would be coming up for other people as well. Ring any bells?
    m
    • 2
    • 3
  • f

    few-air-34037

    05/31/2023, 4:44 AM
    New power bi -ingestion uses
    platform_instance
    -tags for lineages. We haven't used
    platform_instance
    yet but we have added a lot of metadata to objects... If we now add
    platform_instance
    then we get new hierarchy and new objects. What would be easiest way to migrate metadata from old objects without
    platform_instance
    to new ones with it?
    ✅ 1
    g
    • 2
    • 11
  • c

    cool-architect-34612

    05/31/2023, 5:01 AM
    Hi, I want to ingest Presto dataset but there is something slow. The platform was ingested first and datasets were ingested after. Why does it work like this?
    ✅ 1
    m
    • 2
    • 1
1...123124125...144Latest