https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • q

    quaint-barista-82836

    01/23/2023, 5:52 PM
    Hi Team, Got below message for the BQ ingestion pipeline run, I ran this with standard parameter with table profiling enabled:
    Copy code
    '[2023-01-23 17:42:42,108] WARNING  {py.warnings:109} - '
               '/tmp/datahub/ingest/venv-bigquery-0.9.6/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py:937: '
               'DeprecationWarning: Call to deprecated function (or staticmethod) wrap_aspect_as_workunit. (use '
               'MetadataChangeProposalWrapper(...).as_workunit() instead)\n'
               '  wu = wrap_aspect_as_workunit(\n'
               '\n'
               '[2023-01-23 17:42:42,110] WARNING  {py.warnings:109} - '
               '/tmp/datahub/ingest/venv-bigquery-0.9.6/lib/python3.10/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py:957: '
               'DeprecationWarning: Call to deprecated function (or staticmethod) wrap_aspect_as_workunit. (use '
               'MetadataChangeProposalWrapper(...).as_workunit() instead)\n'
               '  wu = wrap_aspect_as_workunit("dataset", dataset_urn, "subTypes", subTypes)\n'
               '\n'
               '[2023-01-23 17:42:42,190] DEBUG    {datahub.emitter.rest_emitter:250} - Attempting to emit to DataHub GMS; using curl equivalent to:\n',
               '2023-01-23 17:42:42.336687 [exec_id=96401624-f6b0-46e7-98c9-836345181165] INFO: Caught exception EXECUTING '
               'task_id=96401624-f6b0-46e7-98c9-836345181165, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 525, in readline\n'
               '    line = await self.readuntil(sep)\n'
               '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 620, in readuntil\n'
               '    raise exceptions.LimitOverrunError(\n'
               'asyncio.exceptions.LimitOverrunError: Separator is found, but chunk is longer than limit\n'
               '\n'
               'During handling of the above exception, another exception occurred:\n'
               '\n'
               'Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 147, in execute\n'
               '    await tasks.gather(_read_output_lines(), _report_progress(), _process_waiter())\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 99, in _read_output_lines\n'
               '    line_bytes = await ingest_process.stdout.readline()\n'
               '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 534, in readline\n'
               '    raise ValueError(e.args[0])\n'
               'ValueError: Separator is found, but chunk is longer than limit\n']}
    Execution finished with errors.
    b
    n
    • 3
    • 6
  • c

    cool-fireman-87485

    01/23/2023, 5:58 PM
    Hi all! Using the UI I tried to create some lineages through assets and it works perfectly. Now that I would modify the lineage I created, I relize that is impossible to delete the relationships. I think it is a real bug...in fact when I remove the upstream/downstream a pop-up " lineage updated!" appears, but reloading the UI page the relation is still there... Anyone experienced it?
    b
    b
    s
    • 4
    • 31
  • q

    quaint-barista-82836

    01/23/2023, 10:00 PM
    Hi Team, At multiple stages I am getting below error when ingesting the metadata for bigquery from CLI:
    Copy code
    Does your service account has bigquery.tables.list, bigquery.routines.get, bigquery.routines.list permission, bigquery.tables.getData permission? The error was: 'type'
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO - Traceback (most recent call last):
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -   File "/tmp/venv45wzxte5/lib/python3.8/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 587, in _process_project
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -     yield from self._process_schema(conn, project_id, bigquery_dataset)
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -   File "/tmp/venv45wzxte5/lib/python3.8/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 702, in _process_schema
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -     yield from self._process_table(conn, table, project_id, dataset_name)
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -   File "/tmp/venv45wzxte5/lib/python3.8/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 735, in _process_table
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -     yield from self.gen_table_dataset_workunits(table, project_id, schema_name)
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -   File "/tmp/venv45wzxte5/lib/python3.8/site-packages/datahub/ingestion/source/bigquery_v2/bigquery.py", line 774, in gen_table_dataset_workunits
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -     custom_properties["time_partitioning"] = str(table.time_partitioning)
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -   File "/tmp/venv45wzxte5/lib/python3.8/site-packages/google/cloud/bigquery/table.py", line 2689, in __repr__
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -     key_vals = ["{}={}".format(key, val) for key, val in self._key()]
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -   File "/tmp/venv45wzxte5/lib/python3.8/site-packages/google/cloud/bigquery/table.py", line 2665, in _key
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO -     properties["type_"] = repr(properties.pop("type"))
    [2023-01-23, 21:55:49 UTC] {process_utils.py:168} INFO - KeyError: 'type'
    The service account has access based on https://datahubproject.io/docs/quick-ingestion-guides/bigquery/setup/ and I am at v0.9.6.1
    a
    d
    • 3
    • 16
  • l

    limited-library-89060

    01/24/2023, 2:26 AM
    Hi team, we want to integrate our Great Expectation results to datahub. Previuosly we got an error
    Copy code
    Datasource test_datasource is not present in platform_instance_map
    argument of type 'NoneType' is not iterable
    But after we put it into the platform instance map into the payload, the first error is not showing anymore, but the second one still there. We are using custom queries to create a dataset test, and use
    expect_table_row_count_to_equal
    to check whether it passed. Any help would be appreciated
    a
    • 2
    • 7
  • f

    flat-table-17463

    01/24/2023, 6:54 AM
    Hi all, We want to get table descriptions when importing metadata with using transformers. However, we could not get the table descriptions using custom transformers as mentioned in the document. How can we do this?
    a
    • 2
    • 3
  • g

    gray-ocean-32209

    01/24/2023, 7:18 AM
    We are seeing ‘Unauthorized’
    Sorry, you are not authorized to access this page.
    on all assets after upgrading to 0.9.5. all content appears to be unaccessible with a “Unauthorized” message. Even the admin user is not able to access any entities. We use OICD for authentication when we try look at policies on the
    <datahub-url>/policies
    only to get a
    Copy code
    Unauthorized to perform this action. Please contact your DataHub administrator. (code 403)
    It was all working fine before the upgrade
    ✅ 1
    w
    o
    +2
    • 5
    • 11
  • b

    bland-balloon-48379

    01/24/2023, 4:53 PM
    Hey everyone! Lately my team has been seeing some issues in one of our datahub environments, namely it appears data is not being pushed to our graph database (neo4j community edition) when new items are ingested. The main example I have is for the UpstreamLineage aspect. When ingesting a set of these aspects, we're seeing the data show up in mysql, but not neo4j. Additionally, when we hard delete the entity from datahub using the CLI. It is removed from mysql, but is not removed from neo4j. However, the connection between the gms service and neo4j seems to be working fine for standard queries because whatever dat is present in neo4j is visible in the frontend UI. The following are some steps and results while identifying and debugging this issue to create a timeline for you all: 1. Ingested new dataset entities. They appeared in mysql, neo4j, and the UI. 2. Ingested lineage data from these new dataset. All of the lineage appeared in mysql, but only a subset of the lineage appeared in neo4j & the UI (seemingly all oracle tables). 3. Reindexed a single urn for a downstream dataset. The DownstreamOf relationship now appears in neo4j for the reindexed dataset, and the correct lineage is shown in the UI. 4. Ran the RestoreIndices kubernetes job for all aspects. Job ran for ~9 hours and completed successfully however no knew relationships appeared in neo4j or the UI. 5. Restarted neo4j, no effect. 6. Manually added one of the missing edges to neo4j. The correct lineage then appeared in the UI. 7. Did a hard-delete on one of the dataset entities. The dataset was deleted from mysql and elasticsearch, and no longer was present in the UI, however the node and relationships were still present in neo4j. 8. From this point on we switched over to kafka emitter as the rest emitter was seemingly related to similar problems in the past. 9. Reingested the deleted dataset. It reappeared in the UI with the partial lineage info it had before being deleted. 10. Manually deleted the lineage relationships from neo4j for that dataset and reingested the UpstreamLineage aspect. The aspect appeared in mysql, but the relationships were not recreated in neo4j or the UI. 11. Tried several combinations of restarting datahub, restarting neo4j, reindexing, and reingesting. No effects. 12. We've also seen some validations aspects be created in mysql after ingestion, but not appear in the UI. We've seen an issue like this pop up in the past that appeared to be related to the REST sink. The REST sink was used for the first seven steps of this timeline, but we have switching to kafka emitter now. When similar issues occurred in the past we were able to resolve them by reindexing the database and restarting our graph db a few times, but that does not appear to be working here. If anyone has any thoughts or ideas regarding directions to move in on this issue, I'd love to hear them. Thanks in advance!
    ✅ 1
    b
    o
    • 3
    • 15
  • a

    able-evening-90828

    01/24/2023, 10:43 PM
    What is the best way to retrieve a list of child glossary terms under a glossary node using GraphQL? The following query didn't work:
    Copy code
    query childGlossaryTerms {
      searchAcrossEntities(input: {
        types: [GLOSSARY_TERM], 
        query: "",
        orFilters: {
          and: {
            field: "parentNodes",
            values: ["urn:li:glossaryNode:data-type"],
          }
        }
      }) {
        searchResults {
          entity {
            urn
            type
          }
        }
      }
    }
    b
    • 2
    • 7
  • b

    best-wire-59738

    01/25/2023, 6:00 AM
    Hello Team, I was having a small doubt. We have implemented the custom authenticator plugin, we have implemented in a way that we return different user URN for users belonging to different domains to datahub after Authentication, so that user wouldn’t be able to change datasets that belong to other domain. This was working fine for graphQL API. But when user hit openAPI for adding or deleting dataset he is able to do it without any domain restriction. I would like to know if policies aren’t considered when we use OpenAPI?
    b
    • 2
    • 1
  • a

    average-dinner-25106

    01/25/2023, 7:07 AM
    Hi, I am trying to upload images in the documentation. However, as the screenshot shows, the image located in datahub does not appear. What's the problem? FYI, I ran datahub quickstart.
    a
    b
    b
    • 4
    • 4
  • b

    brief-ability-41819

    01/25/2023, 10:31 AM
    Hello, Is it possible that DataHub uses two versions of entities in API calls? When I run commands via CURL it works properly:
    Copy code
    curl -X 'GET' '<https://DATAHUB_URL/openapi/entities/v1/latest?urns=MY_URN>' -H 'accept: application/json' --header 'Authorization: Bearer MY_TOKEN | jq
    but when I’m trying to access the same data with:
    Copy code
    datahub --debug get --urn "urn:li:dataset:(MY_URN)" --aspect ownership
    it throws 404:
    404 Client Error: Not Found for url: <https://DATAHUB_URL/openapi/entitiesV2/MY_URN?aspects=List(ownership)>
    SwaggerUI shows only
    /entities/v1
    and my suspicion is that it tries to reach
    /entities/v2
    via CLI - is there any flag to set it?
    ✅ 1
    • 1
    • 1
  • e

    elegant-salesmen-99143

    01/25/2023, 3:37 PM
    I have a problem with stateful ingestion. It wasn’t enabled when we initially ingested the datasourse (Hive), I enabled it now, but Datahub still displays tables that are long gone. It says ‘Last synchronized 4 months ago” next to it, so we now that’s when they last existed, but it still doesn’t soft-delete them:( What can I do to clean-up all old deleted tables? I’m on 0.9.6.1 and my ingest recipe looks like this:
    Copy code
    sink:
        type: datahub-rest
        config:
            server: '***'
    source:
        type: hive
        config:
            host_port: '***:10000'
            env: PROD
            username: ***
            include_tables: true
            include_views: true
            stateful_ingestion:
                enabled: true
                remove_stale_metadata: true
    transformers:
        -
            type: set_dataset_browse_path
            config:
                replace_existing: true
                path_templates:
                    - /ENV/PLATFORM/DATASET_PARTS
    pipeline_name: 'urn:li:dataHubIngestionSource:***'
    b
    d
    f
    • 4
    • 14
  • a

    acceptable-restaurant-2734

    01/25/2023, 7:51 PM
    Silly question but if running ingestion through CLI and using docker with localhost:8080 as sink, why can I not see the UI for the metadata I ingested from BQ?
    ✅ 1
    o
    • 2
    • 3
  • h

    helpful-fish-88957

    01/25/2023, 8:22 PM
    Hi all, quickstart started failing for me yesterday, with the following error:
    Copy code
    Unable to run quickstart - the following issues were detected:
    - kafka-setup container is not present
    I suspect it's related to the changes in this PR: https://github.com/datahub-project/datahub/pull/7073 based on the timing on the fact that it has to do with kafka/quickstart -- but I'm pretty new to datahub so advice on how to proceed would be appreciated. Thanks!
    ✅ 1
    o
    i
    • 3
    • 3
  • f

    faint-hair-91313

    01/26/2023, 8:17 AM
    Dear all, sometimes we are having some slight delays in getting everything on UI (like up to 5 seconds), or when navingating through Datasets. It does not always happen, sometimes it is instant. Is there a way to raise the performance by allocating more resources to the containers, etc.?
    g
    b
    • 3
    • 10
  • e

    early-student-2446

    01/26/2023, 10:28 AM
    Hi all, I would like to test my Datahub sql backup, prior to starting a restore process I was trying to follow this but I’m getting:
    Copy code
    error: unknown object type *v1beta1.CronJob
    I’m currently using k8s version:
    Copy code
    Server Version: <http://version.Info|version.Info>{Major:"1", Minor:"18", GitVersion:"v1.18.14", GitCommit:"89182bdd065fbcaffefec691908a739d161efc03", GitTreeState:"clean", BuildDate:"2020-12-18T12:02:35Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
    are you familiar with that?
    • 1
    • 1
  • e

    echoing-needle-51090

    01/26/2023, 1:48 PM
    Hi all, I would like to know anyway to reduce RAM usage ? I have just made a single ingestion pipeline, it consumed around 300 MiB which I considered too much.
    ✅ 1
    o
    • 2
    • 4
  • a

    ancient-kite-60433

    01/26/2023, 2:12 PM
    Hi all, we've been running DataHub for 14 days using docker quickstart, but today our DataHub front end home page started showing a big red error message:
    Oops, an error occurred. This exception has been logged with id xxxxxxxx
    (no login page shown, only the error message) Have restarted the quickstart container, have also rebooted the VM. Have followed the advice in https://datahubproject.io/docs/debugging/#how-can-i-confirm-if-all-docker-containers-are-running-as-expected-after-a-quickstart •
    datahub docker check
    returned everything was OK •
    docker logs datahub-frontend-react
    returned the following errors:
    Copy code
    play.api.UnexpectedException: Unexpected exception[ServerResultException: HTTP 1.0 client does not support chunked response]
            at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:358)
            at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:264)
            at play.core.server.common.ServerResultUtils.validateResult(ServerResultUtils.scala:69)
            at play.core.server.akkahttp.AkkaModelConversion.$anonfun$convertResult$1(AkkaModelConversion.scala:193)
            at play.core.server.common.ServerResultUtils.resultConversionWithErrorHandling(ServerResultUtils.scala:195)
            at play.core.server.akkahttp.AkkaModelConversion.convertResult(AkkaModelConversion.scala:215)
            at play.core.server.AkkaHttpServer.$anonfun$runAction$5(AkkaHttpServer.scala:440)
            at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
            at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
            at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
            at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63)
            at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:100)
            at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
            at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
            at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:100)
            at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
            at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
            at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
            at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
            at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
            at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
            at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
    Caused by: play.core.server.common.ServerResultException: HTTP 1.0 client does not support chunked response
            at play.core.server.common.ServerResultUtils.validateResult(ServerResultUtils.scala:68)
            ... 19 common frames omitted
    2023-01-26 13:44:29,799 [application-akka.actor.default-dispatcher-19] ERROR p.api.http.DefaultHttpErrorHandler -
    ! @80d92mm8g - Internal server error, for (GET) [/] ->
    
    play.api.UnexpectedException: Unexpected exception[ServerResultException: HTTP 1.0 client does not support chunked response]
            at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:358)
            at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:264)
            at play.core.server.common.ServerResultUtils.validateResult(ServerResultUtils.scala:69)
            at play.core.server.akkahttp.AkkaModelConversion.$anonfun$convertResult$1(AkkaModelConversion.scala:193)
            at play.core.server.common.ServerResultUtils.resultConversionWithErrorHandling(ServerResultUtils.scala:195)
            at play.core.server.akkahttp.AkkaModelConversion.convertResult(AkkaModelConversion.scala:215)
            at play.core.server.AkkaHttpServer.$anonfun$runAction$5(AkkaHttpServer.scala:440)
            at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307)
            at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41)
            at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
            at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63)
            at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:100)
            at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
            at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
            at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:100)
            at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
            at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
            at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
            at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
            at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
            at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
            at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
    Caused by: play.core.server.common.ServerResultException: HTTP 1.0 client does not support chunked response
            at play.core.server.common.ServerResultUtils.validateResult(ServerResultUtils.scala:68)
            ... 19 common frames omitted
    2023-01-26 13:44:29,863 [application-akka.actor.default-dispatcher-19] ERROR p.api.http.DefaultHttpErrorHandler -
    ! @80d92mmb1 - Internal server error, for (GET) [/favicon.ico] ->
    Would greatly appreciate any suggestions. Thanks!
    ✅ 1
    b
    o
    • 3
    • 16
  • b

    bland-orange-13353

    01/26/2023, 2:13 PM
    If you’re having trouble with quickstart, please make sure you’re using the most up-to-date version of DataHub by following the steps in the quickstart deployment guide: https://datahubproject.io/docs/quickstart/#deploying-datahub. Specifically, ensure you’re up to date with the DataHub CLI:
    Copy code
    python3 -m pip install --upgrade pip wheel setuptools
    python3 -m pip install --upgrade acryl-datahub
    datahub version
  • r

    rhythmic-quill-75064

    01/26/2023, 2:33 PM
    Hello team. The transition from version 0.2.105 to version 0.2.106 fails. The datahub-elasticsearch-setup-job is failing, here is the log:
    Copy code
    2023/01/26 14:22:48 Waiting for: <http://elasticsearch-master:9200>
    Going to use protocol: http
    Going to use default elastic headers
    Create datahub_usage_event if needed against Elasticsearch at elasticsearch-master:9200
    Going to use index prefix::
    2023/01/26 14:22:48 Received 200 from <http://elasticsearch-master:9200>
    Policy GET response code is
    Got response code  while creating policy so exiting.
    curl: option -k <http://elasticsearch-master:9200/_ilm/policy/datahub_usage_event_policy>: is unknown
    curl: try 'curl --help' or 'curl --manual' for more information
    /create-indices.sh: line 41: [: -eq: unary operator expected
    /create-indices.sh: line 45: [: -eq: unary operator expected
    /create-indices.sh: line 47: [: -eq: unary operator expected
    2023/01/26 14:22:48 Command exited with error: exit status 1
    Any ideas ?
    ✅ 2
    o
    • 2
    • 11
  • a

    aloof-father-61672

    01/26/2023, 2:47 PM
    Hello everyone. Attempting to generate a list of "pipeline" URNs. However, I receive no results. My script works fine with
    dataset
    entities but not with `dataflow`/`datajob`entities. Is this a bug? I even tried making use of
    datahub.cli.cli_utils.get_urns_by_filter
    See https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/cli/cli_utils.py Same output. I also tried entity type dataFlow/dataJob. Number of entities returned is zero. URL: DataHub GMS host +
    /entities?action=search
    Payload
    Copy code
    {
      "input": "*",
      "entity": "dataflow",
      "start": 0,
      "count": 100,
      "filter": {
        "or": [
          {
            "and": [
              {
                "field": "origin",
                "value": "DEV",
                "condition": "EQUAL"
              },
              {
                "field": "platform",
                "value": "urn:li:dataPlatform:my-platform",
                "condition": "EQUAL"
              }
            ]
          }
        ]
      }
    }
    Response
    Copy code
    {
      "value": {
        "numEntities": 0,
        "pageSize": 100,
        "from": 0,
        "metadata": {
          "aggregations": [
            {
              "name": "origin",
              "filterValues": [],
              "aggregations": {},
              "displayName": "origin"
            },
            {
              "name": "platform",
              "filterValues": [],
              "aggregations": {},
              "displayName": "Platform"
            }
          ]
        },
        "entities": []
      }
    }
    b
    • 2
    • 5
  • q

    quick-pizza-8906

    01/26/2023, 5:29 PM
    Hello, after upgrading my deployment to 0.9.6.1 version (from 0.9.1) my tableau ingestor stopped working - at the end of ingestion it produces error
    Remote end closed connection without response
    (see attached log). I noticed that my deployment versioned 0.9.1 uses
    tableauserverclient
    version
    0.19.0
    while the newer one used
    0.23.4
    - I downgraded it on my newer deployment to
    0.19.0
    only to see same exception... Note that my existing 0.9.1 deployment connects to the tableau server just fine so it's not a matter of networking/server being down. Was there any significant change applied to tableau connector which could have caused it? Does anybody suffer similar problems?
    tableau_problem.log
    • 1
    • 1
  • n

    nutritious-bird-77396

    01/26/2023, 5:42 PM
    Hi Team, After upgrading my datahub version from
    0.8.43
    to
    0.9.6.1
    I am facing errors with reindexing...
    Copy code
    17:30:57 [main] INFO  c.l.m.s.e.i.ESIndexBuilder - Reindexing dataset_operationaspect_v1 to dataset_operationaspect_v1_1674751305780 task has completed, will now check if reindex was successful
    17:31:00 [main] INFO  c.l.m.s.e.i.ESIndexBuilder - Post-reindex document count is different, source_doc_count: 34822915 reindex_doc_count: 15463000
    17:31:00 [main] WARN  o.s.w.c.s.XmlWebApplicationContext - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'metadataChangeLogProcessor' defined in URL [jar:file:/tmp/jetty-0_0_0_0-8080-war_war-_-any-3785592998662924994/webapp/WEB-INF/lib/mae-consumer.jar!/com/linkedin/metadata/kafka/MetadataChangeLogProcessor.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'updateIndicesHook' defined in URL [jar:file:/tmp/jetty-0_0_0_0-8080-war_war-_-any-3785592998662924994/webapp/WEB-INF/lib/mae-consumer.jar!/com/linkedin/metadata/kafka/hook/UpdateIndicesHook.class]: Bean instantiation via constructor failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.metadata.kafka.hook.UpdateIndicesHook]: Constructor threw exception; nested exception is java.lang.RuntimeException: Reindex from dataset_operationaspect_v1 to dataset_operationaspect_v1_1674751305780 failed
    17:31:00 [main] INFO  c.l.r.t.h.c.c.AbstractNettyClient - Shutdown requested
    17:31:00 [main] INFO  c.l.r.t.h.c.c.AbstractNettyClient - Shutting down
    Any body else faced this issue? Any tips would help....
    o
    • 2
    • 12
  • a

    able-evening-90828

    01/26/2023, 11:27 PM
    The
    andFilter
    in the
    orFilters
    in
    SearchInput
    seems to require all fields of a dataset to match the `andFilter`'s condition. Otherwise, the dataset won't be returned. For example, say we have a dataset that has the following columns and tags defined
    Copy code
    col1: [tagA, tagB]
    col2: [tagA]
    If I do a GraphQL query below. Then the dataset is not returned, even though
    col2
    satisfied the filter condition.
    Copy code
    query searchDataset {
      search(input: {
        type: DATASET, 
        query: "", 
        start: 0, 
        count: 1000,
        orFilters: [
          {
            and: [
              {
                field: "fieldTags",
                values: ["urn:li:tag:tagA"]
                condition: CONTAIN
              }
              {
                field: "fieldTags",
                values: ["urn:li:tag:tagB"]
                condition: CONTAIN
                negated: true
              }
            ]
          }
        ]
      }) {
        start
        count
        total
        searchResults {
          entity {
            urn
            type
          }
        }
      
    }
    }
    What I want is if at least one column satisfies the tag filter condition, then the dataset should be returned. How can I achieve this?
    • 1
    • 1
  • b

    bland-orange-13353

    01/27/2023, 12:57 AM
    If you’re having trouble with quickstart, please make sure you’re using the most up-to-date version of DataHub by following the steps in the quickstart deployment guide: https://datahubproject.io/docs/quickstart/#deploying-datahub. Specifically, ensure you’re up to date with the DataHub CLI:
    Copy code
    python3 -m pip install --upgrade pip wheel setuptools
    python3 -m pip install --upgrade acryl-datahub
    datahub version
  • r

    rhythmic-glass-37647

    01/27/2023, 1:28 AM
    Hi, I'm trying to setup ingestion from the cli, I very simple yaml file im using but i keep getting
    PipelineInitError
    any help would be appreciated!
    ✅ 1
    b
    • 2
    • 10
  • b

    brief-ability-41819

    01/27/2023, 6:50 AM
    Hello, Is there a way of changing ClusterIP to LoadBalancer in this subchart: https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/subcharts/acryl-datahub-actions/values.yaml ? I tried to apply it (of course running
    helm dep update
    before an upgrade itself) and it still show service as ClusterIP. I have a feeling that I’m missing something. FYI we’re running DataHub 0.9.1 on EKS.
    ✅ 1
    plus1 1
    o
    • 2
    • 5
  • b

    best-wire-59738

    01/27/2023, 7:04 AM
    Hello Team, I was noticing datahub-frontend was not getting updated with gms. Like when I was running Ingestion from UI, I was getting pop-up that run was triggered and in action logs i can see its getting ingested but UI is not getting updated with the latest run details and also I invited new user using invite link and the user is not showing up in users tab in UI. Could you please debug the Issue we are running DataHub 0.9.6 on EKS.
    a
    • 2
    • 3
  • a

    acceptable-terabyte-34789

    01/27/2023, 7:13 AM
    how can I delete from cli a dataset? trying to use: datahub delete --urn "urnlidataset:(urnlidataPlatform:athena,xxx_exception,PROD)" --dry-run but throws:
    Copy code
    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 971, in json
        return complexjson.loads(self.text, **kwargs)
      File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py", line 346, in loads
        return _default_decoder.decode(s)
      File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/decoder.py", line 355, in raw_decode
        raise JSONDecodeError("Expecting value", s, err.value) from None
    json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    a
    • 2
    • 1
  • g

    gray-ocean-32209

    01/27/2023, 1:27 PM
    Hello Team We are experimenting with testing Airflow datahub integrations (Airflow lineage backend) with datahub quickstart and datahub-airflow docker https://datahubproject.io/docs/docker/airflow/local_airflow
    Copy code
    [lineage]
    backend = datahub_provider.lineage.datahub.DatahubLineageBackend
    datahub_kwargs = {
        "datahub_conn_id": "datahub_rest_default",
        "cluster": "local_airflow",
        "capture_ownership_info": true,
        "capture_tags_info": true,
        "capture_executions": true,
        "graceful_exceptions": true }
    To see the information about the runs history of airflow tasks in datahub added
    "capture_executions": true
    whenever we add this option `"capture_executions": true`` and try to initialize airflow with cmd
    docker-compose up airflow-init
    it fails with
    Copy code
    ....
    datahub-airflow-airflow-init-1  |     _backend = get_backend()
    datahub-airflow-airflow-init-1  |   File "/home/airflow/.local/lib/python3.9/site-packages/airflow/lineage/__init__.py", line 61, in get_backend
    datahub-airflow-airflow-init-1  |     return clazz()
    datahub-airflow-airflow-init-1  |   File "/home/airflow/.local/lib/python3.9/site-packages/datahub_provider/lineage/datahub.py", line 64, in __init__
    datahub-airflow-airflow-init-1  |     _ = get_lineage_config()
    datahub-airflow-airflow-init-1  |   File "/home/airflow/.local/lib/python3.9/site-packages/datahub_provider/lineage/datahub.py", line 35, in get_lineage_config
    datahub-airflow-airflow-init-1  |     return DatahubLineageConfig.parse_obj(kwargs)
    datahub-airflow-airflow-init-1  |   File "pydantic/main.py", line 511, in pydantic.main.BaseModel.parse_obj
    datahub-airflow-airflow-init-1  |   File "pydantic/main.py", line 331, in pydantic.main.BaseModel.__init__
    datahub-airflow-airflow-init-1  | pydantic.error_wrappers.ValidationError: 1 validation error for DatahubLineageConfig
    datahub-airflow-airflow-init-1  | capture_executions
    I’m running
    acryldata/airflow-datahub:latest
    image is ’`capture_executions`, is not supported?
    b
    • 2
    • 1
1...727374...119Latest