https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • r

    rich-restaurant-61261

    06/28/2023, 8:07 AM
    Hi Team, I am trying to get the table description by using the python SDK, but the output is return null for the table, anyone know why is that?
    Copy code
    from datahub.emitter.mce_builder import make_dataset_urn
    from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph
    from datahub.metadata.schema_classes import DatasetPropertiesClass
    
    dataset_urn = make_dataset_urn(
        platform="trino", name="cass.data_insights.activity", env="PROD"
    )
    
    gms_endpoint = "<http://localhost:8080>""
    graph = DataHubGraph(DatahubClientConfig(server=gms_endpoint))
    
    # Query multiple aspects from entity
    result = graph.get_aspects_for_entity(
        entity_urn=dataset_urn,
        aspects=["datasetProperties"],
        aspect_types=[DatasetPropertiesClass],
    )["datasetProperties"]
    
    if result:
        print(result.description)
    ✅ 1
    b
    • 2
    • 2
  • w

    wide-energy-52884

    06/28/2023, 9:24 AM
    Hi Team, I am trying to add this plugin acryl-datahub-airflow-plugin to the airflow:2.4.0 docker compose file. but i am getting the below issue every-time i run it Page i am following: https://datahubproject.io/docs/lineage/airflow/
    Copy code
    Building wheel for backports.zoneinfo (pyproject.toml): finished with status 'error'
    airflow-airflow-webserver-1  |   error: subprocess-exited-with-error
    airflow-airflow-webserver-1  |
    airflow-airflow-webserver-1  |   × Building wheel for backports.zoneinfo (pyproject.toml) did not run successfully.
    airflow-airflow-webserver-1  |   │ exit code: 1
    airflow-airflow-webserver-1  |   ╰─> [49 lines of output]
    airflow-airflow-webserver-1  |       running bdist_wheel
    airflow-airflow-webserver-1  |       running build
    airflow-airflow-webserver-1  |       running build_py
    airflow-airflow-webserver-1  |       creating build
    airflow-airflow-webserver-1  |       creating build/lib.linux-aarch64-cpython-37
    airflow-airflow-webserver-1  |       creating build/lib.linux-aarch64-cpython-37/backports
    airflow-airflow-webserver-1  |       copying src/backports/__init__.py -> build/lib.linux-aarch64-cpython-37/backports
    airflow-airflow-webserver-1  |       creating build/lib.linux-aarch64-cpython-37/backports/zoneinfo
    airflow-airflow-webserver-1  |       copying src/backports/zoneinfo/_zoneinfo.py -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
    airflow-airflow-webserver-1  |       copying src/backports/zoneinfo/__init__.py -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
    airflow-airflow-webserver-1  |       copying src/backports/zoneinfo/_common.py -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
    airflow-airflow-webserver-1  |       copying src/backports/zoneinfo/_tzpath.py -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
    airflow-airflow-webserver-1  |       copying src/backports/zoneinfo/_version.py -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
    airflow-airflow-webserver-1  |       running egg_info
    airflow-airflow-webserver-1  |       writing src/backports.zoneinfo.egg-info/PKG-INFO
    airflow-airflow-webserver-1  |       writing dependency_links to src/backports.zoneinfo.egg-info/dependency_links.txt
    airflow-airflow-webserver-1  |       writing requirements to src/backports.zoneinfo.egg-info/requires.txt
    airflow-airflow-webserver-1  |       writing top-level names to src/backports.zoneinfo.egg-info/top_level.txt
    airflow-airflow-webserver-1  |       reading manifest file 'src/backports.zoneinfo.egg-info/SOURCES.txt'
    airflow-airflow-webserver-1  |       reading manifest template '<http://MANIFEST.in|MANIFEST.in>'
    airflow-airflow-webserver-1  |       /tmp/pip-build-env-c1pa5htp/overlay/lib/python3.7/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config in `setup.cfg`
    airflow-airflow-webserver-1  |       !!
    airflow-airflow-webserver-1  |
    airflow-airflow-webserver-1  |               ********************************************************************************
    airflow-airflow-webserver-1  |               The license_file parameter is deprecated, use license_files instead.
    airflow-airflow-webserver-1  |
    airflow-airflow-webserver-1  |               By 2023-Oct-30, you need to update your project and remove deprecated calls
    airflow-airflow-webserver-1  |               or your builds will no longer be supported.
    airflow-airflow-webserver-1  |
    airflow-airflow-webserver-1  |               See <https://setuptools.pypa.io/en/latest/userguide/declarative_config.html> for details.
    airflow-airflow-webserver-1  |               ********************************************************************************
    airflow-airflow-webserver-1  |
    airflow-airflow-webserver-1  |       !!
    airflow-airflow-webserver-1  |         parsed = self.parsers.get(option_name, lambda x: x)(value)
    airflow-airflow-webserver-1  |       warning: no files found matching '*.png' under directory 'docs'
    airflow-airflow-webserver-1  |       warning: no files found matching '*.svg' under directory 'docs'
    airflow-airflow-webserver-1  |       no previously-included directories found matching 'docs/_build'
    airflow-airflow-webserver-1  |       no previously-included directories found matching 'docs/_output'
    airflow-airflow-webserver-1  |       adding license file 'LICENSE'
    airflow-airflow-webserver-1  |       adding license file 'licenses/LICENSE_APACHE'
    airflow-airflow-webserver-1  |       writing manifest file 'src/backports.zoneinfo.egg-info/SOURCES.txt'
    airflow-airflow-webserver-1  |       copying src/backports/zoneinfo/__init__.pyi -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
    airflow-airflow-webserver-1  |       copying src/backports/zoneinfo/py.typed -> build/lib.linux-aarch64-cpython-37/backports/zoneinfo
    airflow-airflow-webserver-1  |       running build_ext
    airflow-airflow-webserver-1  |       building 'backports.zoneinfo._czoneinfo' extension
    airflow-airflow-webserver-1  |       creating build/temp.linux-aarch64-cpython-37
    airflow-airflow-webserver-1  |       creating build/temp.linux-aarch64-cpython-37/lib
    airflow-airflow-webserver-1  |       gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.7m -c lib/zoneinfo_module.c -o build/temp.linux-aarch64-cpython-37/lib/zoneinfo_module.o -std=c99
    airflow-airflow-webserver-1  |       error: command 'gcc' failed: Permission denied
    airflow-airflow-webserver-1  |       [end of output]
    airflow-airflow-webserver-1  |
    airflow-airflow-webserver-1  |   note: This error originates from a subprocess, and is likely not a problem with pip.
    airflow-airflow-webserver-1  |   ERROR: Failed building wheel for backports.zoneinfo
    airflow-airflow-webserver-1  | Successfully built avro click-default-group
    airflow-airflow-webserver-1  | Failed to build backports.zoneinfo
    airflow-airflow-webserver-1  | ERROR: Could not build wheels for backports.zoneinfo, which is required to install pyproject.toml-based projects
    airflow-airflow-scheduler-1  |
    airflow-airflow-scheduler-1  | [notice] A new release of pip available: 22.2.2 -> 23.1.2
    airflow-airflow-scheduler-1  | [notice] To update, run: python -m pip install --upgrade pip
    airflow-airflow-webserver-1  |
    airflow-airflow-webserver-1  | [notice] A new release of pip available: 22.2.2 -> 23.1.2
    airflow-airflow-webserver-1  | [notice] To update, run: python -m pip install --upgrade pip
    Appreciate your help!
    g
    • 2
    • 2
  • p

    proud-dusk-671

    06/28/2023, 9:31 AM
    Hi team, running a simple query from terminal is returning a 401. Any clues?? Query - https://datahubproject.io/docs/api/graphql/token-management/#generating-access-tokens:~:text=descriptio[…]rl%20%2D%2Dlocation%20%2D%2Drequest%20POST
    g
    • 2
    • 12
  • f

    full-shoe-73099

    06/28/2023, 11:45 AM
    Can you please tell me, in version 0.10.4, are the default role types (Technical Owner, Business Owner...) preserved? Or do they need to be recreated? Because right now I see this (emty menu):
    g
    a
    i
    • 4
    • 5
  • a

    adamant-furniture-37835

    06/28/2023, 2:16 PM
    Hi, We are observing some issues with Redshift lineage, for many queries, this error is thrown :
    Error was An Identifier is expected, got Token[value: (] instead..
    Could it be a issue related with dependencies on external tables ? These problematic tables have dependency on tables from "stage_external" schema e.g.
    Copy code
    Error parsing query create table "DATABASE"."SCHEMA"."TABLE" as ( with /* Starting points is .... from "DATABASE"."stage_external"."TABLE_X" ....
    Error was An Identifier is expected, got Token[value: (] instead..
    According to Redshift admin, the user we got have access to stage_external schema but no datasets are ingested from stage_external schema. When I search for "stage_external" in global search, I can see a couple of containers but these containers are empty. Is it that tables in stage_external are external tables so they aren't ingested in Datahub ? Datahub user is accessing SVV_TABLE_INFO and external tables aren't available in this view instead there is another view containing info about external tables i.e. SVV_EXTERNAL_TABLES. Please guide how to fix this issue or if I should create a bug request ? Thanks
    g
    d
    +3
    • 6
    • 13
  • w

    wide-florist-83539

    06/28/2023, 5:02 PM
    For Airflow emitting metadata, is there a way to get the DAG/DataHub plugin to ignore the self-signed cert / ssl?
    g
    a
    • 3
    • 3
  • g

    gray-ocean-32209

    06/29/2023, 7:52 AM
    Hi Team, Is there a way to mark a data asset as deleted or stalled (like deprecated) without actually removing it from Datahub UI? Because these deleted assets might be referenced as part of lineage
    g
    a
    • 3
    • 6
  • m

    many-manchester-24732

    06/29/2023, 8:07 AM
    Hi all Iam tryingto setup datahub in Kubernetes, I have a custom installation for Elasticsearch with self signed certificates by setting the property
    createCert: true
    in elastic search helm. Now the elastic search instance is up with ssl enabled. The problem comes while trying to connect to Elasticsearch from datahub. The elasticsearch setup job itself fails. The hostname and port has been specified in the datahub helm chart. But if try I deploying I get following error.
    Problem with request: Get "<https://elasticsearch-master.datahub.svc.cluster.local:9200>": tls: failed to verify certificate: x509: certificate is valid for elasticsearch-master, elasticsearch-master.datahub, elasticsearch-master.datahub.svc
    If I try to deploy specifying host as
    elasticsearch-master
    then I get following error.
    Problem with request: Get "<https://elasticsearch-master:9200>": tls: failed to verify certificate: x509: certificate signed by unknown authority. Sleeping 1s
    The tls for Elastics search seems to be in the secret
    elasticsearch-master-certs.
    Both datahub and elasticsearch is in same namespace. Is there any configuration to be changed in datahub helm so that the ssl works fine between datahub and elasticsearch
    g
    h
    • 3
    • 3
  • a

    alert-state-4917

    06/29/2023, 2:28 PM
    Hi all, I ran "datahub docker quickstart" and it each time it says one of the container has an error - broker or mysql or zookeeper. I've tried the following 1. datahub docker nuke 2. docker system prune --volumes 3. setting my resources correctly on docker as per the documentation. PFA the screenshot . Does anyone have troubleshooting suggestions ?
    ✅ 1
    ➕ 1
    b
    • 2
    • 1
  • m

    most-nightfall-36645

    06/29/2023, 2:33 PM
    I am experience a sharding issues when upgrading to datahub
    0.10.x
    from
    0.9.6.1
    We are using elasticsearch
    v7.10
    as our indexing/graph server: The
    datahub-system-update
    container is failing with:
    Copy code
    Suppressed: org.elasticsearch.client.ResponseException: method [PUT], host [<elasticsearch-host>], URI [/containerindex_v2_1687869120968/_clone/containerindex_v2_clone_1688044528961?master_timeout=30s&timeout=30s], status line [HTTP/1.1 400 Bad Request]
    {"error":{"root_cause":[{"type":"validation_exception","reason":"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [999]/[1000] maximum shards open;"}],"type":"validation_exception","reason":"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [999]/[1000] maximum shards open;"},"status":400}
    Whilst the
    elasticsearch-setup-job
    container completes without error but contains the following error message in its logs:
    Copy code
    >>> deleting invalid datahub_usage_event ...
    {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [datahub_usage_event] matches an alias, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [datahub_usage_event] matches an alias, specify the corresponding concrete indices instead."},"status":400}
    >>> GET datahub_usage_event-000001 response code is 404
    >>> creating datahub_usage_event-000001 because it doesn't exist ...
    {
      "aliases": {
        "datahub_usage_event": {
          "is_write_index": true
        }
      }
    }
    2023/06/29 14:25:05 Command finished successfully.
    {"error":{"root_cause":[{"type":"validation_exception","reason":"Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [999]/[1000] maximum shards open;"}],"type":"validation_exception","reason":"Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [999]/[1000] maximum shards open;"},"status":400}
    We deployed datahub via helm. I noticed the helm chart includes the following:
    Copy code
    ## The following section controls when and how reindexing of elasticsearch indices are performed
        index:
          ## Enable reindexing when mappings change based on the data model annotations
          enableMappingsReindex: true
    
          ## Enable reindexing when static index settings change.
          ## Dynamic settings which do not require reindexing are not affected
          ## Primarily this should be enabled when re-sharding is necessary for scaling/performance.
          enableSettingsReindex: true
    My deployed helm values include:
    Copy code
    elasticsearch:
          host: <elasticsearch-host>
          index:
            enableMappingsReindex: true
            enableSettingsReindex: true
    I tried setting number of shards for the index to 2000 with
    Copy code
    name  = "global.elasticsearch.index.entitySettingsOverrides"
        value = <<EOF
        {"/containerindex_v2_1687869120968/_clone/containerindex_v2_clone_1688044528961": {"number_of_shards": "2000"}}
        EOF
    However I get the same errors. Has anyone experience this issue?
    ✅ 1
    h
    c
    • 3
    • 4
  • p

    powerful-planet-87080

    06/29/2023, 4:41 PM
    I am trying to ingest dbt cloud and the run results are not showing up. On the backend, this error shows up. Appreciate any pointers for troubleshooting!
    Copy code
    2023-06-28 11:31:37,071 [pool-11-thread-5] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 4ms
    2023-06-28 11:31:37,319 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:56 - Error feeding bulk request. No retries left. Request: Failed to perform bulk request: index [datahub_usage_event], optype: [CREATE], type [_doc], id [PageViewEvent_urn%3Ali%3Acorpuser%3Adatahub_1687951896555]
    java.io.IOException: Unable to parse response body for Response{requestLine=POST /_bulk?timeout=1m HTTP/1.1, host=<https://vpc-datahubpoc-2o3qotwticclmfbedwhznvu4om.us-east-1.es.amazonaws.com:443>, response=HTTP/1.1 200 OK}
            at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1783)
            at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:636)
            at org.elasticsearch.client.RestClient$1.completed(RestClient.java:376)
            at org.elasticsearch.client.RestClient$1.completed(RestClient.java:370)
            at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122)
            at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181)
            at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448)
            at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338)
            at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
            at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
            at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
            at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:121)
            at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
            at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
            at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
            at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
            at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
            at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
            at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: java.lang.NullPointerException: null
            at java.base/java.util.Objects.requireNonNull(Objects.java:221)
            at org.elasticsearch.action.DocWriteResponse.<init>(DocWriteResponse.java:127)
            at org.elasticsearch.action.index.IndexResponse.<init>(IndexResponse.java:54)
            at org.elasticsearch.action.index.IndexResponse.<init>(IndexResponse.java:39)
            at org.elasticsearch.action.index.IndexResponse$Builder.build(IndexResponse.java:107)
            at org.elasticsearch.action.index.IndexResponse$Builder.build(IndexResponse.java:104)
            at org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:159)
            at org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:188)
            at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1911)
            at org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$10(RestHighLevelClient.java:1699)
            at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1781)
            ... 18 common frames omitted
    g
    • 2
    • 1
  • b

    broad-parrot-31743

    06/30/2023, 3:15 AM
    Hi team, does SSO support CAS?
    g
    • 2
    • 1
  • b

    best-wire-59738

    06/30/2023, 10:31 AM
    Hi Team, we are unable to visualize all charts provided by datahub under Analytics tab. Only select domain option is visible. We saw elasticsearch error logs in GMS when we open Analytics tab and are attached below. please help us to resolve this isue.
    error_logs.txt
    g
    • 2
    • 5
  • a

    ancient-yacht-36269

    06/30/2023, 11:12 AM
    👋 Hola, equipo.
    ✅ 1
  • b

    brief-nail-41206

    06/30/2023, 12:05 PM
    Hi, I’m unable to create a lineage between a dataset (like kafka and bq table) and datajob (like a spark job). I’ve tried doing it on the UI and using python lineage methods, but it throws a 500 error. Is there another way to establish this lineage?
    ✅ 1
    • 1
    • 1
  • b

    blue-rainbow-97669

    06/30/2023, 1:05 PM
    Hello All any suggestion on the attached thread? https://datahubspace.slack.com/archives/C029A3M079U/p1687763167808329
    g
    • 2
    • 4
  • b

    broad-parrot-31743

    07/03/2023, 2:23 AM
    hi team! How to solve the problem in datahub: elasticsearch unauthorized access vulnerability
    g
    • 2
    • 1
  • f

    future-yak-13169

    07/03/2023, 6:53 AM
    Hi Community - we are on version 10.3 and have been running Datahub for a year now. We have deployed on Kubernetes cluster and have our elasticsearch in the same cluster but MySQL DB is outside the cluster. We have been seeing issues whenever we ingest new data either new platform or existing platform-new data or even access tokens sometimes, the metadata doesnt show up on the UI even though it has registered in the MySQL DB. The restore indices job is deployed as a cronjob but that also doest help. Only by deleting our storage PVCs and redeploying prerequisites and components, then the new data starts to show up on UI. Can someone guide me on where could the problem be? Please revert in case of more info required.
    plus1 1
    g
    • 2
    • 3
  • e

    elegant-forest-24965

    07/03/2023, 11:37 AM
    Hi Datahub, we’re trying to deploy Datahub 10.4 on AWS using AWS Opensearch/Elasticsearch. How can I make sure the ES indices get created properly? When calling /__cat/_indices I’m only getting these:
    h
    g
    • 3
    • 4
  • f

    fierce-restaurant-41034

    07/03/2023, 12:11 PM
    Hi, Does anyone use Customizing Search? I am trying to use it on my local machine. I have used the docker-compose file and added these variables
    Copy code
    - ELASTICSEARCH_QUERY_CUSTOM_CONFIG_ENABLED=true
        - ELASTICSEARCH_QUERY_CUSTOM_CONFIG_FILE=search_config.yml
    Where can I find the location of the search_config.yml in the GMS? I didn’t find it on the server. Do I need to create the file? Thanks a lot
    g
    • 2
    • 6
  • r

    rapid-spoon-94582

    07/04/2023, 2:01 AM
    Hello everyone, I am new to Datahub and installation on my PC (Windows 10) to start my exploration with Data hub. while installing Datahub through dockers I face an issue with launching kafka. I see the below error with the broker. ******************************************************************** ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) 2023-07-03 212148 kafka.common.InconsistentClusterIdException: The Cluster ID n9vgzZeMQTySJURLg8BrFg doesn't match stored clusterId Some(RQpQ1BfsSJm3IyB3vSpW-A) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong. **************************************************************** Installation log is attached.
    g
    • 2
    • 17
  • r

    rapid-spoon-94582

    07/04/2023, 2:06 AM
    tmp64ks_vwi.log
    tmp64ks_vwi.log
    ✅ 1
  • c

    curved-judge-66735

    07/04/2023, 12:23 PM
    Hi Team We are facing an issue when using datahub cli to delete a large number of datasets. (v0.10.3.1) When deleting small amount of datasets(around 1000), things work well. However, when trying with bigger number (around 200k), we are getting following errors from the cli. And there are no error log can be observed from GMS.
    Copy code
    Traceback (most recent call last):
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 186, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
      File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
      File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
      File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/airflow/.local/lib/python3.8/site-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/upgrade/upgrade.py", line 398, in async_wrapper
        loop.run_until_complete(run_func_check_upgrade())
      File "/usr/local/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
        return future.result()
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/upgrade/upgrade.py", line 385, in run_func_check_upgrade
        ret = await the_one_future
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/upgrade/upgrade.py", line 378, in run_inner_func
        return await loop.run_in_executor(
      File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
        raise e
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
        res = func(*args, **kwargs)
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/cli/delete_cli.py", line 297, in by_filter
        urns = list(
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/ingestion/graph/client.py", line 684, in get_urns_by_filter
        response = self.execute_graphql(
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/ingestion/graph/client.py", line 751, in execute_graphql
        result = self._post_generic(url, body)
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/ingestion/graph/client.py", line 160, in _post_generic
        return self._send_restli_request("POST", url, json=payload_dict)
      File "/home/airflow/.local/lib/python3.8/site-packages/datahub/ingestion/graph/client.py", line 141, in _send_restli_request
        response = self._session.request(method, url, **kwargs)
      File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
        resp = self.send(prep, **send_kwargs)
      File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
        r = adapter.send(request, **kwargs)
      File "/home/airflow/.local/lib/python3.8/site-packages/requests/adapters.py", line 519, in send
        raise ConnectionError(e, request=request)
    requests.exceptions.ConnectionError: HTTPConnectionPool(host='mercari-dh-datahub-gms.datahub.svc.cluster.local', port=8080): Max retries exceeded with url: /api/graphql (Caused by ReadTimeoutError("HTTPConnectionPool(host='mercari-dh-datahub-gms.datahub.svc.cluster.local', port=8080): Read timed out. (read timeout=30)"))
    Tried with the python function
    get_urns_by_filter
    , get the same error. We were able to delete even bigger number of datasets in previous Datahub versions. (0.9.x, 0.10.1)
    i
    d
    • 3
    • 5
  • b

    bland-orange-13353

    07/04/2023, 12:31 PM
    This message was deleted.
    ✅ 1
    i
    b
    • 3
    • 2
  • f

    fierce-electrician-85924

    07/04/2023, 1:01 PM
    Hi Team, is there way to get when the dataset was last updated in datahub?
    ✅ 1
    i
    • 2
    • 1
  • b

    brave-tomato-16287

    07/04/2023, 2:04 PM
    Hello All. I've started to get errors when trying to ingest the Redshift database.
    Copy code
    File "/tmp/datahub/ingest/venv-redshift-0.8.45/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py", line 328, in '
               'inspect_namespace\n'
               '    raise PydanticUserError(\n'
               'pydantic.errors.PydanticUserError: A non-annotated attribute was detected: `set_system_metadata = True`. All model fields require a type '
               'annotation; if `set_system_metadata` is not meant to be a field, you may be able to resolve this error by annotating it as a `ClassVar` '
               "or updating `model_config['ignored_types']`.\n"
               '\n'
               'For further information visit <https://errors.pydantic.dev/2.0/u/model-field-missing-annotation>\n'
    v.0.8.45
    CLI version
    0.8.42
    What is the way to fix it?
    g
    a
    • 3
    • 13
  • b

    better-actor-45043

    07/04/2023, 7:05 PM
    Hey everyone, How is the logout with OIDC supposed to work? I’m having some troubles, but I’m not sure if it is working as designed, or if I have configured something wrong. When I login via SSO and click on logout, I’m redirected to a logout page. But it doesn’t seem to log me out with my openid provider, it only removes the browser session. So when I go back to datahub I’m automatically logged in again. This snippet of code in
    CentralLogoutController.java
    makes me think it’s supposed to log me out with my openid provider:
    Copy code
    setCentralLogout(true);
    n
    d
    +3
    • 6
    • 13
  • b

    brainy-oxygen-20792

    07/04/2023, 10:25 PM
    Good evening, We've ingested Snowflake and DBT on our DataHub instance (0.10.1, via CLI) but the Sibling management is very inconsistent; the DBT and Snowflake assets are only being grouped together sometimes, which makes for a very complex lineage graph. Tested locally on 0.10.4, the issue persists (strangely with different assets, e.g. modelX is affected on the hosted instance but not on my local, and the opposite for modelY). There were no ingestion errors and ingestion has run several times for both platforms. I've read a suggestion that ingesting DBT first can help, but I'd rather not nuke my instance if I can help it, and I wouldn't be able to guarantee that future ingestions will always run in this order. Can I resolve this with configuration or through the API? I don't any reference in the API for updating Sibling information. I can see that the Snowflake table that isn't grouped with DBT is
    DownstreamOf
    , but not
    SiblingOf
    , the DBT model, while the correctly grouped Snowflake table is both downstream and Sibling. In both the Siblinged and not-siblinged cases the DBT urn has the database capitalised and the Snowflake urn is entirely lower case.
    m
    d
    w
    • 4
    • 8
  • s

    stocky-morning-47491

    07/05/2023, 11:02 AM
    Hello. Installing plugins ended with the messages pip install 'acryl-datahub[mssql]' pip install 'acryl-datahub[oracle]' pip install 'acryl-datahub[mongodb]' Building wheels for collected packages: python-tds Building wheel for python-tds (setup.py) ... done Created wheel for python-tds: filename=python_tds-1.12.0-py3-none-any.whl size=69852 sha256=79afbcd90b7888ef4e46cf2bf512faca6dc9d5ea67c90803252e0a17dae938ba Stored in directory: /root/.cache/pip/wheels/54/67/38/18a1227a331dc01283fc668c363ecbc62d97267cd89bd2c99c Successfully built python-tds ERROR: Exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/pip/_internal/cli/base_command.py", line 165, in exc_logging_wrapper status = run_func(*args) File "/usr/lib/python3/dist-packages/pip/_internal/cli/req_command.py", line 205, in wrapper return func(self, options, args) File "/usr/lib/python3/dist-packages/pip/_internal/commands/install.py", line 389, in run to_install = resolver.get_installation_order(requirement_set) File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/resolver.py", line 188, in get_installation_order weights = get_topological_weights( File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/resolver.py", line 276, in get_topological_weights assert len(weights) == expected_node_count AssertionError Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB) ERROR: Exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/pip/_internal/cli/base_command.py", line 165, in exc_logging_wrapper status = run_func(*args) File "/usr/lib/python3/dist-packages/pip/_internal/cli/req_command.py", line 205, in wrapper return func(self, options, args) File "/usr/lib/python3/dist-packages/pip/_internal/commands/install.py", line 389, in run to_install = resolver.get_installation_order(requirement_set) File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/resolver.py", line 188, in get_installation_order weights = get_topological_weights( File "/usr/lib/python3/dist-packages/pip/_internal/resolution/resolvelib/resolver.py", line 276, in get_topological_weights assert len(weights) == expected_node_count AssertionError
    ✅ 1
    b
    • 2
    • 2
  • s

    stocky-morning-47491

    07/05/2023, 11:02 AM
    Python is updated
1...104105106...119Latest