https://datahubproject.io logo
Join SlackCommunities
Powered by
# troubleshoot
  • p

    proud-baker-56489

    07/14/2022, 9:48 AM
    hi team, for the normal dag in airflow, I did not import any package about datahub, but the task log show errors like this. Is there any update for the new version of datahub?
    c
    m
    • 3
    • 3
  • i

    icy-portugal-26250

    07/14/2022, 10:44 AM
    I’m trying to query some metadata from the
    /api/graphiql
    endpoint. A query returned a response about an hour ago, but now when rerunning the query I get a
    Copy code
    {
      "errors": {
        "message": "Response.text: Body has already been consumed.",
        "stack": "graphQLFetcher/</</<@https://datahub.wolt.com/api/graphiql:57:33\n"
      }
    }
    Is there a way to fetch this response again?
    b
    • 2
    • 2
  • q

    quick-pizza-8906

    07/14/2022, 2:06 PM
    Hello, I found some issues when running 0.8.40 version dbt connector. To give some context: We have dbt workflows for Snowflake tables. Snowflake tables are ingested independently by Snowflake connector. We use only catalog and manifest yaml files for dbt connector. Now what are the issues: 1. If I run with
    disable_dbt_node_creation
    set to True - I can see nice lineage between preingested Snowflake tables but on the main page where all platforms are shown I can see DBT platform with count of several thousand elements. If I click on this platform to see entities I got an exception. After some examination of mysql database I could see there are objects with urn like
    urn:li:assertion:2c8a2605354d9b924c0f1b5d9f0dffd5
    with dataPlatformInstance apsect having
    dbt
    as platform but nothing as an instance (I believe exception was coming from that aspect missing platform instance). 2. If I run with
    disable_dbt_node_creation
    set to False - I can see lineage and dbt objects combined with Snowflake tables (very cool). It seems I still have above assertions but they don't cause problems on platform search anymore. In either case if I run connector with
    stateful_ingestion
    enabled I end up with connector ingesting data but then throwing an exception ending with code like below:
    Copy code
    File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/state/sql_common_state.py", line 35, in _get_lightweight_repr
        31   def _get_lightweight_repr(dataset_urn: str) -> str:
        32       """Reduces the amount of text in the URNs for smaller state footprint."""
        33       SEP = BaseSQLAlchemyCheckpointState._get_separator()
        34       key = dataset_urn_to_key(dataset_urn)
    --> 35       assert key is not None
        36       return f"{key.platform}{SEP}{key.name}{SEP}{key.origin}"
        ..................................................
         dataset_urn = 'urn:li:assertion:2c8aaaa5354d9b924c0f1b5c9f09bf75'
         SEP = '||'
         key = None
    Which makes me think urn representation function fails for assertion objects which are considered to be datasets somehow? Anyone having similar problems?
    g
    m
    +2
    • 5
    • 11
  • b

    best-lamp-53937

    07/14/2022, 2:22 PM
    Is there a query that would return the entire schema of the GraphQL API? Or one that would return all entities in DataHub? Perhaps one that would return all entities for a given Domain?
    b
    p
    • 3
    • 8
  • p

    prehistoric-yak-55672

    07/14/2022, 8:41 PM
    Hello everyone, first time here! I'm trying to initialize datahub locally on a windows machine, but when I run
    datahub docker quickstart
    It returns the following error:
    Copy code
    ---- (full traceback above) ----
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\entrypoints.py", line 149, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 1055, in main
        rv = self.invoke(ctx)
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 760, in invoke
        return __callback(*args, **kwargs)
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\upgrade\upgrade.py", line 322, in wrapper
        res = func(*args, **kwargs)
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\telemetry\telemetry.py", line 338, in wrapper
        raise e
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\telemetry\telemetry.py", line 290, in wrapper
        res = func(*args, **kwargs)
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\cli\docker.py", line 322, in quickstart
        default_quickstart_compose_file = _get_default_quickstart_compose_file()
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\cli\docker.py", line 162, in _get_default_quickstart_compose_file
        home = os.environ["HOME"]
    File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\os.py", line 681, in __getitem__
        raise KeyError(key) from None
    
    KeyError: 'HOME'
    [2022-07-14 17:36:08,451] INFO     {datahub.entrypoints:188} - DataHub CLI version: 0.8.40.3 at c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\__init__.py
    [2022-07-14 17:36:08,451] INFO     {datahub.entrypoints:191} - Python version: 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:01:55) [MSC v.1900 32 bit (Intel)] at c:\users\wohar\appdata\local\programs\python\python37-32\python.exe on Windows-10-10.0.22000-SP0
    [2022-07-14 17:36:08,451] INFO     {datahub.entrypoints:193} - GMS config {}
    Does anyone knows what might be happening?
    b
    s
    • 3
    • 2
  • f

    flat-window-44654

    07/14/2022, 10:51 PM
    Hi there, I'm querying
    SearchAcrossEntities
    endpoint trying to return only results for
    DASHBOARDS
    and
    DATASETS
    . However, when I submit the following query (see 🧵) with both types, I only get back
    DATASETS
    , even though I know there are
    DASHBOARDS
    that match my search query. Could there be a bug in the API or am I missing something? 🤔
    m
    • 2
    • 5
  • a

    adamant-van-21355

    07/15/2022, 7:48 AM
    https://datahubspace.slack.com/archives/C029A3M079U/p1657698691218499
  • b

    better-spoon-77762

    07/15/2022, 8:05 PM
    Hello, Can someone pls share some examples of paginating thru graphQL results for search query?
    • 1
    • 1
  • d

    delightful-barista-90363

    07/15/2022, 10:30 PM
    Hello, apologies for asking late on a friday (and the answer can wait) But i am getting this error on a spark job when trying to use the DatahubSparkListe
    Copy code
    DatahubSparkListener: java.lang.NullPointerException: Cannot invoke "java.util.Map.put(Object, Object)" because the return value of "java.util.Map.get(Object)" is null
    was wondering if i could get some assistance? Stacktrace(s) in thread. Thanks for the help in advanced
    c
    • 2
    • 13
  • m

    most-nightfall-36645

    07/18/2022, 8:51 AM
    Hi when I try to upgrade to datahub
    v0.8.41
    my frontend and gms containers error with:
    Copy code
    Error: secret "datahub-auth-secrets" not found
    How do I create this secret from the datahub helm chart (e.g. which pod/container creates the secret).
    i
    • 2
    • 6
  • p

    purple-analyst-83660

    07/18/2022, 10:21 AM
    Hi All, I am trying to ingest metadata corresponding to a project. I get _NODE_LIMIT_EXCEEDED_ error first, when I try to include _page_size: 5_. I get this error. Can any body help? (Have attached the config yaml that I am using)
    a
    m
    • 3
    • 4
  • a

    agreeable-belgium-70840

    07/18/2022, 10:27 AM
    hello, I recently updated to v0.8.40 . The problem that I am facing is that I can't create new groups and I can't add new users in existing groups. I am getting a message that the group was created and the graphql in the developer's tools responds 200. This is what I am getting:
    Copy code
    {data: {createGroup: "urn:li:corpGroup:4404d005-a2f6-491f-8b4d-931c7063ea0a"}, extensions: {}}
    data: {createGroup: "urn:li:corpGroup:4404d005-a2f6-491f-8b4d-931c7063ea0a"}
    createGroup: "urn:li:corpGroup:4404d005-a2f6-491f-8b4d-931c7063ea0a"
    extensions: {}
    Any ideas?
    b
    • 2
    • 3
  • s

    square-hair-99480

    07/18/2022, 4:08 PM
    Hello dear friends, I am ingesting data from Snowflake and my boss asked my if it was possible to use mfa with out datahub user. If I activate the mfa it keeps asking me to fill it up multiple times during the ingestion. I took a look here https://datahubproject.io/docs/generated/ingestion/sources/snowflake/#prerequisites and it seems that the most reasonable solution would actually be to use a key pair authentication. Nevertheless if there is a way with mfa and you know it could you share please?
  • f

    faint-translator-23365

    07/18/2022, 8:01 PM
    When I am trying to configure OIDC in datahub-frontend. I'm getting this error. Can someone please help? Slack Conversation
    b
    • 2
    • 2
  • r

    rhythmic-stone-77840

    07/19/2022, 12:42 AM
    Hey all - I'm using GraphQL and am having trouble setting up a filter for downstream/upstream lineage. I'd like to pull out all datasets that have an upstream lineage of 0, but I don't understand how to get the filter to work for this. Current query in 🧵
    b
    • 2
    • 4
  • c

    clean-tomato-22549

    07/19/2022, 4:59 AM
    Error in using lookml connector
    plus1 1
    m
    n
    • 3
    • 11
  • i

    icy-portugal-26250

    07/19/2022, 7:19 AM
    Validation errors pop up in the Dathahub’s UI following update to
    v0.8.41
    m
    b
    • 3
    • 8
  • w

    witty-butcher-82399

    07/19/2022, 12:52 PM
    I have a bigquery connector instance failling with the following error:
    Copy code
    │ PermissionDenied: 403 request failed: the user does not have 'bigquery.readsessions.create' permission for 'projects/XXXXXXXX'
    According to the docs, that permission is required only for lineage. So I tried by disabling table lineage with:
    include_table_lineage: False
    However, still getting the same error. Is there any other config setting for disabling the table lineage? or is this a bug in the config field? 🧵
    s
    • 2
    • 14
  • b

    bland-orange-13353

    07/19/2022, 4:04 PM
    This message was deleted.
  • d

    delightful-barista-90363

    07/19/2022, 10:12 PM
    gonna bump this for help https://datahubspace.slack.com/archives/C029A3M079U/p1657924216360099
  • h

    hallowed-dog-79615

    07/20/2022, 8:00 AM
    Greetings Team: We have been testing a bit Glossary Terms ingestion and we have found some unexpected behavior. Let's go through some steps: 1.- It does not matter if we create A Glossary Term in the UI before adding it massively through a CSV ingestion. But let's say we create it. We create a term called "Active_users". 2.- We add some documentation to our just-created term. Again this is not mandatory but helps identify the issue later. 3.- We proceed to add the term to several dataset objects. For this we leverage the CSV ingestion feature. We prepare our CSV following the guidelines in the documentation. We ingest the CSV. And the term "Active_users" is added to our datasets! It seems it worked. 4.- But then if we go to a dataset's entity page and click on the "Active_users" term badge, so we access its own entity page. And there we see that the documentation we added is missing. 5.- Then we start playing around and realize that the term "Active_users" is duplicated. There are two different entity pages: the one of the term we created manually (urnliglossaryTerm:82a86728-087a-4232-bfbe-5a9a2790f6ce), and the one of the term we added through CSV (urnliglossaryTerm:Active_users). As you see, their ids are quite different. 6.- Not only that. In the Glossary terms menu, the ingested term is not even visible, we can only access its page through other entities badges. The manually created one is of course in the list, but nothing appears under "Related entities". 7.- Even more, we realized we are not able to delete the ingested term. We cannot even remove it from datasets. If we try to remove it, it says "Successfully removed", but then the term still there when you refresh. We understand this is a bug, even if we were missing some step in which we had to associate the ingested term with an already existing one, having both the same name, not being able to delete or access a term does not seem like a desired behavior. I apologize if this have been reported elsewhere, I found Glossary term bugs but they didn't reach the "not being able to delete part". Thanks!! Dani
    b
    e
    f
    • 4
    • 19
  • m

    microscopic-mechanic-13766

    07/20/2022, 8:17 AM
    Hi, I am deploying datahub v0.8.41 in docker 20.10.17 and have found one thing that I don't know if it is intended to be this way but it doesn't make much sense as far as I know. The thing is that in the 3 basic services needed for the deployment (gms, frontend and actions) the user used to log in is not the same. For example: in datahub-gms is
    uid=101(datahub) gid=101(datahub) groups=101(datahub)
    but in the datahub-front-end is
    uid=100(datahub) gid=101(datahub) groups=101(datahub)
    . Is this done on purpose or is it just a mistake?? Thanks in advance for the help!
  • s

    steep-soccer-91284

    07/20/2022, 9:24 AM
    datahub-gms is not running
  • b

    best-leather-7441

    07/20/2022, 1:03 PM
    hi ! i hope this is the right thread for this question. After i ingest my metadata everything is fine, but if i shut down my console and restart datahub all my data,ingestions,groups etc vanishes... am i missing something ? thank you for you time
    b
    • 2
    • 2
  • l

    lemon-engine-23512

    07/20/2022, 2:00 PM
    Hello team. Am trying to deploy datahub to aws, in my org we cannot use helm as we have no access to k8 cluster. We can only build images and push it. Can anyone assist me on this?
  • p

    prehistoric-yak-55672

    07/20/2022, 4:49 PM
    Hello everyone. Is there a way to create a backup of all progress I made on Datahub? As an example, I would to backup all documentation I wrote for each dataset I have, in case somethin happens
    b
    b
    • 3
    • 2
  • s

    shy-parrot-64120

    07/20/2022, 5:55 PM
    Hi all We are trying to migrate backend database from mysql to postgres are there any ways to preserve all data ?
    l
    • 2
    • 12
  • b

    bland-orange-13353

    07/20/2022, 6:06 PM
    This message was deleted.
  • b

    big-ocean-9800

    07/20/2022, 6:12 PM
    Hey folks! We are currently running datahub @
    v0.8.38
    and we have about 7k data assets loaded. We are seeing a pattern where loading the home page is extremely slow (on the order of 5-10 seconds). I checked metrics around our datahub infrastructure and everything was running at about 10-20% utilization. Our elastic search cluster is at low utilization, their disks are less than 10% utilized, and I don’t see any IO throttling from our cloud provider. Same story with our Postgres instance. I took a look at the calls that hang the longest on the home page and the consistently slow call is the graphql call
    searchAcrossEntities
    . By taking a cursory look through the code, I can see that it seems to interact with just elastic search. I’m here wondering if anyone has experienced a similar behavior, any troubleshooting tips, etc. Is this expected performance with the number of assets we have? Are there any changes we can make to our elastic cluster to help alleviate these problems? I took a look through the slack history through this channel and couldn’t quite find any messages which seem similar (same with github issues both open and closed). Please let me know if any more information would be helpful. Cheers!
    o
    b
    • 3
    • 7
  • a

    ambitious-cartoon-15344

    07/21/2022, 8:16 AM
    Hi , I use Metadata Service Authentication, Whether the Airflow lineage plugin cannot be used. I don't see anything about token in the Airflow plugin.
    d
    • 2
    • 2
1...383940...119Latest