https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • h

    handsome-football-66174

    11/03/2022, 3:23 PM
    Hi everyone, I notice that Graphql search results include the datasets which have been deleted. Is there a way to exclude them ?
    g
    • 2
    • 2
  • l

    lively-dusk-19162

    11/03/2022, 6:05 PM
    Hi everyone, Is there any way to ingest column level lineage to datahub?
    h
    b
    g
    • 4
    • 6
  • s

    silly-lock-22001

    11/03/2022, 6:30 PM
    Hello! I am looking to do an impact analysis (specifically - get all downstream tasks for a given task). Typically I can do that via the Download Lineage tool However, the download tool seems to be bonking (maybe the lineage is too large? it works for smaller trees). • Is this a known problem, and are there any workarounds? • Is there a way to get this data via the API?
    b
    p
    +2
    • 5
    • 13
  • f

    fierce-garage-74290

    11/04/2022, 11:06 AM
    Manage Users & Groups - sync with idP (LDAP/AD) Maybe it's a bit naive idea, but it'd be great if the groups (and their members) could be synced regularly with company's idP. What I'd have to care about in DataHub is permission definition & mapping these permissions to the groups, but who's in the group - it will be up to LDAP/AD. Has anyone tried something like this before?
    plus1 2
    g
    m
    • 3
    • 3
  • c

    chilly-elephant-51826

    11/04/2022, 11:37 AM
    Hi, I am facing issue while running restore indexes job, I am trying to host new datahub from existing database it is unable to fetch policy urn, can somebody help what is the issue, why Elastic search query is failing ?
    Copy code
    2022-11-04 11:27:15.085 ERROR 1 --- [pool-7-thread-1] c.d.authorization.DataHubAuthorizer : Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
    com.datahub.util.exception.ESQueryException: Search query failed:
    at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:73) ~[metadata-io.jar!/:na]
    at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:100) ~[metadata-io.jar!/:na]
    at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:67) ~[metadata-io.jar!/:na]
    at com.linkedin.entity.client.JavaEntityClient.search(JavaEntityClient.java:280) ~[restli-client.jar!/:na]
    at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:50) ~[auth-impl.jar!/:na]
    at com.datahub.authorization.PolicyFetcher.fetchPolicies(PolicyFetcher.java:42) ~[auth-impl.jar!/:na]
    at com.datahub.authorization.DataHubAuthorizer$PolicyRefreshRunnable.run(DataHubAuthorizer.java:229) ~[auth-impl.jar!/:na]
    a
    • 2
    • 6
  • w

    witty-television-74309

    11/04/2022, 2:21 PM
    Enabling lineage on snowflake table generates SQL queries to snowflake.account_usage.access_history table. This table is populated only in snowflake enterprise edition. What are the options to enable lineage for non-enterprise edition snowflake table, if any ? IMHO this is a big limitation, since many customers of snowflake are on non-enterprise edition due to exorbitant cost of enterprise edition
  • m

    mysterious-hamburger-65313

    11/04/2022, 4:29 PM
    Hi, I am facing a similar problem to that of another complaint in this Slack. I followed the steps in the self-hosted datahub and can't proceed after the
    datahub docker quickstart
    step. I run into the following issue: see image below and file as stated in the cmd | I'm on a Windows computer 4 CPUs, 16GB RAM and I believe I satisfy the 2GB Swap area and 10GB disk space. thanks for any help!
    tmptk72vr8f.log
    m
    • 2
    • 18
  • b

    bland-orange-13353

    11/04/2022, 4:29 PM
    Acryl Data delivers an easy to consume DataHub platform for the enterprise - sign up here: https://www.acryldata.io/sign-up
  • g

    green-intern-1667

    11/04/2022, 4:37 PM
    Trying to ingest Snowflake data via UI ingestion, but after hit
    Test Connection
    button I can only see
    Testing you connection for several minutes
    . Any clue on that?
    a
    m
    • 3
    • 12
  • b

    billowy-pilot-93812

    11/04/2022, 7:09 PM
    Hi all, i'm trying connect to superset but receiving access token error. Any clue on that? Thank guys
    m
    • 2
    • 1
  • l

    little-spring-72943

    11/05/2022, 10:34 PM
    I am in the same situation...
  • f

    few-sunset-43876

    11/06/2022, 3:56 PM
    Hi all, I'm trying to locally upgrade the datahub from version v0.8.42 to the version v0.9.1 using nocode upgrade docker/datahub-upgrade/nocode/run_upgrade.sh But I got the error and the process was aborted.
    Copy code
    Starting upgrade with id NoCodeDataMigration...
    Cleanup has not been requested.
    Skipping Step 1/6: RemoveAspectV2TableStep...
    Executing Step 2/6: GMSQualificationStep...
    Completed Step 2/6: GMSQualificationStep successfully.
    Executing Step 3/6: UpgradeQualificationStep...
    -- V1 table does not exist
    Failed to qualify upgrade candidate. Aborting the upgrade...
    Step with id UpgradeQualificationStep requested an abort of the in-progress update. Aborting the upgrade...
    Upgrade NoCodeDataMigration completed with result ABORTED. Exiting...
    It seems that a table was missed? What should I do in this case? Thanks!
    b
    • 2
    • 9
  • b

    billowy-pilot-93812

    11/07/2022, 3:43 AM
    Copy code
    ~~~~ Execution Summary ~~~~
    
    RUN_INGEST - {'errors': [],
     'exec_id': 'e84cc000-058f-4fa0-a2e1-6ad532762012',
     'infos': ['2022-11-07 03:41:59.324788 [exec_id=e84cc000-058f-4fa0-a2e1-6ad532762012] INFO: Starting execution for task with name=RUN_INGEST',
               '2022-11-07 03:42:05.614780 [exec_id=e84cc000-058f-4fa0-a2e1-6ad532762012] INFO: stdout=venv setup time = 0\n'
               'This version of datahub supports report-to functionality\n'
               'datahub  ingest run -c /tmp/datahub/ingest/e84cc000-058f-4fa0-a2e1-6ad532762012/recipe.yml --report-to '
               '/tmp/datahub/ingest/e84cc000-058f-4fa0-a2e1-6ad532762012/ingestion_report.json\n'
               '[2022-11-07 03:42:03,375] INFO     {datahub.cli.ingest_cli:182} - DataHub CLI version: 0.9.1\n'
               '[2022-11-07 03:42:03,432] INFO     {datahub.ingestion.run.pipeline:175} - Sink configured successfully. DataHubRestEmitter: configured '
               'to talk to <http://datahub-gms:8080>\n'
               '[2022-11-07 03:42:04,848] ERROR    {datahub.entrypoints:192} - \n'
               'Traceback (most recent call last):\n'
               '  File "/tmp/datahub/ingest/venv-superset-0.9.1/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 196, in __init__\n'
               '    self.source: Source = source_class.create(\n'
               '  File "/tmp/datahub/ingest/venv-superset-0.9.1/lib/python3.10/site-packages/datahub/ingestion/source/superset.py", line 168, in create\n'
               '    return cls(ctx, config)\n'
               '  File "/tmp/datahub/ingest/venv-superset-0.9.1/lib/python3.10/site-packages/datahub/ingestion/source/superset.py", line 148, in '
               '__init__\n'
               '    self.access_token = login_response.json()["access_token"]\n'
               "KeyError: 'access_token'\n"
               '\n'
               'The above exception was the direct cause of the following exception:\n'
               '\n'
               'Traceback (most recent call last):\n'
               '  File "/tmp/datahub/ingest/venv-superset-0.9.1/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 197, in run\n'
               '    pipeline = Pipeline.create(\n'
               '  File "/tmp/datahub/ingest/venv-superset-0.9.1/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create\n'
               '    return cls(\n'
               '  File "/tmp/datahub/ingest/venv-superset-0.9.1/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 202, in __init__\n'
               '    self._record_initialization_failure(\n'
               '  File "/tmp/datahub/ingest/venv-superset-0.9.1/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 129, in '
               '_record_initialization_failure\n'
               '    raise PipelineInitError(msg) from e\n'
               'datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure source (superset)\n'
               '[2022-11-07 03:42:04,848] ERROR    {datahub.entrypoints:195} - Command failed: \n'
               '\tFailed to configure source (superset) due to \n'
               "\t\t''access_token''.\n"
               '\tRun with --debug to get full stacktrace.\n'
               "\te.g. 'datahub --debug ingest run -c /tmp/datahub/ingest/e84cc000-058f-4fa0-a2e1-6ad532762012/recipe.yml --report-to "
               "/tmp/datahub/ingest/e84cc000-058f-4fa0-a2e1-6ad532762012/ingestion_report.json'\n",
               "2022-11-07 03:42:05.615068 [exec_id=e84cc000-058f-4fa0-a2e1-6ad532762012] INFO: Failed to execute 'datahub ingest'",
               '2022-11-07 03:42:05.616189 [exec_id=e84cc000-058f-4fa0-a2e1-6ad532762012] INFO: Caught exception EXECUTING '
               'task_id=e84cc000-058f-4fa0-a2e1-6ad532762012, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
               '    task_event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
               '    return future.result()\n'
               '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 227, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    Execution finished with errors.
    Hi all, I'm ingesting data from superset and got this issued. Any clue on that? Thank you
    f
    • 2
    • 3
  • c

    colossal-laptop-87082

    11/07/2022, 6:10 AM
    Hello team!! I'm new to the Datahub, I wanted to ingest CSV and make these observability checkpoints with the help of the Datahub, Is this possible for this? • Freshness • Volume • Scheme
  • b

    brave-zebra-97479

    11/07/2022, 7:10 AM
    Is it possible to disable writes/ingestion to the GMS API while still allowing reads?
  • g

    gray-telephone-67568

    11/07/2022, 7:39 AM
    Hi, have a question regarding lineage. We are ingesting lineage (both table and column/finegrained) via python REST and we encounter an issue. Somehow the lineage were gone after a few seconds refreshing after ingesting even though it shows at first in UI.
    a
    b
    • 3
    • 4
  • m

    microscopic-mechanic-13766

    11/07/2022, 11:45 AM
    Good morning, I have a deployment of Datahub with Airflow's connection enabled. Suh connection has been working until today, that for an unknown reason it is now printing the following error:
    Copy code
    {plugins_manager.py:235} ERROR - Failed to import plugin acryl-datahub-airflow-plugin
     Traceback (most recent call last):
       File "/home/airflow/.local/lib/python3.7/site-packages/airflow/plugins_manager.py", line 227, in load_entrypoint_plugins
         plugin_class = entry_point.load()
       File "/home/airflow/.local/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 203, in load
         module = import_module(match.group('module'))
       File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
         return _bootstrap._gcd_import(name[level:], package, level)
       File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
       File "<frozen importlib._bootstrap>", line 983, in _find_and_load
       File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
       File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
       File "<frozen importlib._bootstrap_external>", line 728, in exec_module
       File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/home/airflow/.local/lib/python3.7/site-packages/datahub_airflow_plugin/datahub_plugin.py", line 14, in <module>
         from datahub_provider.hooks.datahub import AIRFLOW_1, DatahubGenericHook
     ImportError: cannot import name 'AIRFLOW_1' from 'datahub_provider.hooks.datahub' (/home/airflow/.local/lib/python3.7/site-packages/datahub_provider/hooks/datahub.py)
    I don't know the source of this error as it has never shown before. I am using the v0.8.44 version of Datahub's plugin on Airflow as with later versions I have a problem version with sqlalchemy that causes the following error:
    Copy code
    File "/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py", line 267, in _validate_config_dependencies
         raise AirflowConfigException(f"error: cannot use sqlite with the {self.get('core', 'executor')}")
     airflow.exceptions.AirflowConfigException: error: cannot use sqlite with the CeleryExecutor
    Thanks in advance!!
    d
    • 2
    • 3
  • s

    salmon-angle-92685

    11/07/2022, 2:15 PM
    Hello guys, I am running metadata ingestion via yaml file. However, when it fails it doesn't exit with code 1 or any error. Since I am using Dagster to run this ingestion pipeline, I have some trouble to track the ingestions which have failed. Is there a way of forcing the script to exit with code 1 when failing ? Thank you guys !
    a
    • 2
    • 1
  • m

    microscopic-mechanic-13766

    11/07/2022, 3:45 PM
    Hello again, so I have just uploaded my Datahub deployment to the newest version (v0.9.2) and found a behaviour that don't know if it is the expected. So I logged in with my OIDC provider (Keycloak in my case) using its default page, but when I logged out of Datahub I was redirected to the page shown in the picture instead of my Keycloak log in page. This behaviour must have been added either on this new version or in 0.9.1, as in my previous deployment (0.9.0) this didn't happen. If this behaviour is the expected, is there any way to change it to the behaviour of previous releases as I would like to have both log in pages separate? Thanks in advance! (Don't know for sure but it could be a problem of not notifying the logOut to the OIDC, as when I click
    Sign in with SSO
    I don't have to put my credentials again)
    b
    • 2
    • 1
  • c

    clever-garden-23538

    11/07/2022, 4:19 PM
    Hey, in the OIDC flow, the page the user was initially trying to reach is lost in the redirect flow, ie you'll get dumped back at the homepage no matter what. Is this on the radar atm? most obvious solution would be to encode the original request url in the "state" param
    a
    • 2
    • 2
  • w

    witty-television-74309

    11/07/2022, 4:20 PM
    Hello, I am trying to ingest parquet on s3 with profiling on . I am seeing the following error. Appreciate any pointers to investigate this issue. BTW, it works fine with profiling is turned off.
    a
    h
    • 3
    • 5
  • b

    busy-eye-72759

    11/07/2022, 4:21 PM
    Hi good folk! I need some help regarding lineage. I am parsing some stored procedures in a MS SQL database using the REST emitter. I am emitting stored procedures as Tasks with input and output datasets/tasks. This works fine. However, when I come across a stored procedure with the same dataset as both input AND output, the lineage seems to crash. Is there a way to make this work? If not I will be happy to contribute on resolving the issue, if someone can point me in the right direction 🙂
    b
    b
    • 3
    • 5
  • s

    swift-farmer-36942

    11/07/2022, 6:22 PM
    👋 Hello, team!
  • s

    swift-farmer-36942

    11/07/2022, 6:24 PM
    Greetings all! I recently joined a new job and part of my responsibilities are reviewing security audit results from our DataHub instance. My manager said we recently had a web application penetration audit done, and they're claiming they found sensitive information in our datahub instance. I thought we had things locked down pretty well, any ideas where they could have found it? This is all kind of new to me so I am trying to learn!
    m
    • 2
    • 2
  • s

    swift-farmer-36942

    11/07/2022, 6:24 PM
    (if there is a better place to ask, please let me know)
  • w

    witty-television-74309

    11/07/2022, 7:32 PM
    Folks, any suggestion on this spark issue ?
  • h

    handsome-football-66174

    11/07/2022, 8:26 PM
    Hi Team, Trying to use search in GraphQL, how to give OR condition in Query to search ?
    Copy code
    {search(input: {type:DATASET,start: 0, count: 1000, query: "*", filters: [{field:"tags",value:"urn:li:tag:testtag"}]} ) {searchResults {entity {urn ... on Dataset{urn status{ removed } editableProperties{ description } schemaMetadata{ fields{description type fieldPath } }  domain{domain{ properties{name } } } } } } } }
    h
    a
    • 3
    • 5
  • s

    salmon-jackal-36326

    11/07/2022, 9:16 PM
    @witty-plumber-82249 I'm receiving this error. Someone?
    Copy code
    [application-akka.actor.default-dispatcher-25] WARN  application - The submitted callback is unsupported! 
    
    ERROR controllers.AuthenticationController - Caught exception while attempting to redirect to SSO identity provider! It's likely that SSO integration is mis-configured.
    
    org.pac4j.core.exception.TechnicalException: com.nimbusds.oauth2.sdk.ParseException: The scope must include an "openid" value
    m
    b
    • 3
    • 9
  • a

    able-evening-90828

    11/08/2022, 1:53 AM
    How should I configure the ObjectMapper so that it can deserialize
    Entity
    in
    SearchQueryResponse
    of a GraphQL search request? I got the following exception when trying to turn the code below it. I used the auto generated graphql client code and I confirmed that the raw response string the client received is correct.
    Copy code
    com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of com.linkedin.datahub.graphql.generated.Entity (no Creators, like default constructor, exist): abstract types either need to be mapped to concrete types, have custom deserializer, or contain additional type information
    Copy code
    final SearchInput searchInput =
            SearchInput.builder()
                .setType(EntityType.DATASET)
                .setQuery("")
                .setStart(0)
                .setCount(10)
                .build();
        final SearchQueryRequest searchQueryRequest =
            SearchQueryRequest.builder().setInput(searchInput).build();
    
        GraphQLResponseProjection graphQLResponseProjection =
            new SearchResultsResponseProjection()
                .start()
                .count()
                .total()
                .searchResults(
                    new SearchResultResponseProjection()
                        .entity(new EntityResponseProjection().urn().type()));
    
        GraphQLRequest graphQLRequest =
            new GraphQLRequest(searchQueryRequest, graphQLResponseProjection);
        final SearchQueryResponse searchQueryResponse =
            getRestTemplate()
                .exchange(
                    URI.create(GRAPHQL_ENDPOINT),
                    <http://HttpMethod.POST|HttpMethod.POST>,
                    createHttpEntity(graphQLRequest),
                    SearchQueryResponse.class)
                .getBody();
    
        System.out.println(searchQueryResponse.search());
    @green-football-43791 @bulky-soccer-26729
    i
    o
    • 3
    • 11
  • b

    billowy-pilot-93812

    11/08/2022, 4:25 AM
    Hi all, i'm ingesting data from postgresql, the first time running sucessfull, but it turn out to fail since the second running with this can't map type error. Any clue on this? Thank you
    Copy code
    "'container-urn:li:container:f4f87664cea5ae66d80ae56c7893eef5-to-urn:li:dataset:(urn:li:dataPlatform:postgres,tech.tech_sch.ga2_metrics_hour,PROD)',\n"
               "               'tech.tech_sch.landers_category',\n"
               "               'tech.tech_sch.search_query_category_rnk1-subtypes',\n"
               "               '... sampled of 2271 total elements'],\n"
               ' \'warnings\': {\'tech.location_tree._locations_vn_geo\': ["unable to map type Geometry(from_text=\'ST_GeomFromEWKT\', '
               'name=\'geometry\') to metadata schema"],\n'
               '              \'tech.location_tree.locations_vn\': ["unable to map type Geometry(from_text=\'ST_GeomFromEWKT\', name=\'geometry\') to '
               'metadata schema"],\n'
               '              \'tech.location_tree.locations_ph\': ["unable to map type Geometry(from_text=\'ST_GeomFromEWKT\', name=\'geometry\') to '
               'metadata schema"]},\n'
               " 'failures': {'Stateful Ingestion': ['Fail safe mode triggered, entity difference percent:97.61904761904762 > "
               "fail_safe_threshold:{self.stateful_ingestion_config.fail_safe_threshold}']},\n"
               " 'soft_deleted_stale_entities': [],\n"
               " 'tables_scanned': '656',\n"
    plus1 1
    m
    h
    • 3
    • 2
1...575859...119Latest