https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • p

    prehistoric-optician-40107

    03/22/2022, 11:10 AM
    Hi all. I'm trying to learn datahub and I'm having trouble ingestion metadata via UI. I was able to get my metadata via yml file but not via UI. This is my execution details. How can I fix this ?
    Copy code
    "ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by "
               "NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb58e5d14f0>: Failed to establish a new connection: [Errno 111] "
               "Connection refused'))\n",
               "2022-03-16 11:30:30.325290 [exec_id=d287226a-592b-4029-879a-583a3cfa64eb] INFO: Failed to execute 'datahub ingest'",
               '2022-03-16 11:30:30.325765 [exec_id=d287226a-592b-4029-879a-583a3cfa64eb] INFO: Caught exception EXECUTING '
               'task_id=d287226a-592b-4029-879a-583a3cfa64eb, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
               '    self.event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n'
               '    return f.result()\n'
               '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
               '    raise self._exception\n'
               '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
               '    result = coro.send(None)\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
               '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
               "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
    Execution finished with errors.
    d
    b
    • 3
    • 4
  • w

    white-postman-45591

    03/22/2022, 12:42 PM
    Hi Team, Question about ingestion from BigQuery - how can I change the default jobs location ? currently the BQ profiling jobs are running on different location from the datasets locations (US / us-central1) so I get an error: Dataset XXXXX was not found in location us-central1
    plus1 1
    d
    • 2
    • 1
  • b

    bland-crowd-77263

    03/23/2022, 2:47 AM
    hi team, the link of “local dao” is expired in this documentation , where can I get the latest one ? Thanks
    o
    • 2
    • 2
  • r

    red-napkin-59945

    03/23/2022, 5:55 AM
    The corpGroupInfo is deprecated in entity.graphql. In the comment, it recommends using
    properties
    however, the frontend code(group.graphql) does not request
    properties
    field. Is this some bug?
    o
    • 2
    • 8
  • f

    full-dentist-68591

    03/23/2022, 8:36 AM
    Hi everyone, is there some sort of mapping function to select
    SchemaFieldDataTypeClass
    in
    SchemaFieldClass
    when creating a dataset? I have a xml export of an ETL job and trying to ingest table defintions into DataHub from this file. In order to select the right data types for the columns I need some sort of mapping (e.g. VARCHAR -> StringTypeClass).
    o
    • 2
    • 4
  • p

    polite-orange-57255

    03/23/2022, 9:54 AM
    Hi team , I have deployed datahub and it seems UI is not able to hit the graphql api. Can someone help to figure out that?
    b
    l
    +2
    • 5
    • 39
  • r

    rich-policeman-92383

    03/23/2022, 10:47 AM
    What ES indexes are created/required by datahub and Is there any ILM policy set on these indexes. IF ILM policy is not set what should be the ILM policies for these indexex.
    e
    • 2
    • 1
  • m

    modern-artist-55754

    03/23/2022, 1:01 PM
    Hi there, what would be the cause for
    Copy code
    source produced an invalid metadata work unit: MetadataChangeEventClass
    I'm trying to ingest some workbooks from tableau, and there's one particular one that keeps failing but i wasn't sure why.
    o
    b
    m
    • 4
    • 10
  • b

    brave-secretary-27487

    03/23/2022, 2:53 PM
    Hey all, I;m having issues with a bigquery sink.
    Copy code
    [2022-03-23, 14:43:03 UTC] {pipeline.py:84} INFO - sink wrote workunit container-urn:li:container:714a46eb68a1eb8ba6308cf73b33190a-to-urn:li:dataset:(urn:li:dataPlatform:bigquery,dw.analytics_245627.page_views,PROD)
    [2022-03-23, 14:43:03 UTC] {pipeline.py:92} ERROR - failed to write record with workunit dw.analytics_245627.page_views with Expecting value: line 1 column 1 (char 0) and info {}
    [2022-03-23, 14:43:03 UTC] {pipeline.py:84} INFO - sink wrote workunit container-urn:li:container:714a46eb68a1eb8ba6308cf73b33190a-to-urn:li:dataset:(urn:li:dataPlatform:bigquery,dw.analytics_245627038.sessions,PROD)
    [2022-03-23, 14:43:03 UTC] {pipeline.py:92} ERROR - failed to write record with workunit dw.analytics_245627038.sessions with Expecting value: line 1 column 1 (char 0) and info {}
    [2022-03-23, 14:43:04 UTC] {pipeline.py:84} INFO - sink wrote workunit container-urn:li:container:714a46eb68a1eb8ba6308cf73b33190a-to-urn:li:dataset:(urn:li:dataPlatform:bigquery,dw.analytics_245627038.user_detail_events,PROD)
    [2022-03-23, 14:43:04 UTC] {pipeline.py:84} INFO - sink wrote workunit dw.analytics_245627038.user_detail_events
    In this log it's visible that it sometimes error out. How would I best approach this to debug this issue and what could be the reason that some sinks fail?
    d
    p
    • 3
    • 7
  • r

    red-napkin-59945

    03/23/2022, 7:08 PM
    Hey team, we got the following error when click the
    Analytics
    button. Do we need to do some special configuration in order to use the feature?
    AnalyticsService:264 - Search query failed: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event]]
    o
    b
    • 3
    • 13
  • c

    chilly-oil-22683

    03/23/2022, 8:22 PM
    Hi DataHub people! While setting up DataHub Project in our AWS EKS cluster, we're running into an issue we donb't really understand. Following this guide; https://datahubproject.io/docs/deploy/aws, we got the prerequisites running. However, the setup in the datahub helm chart itself gives an issue:
    Copy code
    Error: UPGRADE FAILED: failed to create resource: Deployment.apps "datahub-datahub-frontend" is invalid: spec.template.spec.containers[0].env[11].valueFrom.secretKeyRef.key: Required value helm.go:84: [debug] Deployment.apps "datahub-datahub-frontend" is invalid: spec.template.spec.containers[0].env[11].valueFrom.secretKeyRef.key: Required value
    What kind of value is it looking for? Is it some helm setting, but I can't seem to find that setting in the helm chart settings: https://artifacthub.io/packages/helm/datahub/datahub Is it looking for EKS cluster settings? Does anyone have a pointer for me, where should I set this setting and what value is it looking for? Thanks! Dennis
    o
    • 2
    • 5
  • b

    breezy-portugal-43538

    03/24/2022, 9:27 AM
    Hello, I am trying to run the integration for the datahub with great expectations and I receive really strange error. Despite following the tutorial and installing the latest version of:
    pip install 'acryl-datahub[great-expectations]'
    When running the checkpoint yml file there is an error prompted with missing module:
    FileNotFoundError: No module named "datahub.integrations.great_expectations.action" could be found in the repository. Please make sure that the file, corresponding to this package and module, exists and that dynamic loading of code modules, templates, and assets is supported in your execution environment. This error is unrecoverable.
    When I ran my IDE I see that during the import the
    integrations
    module is not present, is it some bug occurring ubuntu? Could you help to resolve the issue? I am posting pictures below from the windows and ubuntu, if any more information would be required please let me know.
    h
    • 2
    • 11
  • g

    gentle-father-80172

    03/24/2022, 2:15 PM
    Hey Team good morning! 👋 Lineage question -> What could cause lineage to be set for some datasets but not others if they come from the same source/schema? Details in 🧵
    o
    • 2
    • 5
  • q

    quick-student-61408

    03/24/2022, 2:34 PM
    Hello everyone , I'm a datahub beginner and i want to try to ingest a business glossary with cli. But i've this error message :
    apache@apache-VirtualBox:~$ python3.9 -m datahub ingest -c business_glossary.yml
    [2022-03-24 15:32:13,017] INFO     {datahub.cli.ingest_cli:75} - DataHub CLI version: 0.8.31.2
    [2022-03-24 15:32:13,164] ERROR    {datahub.entrypoints:152} - File "/home/apache/.local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 82, in run
    70   def run(
    71       ctx: click.Context, config: str, dry_run: bool, preview: bool, strict_warnings: bool
    72   ) -> None:
    (...)
    78       pipeline_config = load_config_file(config_file)
    79
    80       try:
    81           logger.debug(f"Using config: {pipeline_config}")
    --> 82           pipeline = Pipeline.create(pipeline_config, dry_run, preview)
    83       except ValidationError as e:
    File "/home/apache/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 174, in create
    170  @classmethod
    171  def create(
    172      cls, config_dict: dict, dry_run: bool = False, preview_mode: bool = False
    173  ) -> "Pipeline":
    --> 174      config = PipelineConfig.parse_obj(config_dict)
    175      return cls(config, dry_run=dry_run, preview_mode=preview_mode)
    File "pydantic/main.py", line 511, in pydantic.main.BaseModel.parse_obj
    File "pydantic/main.py", line 329, in pydantic.main.BaseModel.__init__
    File "pydantic/main.py", line 1022, in pydantic.main.validate_model
    File "pydantic/fields.py", line 837, in pydantic.fields.ModelField.validate
    File "pydantic/fields.py", line 1118, in pydantic.fields.ModelField._apply_validators
    File "pydantic/class_validators.py", line 278, in pydantic.class_validators._generic_validator_cls.lambda2
    File "/home/apache/.local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 56, in run_id_should_be_semantic
    52   def run_id_should_be_semantic(
    53       cls, v: Optional[str], values: Dict[str, Any], **kwargs: Any
    54   ) -> str:
    55       if v == "__DEFAULT_RUN_ID":
    --> 56           if values["source"] is not None:
    57               if values["source"].type is not None:
    KeyError: 'source'
    [2022-03-24 15:32:13,165] INFO     {datahub.entrypoints:161} - DataHub CLI version: 0.8.31.2 at /home/apache/.local/lib/python3.9/site-packages/datahub/__init__.py
    [2022-03-24 15:32:13,165] INFO     {datahub.entrypoints:164} - Python version: 3.9.11 (main, Mar 16 2022, 17:19:28)
    [GCC 9.4.0] at /usr/bin/python3.9 on Linux-5.13.0-35-generic-x86_64-with-glibc2.31
    [2022-03-24 15:32:13,165] INFO     {datahub.entrypoints:167} - GMS config {}
    s
    m
    • 3
    • 31
  • q

    quick-student-61408

    03/24/2022, 2:35 PM
    Does anyone have a solution? !)
    s
    • 2
    • 1
  • c

    calm-television-89033

    03/24/2022, 3:56 PM
    EHi Hi everybody! I'm having an error when trying to use secrets for the UI ingestion. I configured the recipe for bigquery using the following structure:
    Copy code
    source:
        type: bigquery
        config:
            project_id: '${DATAPLATFORM_PROJECT_ID}'
            credential:
                project_id: '${DATAPLATFORM_PROJECT_ID}'
                private_key_id: '${BIGQUERY_PRIVATE_KEY_ID}'
                private_key: '${BIGQUERY_PRIVATE_KEY}'
                client_email: '${BIGQUERY_CLIENT_EMAIL}'
                client_id: '${BIGQUERY_CLIENT_ID}'
    sink:
        type: datahub-rest
        config:
            server: '<http://30.222.164.39:8080>'
    And I'm getting the following error:
    Copy code
    "Failed to resolve secret with name DATAPLATFORM_PROJECT_ID. Aborting recipe execution."
    I double-checked the secrets names as sugested by the UI Ingestion Guide and they are correct. Have you guys gone through this or could you give me any tips on how to proceed? Thanks in advance for your attention! 🙂
    s
    b
    +3
    • 6
    • 39
  • g

    gentle-camera-33498

    03/24/2022, 5:35 PM
    Hello, everyone! I'm new to Datahub Project. I made a setup for a POC a few days ago and today I'm trying to ingest metadata from Metabase. Unfortunately, I'm getting some errors. Everything is ok about permissions and access to Metabase API (I checked by me with a python script). But, before metadata ingestion ends, I get this error below:
    Copy code
    ---- (full traceback above) ----
    File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/entrypoints.py", line 138, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
    File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
        return self.main(*args, **kwargs)
    File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 1053, in main
        rv = self.invoke(ctx)
    File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
    File "/home/pbraz/.local/lib/python3.8/site-packages/click/core.py", line 754, in invoke
        return __callback(*args, **kwargs)
    File "/home/pbraz/.local/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
    File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 202, in wrapper
        raise e
    File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 194, in wrapper
        res = func(*args, **kwargs)
    File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/utilities/memory_leak_detector.py", line 102, in wrapper
        res = func(*args, **kwargs)
    File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 92, in run
        pipeline.run()
    File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 181, in run
        for wu in itertools.islice(
    File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/ingestion/source/metabase.py", line 541, in get_workunits
        yield from self.emit_card_mces()
    File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/ingestion/source/metabase.py", line 240, in emit_card_mces
        chart_snapshot = self.construct_card_from_api_data(card_info)
    File "/home/pbraz/.local/lib/python3.8/site-packages/datahub/ingestion/source/metabase.py", line 258, in construct_card_from_api_data
        card_response = self.session.get(card_url)
    File "/usr/lib/python3/dist-packages/requests/sessions.py", line 546, in get
        return self.request('GET', url, **kwargs)
    File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in request
        resp = self.send(prep, **send_kwargs)
    File "/usr/lib/python3/dist-packages/requests/sessions.py", line 646, in send
        r = adapter.send(request, **kwargs)
    File "/usr/lib/python3/dist-packages/requests/adapters.py", line 498, in send
        raise ConnectionError(err, request=request)
    
    ConnectionError: ('Connection aborted.', OSError("(104, 'ECONNRESET')"))
    [2022-03-24 17:02:52,053] INFO     {datahub.entrypoints:161} - DataHub CLI version: 0.8.31.1 at /home/pbraz/.local/lib/python3.8/site-packages/datahub/__init__.py
    [2022-03-24 17:02:52,053] INFO     {datahub.entrypoints:164} - Python version: 3.8.10 (default, Nov 26 2021, 20:14:08) 
    [GCC 9.3.0] at /usr/bin/python3 on Linux-5.13.0-1019-gcp-x86_64-with-glibc2.29
    [2022-03-24 17:02:52,053] INFO     {datahub.entrypoints:167} - GMS config {}
    I made some searches trying to discover the reason for these errors. My first guess was the API request rate limit, but I found in the documentation that just login requests have rate limits (see here). My second try was to search for the error on the internet and I found not so similar situation but with the same error (see here). It could be possible that Metabase has a security control for User-Agent headers? The user created for this POC is receiving this email every time I try to ingest the metadata from Metabase: Does someone have some idea what I could probably be doing wrong? Thanks for your attention!
    l
    • 2
    • 3
  • r

    red-napkin-59945

    03/24/2022, 8:59 PM
    hey team, after setting
    DATAHUB_ANALYTICS_ENABLED
    to
    true
    , the UI pages load time is extremely long (10s+), any idea about this long time? I did not find any error log in both datahub-frontend and datahub-gms
    e
    • 2
    • 3
  • m

    mysterious-portugal-30527

    03/24/2022, 9:51 PM
    OK. Why is this happening: I go to my home page, choose Snowflake under platforms which claims six topics but instead I get a page with no entries! Whiskey-Tango-Foxtrot-Over! 😁
    s
    • 2
    • 6
  • n

    numerous-table-92385

    03/25/2022, 5:50 AM
    Hi Team, im trying to get the hello world instance up on an aws ami, i installed docker and the python from the quickstart and have a helloworld working on docker however the quickstart guide isn't working
    s
    • 2
    • 3
  • m

    modern-monitor-81461

    03/25/2022, 2:17 PM
    View in Airflow broken on my deployment When I click the
    View in Airflow
    button in my deployed Datahub, it opens something else (seems like it is reloading the current page I'm on). When I look at the navigation URL of the button (
    href
    ), I see something like:
    <https://datahub.mydomain.com/tasks/urn:li:dataJob:(urn:li:dataFlow:(airflow,xxxxx,prod),xxxxx)/airflow.mydomain.com/taskinstance/list/?flt1_dag_id_equals=xxxxx&_flt_3_task_id=xxxxx>
    where
    <http://airflow.mydomain.com/taskinstance/list/?flt1_dag_id_equals=xxxxx&_flt_3_task_id=xxxxx|airflow.mydomain.com/taskinstance/list/?flt1_dag_id_equals=xxxxx&_flt_3_task_id=xxxxx>
    is a valid URL (I can open it up in my browser and what I get is what I expected). So the
    href
    is like https://`<datahub domain>`/`<params and urn of the airflow task>`/`<airflow domain>`/`<params of the airflow task>` I also have a Superset integration and when I look at the
    href
    of the
    View in Superset
    button, I see a real Superset URL without a datahub URL prepended. That button works well. Why is the Airflow button href different?
    o
    d
    • 3
    • 7
  • r

    red-napkin-59945

    03/25/2022, 8:10 PM
    Hey team, regarding
    TrackingController
    in datahub-frontend, any reason we want to flush here?
    b
    • 2
    • 13
  • m

    most-room-32003

    03/26/2022, 10:13 PM
    hi all - i hope this is the right place to post issues. i'm trying to work with assertions/"validation" on table UI. my end goal is to populate the "validation" tab on table UI with some external checks i'm doing on the table, very similar to the great expectations integration i reviewed the python emitter and also found data_quality_mcpw_rest.py example. however, the example doesn't work as-is (obviously changing data source, table name, etc). is it suppose to? after i run the script and "validation" tab in UI an error appears
    Copy code
    The field at path '/dataset/assertions/assertions[0]' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Assertion' within parent type '[Assertion!]' (code undefined)
    l
    b
    h
    • 4
    • 14
  • w

    wooden-football-7175

    03/28/2022, 1:08 PM
    Hello channel, I have some warning with
    airflow
    and
    GreatExpectationsOperator
    that maybe have sense to bring it here. When trying to install acryl dependencies to run GE action to send validations to Datahub, this “compatibility issue” appears.
    d
    • 2
    • 2
  • n

    numerous-camera-74294

    03/28/2022, 1:38 PM
    hello! is there any way of listing all glossary terms under a given path?
    b
    • 2
    • 3
  • b

    bitter-toddler-42943

    03/29/2022, 2:41 AM
    Hello, ask here too. Is elasticsearch essential to datahub? (Surely YES). Does anybody know how can I setup some security option to enter password for elasticsearch on datahub? After I change some option to elasticsearch I cannot connect to datahub ingestion.
    b
    • 2
    • 5
  • g

    gorgeous-dinner-4055

    03/29/2022, 6:01 AM
    An FYI in case anyone else runs into issues with setting up analytics in a datahub deployment without the standard helm charts using AWS MSK. If you notice every third click in the datahub UI hanging, you can look into the networking and check if the
    trace
    call is hanging(should timeout in 60 seconds and you can make one more call). If it is, you may also notice the following stack trace if you add a callback to the event tracking kafka emit:
    Copy code
    org.apache.kafka.common.errors.TimeoutException: Topic DataHubUsageEvent_v1 not present in metadata after 60000 ms.
    To fix this, the following configs need to be set to talk to kafka correctly: https://github.com/datahub-project/datahub/blob/34b36c0fe17f6ed6195ba5a0b57f41853fc60532/datahub-frontend/conf/application.conf#L158 Will update the docs tomorrow to add this info, but hopefully someone will find this useful if doing a search through slack one day 🙂
    ❤️ 3
    b
    • 2
    • 2
  • b

    boundless-student-48844

    03/29/2022, 1:57 PM
    Hi team, I saw this weird behavior in my local datahub. Not sure if anyone encountered this behavior before or could share some idea. 🙇 After starting my local datahub with
    ./docker/dev-without-neo4j.sh
    , I do a first ingestion using
    datahub ingest -c ~/Desktop/recipe1.yml
    (it ingests from json file with MCEs for Dataset), and then do the second ingestion using
    datahub ingest -c ~/Desktop/recipe2.yml
    (it ingests from json file with MCEs for Dashboard), The
    getSearchResultsForMultiple
    gql query with query variable “_types_” set to the first entity type (in this case, Dataset) return expected results. However, it returns no result when “_types_” is set to the second entity type (in this case, Dashboard). Supposedly, there should be some entities returned as Dashboard entities are ingested in the second run. I checked the ES, the entities exist in the
    dashboardindex_v2
    index. If I do the reverse, ingest Dashboards first then Datasets, I am not able to get Datasets from
    getSearchResultsForMultiple
    . Does this have anything to do with Elasticsearch cache?
    b
    e
    • 3
    • 6
  • p

    polite-orange-57255

    03/30/2022, 9:07 AM
    Hey Team, on our deployed datahub we are unable to see most popular , recently viewed data, domains on homepage and also weekly active users are not updating on analytics page . What can be the reason (or have any extra configs to enable this) cc @gifted-kite-59905 Slack Conversation
    i
    • 2
    • 1
  • g

    gray-agency-10420

    03/30/2022, 9:28 AM
    Hello We are trying to use the Data Lake source, it works but ingests the data without partitions, and instead of one dataset, we got a thousand. Do we have some configuration for it? For example, instead of two datasets:
    Copy code
    dim_geo_location_processed/version=20220312T000000/dim_geo_location_csv
    dim_geo_location_processed/version=20220313T000000/dim_geo_location_csv
    we expect to have one
    dim_geo_location_processed/dim_geo_location_csv
    i
    • 2
    • 4
1...222324...119Latest