https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • a

    agreeable-address-71270

    05/05/2023, 5:03 PM
    Hello! I am looking for clarification on the documentation to setup Okta SSO. https://datahubproject.io/docs/authentication/guides/sso/configure-oidc-react-okta#4-configure-datahub-frontend-to-enable-oidc-authentication
    a. Open the file
    docker/datahub-frontend/env/docker.env
    b. Add the following configuration values to the file:
    My question is can the env variables in
    docker.env
    be set as docker-compose environment variables in the
    datahub-frontend-react
    container?
    Copy code
    AUTH_OIDC_ENABLED=true
    AUTH_OIDC_CLIENT_ID=your-client-id
    AUTH_OIDC_CLIENT_SECRET=your-client-secret
    AUTH_OIDC_DISCOVERY_URI=<https://your-okta-domain.com/.well-known/openid-configuration>
    AUTH_OIDC_BASE_URL=your-datahub-url
    AUTH_OIDC_SCOPE="openid profile email groups"
    I ask this because I am running the frontend container as an ECS service.
    l
    a
    • 3
    • 2
  • e

    early-hydrogen-27542

    05/05/2023, 8:23 PM
    πŸ‘‹ folks - a couple questions around dbt meta mapping: 1. Is there a way to make model level meta mapping stateful? For instance, if I add a
    Tier 1
    term to a model via mapping, then swap it out for a
    Tier 3
    term, both terms remain on the model. We have stateful ingestion enabled, and it works at the model level (e.g. soft deletes deleted models). Column meta mapping appears to be stateful. 2. How would I update my recipe to have column level terms prefaced with a term node (e.g.
    PII.
    )? I would like to avoid the user having to type that prefix, but unsure how to add it when using
    add_terms
    instead of just
    add_term
    Version: 0.10.1 Recipe config:
    Copy code
    meta_mapping:
          datahub.owner:
            match: ".*"
            operation: "add_owner"
            config:
              owner_type: "group"
          contains_pii:
            match: True
            operation: "add_term"
            config:
              term: "PersonalInformation.PII"
          contains_pii:
            match: False
            operation: "add_term"
            config:
              term: "<http://PersonalInformation.No|PersonalInformation.No> PII"
          tier:
            match: "Tier 1|Tier 2|Tier 3"
            operation: "add_term"
            config:
              term: "Tier.{{ $match }}"
        column_meta_mapping:
          glossary_terms:
            match: ".*"
            operation: "add_terms"
            config:
              separator: ","
    πŸ“– 1
    βœ… 1
    πŸ” 1
    l
    m
    d
    • 4
    • 7
  • n

    nutritious-musician-70978

    05/05/2023, 8:41 PM
    I am getting an error saying cannot read file, no connection adapters were found for json file when trying to ingest sample metadata per the quickstart guide "datahub docker ingest-sample-data". Would appreciate any help.
    πŸ“– 1
    βœ… 1
    πŸ” 1
    l
    b
    • 3
    • 4
  • a

    adamant-engine-29309

    05/06/2023, 5:25 AM
    In my mac m1 laptop I am trying to install the project but while running the
    ./gradlew quickstart
    getting below error.
    Copy code
    > Task :docker:elasticsearch-setup:docker
    #12 10.24 go: <http://github.com/jwilder/dockerize@v0.6.1|github.com/jwilder/dockerize@v0.6.1>: <http://github.com/jwilder/dockerize@v0.6.1|github.com/jwilder/dockerize@v0.6.1>: Get "<https://proxy.golang.org/github.com/jwilder/dockerize/@v/v0.6.1.info>": dial tcp: lookup <http://proxy.golang.org|proxy.golang.org> on 192.168.65.5:53: read udp 172.17.0.2:59270->192.168.65.5:53: i/o timeout
    #12 ERROR: executor failed running [/bin/sh -c go install <http://github.com/jwilder/dockerize@$DOCKERIZE_VERSION|github.com/jwilder/dockerize@$DOCKERIZE_VERSION>]: exit code: 1
    ------
     > [binary 5/5] RUN go install <http://github.com/jwilder/dockerize@v0.6.1|github.com/jwilder/dockerize@v0.6.1>:
    #12 10.24 go: <http://github.com/jwilder/dockerize@v0.6.1|github.com/jwilder/dockerize@v0.6.1>: <http://github.com/jwilder/dockerize@v0.6.1|github.com/jwilder/dockerize@v0.6.1>: Get "<https://proxy.golang.org/github.com/jwilder/dockerize/@v/v0.6.1.info>": dial tcp: lookup <http://proxy.golang.org|proxy.golang.org> on 192.168.65.5:53: read udp 172.17.0.2:59270->192.168.65.5:53: i/o timeout
    ------
    ERROR: failed to solve: executor failed running [/bin/sh -c go install <http://github.com/jwilder/dockerize@$DOCKERIZE_VERSION|github.com/jwilder/dockerize@$DOCKERIZE_VERSION>]: exit code: 1
    
    > Task :docker:elasticsearch-setup:docker FAILED
    
    > Task :docker:mysql-setup:docker FAILED
    #11 10.24 go: <http://github.com/jwilder/dockerize@v0.6.1|github.com/jwilder/dockerize@v0.6.1>: <http://github.com/jwilder/dockerize@v0.6.1|github.com/jwilder/dockerize@v0.6.1>: Get "<https://proxy.golang.org/github.com/jwilder/dockerize/@v/v0.6.1.info>": dial tcp: lookup <http://proxy.golang.org|proxy.golang.org> on 192.168.65.5:53: read udp 172.17.0.2:59270->192.168.65.5:53: i/o timeout
    #11 ERROR: executor failed running [/bin/sh -c go install <http://github.com/jwilder/dockerize@$DOCKERIZE_VERSION|github.com/jwilder/dockerize@$DOCKERIZE_VERSION>]: exit code: 1
    ------
     > [binary 5/5] RUN go install <http://github.com/jwilder/dockerize@v0.6.1|github.com/jwilder/dockerize@v0.6.1>:
    #11 10.24 go: <http://github.com/jwilder/dockerize@v0.6.1|github.com/jwilder/dockerize@v0.6.1>: <http://github.com/jwilder/dockerize@v0.6.1|github.com/jwilder/dockerize@v0.6.1>: Get "<https://proxy.golang.org/github.com/jwilder/dockerize/@v/v0.6.1.info>": dial tcp: lookup <http://proxy.golang.org|proxy.golang.org> on 192.168.65.5:53: read udp 172.17.0.2:59270->192.168.65.5:53: i/o timeout
    ------
    ERROR: failed to solve: executor failed running [/bin/sh -c go install <http://github.com/jwilder/dockerize@$DOCKERIZE_VERSION|github.com/jwilder/dockerize@$DOCKERIZE_VERSION>]: exit code: 1
    
    > Task :datahub-web-react:yarnInstall
    Done in 15.09s.
    
    FAILURE: Build completed with 2 failures.
    
    1: Task failed with an exception.
    -----------
    * What went wrong:
    Execution failed for task ':docker:elasticsearch-setup:docker'.
    > Process 'command 'docker'' finished with non-zero exit value 1
    
    * Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
    ==============================================================================
    
    2: Task failed with an exception.
    -----------
    * What went wrong:
    Execution failed for task ':docker:mysql-setup:docker'.
    > Process 'command 'docker'' finished with non-zero exit value 1
    
    * Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
    ==============================================================================
    
    * Get more help at <https://help.gradle.org>
    
    Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
    Use '--warning-mode all' to show the individual deprecation warnings.
    See <https://docs.gradle.org/6.9.2/userguide/command_line_interface.html#sec:command_line_warnings>
    
    BUILD FAILED in 1m 8s
    209 actionable tasks: 82 executed, 127 up-to-date
    nsimadas@bcd0746626a5 datahub %
    πŸ“– 1
    πŸ” 1
    l
    d
    • 3
    • 4
  • p

    powerful-answer-39247

    05/06/2023, 10:06 AM
    Hello good people, I have the docker version running and want to test connecting to a MongoDB and Postgres databases running on my localhost (not the datahub docker localhost). It seems I cannot simply use localhost:5432 as the host & port because both DBs are not running inside datahub docker; any suggestions?
    l
    d
    • 3
    • 8
  • r

    rich-policeman-92383

    05/07/2023, 2:06 AM
    Datahub version: v0.9.6.1 CLI: 0.9.6.4 Problem: When we ingest metadata using the datahub cli from oracle source, the GMS service throws "document missing exception". Due to this metadata is never ingested in datahub. This error comes randomly but it persists for this particular ingestion. We are able to modify descriptions, create domains from the datahub UI. Error
    Copy code
    [62]: index [datahubprod_datasetindex_v2], type [_doc], id [urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cbtvl.item_master%2CPROD%29], message [[datahubprod_datasetindex_v2/zU2-TAicR3aYyFw_r6niQg][[datahubprod_datasetindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Aoracle%2Cbtvl.item_master%2CPROD%29]: document missing]]]
    πŸ” 1
    l
    d
    • 3
    • 4
  • b

    bulky-vr-54429

    05/08/2023, 8:04 AM
    Hello, I am Doron Podoleanu VP R&D at Velotix. We started using datahub but since it has vulnerable packing: Scanned by GCR and Snyk. This de facto prevents us from onboarding into google marketplace (and will certainly be an issue for any other audited production).
    Copy code
    "Packages below are known to contain vulnerabilities. Please update the affected packages and resubmit the solution
    Image: elasticsearch:7.17.9
      Note: CVE-2022-1471
      Package: org.yaml:snakeyaml
      Package Type: MAVEN
      Affected Version: 1.33
      Fixed Version: 2.0
    Image: acryldata/datahub-actions:v0.0.12
      Note: CVE-2023-24538
      Package: go
      Package Type: GO_STDLIB
      Affected Version: 1.20.2
      Fixed Version: 1.20.3
      Note: CVE-2021-33036
      Package: org.apache.hadoop:hadoop-yarn-server-common
      Package Type: MAVEN
      Affected Version: 3.2.0
      Fixed Version: 3.2.3
      Note: CVE-2022-37865
      Package: org.apache.ivy:ivy
      Package Type: MAVEN
      Affected Version: 2.4.0
      Fixed Version: 2.5.1
      Note: CVE-2022-25168
      Package: org.apache.hadoop:hadoop-common
      Package Type: MAVEN
      Affected Version: 3.2.0
      Fixed Version: 3.2.4
      Note: CVE-2023-22946
      Package: org.apache.spark:spark-core_2.12
      Package Type: MAVEN
      Affected Version: 3.0.3
      Fixed Version: 3.4.0
      Note: CVE-2022-26612
      Package: org.apache.hadoop:hadoop-common
      Package Type: MAVEN
      Affected Version: 3.2.0
      Fixed Version: 3.2.3
      Note: CVE-2019-0204
      Package: org.apache.mesos:mesos
      Package Type: MAVEN
      Affected Version: 1.4.0
      Fixed Version: 1.4.3
    Image: acryldata/datahub-kafka-setup:v0.10.2.2
      Note: CVE-2022-1471
      Package: org.yaml:snakeyaml
      Package Type: MAVEN
      Affected Version: 1.32
      Fixed Version: 2.0"
    1 - I understand that it uses ES 7 and does not really run the vulnerability execution path with snakeYaml etc. It matters not. this is severe CVE and this software is going to be stopped out of any audited place (such as marketplace, etc.) 2 - Is there commitment to fix anything in that list ( I know about best efforts) 3 - Trying to run datahub without ES right now. We do not really use search. When Trying to build I get the following error:
    Copy code
    2023-05-08T11:01:24.654+0300 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationRunner] Completing Build operation 'Configure build'
    2023-05-08T11:01:24.654+0300 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationRunner] Build operation 'Configure build' completed
    2023-05-08T11:01:24.660+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]
    2023-05-08T11:01:24.662+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] FAILURE: Build failed with an exception.
    2023-05-08T11:01:24.663+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]
    2023-05-08T11:01:24.663+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * Where:
    2023-05-08T11:01:24.663+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] Build file '/Users/podoleanu/work/datahub/buildSrc/build.gradle' line: 8
    2023-05-08T11:01:24.663+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]
    2023-05-08T11:01:24.663+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * What went wrong:
    2023-05-08T11:01:24.663+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] A problem occurred evaluating project ':buildSrc'.
    2023-05-08T11:01:24.663+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] > Could not find method compile() for arguments [io.acryl:json-schema-avro:0.1.5, build_72c16s6ya15s0l3jdky658gr3$_run_closure1$_closure2@7c1b50a8] on object of type org.gradle.api.internal.artifacts.dsl.dependencies.DefaultDependencyHandler.
    2023-05-08T11:01:24.664+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]
    2023-05-08T11:01:24.664+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * Exception is:
    2023-05-08T11:01:24.664+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] org.gradle.api.GradleScriptException: A problem occurred evaluating project ':buildSrc'.
    2023-05-08T11:01:24.664+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.groovy.scripts.internal.DefaultScriptRunnerFactory$ScriptRunnerImpl.run(DefaultScriptRunnerFactory.java:93)
    2023-05-08T11:01:24.664+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.configuration.DefaultScriptPluginFactory$ScriptPluginImpl.lambda$apply$0(DefaultScriptPluginFactory.java:135)
    2023-05-08T11:01:24.669+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.configuration.ProjectScriptTarget.addConfiguration(ProjectScriptTarget.java:79)
    2023-05-08T11:01:24.669+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.configuration.DefaultScriptPluginFactory$ScriptPluginImpl.apply(DefaultScriptPluginFactory.java:138)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.configuration.BuildOperationScriptPlugin$1.run(BuildOperationScriptPlugin.java:65)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.internal.operations.DefaultBuildOperationRunner$1.execute(DefaultBuildOperationRunner.java:29)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.internal.operations.DefaultBuildOperationRunner$1.execute(DefaultBuildOperationRunner.java:26)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:66)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.internal.operations.DefaultBuildOperationRunner$2.execute(DefaultBuildOperationRunner.java:59)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:157)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.internal.operations.DefaultBuildOperationRunner.execute(DefaultBuildOperationRunner.java:59)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.internal.operations.DefaultBuildOperationRunner.run(DefaultBuildOperationRunner.java:47)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.internal.operations.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:68)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.configuration.BuildOperationScriptPlugin.lambda$apply$0(BuildOperationScriptPlugin.java:62)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.configuration.internal.DefaultUserCodeApplicationContext.apply(DefaultUserCodeApplicationContext.java:44)
    2023-05-08T11:01:24.670+0300 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]   at org.gradle.configuration.BuildOperationScriptPlugin.apply(BuildOperationScriptPlugin.java:62)
    Can I get help/answers please? Thanks!
    βœ… 1
    l
    m
    • 3
    • 4
  • a

    acoustic-kite-241

    05/08/2023, 8:36 AM
    Hi everyone, I’m trying to use datahub ingest with tableau, and I meet trouble:
    Internal Server Error(s) while executing query
    Ingest log:
    Copy code
    [2023-05-08 08:17:12,202] INFO     {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.2.2
    [2023-05-08 08:17:12,256] INFO     {datahub.ingestion.run.pipeline:204} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://datahub-gms.com> with token: eyJh**********SOTI
    /usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py:2271: ConfigurationWarning: projects is deprecated and will be removed in a future release. Please removeit from your config.
      config = TableauConfig.parse_obj(config_dict)
    [2023-05-08 08:17:12,511] WARNING  {datahub.ingestion.source.tableau:342} - project_pattern is not set but projects is set. projects is deprecated, please use project_pattern instead.
    [2023-05-08 08:17:12,511] INFO     {datahub.ingestion.source.tableau:345} - Initializing project_pattern from projects
    [2023-05-08 08:17:12,842] INFO     {tableau.endpoint.auth:50} - Signed into <https://my-tableau.org> as user with id d6948785-5cc9-4c58-8d7f-675a4e4f168b
    [2023-05-08 08:17:12,842] INFO     {datahub.ingestion.source.tableau:616} - Authenticated to Tableau server
    [2023-05-08 08:17:12,842] INFO     {datahub.ingestion.run.pipeline:221} - Source configured successfully.
    [2023-05-08 08:17:12,843] INFO     {datahub.cli.ingest_cli:129} - Starting metadata ingestion
    -[2023-05-08 08:17:12,864] INFO     {datahub.ingestion.source.tableau:596} - Initializing site project registry
    [2023-05-08 08:17:12,865] INFO     {tableau.endpoint.projects:31} - Querying all projects on site
    2023-05-08 08:17:13,188] INFO     {datahub.ingestion.source.tableau:517} - project(xxxx) is not allowed as per project_pattern
    2023-05-08 08:17:13,188] INFO     {datahub.ingestion.source.tableau:517} - project(xxxx) is not allowed as per project_pattern
    2023-05-08 08:17:13,188] INFO     {datahub.ingestion.source.tableau:517} - project(xxxx) is not allowed as per project_pattern
    ......
    [2023-05-08 08:17:13,199] INFO     {datahub.ingestion.source.tableau:517} - project(Paid Search) is not allowed as per project_pattern
    [2023-05-08 08:17:13,200] INFO     {tableau.endpoint.datasources:84} - Querying all datasources on site
    [2023-05-08 08:17:13,306] INFO     {tableau.endpoint.datasources:84} - Querying all datasources on site
    |[2023-05-08 08:17:13,416] INFO     {tableau.endpoint.workbooks:74} - Querying all workbooks on site
    [2023-05-08 08:17:13,562] INFO     {tableau.endpoint.workbooks:74} - Querying all workbooks on site
    \[2023-05-08 08:17:13,694] INFO     {tableau.endpoint.workbooks:74} - Querying all workbooks on site
    [2023-05-08 08:17:13,807] INFO     {tableau.endpoint.metadata:61} - Querying Metadata API
    -[2023-05-08 08:17:13,877] ERROR    {datahub.ingestion.run.pipeline:409} - Caught error
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/run/pipeline.py", line 361, in run
        self.preview_workunits if self.preview_mode else None,
      File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/source_helpers.py", line 91, in auto_stale_entity_removal
        for wu in stream:
      File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/source_helpers.py", line 42, in auto_status_aspect
        for wu in stream:
      File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 2305, in get_workunits_internal
        yield from self.emit_workbooks()
      File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 738, in emit_workbooks
        page_size_override=self.config.workbook_page_size,
      File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 718, in get_connection_objects
        offset,
      File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 676, in get_connection_object_page
        raise RuntimeError(f"Query {connection_type} error: {errors}")
    RuntimeError: Query workbooksConnection error: [{'message': 'Internal Server Error(s) while executing query', 'extensions': None, 'path': None}]
    [2023-05-08 08:17:13,895] INFO     {datahub.cli.ingest_cli:135} - Source (tableau) report:
    {'aspects': {'container': {'containerProperties': 1, 'dataPlatformInstance': 1, 'status': 1, 'subTypes': 1}},
     'entities': {'container': ['urn:li:container:c6e27b6a2acce0003bc944ba693553f5']},
     'events_produced': 4,
     'events_produced_per_sec': 2,
     'failures': {},
     'running_time': '1.36 seconds',
     'soft_deleted_stale_entities': [],
     'start_time': '2023-05-08 08:17:12.531954 (1.36 seconds ago)',
     'warnings': {}}
    [2023-05-08 08:17:13,895] INFO     {datahub.cli.ingest_cli:138} - Sink (datahub-rest) report:
    {'current_time': '2023-05-08 08:17:13.895275 (now)',
     'failures': [],
     'gms_version': 'v0.9.5',
     'pending_requests': 0,
     'records_written_per_second': 2,
     'start_time': '2023-05-08 08:17:12.249157 (1.65 seconds ago)',
     'total_duration_in_seconds': 1.65,
     'total_records_written': 4,
     'warnings': []}
    [2023-05-08 08:17:14,269] ERROR    {datahub.entrypoints:195} - Command failed: Query workbooksConnection error: [{'message': 'Internal Server Error(s) while executing query', 'extensions': None, 'path': None}]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/datahub/entrypoints.py", line 182, in main
        sys.exit(datahub(standalone_mode=False, **kwargs))
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1130, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1055, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1657, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 760, in invoke
        return __callback(*args, **kwargs)
      return __callback(*args, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/click/decorators.py", line 26, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
        raise e
      File "/usr/local/lib/python3.7/dist-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
        res = func(*args, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
        return func(ctx, *args, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/datahub/cli/ingest_cli.py", line 198, in run
        loop.run_until_complete(run_func_check_upgrade(pipeline))
      File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
        return future.result()
      File "/usr/local/lib/python3.7/dist-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
        ret = await the_one_future
      File "/usr/local/lib/python3.7/dist-packages/datahub/cli/ingest_cli.py", line 150, in run_pipeline_async
        None, functools.partial(run_pipeline_to_completion, pipeline)
      File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.7/dist-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
        raise e
      File "/usr/local/lib/python3.7/dist-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
        pipeline.run()
      File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/run/pipeline.py", line 361, in run
        self.preview_workunits if self.preview_mode else None,
      File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/source_helpers.py", line 91, in auto_stale_entity_removal
        for wu in stream:
      File "/usr/local/lib/python3.7/dist-packages/datahub/utilities/source_helpers.py", line 42, in auto_status_aspect
        for wu in stream:
      File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 2305, in get_workunits_internal
        yield from self.emit_workbooks()
      File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 738, in emit_workbooks
        page_size_override=self.config.workbook_page_size,
      File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 718, in get_connection_objects
        offset,
      File "/usr/local/lib/python3.7/dist-packages/datahub/ingestion/source/tableau.py", line 676, in get_connection_object_page
        raise RuntimeError(f"Query {connection_type} error: {errors}")
    RuntimeError: Query workbooksConnection error: [{'message': 'Internal Server Error(s) while executing query', 'extensions': None, 'path': None}]
    tableau ingest yaml
    Copy code
    # tableau
    source:
        type: tableau
        config:
            connect_uri: '${TABLEAU_ADDRESS}'
            # site:
            platform_instance: acryl_instance
            # project_pattern:
            project_pattern: ["^default$", "^Project 2$", "^/Project A/Nested Project B$"]
            # projects: ["^default$", "^Project 2$", "^/Project A/Nested Project B$"]
    
            username: '${TABLEAU_USER}'
            password: '${TABLEAU_PASSWD}'
    
            page_size: 10
    
            ingest_tags: True
            ingest_owner: True
            stateful_ingestion:
                enabled: True
                remove_stale_metadata: true
    and my datahub version: 0.10.0.7 tablue version: 2022.3.1 I want to figure out why this problem occurred and how to solve it. In fact, my Tableau service went through a version upgrade and was ingesting normally before the upgrade. Thank u very much!
    l
    m
    • 3
    • 17
  • p

    powerful-cat-68806

    05/08/2023, 12:29 PM
    Hi team, I’m trying to ingest a Redshift cluster & facing the following error:
    Copy code
    error was '
               '(psycopg2.errors.FeatureNotSupported) Specified types or functions (one per INFO message) not supported on Redshift tables.
    From some investigation we did, the recommendation is to install
    psycopg2
    package Not sure it’s relevant here, because I’m only configuring my YAML to ingest the data I’m able to connect the cluster from my local prompt Pls. advise…. cc: @modern-garden-35830 @icy-controller-68116
    πŸ“– 1
    πŸ” 1
    l
    d
    • 3
    • 4
  • b

    billowy-lock-72499

    05/08/2023, 1:19 PM
    fetch("http://localhost:9002/api/graphql", { method: "POST", mode: 'no-cors', headers: { Accept: "application/json", "Accept-Language": "en-US,en-IN;q=0.9,en;q=0.8", Connection: "keep-alive", "Content-Type": "application/json", Cookie: "bid=2b4f6ed8-9f12-430b-baed-7116c1493c2e; PLAY_SESSION=aad7da57762b3c827347ee24fad95efda4a03127-actor=urn%3Ali%3Acorpuser%3Adatahub&token=token; actor=urnlicorpuser:datahub", DNT: "1", Origin: "http://localhost:9002", Referer: "http://localhost:9002/api/graphiql", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36", }, body: JSON.stringify({ query: "query getDatasetDetails($urn: String!) {\n dataset(urn: $urn) {\n ...getDatasetDetailsNonRecursiveDatasetFields\n }\n}\n\nfragment getDatasetDetailsNonRecursiveDatasetFields on Dataset {\n \n\n externalRoles {\n ...getExternalRoles\n }\n}\n\nfragment getExternalRoles on ExtRoles {\n extRoles {\n extRole {\n urn\n type\n id\n properties {\n name\n type\n requestLink\n description\n }\n provisionedUsers {\n urn\n\n }\n }\n associatedUrn\n }\n}\n\n", variables: { urn: "urnlidataset:(urnlidataPlatform:oracle,oracle.foobar,PROD)", }, operationName: "getDatasetDetails", }), }) hi i am trying to run fetch command and getting 401 unauthorized
    l
    g
    • 3
    • 2
  • g

    gentle-camera-33498

    05/08/2023, 3:03 PM
    Hello Everyone, I have a custom build process for GMS and Frontend images. I don't use release tags the same as Datahub official project. Because of that, my GMS is waiting forever for the system update job (I'm using the community system update job image, which has a different deployment tag). Besides, even setting 'BOOTSTRAP_SYSTEM_UPDATE_WAIT_FOR_SYSTEM_UPDATE' environment variable to 'false', the GMS application stills wait for the system update job. Could someone please tell me how to fix this problem without rebuilding the system update image on my side?
    βœ… 1
    l
    a
    • 3
    • 5
  • l

    late-smartphone-6255

    05/08/2023, 4:27 PM
    Hi team, I am trying to deploy DataHub on GCP Kubernetes by running
    helm install datahub datahub/datahub
    but get this error, could anyone help
    Copy code
    UPGRADE FAILED: pre-upgrade hooks failed: 1 error occurred:
    	* timed out waiting for the condition
    l
    d
    • 3
    • 6
  • h

    handsome-football-66174

    05/08/2023, 6:07 PM
    Hi Team, Facing issue with Publishing Lineage. When adding multiple lineage , the first set of lineage is not getting published: ( first input and output is not getting published )
    Copy code
    - input-datasets:
        - <s3://enterprise/Atemp>
        output-datasets:
        - <s3://enterprise/A>
      - input-datasets:
        - <s3://enterprise/A>
        output-datasets:
        - <s3://enterprise/B>
      - input-datasets:
        - <s3://enterprise/B>
        output-datasets:
        - <hdfs://enterprise/C>
      - input-datasets:
        - <hdfs://enterprise/C>
        output-datasets:
        - <s3://enterprise/D>
      - input-datasets:
        - <s3://enterprise/D>
        output-datasets:
        - <glue://db.table1>
    GMS Logs:
    Copy code
    2023-05-08 19:04:06,204 [qtp944427387-20] INFO  c.l.m.r.entity.AspectResource:166 - INGEST PROPOSAL proposal: {aspectName=upstreamLineage, entityUrn=urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newA,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=181,bytes=7b227570...227d5d7d)}, changeType=UPSERT}
    [enterprise, newA]
    2023-05-08 19:04:06,315 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.h.s.SiblingAssociationHook:104 - Urn urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newA,PROD) received by Sibling Hook.
    2023-05-08 19:04:06,346 [qtp944427387-97] INFO  c.l.m.r.platform.PlatformResource:61 - Emitting platform event. name: entityChangeEvent, key: entityChangeEvent-urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newA,PROD)
    2023-05-08 19:04:06,422 [pool-13-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 218ms
    2023-05-08 19:04:06,427 [pool-13-thread-2] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /platform?action=producePlatformEvent - producePlatformEvent - 200 - 81ms
    2023-05-08 19:04:06,429 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.h.s.SiblingAssociationHook:104 - Urn urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newA,PROD) received by Sibling Hook.
    2023-05-08 19:04:06,449 [qtp944427387-18] INFO  c.l.m.r.entity.AspectResource:166 - INGEST PROPOSAL proposal: {aspectName=upstreamLineage, entityUrn=urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newB,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=178,bytes=7b227570...227d5d7d)}, changeType=UPSERT}
    2023-05-08 19:04:06,452 [ThreadPoolTaskExecutor-1] INFO  c.d.event.PlatformEventProcessor:47 - Consuming a Platform Event
    2023-05-08 19:04:06,495 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.h.s.SiblingAssociationHook:104 - Urn urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newB,PROD) received by Sibling Hook.
    [enterprise, newB]
    2023-05-08 19:04:06,566 [qtp944427387-140] INFO  c.l.m.r.platform.PlatformResource:61 - Emitting platform event. name: entityChangeEvent, key: entityChangeEvent-urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newB,PROD)
    2023-05-08 19:04:06,568 [pool-13-thread-3] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /platform?action=producePlatformEvent - producePlatformEvent - 200 - 2ms
    2023-05-08 19:04:06,569 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.h.s.SiblingAssociationHook:104 - Urn urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newB,PROD) received by Sibling Hook.
    2023-05-08 19:04:06,578 [ThreadPoolTaskExecutor-1] INFO  c.d.event.PlatformEventProcessor:47 - Consuming a Platform Event
    2023-05-08 19:04:06,617 [pool-13-thread-4] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 168ms
    2023-05-08 19:04:07,001 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 20 Took time ms: -1
    2023-05-08 19:04:32,386 [qtp944427387-18] INFO  c.l.m.r.entity.AspectResource:166 - INGEST PROPOSAL proposal: {aspectName=upstreamLineage, entityUrn=urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newA,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=181,bytes=7b227570...227d5d7d)}, changeType=UPSERT}
    2023-05-08 19:04:32,420 [pool-13-thread-5] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 34ms
    2023-05-08 19:04:32,421 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.h.s.SiblingAssociationHook:104 - Urn urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newA,PROD) received by Sibling Hook.
    2023-05-08 19:04:32,443 [qtp944427387-97] INFO  c.l.m.r.entity.AspectResource:166 - INGEST PROPOSAL proposal: {aspectName=upstreamLineage, entityUrn=urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newB,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=178,bytes=7b227570...227d5d7d)}, changeType=UPSERT}
    2023-05-08 19:04:32,479 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.h.s.SiblingAssociationHook:104 - Urn urn:li:dataset:(urn:li:dataPlatform:s3,enterprise/newB,PROD) received by Sibling Hook.
    πŸ” 1
    πŸ“– 1
    l
    g
    • 3
    • 2
  • m

    mysterious-scooter-52411

    05/08/2023, 8:56 PM
    Let’s say I (A) have to provide Datahub’s locally built docker images (local development) to someone (B), what will be the best strategy for the same ? If I put all individual images into a tar, what will be the exact command we need to run on B’s machine to deploy these images ? Also what docker compose yml files I need to ship along with it.
    l
    o
    • 3
    • 3
  • e

    eager-river-28849

    05/09/2023, 6:16 AM
    Hi team. Good Morning. My name is Milan and I am currently exploring the actions framework of Datahub. I have a couple of doubt regarding this. 1. Currently in the Entity Change Event v1 provides the facility for detecting any change in the schema of datasets in Datahub right? Is this limited only to Add/Remove of Schema fields (i.e. I hope that any new column inserted/removed will be captured) or can it be also extended to any kind of modification of the fields. 2. We are also planning to upload the stats for each of our dataset. Is there a way to have customised slack alerts for this. Alerts to be sent when there is some deviation in the stats or say at a certain threshold? It will be very helpful if you can kindly furnish me with these two details at the earliest
    πŸ“– 1
    πŸ” 1
    l
    d
    q
    • 4
    • 11
  • r

    rich-pager-68736

    05/09/2023, 7:17 AM
    Hi there, I am currently exploring the Tableau ingestion using the newly supported nested project structure (datahub v0.10.2 using CLI v0.10.2.2). I think I found an issue, when trying to ingest a child project only, without its parent, see in the attached screenshot. It seems like it is not correctly displaying the parent project in the breadcrumbs. Also the link of this 'urn' leads to a 'not found' page. In my recipe, the relevant part looks like this:
    Copy code
    project_pattern:
                allow:
                    - '^Common Analytics Domain/Production$'
    I did not configure the field
    extract_project_hierarchy
    , so that's supposed to be
    true
    by default.
    βœ… 1
    l
    d
    +2
    • 5
    • 5
  • w

    wide-afternoon-79955

    05/09/2023, 10:26 AM
    Hi Everyone, I have 2 queries (Datahub & Client version 10.0.1) 1. Has anyone able to achieve update lineage automatically between Kafka topics with upstream & downstream datasets? a. If yes then did you manage to do it without Kafka connectors ? 2. I am also trying to ingest Kafka connectors and only data it ingest is the Connector name. It does not even imports the Kafka connector properties. a. Message I can get
    Detected undefined connector <Kafka Connector Name>, which is not in the customized connector list
    . The Datahub document does not explains how do I make connector ingest the properties anyway. b. Connector which I am trying to import are Snowflake-Sink and Generic connectors. Thank you in advance.
    πŸ“– 1
    πŸ” 1
    l
    d
    • 3
    • 5
  • a

    adorable-lawyer-88494

    05/09/2023, 12:32 PM
    Hi Everyone, I have one query , While creating docker image for gms after running this cmd
    Copy code
    ./gradlew metadata-service:war:docker
    I am getting error like
    Copy code
    FAILURE: Build completed with 2 failures.
    
    1: Task failed with an exception.
    -----------
    * What went wrong:
    Execution failed for task ':metadata-models:processResources'.
    > Cannot convert URL 'entity-registry.yml:Zone.Identifier' to a file.
    
    * Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
    ==============================================================================
    
    2: Task failed with an exception.
    -----------
    * What went wrong:
    Execution failed for task ':metadata-service:factories:processResources'.
    Can anyone help me out from this. Thanks
    l
    d
    • 3
    • 5
  • p

    purple-forest-88570

    05/09/2023, 4:29 PM
    Hi Everyone I am struggling with search performance and found some weird thing. When I call GraphQL API searchAcrossEntities with input parameter "start: 1200, count: 100" just one time, search request was called 91 times from GMS to Elasticsearch. I expect that the count of search request is 13, not 91 because batch size is 100 and total search results is 1230. Do I need to miss any configuration for the right operation? Tested DataHub version was latest, v0.10.2 GMS log
    Copy code
    2023-05-09 15:13:26,155 [ForkJoinPool.commonPool-worker-201] DEBUG c.l.metadata.search.SearchService - Searching Search documents entities: [dataset], input: 2nd, postFilters: null, sortCriterion: null, from: 1200, size: 100
    2023-05-09 15:13:26,156 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 0, size: 100
    2023-05-09 15:13:27,191 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 0, size: 100
    2023-05-09 15:13:28,225 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 100, size: 100
    2023-05-09 15:13:29,010 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 0, size: 100
     2023-05-09 15:13:29,764 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 100, size: 100
    2023-05-09 15:13:30,549 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 200, size: 100
    2023-05-09 15:13:31,417 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 0, size: 100
    2023-05-09 15:13:32,378 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 100, size: 100
    2023-05-09 15:13:33,462 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 200, size: 100
    2023-05-09 15:13:34,105 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 300, size: 100
    ---------------<snip>---------------
    2023-05-09 15:13:58,505 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 1100, size: 100
    2023-05-09 15:13:58,586 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 0, size: 100
    2023-05-09 15:13:59,617 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 100, size: 100
    2023-05-09 15:14:00,402 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 200, size: 100
    2023-05-09 15:14:01,266 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 300, size: 100
    2023-05-09 15:14:01,430 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 400, size: 100
    2023-05-09 15:14:01,515 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 500, size: 100
    2023-05-09 15:14:01,585 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 600, size: 100
    2023-05-09 15:14:01,664 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 700, size: 100
    2023-05-09 15:14:01,736 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 800, size: 100
    2023-05-09 15:14:01,814 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 900, size: 100
    2023-05-09 15:14:01,883 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 1000, size: 100
    2023-05-09 15:14:01,973 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 1100, size: 100
    2023-05-09 15:14:02,056 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 1200, size: 100
    GraphQL parameter
    Copy code
    {
      "input": {
        "types": [
          "DATASET"
        ],
        "query": "2nd",
        "start": 1200,
        "count": 100,
        "orFilters": []
      }
    }
    GraphQL query
    Copy code
    query getSearch($input:SearchAcrossEntitiesInput!){
      searchAcrossEntities(input:$input){
        total
        count
        searchResults{
          entity{
            urn
            type
            ... on Dataset{
              urn
              name
            }
          }
          matchedFields {
            name
          }
        }
      }
    }
    πŸ” 1
    πŸ“– 1
    l
    d
    a
    • 4
    • 4
  • b

    bland-orange-13353

    05/09/2023, 6:55 PM
    This message was deleted.
    πŸ“– 1
    βœ… 1
    πŸ” 1
    l
    • 2
    • 1
  • r

    rich-state-73859

    05/09/2023, 7:37 PM
    Hi all, I got this error in frontend when set
    PAC4J_SESSIONSTORE_PROVIDER=PlayCacheSessionStore
    . To reproduce, add
    PAC4J_SESSIONSTORE_PROVIDER=PlayCacheSessionStore
    to
    docker-compose.yaml
    and run quickstart.
    Copy code
    Oops, cannot start the server.
    com.google.inject.CreationException: Unable to create injector, see the following errors:
    
    1) No implementation for play.cache.SyncCacheApi was bound.
      at auth.AuthModule.configure(AuthModule.java:81) (via modules: com.google.inject.util.Modules$OverrideModule -> auth.AuthModule)
    
    1 error
            at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:554)
            at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:161)
            at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:108)
            at com.google.inject.Guice.createInjector(Guice.java:87)
            at com.google.inject.Guice.createInjector(Guice.java:78)
            at play.api.inject.guice.GuiceBuilder.injector(GuiceInjectorBuilder.scala:200)
            at play.inject.guice.GuiceBuilder.injector(GuiceBuilder.java:211)
            at play.inject.guice.GuiceApplicationBuilder.build(GuiceApplicationBuilder.java:121)
            at play.inject.guice.GuiceApplicationLoader.load(GuiceApplicationLoader.java:32)
            at play.api.ApplicationLoader$JavaApplicationLoaderAdapter$1.load(ApplicationLoader.scala:181)
            at play.core.server.ProdServerStart$.start(ProdServerStart.scala:53)
            at play.core.server.ProdServerStart$.main(ProdServerStart.scala:29)
            at play.core.server.ProdServerStart.main(ProdServerStart.scala)
    πŸ” 1
    πŸ“– 1
    l
    d
    • 3
    • 3
  • r

    rapid-crowd-46218

    05/10/2023, 6:48 AM
    Hi, I have known datahub support for elasticsearch version 7.17.3. Is it impossible to downgrade elasticsearch 7.17.3 to 7.17.0? I tried to edit prerequisites value.xml file and deployed. But this error occured. How can I downgrade elasticsearch version?
    Copy code
    "stacktrace": ["org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: cannot downgrade a node from version [7.17.3] to version [7.17.0]",
    "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:157) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) ~[elasticsearch-cli-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.cli.Command.main(Command.java:77) ~[elasticsearch-cli-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:122) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "Caused by: java.lang.IllegalStateException: cannot downgrade a node from version [7.17.3] to version [7.17.0]",
    "at org.elasticsearch.env.NodeMetadata.upgradeToCurrentVersion(NodeMetadata.java:95) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.env.NodeEnvironment.loadNodeMetadata(NodeEnvironment.java:484) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:356) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.node.Node.<init>(Node.java:429) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.node.Node.<init>(Node.java:309) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:234) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:234) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:434) ~[elasticsearch-7.17.0.jar:7.17.0]",
    "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:166) ~[elasticsearch-7.17.0.jar:7.17.0]",
    uncaught exception in thread [main]
    "... 6 more"] }
    java.lang.IllegalStateException: cannot downgrade a node from version [7.17.3] to version [7.17.0]
            at org.elasticsearch.env.NodeMetadata.upgradeToCurrentVersion(NodeMetadata.java:95)
            at org.elasticsearch.env.NodeEnvironment.loadNodeMetadata(NodeEnvironment.java:484)
            at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:356)
            at org.elasticsearch.node.Node.<init>(Node.java:429)
            at org.elasticsearch.node.Node.<init>(Node.java:309)
            at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:234)
            at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:234)
            at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:434)
            at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:166)
            at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:157)
            at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77)
            at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112)
            at org.elasticsearch.cli.Command.main(Command.java:77)
            at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:122)
            at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80)
    For complete error details, refer to the log at /usr/share/elasticsearch/logs/elasticsearch.log
    πŸ” 1
    βœ… 1
    πŸ“– 1
    l
    d
    • 3
    • 2
  • s

    strong-parrot-78481

    05/10/2023, 12:51 PM
    @here getting this error File "/usr/local/google/home/mardanov/.local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 121, in _add_init_error_context raise PipelineInitError(f"Failed to {step}: {e}") from e datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure the source (tableau): 1 validation error for TableauConfig project_pattern value is not a valid dict (type=type_error.dict) when adding project_patern in tableau "project_pattern": ["^datahub$"].. it works with projects attribute but telling depricated.
    l
    d
    • 3
    • 2
  • p

    powerful-shampoo-81990

    05/10/2023, 10:35 PM
    Hi Datahub folks, I'm new in Datahub, is it possible for me to bring the relationship between database tables such as Oracle, PostgreSQL, and SQLServer?
    πŸ“– 1
    πŸ” 1
    l
    d
    • 3
    • 3
  • a

    astonishing-dusk-99990

    05/11/2023, 4:21 AM
    Hi All, Did someone experienced
    CrashLoopBackOff
    on pods
    prerequisites-kafka
    ? Recently I have installed datahub using helm deployment on kubernetes and after 3 days one of the pod named
    prerequisites-kafka
    got status
    CrashLoopBackOff
    and always restarting but the pods always falling. Did someone know how to fix it? Currently I always redeploy again but I think it’s not the best solution so far. Here I attach some screenshoots from UI kubernetes, Openlens and kubectl *I’m using version v0.10.0 *Also I’m using default installation for kafka not customized
    l
    • 2
    • 1
  • s

    silly-ability-65278

    05/11/2023, 8:17 AM
    Hi, I need help, when i try quickstart my datahub frontend react can't be started, This is the message when I follow the log
    Copy code
    ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7.2ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7.2
    πŸ“– 1
    βœ… 1
    l
    b
    • 3
    • 3
  • h

    helpful-doctor-67337

    05/11/2023, 8:56 AM
    Hello, in a DataHub system I wanted to integrate with Databricks and used the GUI for the connection. But sadly the ingestion fails with this error:
    Copy code
    [2023-05-10 14:02:14,659] ERROR    {datahub.entrypoints:213} - Command failed: Failed to configure the source (unity-catalog): type object 'Retry' has no attribute 'DEFAULT_METHOD_WHITELIST'
    Traceback (most recent call last):
      File "/tmp/datahub/ingest/venv-unity-catalog-0.9.6/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 114, in _add_init_error_context
        yield
      File "/tmp/datahub/ingest/venv-unity-catalog-0.9.6/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 192, in __init__
        self.source = source_class.create(
      File "/tmp/datahub/ingest/venv-unity-catalog-0.9.6/lib/python3.10/site-packages/datahub/ingestion/source/unity/source.py", line 158, in create
        return cls(ctx=ctx, config=config)
      File "/tmp/datahub/ingest/venv-unity-catalog-0.9.6/lib/python3.10/site-packages/datahub/ingestion/source/unity/source.py", line 110, in __init__
        self.unity_catalog_api_proxy = proxy.UnityCatalogApiProxy(
      File "/tmp/datahub/ingest/venv-unity-catalog-0.9.6/lib/python3.10/site-packages/datahub/ingestion/source/unity/proxy.py", line 125, in __init__
        ApiClient(
      File "/tmp/datahub/ingest/venv-unity-catalog-0.9.6/lib/python3.10/site-packages/databricks_cli/sdk/api_client.py", line 106, in __init__
        method_whitelist=set({'POST'}) | set(Retry.DEFAULT_METHOD_WHITELIST),
    AttributeError: type object 'Retry' has no attribute 'DEFAULT_METHOD_WHITELIST'
    A Google search revealed that urllib3 had an update which changed some things: https://stackoverflow.com/questions/76183443/azure-devops-release-pipeline-attributeerror-type-object-retry-has-no-attribu So I guess that pinning the urllib3 requirement to a 1.x version could solve this.
    πŸ“– 1
    πŸ” 1
    l
    b
    +3
    • 6
    • 10
  • m

    mysterious-table-75773

    05/11/2023, 9:03 AM
    is there a way to run datahub without datahub-actions? it contains tens of critical vulnerabilities
    πŸ” 1
    πŸ“– 1
    l
    g
    a
    • 4
    • 11
  • a

    ancient-queen-15575

    05/11/2023, 2:34 PM
    I’m having issues deleting entities with the cli either with a specific URN or a filter. Can anyone help? If I use a urn like:
    Copy code
    datahub delete --urn "urn:li:dataPlatform:awsdms_apply_exceptions"
    datahub delete --urn "urn:li:dataPlatform:mongodb"
    I get an error about json decoding:
    Copy code
    json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    If I try with a filter like
    Copy code
    datahub delete --entity_type dataset --platform mongodb
    I get an error about the client not being authorised
    Copy code
    requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: <http://3.72.60.6:8080/entities?action=search>
    I am using an API key and my user is an Admin in Datahub.
    πŸ“– 1
    πŸ” 1
    l
    g
    • 3
    • 4
  • a

    average-nail-72662

    05/11/2023, 4:24 PM
    Hello Everyone! I’m trying to run datahub container, but I giving the error broker is unhealthy. Anybody can you help me?
    βœ… 1
    πŸ” 1
    l
    d
    • 3
    • 3
1...949596...119Latest