https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • g

    gray-gold-85760

    03/04/2024, 12:21 PM
    Is search a primary offering of acryl / Datahub or it is primarily meant for some other use ? Because imo search api could improve a lot
    r
    • 2
    • 1
  • a

    abundant-garage-20831

    03/04/2024, 1:11 PM
    Hi all, I'm trying to create multiple lineages using python datahub.emitter.mce_builder as builder, I've below structure Multiple MySQL source--->multiple raw source/target--- >one refined target I'm running a loop for each set but it is updating instead of appending for my refined target Can someone please help me with this ?
    r
    d
    • 3
    • 3
  • m

    mammoth-apple-56011

    03/04/2024, 7:56 PM
    Hello everybody. After redeploy of Datahub 0.12.0 it suddenly lose the ability to show the "ingestion" page: I redeployed several times again, but the problem persists. What I can do to solve this?
    r
    • 2
    • 3
  • i

    icy-airplane-5350

    03/04/2024, 7:59 PM
    Hi I'm trying to integrate SSO with Ping and running into a similar error to https://datahubspace.slack.com/archives/C029A3M079U/p1709066394210229 I'm deploying onto a GKE cluster and checked my datahub-frontend pod has the right configs and secrets (kubectl exec -it frontend pod name -- env) I'm getting this error in the logs as well
    r
    l
    • 3
    • 8
  • a

    astonishing-kite-41577

    03/04/2024, 10:44 PM
    Hey team, just noticed an error with that latest pip package (acryl-datahub-0.13.0). Getting the following error when trying to add an owner with the Python emitter. ('Unable to emit metadata to DataHub GMS: Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\\n', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'Failed to validate record with class com.linkedin.common.Ownership: ERROR :: /ownerTypes :: unrecognized field found but not allowed\\n', 'status': 422}) I found a link to a similar post about a year ago that I thought might be related: https://datahubspace.slack.com/archives/C029A3M079U/p1662968709729969 I already resolved it for myself by switching back to my previous installation (acryl-datahub-0.12.1.5), but just thought I'd notify the channel.
    plus1 2
    r
    s
    g
    • 4
    • 3
  • a

    abundant-minister-71466

    03/05/2024, 6:07 AM
    Hi folks! I'm trying to add a nested schema with Datahub 0.10.3 Open Api, but nested fields don't be added. How should I change my payload? Everything except nested field works fine Here is my payload:
    Copy code
    payload = [
            {
                "entityType": "dataset",
                "entityUrn": f"urn:li:dataset:(urn:li:dataPlatform:{platform_name},{dataset_name},PROD)",
                "aspect": {
                    "__type": "SchemaMetadata", 
                    "schemaName": f"{dataset_name}",
                    "platform": f"urn:li:dataPlatform:{platform_name}",
                    "platformSchema": {
                        "__type": "OtherSchema",
                        "rawSchema": ""
                    },
                    "version": 0,
                    "hash": "", 
                    "fields": [ 
                        {
                            "fieldPath": "nested_field", 
                            "nativeDataType": "",
                            "recursive": True, 
                            "type": {
                                "type": {
                                    "__type": "RecordType",
                                    "fields": [
                                        {
                                            "fieldPath": "nested_field_2", 
                                            "nativeDataType": "", 
                                            "recursive": True, 
                                            "type": {
                                                "__type": "StringType"
                                            }
                                        }
                                    ]
                                }
                            },
                        }
                    ]
                }
            }
        ]
    r
    • 2
    • 7
  • b

    bland-orange-13353

    03/05/2024, 6:10 AM
    This message was deleted.
    r
    • 2
    • 1
  • n

    numerous-judge-13576

    03/05/2024, 9:14 AM
    Hello team, I don't know if this is a troubleshoot or a feature request, but I guess the search for a better workaround would be a troubleshoot. The background is that our astro image build fails due to a conflict with pydantic reqs. Current version of acryl-datahub[dbt] (0.13.0) requires pydantic acryl-datahub[dbt] 0.13.0 depends on pydantic<2; extra == "dbt" apache-airflow 2.8.1+astro.2 depends on pydantic>=2.3.0 I use the following requirements within airflow, using lineage inlet/outlets for airflow lineage and linking up datalake files with dbt tables using the rest emitter acryl-datahub-airflow-plugin[plugin-v2]==0.13.0 acryl-datahub[dbt]==0.13.0 (Current workaround is to stay on astro 9 / airflow 2.7.3) Is there a good reason why the <2 is there for the dbt integration and can I do anything do alleviate that?
    r
    l
    • 3
    • 2
  • a

    adventurous-dawn-19232

    03/05/2024, 9:15 AM
    hello everyone i am try to run datahub without docker i am getting build failure issue can any one help me on this sivasankar@w1:~$ git clone --depth 1 https://github.com/datahub-project/datahub.git Cloning into 'datahub'... remote: Enumerating objects: 8965, done. remote: Counting objects: 100% (8965/8965), done. remote: Compressing objects: 100% (6935/6935), done. remote: Total 8965 (delta 1709), reused 5870 (delta 1241), pack-reused 0 Receiving objects: 100% (8965/8965), 25.62 MiB | 925.00 KiB/s, done. Resolving deltas: 100% (1709/1709), done. sivasankar@w1:~$ cd datahub sivasankar@w1:~/datahub$ ls build.gradle datahub-graphql-core docs gradlew lombok.config metadata-ingestion-modules metadata-models-custom mock-entity-registry SECURITY.md buildSrc datahub-kubernetes docs-website gradlew.bat metadata-auth metadata-integration metadata-models-validator NOTICE settings.gradle CODEOWNERS datahub-upgrade entity-registry ingestion-scheduler metadata-dao-impl metadata-io metadata-operation-context perf-test smoke-test contrib datahub-web-react gradle LICENSE metadata-events metadata-jobs metadata-service README.md test-models datahub-frontend docker gradle.properties li-utils metadata-ingestion metadata-models metadata-utils repositories.gradle vercel.json sivasankar@w1:~/datahub$ ./gradlew build To honour the JVM settings for this build a single-use Daemon process will be forked. See https://docs.gradle.org/8.0.2/userguide/gradle_daemon.html#sec:disabling_the_daemon. Daemon will be stopped at the end of the build Configuration on demand is an incubating feature. > Task buildSrccompileJava /home/sivasankar/datahub/buildSrc/src/main/java/io/datahubproject/OpenApiEntities.java294 error: cannot find symbol .flatMap(entity -> generateEntityParameters(entity, definitions).stream()) ^ symbol: method stream() location: class Optional<PairString,ObjectNode> /home/sivasankar/datahub/buildSrc/src/main/java/io/datahubproject/OpenApiEntities.java296 error: cannot find symbol parametersNode.setAll(entityNode.right()); ^ symbol: method right() location: variable entityNode of type Object /home/sivasankar/datahub/buildSrc/src/main/java/io/datahubproject/OpenApiEntities.java297 error: cannot find symbol return entityNode.left(); ^ symbol: method left() location: variable entityNode of type Object /home/sivasankar/datahub/buildSrc/src/main/java/io/datahubproject/OpenApiEntities.java299 error: incompatible types: inference variable T has incompatible bounds .collect(Collectors.toSet()); ^ equality constraints: String lower bounds: Object where T is a type-variable: T extends Object declared in method TtoSet() 4 errors > Task buildSrccompileJava FAILED FAILURE: Build failed with an exception. * What went wrong: Execution failed for task 'buildSrccompileJava'. > Compilation failed; see the compiler error output for details. * Try: > Run with --stacktrace option to get the stack trace. > Run with --info or --debug option to get more log output. > Run with --scan to get full insights. * Get more help at https://help.gradle.org BUILD FAILED in 9s 1 actionable task: 1 executed and also open datahub in visual studio getting this issue [{ "resource": "/home/sivasankar/datahub", "owner": "_generated_diagnostic_collection_name_#3", "code": "0", "severity": 8, "message": "Could not run phased build action using connection to Gradle distribution 'https://services.gradle.org/distributions/gradle-8.0.2-bin.zip'.\norg.gradle.internal.exceptions.LocationAwareException: Execution failed for task 'buildSrccompileJava'.\nExecution failed for task 'buildSrccompileJava'.\nCompilation failed; see the compiler error output for details.", "source": "Java", "startLineNumber": 1, "startColumn": 1, "endLineNumber": 1, "endColumn": 1 }]
    r
    • 2
    • 1
  • l

    little-scooter-91144

    03/05/2024, 9:23 AM
    Hello everyone.May I ask which index exists in elasticsearch for all documents of my dataset? I tried to query all documents in the 'datasetindex_v2' index, but only some datasets were found in the return result. It seems that not all datasets were returned.This is what I want to find in elasticsearch .My datahub version is 0.12.1.
    r
    b
    • 3
    • 4
  • r

    rhythmic-ram-87007

    03/05/2024, 10:20 PM
    Greetings all. 👋 I'm trying to get the Microsoft SQL Server ingestion plugin set up on a Windows-based machine. When doing a pip install, I'm not able to get past the wheel build for the required pyzmq package. It appears that version 17.9.2 of Visual Studio Build Tools 2022 is not able to complete the build and errors out. Has anyone else encountered this issue? I've tried to separately install pyzmq, but that doesn't seem to work either. Any clues? I'm on DataHub version 0.12.0.
    r
    • 2
    • 1
  • i

    incalculable-sundown-8765

    03/06/2024, 7:53 AM
    Hi, Datahub version: v0.12.1 I'm trying to implement SSO login using Google provider. The url should only be accessible via our VPN. I believe all of my deployment is correct but I'm receiving such error when I open the datahub url (refer images). Upon opening the redirect uri (on the second image :
    <https://datahub>.<stuffs>.local/callback/oidc
    , I was redirected to the Datahub login page instead of the Datahub front page. Mening I still need to login using username & password which is wrong. In
    datahub-datahub-frontend
    pod, I get the following error: Kindly help
    2024-03-05 182618,208 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka version: 2.3.0
    2024-03-05 182618,208 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka commitId: fc1aaa116b661c8a
    2024-03-05 182618,208 [main] INFO o.a.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1709663178206
    2024-03-05 182618,214 [main] INFO play.api.Play - Application started (Prod) (no global state)
    2024-03-05 182618,483 [kafka-producer-network-thread | datahub-frontend] INFO org.apache.kafka.clients.Metadata - [Producer clientId=datahub-frontend] Cluster ID: IVXu8THVSkKiX6PN7UCIBw
    2024-03-05 182618,509 [main] INFO server.CustomAkkaHttpServer - Setting max header count to: 64
    2024-03-05 182618,919 [main] INFO play.core.server.AkkaHttpServer - Listening for HTTP on /000000009002
    2024-03-05 182713,527 [proxyClient-akka.actor.default-dispatcher-7] INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started
    2024-03-05 185742,859 [application-akka.actor.default-dispatcher-9] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Unsupported HTTP method: The HTTP method started with 0x16 rather than any known HTTP method from 10.207.193.12:17455. Perhaps this was an HTTPS request sent to an HTTP endpoint?
    2024-03-05 185742,924 [application-akka.actor.default-dispatcher-9] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Unsupported HTTP method: The HTTP method started with 0x16 rather than any known HTTP method from 10.207.192.195:25792. Perhaps this was an HTTPS request sent to an HTTP endpoint?
    2024-03-06 020226,194 [application-akka.actor.default-dispatcher-7] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': CONNECT requests are not supported: Rejecting CONNECT request to 'example.com:80'
    2024-03-06 020226,198 [application-akka.actor.default-dispatcher-7] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Unsupported HTTP method: HTTP method too long (started with 'Pexample') from 10.207.214.60:57467. Increase
    akka.http.server.parsing.max-method-length
    to support HTTP methods with more characters.
    2024-03-06 020241,113 [application-akka.actor.default-dispatcher-8] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Illegal HTTP message start
    2024-03-06 041247,424 [application-akka.actor.default-dispatcher-9] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Unsupported HTTP method: The HTTP method started with 0x16 rather than any known HTTP method from 10.207.206.240:17502. Perhaps this was an HTTPS request sent to an HTTP endpoint?
    2024-03-06 041247,643 [application-akka.actor.default-dispatcher-9] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Unsupported HTTP method: The HTTP method started with 0x16 rather than any known HTTP method from 10.207.200.153:38444. Perhaps this was an HTTPS request sent to an HTTP endpoint?
    2024-03-06 050415,813 [application-akka.actor.default-dispatcher-12] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Unsupported HTTP method: The HTTP method started with 0x16 rather than any known HTTP method from 10.207.210.102:11699. Perhaps this was an HTTPS request sent to an HTTP endpoint?
    2024-03-06 053737,222 [application-akka.actor.default-dispatcher-11] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Unsupported HTTP method: The HTTP method started with 0x16 rather than any known HTTP method from 10.207.197.70:21229. Perhaps this was an HTTPS request sent to an HTTP endpoint?
    2024-03-06 062058,212 [application-akka.actor.default-dispatcher-5] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Unsupported HTTP method: The HTTP method started with 0x16 rather than any known HTTP method from 10.207.212.162:28342. Perhaps this was an HTTPS request sent to an HTTP endpoint?
    2024-03-06 062058,386 [application-akka.actor.default-dispatcher-9] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Unsupported HTTP method: The HTTP method started with 0x16 rather than any known HTTP method from 10.207.212.22:27133. Perhaps this was an HTTPS request sent to an HTTP endpoint?
    2024-03-06 072148,417 [application-akka.actor.default-dispatcher-9] ERROR controllers.SsoCallbackController - Caught exception while attempting to handle SSO callback! It's likely that SSO integration is mis-configured.
    java.util.concurrent.CompletionException: org.pac4j.core.exception.TechnicalException: State cannot be determined
    at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
    at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)
    at play.core.j.HttpExecutionContext.$anonfun$execute$1(HttpExecutionContext.scala:64)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
    Caused by: org.pac4j.core.exception.TechnicalException: State cannot be determined
    at org.pac4j.oidc.credentials.extractor.OidcExtractor.lambda$extract$0(OidcExtractor.java:100)
    at java.base/java.util.Optional.orElseThrow(Optional.java:408)
    at org.pac4j.oidc.credentials.extractor.OidcExtractor.extract(OidcExtractor.java:100)
    at org.pac4j.core.client.BaseClient.retrieveCredentials(BaseClient.java:66)
    at org.pac4j.core.client.IndirectClient.getCredentials(IndirectClient.java:143)
    at org.pac4j.core.engine.DefaultCallbackLogic.perform(DefaultCallbackLogic.java:85)
    at auth.sso.oidc.OidcCallbackLogic.perform(OidcCallbackLogic.java:116)
    at controllers.SsoCallbackController$SsoCallbackLogic.perform(SsoCallbackController.java:112)
    at controllers.SsoCallbackController$SsoCallbackLogic.perform(SsoCallbackController.java:86)
    at org.pac4j.play.CallbackController.lambda$callback$0(CallbackController.java:54)
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
    ... 8 common frames omitted
    r
    • 2
    • 1
  • f

    fresh-river-19527

    03/06/2024, 3:56 PM
    Hi, is there any way to filter out some of the notifications being published into Slack by the slack action? For example, when I set up the Airflow connector, either with a personal token or with the Datahub user token, for every metadata event that the Airflow plugin publishes, we receive a notification in the slack channel, which can become very spammy when having lots of Dags running
    r
    a
    • 3
    • 3
  • d

    dry-insurance-87668

    03/06/2024, 4:00 PM
    Hi team. I have an issue with the Looker in Datahub. After the ingestion, we are noticing that some of the Dashboards are placed under a "Default" folder. Their actual folder is present with Dashboards in them. But just a few of them are not present in those folders and are placed in a Default folder. Does anyone know why this is happening?
    r
    • 2
    • 2
  • f

    few-piano-98292

    03/06/2024, 6:07 PM
    Hello, can someone please help with this?
    r
    • 2
    • 2
  • g

    gray-gold-85760

    03/07/2024, 10:16 AM
    Can someone share or + and filter example for search functionality
    r
    • 2
    • 3
  • b

    billions-yacht-53533

    03/07/2024, 12:18 PM
    Good morning team. I've been searching and reading the documentation of datahub. At the moment I've been able to create and emit DataPlatforms, DataFlows, DataJobs and associate them with the urn and make lineages between them. But I've been searching if I can associate a container to a DataJob, and I could not find the way to do it. I am using python 3.10 and the acryl-datahub v.0.12.0 and v.0.13.0, and with the kafka-emitter. Maybe I am wrong, but, could be possible that Containers are available just to data assets? Can you help me?
    r
    • 2
    • 1
  • a

    able-carpenter-97384

    03/07/2024, 3:28 PM
    Hello - we are using Datahub 0.12.1 I need to pull lineage ( upstream and downstream ) information for a data job and show on a separate UI ( preferably in a tree graph) any thoughts how I can pull up stream and downstream lineage for a urn
    r
    b
    a
    • 4
    • 16
  • d

    dazzling-solstice-43361

    03/07/2024, 3:46 PM
    Greetings, we recently performed an in-place upgrade from 12.1 to 13.0 - we love the new features like incidents and the detail side-bar for datasets. However, we are noticing that when we push queries (using the createQuery mutation), they are not being saved. There is no obvious error, but the queries don't seem to exist after adding them. I noticed that I can reproduce this in the UI as well. If I navigate to a dataset and add a query via the UI, the query example does get created and displayed, but after refreshing the page or navigating away and coming back, the query is no longer there.
    r
    • 2
    • 3
  • a

    abundant-garage-20831

    03/07/2024, 6:24 PM
    Hi everyone, I'm trying to create lineages in different databases using datahub.emitter.rest_emitter. I've a list of upstream and downstream tables, some of which doens't exist. Is there a way to check the tables present in dataset. Like a get api or something.
    r
    g
    • 3
    • 2
  • g

    gray-bear-33275

    03/07/2024, 7:26 PM
    Hi, is it possible to configure Datahub SSO to map certain users to certain groups based on attributes from the identity provider (Okta)? Looking at https://datahubproject.io/docs/managed-datahub/integrations/oidc-sso-integration/
    r
    a
    • 3
    • 4
  • h

    hallowed-cricket-7436

    03/08/2024, 12:59 AM
    Hi, I'm trying to remove the dataset by command "datahub delete --platform mssql --env PROD --hard", but the dataset structure in UI becomes like the picture below. I try to execute the delete command again, but it says "Found no urns to delete". I search the URN in MySQL, but cannot find it." How can I remove completely?(Version: 0.12.1)
    plus1 1
    r
    s
    • 3
    • 4
  • b

    best-wire-59738

    03/08/2024, 6:34 AM
    Hi Team, Just wanted to check regarding the PolicyCache maintained by datahubAuthorizer. While we are performing our snowflake Ingestions using Transformers and using PATCH to add tags we saw the Ingestion failed due to UnAuthorized Issue. Upon checking we found it’s because of the policy cache getting refreshed at 120s refresh Interval and it’s unavailable to authorize the requests and returned 401 errors for 5 consecutive calls and after 5 calls it again worked fine. we saw this issue once in every 10k API calls to /aspect/urn?aspect=<> endpoint. could you please help us resolve this issue. Datahub version v0.12.0
    r
    e
    • 3
    • 5
  • h

    hundreds-arm-67649

    03/08/2024, 10:17 AM
    Hi! I'm blocked with the lack of ability to disable Avro names validation in the Avro schemas retrieved from Schema Registry. Added very small feature request for that https://feature-requests.datahubproject.io/p/add-disabling-avro-schema-validations-retrieved-from-schema-registry It would be great, if you fix that. Thank you!
    r
    • 2
    • 8
  • g

    gifted-coat-97302

    03/08/2024, 1:22 PM
    Hello Team, based on the suggestion to post this in this channel from ingestion channel, would appreciate if someone could help please. We are getting document_missing_exception on bulk upserts to ES during ingestion. FYI, this is the first ingestion we have been attempting to do Additional Details (missed in the post): Elasticsearch: AWS Opensearch service with Elasticsearch 7.10 compatibility Database: RDS Aurora Postgres 15.4 https://datahubspace.slack.com/archives/CUMUWQU66/p1709900473103329
    r
    • 2
    • 3
  • l

    little-musician-10851

    03/08/2024, 2:36 PM
    Hello! I would like to evaluate and experiment with Datahub (0.13.0) integrations with Neo4j. I know Neo4j acts as the graph database but how would i see that reflected in my local Neo4j instance? Any thoughts or additional documentation would be helpful. Thanks!
    r
    • 2
    • 1
  • a

    able-carpenter-97384

    03/08/2024, 5:28 PM
    How do I use get related entities ? I need both upstream and down stream - version 0.12.1
    r
    • 2
    • 2
  • a

    able-carpenter-97384

    03/08/2024, 6:01 PM
    Is there really a way to get / read both upstream and downstream data for any urn ?? Sorry but I keep asking the same question
    r
    g
    • 3
    • 7
  • c

    calm-lighter-1490

    03/09/2024, 2:02 PM
    Team. Is there way to disable login with username/password after we enabled oidc auth?
    r
    • 2
    • 2
  • w

    wonderful-bear-5842

    03/10/2024, 9:14 AM
    Hi datahub team: I wonder what kind of environment do you use to build the software, especially gms and frontend? I try to use an Amazon Linux2 2023 docker image to establish my standard build environment. So far I was able to build gms (war.war), but had various challenges to build frontend. So I wonder if you can share some relevant info? Thanks!
    r
    • 2
    • 3
1...115116117118119Latest