https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • b

    billowy-truck-48700

    05/23/2022, 10:29 AM
    Hi, could you anyone help me where is executed metadata ingestion via UI as it needs dependencies acryl-datahub[datahub-rest,mssql] for MSSQL ingestion.
    Copy code
    "[Errno 111] Connection refused')': /simple/acryl-datahub/\n"
               'ERROR: Could not find a version that satisfies the requirement acryl-datahub[datahub-rest,mssql]==0.8.35 (from versions: none)\n'
               'ERROR: No matching distribution found for acryl-datahub[datahub-rest,mssql]==0.8.35\n'
    I'm using quickstart Datahub setup for POC purposes on airgap server. Is there any way to install required libraries manually? Or add our internal Airtifactory Pypi repository to respective docker composer or somewhere what it used for execution of UI ingestion. I have tried container acryldata/datahub-actions:head, but it didn't help. ingestion via CLI works fine.
    i
    l
    • 3
    • 5
  • n

    numerous-account-62719

    05/24/2022, 5:37 AM
    Hi Team, Can anyone please tell me how configure hive source Does it require metastore URL or hive server URL?
    d
    • 2
    • 1
  • n

    numerous-account-62719

    05/24/2022, 7:42 AM
    I am not able to add users in datahub I have followed all the steps mentioned on the given link: https://datahubproject.io/docs/how/auth/add-users/
    b
    • 2
    • 17
  • n

    numerous-account-62719

    05/24/2022, 8:09 AM
    Hi, I am not able to execute the UI ingestion pipeline Seems like there is some managed ingestion issue Can someone help me with it? Below are the logs
    Copy code
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/threading.py", line 973, in _bootstrap_inner
        self.run()
      File "/usr/local/lib/python3.9/threading.py", line 910, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/local/lib/python3.9/site-packages/datahub_actions/pipeline/pipeline_manager.py", line 42, in run_pipeline
        pipeline.run()
      File "/usr/local/lib/python3.9/site-packages/datahub_actions/pipeline/pipeline.py", line 161, in run
        for enveloped_event in enveloped_events:
      File "/usr/local/lib/python3.9/site-packages/datahub_actions/plugin/source/kafka/kafka_event_source.py", line 149, in events
        msg = self.consumer.poll(timeout=2.0)
      File "/usr/local/lib/python3.9/site-packages/confluent_kafka/deserializing_consumer.py", line 131, in poll
        raise ConsumeError(msg.error(), kafka_message=msg)
    confluent_kafka.error.ConsumeError: KafkaError{code=UNKNOWN_TOPIC_OR_PART,val=3,str="Subscribed topic not available: PlatformEvent_v1: Broker: Unknown topic or partition"}
    h
    • 2
    • 11
  • b

    bright-receptionist-94235

    05/24/2022, 11:25 AM
    Hi Intalled datahub and under access tokens I get: “Token based authentication is currently disabled. Contact your DataHub administrator to enable this feature.”
    h
    t
    +3
    • 6
    • 13
  • c

    cool-painting-92220

    05/24/2022, 5:22 PM
    Hi everyone! I'm looking to obtain all the entity URNs that exist in my DataHub through the GraphQL API so I can set up a monitoring system on entity documentation and ownership coverage. I see the general tutorial for searching for a specific entity (tutorial), but is there any way to open the search to all entities? (pull all entities under the main directory folder). I've seen this example for the REST side (example), which could be used to recursively navigate the folders until you reach all your entity nodes in the tree, but am not sure how to do the same thing with GraphQL.
    h
    • 2
    • 2
  • c

    calm-waitress-61333

    05/24/2022, 5:59 PM
    kafka connect cluster is in the same k8s env
    l
    • 2
    • 3
  • r

    rich-policeman-92383

    05/24/2022, 8:08 PM
    Hello Getting below error when running more than one instance of GMS.
    b
    e
    • 3
    • 3
  • r

    rich-policeman-92383

    05/24/2022, 8:15 PM
    Hello I am creating a python GQL script that does: • creating domains • Adding datasets to domains • Adding owners of domain Problem: • GQL API allows to create multiple domains with same name ?
    b
    b
    • 3
    • 3
  • m

    mysterious-butcher-86719

    05/24/2022, 8:59 PM
    Hi Team, I am unable to retrieve the tags associated at the field level. Below is the code snippet from the schemaMetadata of Dataset entity. schemaMetadata { fields { fieldPath, description, type, isPartOfKey, tags { tags { tag { urn, properties{name } } } } } I get the rest of the information, but tags are not retrieved. Could you please help if i am missing any?
    d
    • 2
    • 4
  • s

    shy-ability-24875

    05/25/2022, 1:46 AM
    hi all I can ingest the hive metadata successfully,but I can't get those functions about Lineage,queries ,stats .Is there a setting or config required to extract lineage? Any hints would be apprecaited.
    l
    b
    +2
    • 5
    • 8
  • g

    great-cpu-72376

    05/25/2022, 7:33 AM
    Hi, I am trying to integrate datahub 0.8.35 with airflow. In Airflow 2.2.4, I installed the package
    acryl-datahub[airflow]
    , I configured the connection, I set the lineage and I set
    datahub_kwargs
    configuration. I see in the dags logs the following error:
    Copy code
    [2022-05-24 21:00:35,142] {datahub.py:122} ERROR - Failed to serialize DAG 'create_partitions_ubxpt_ninaw1_dag': 'str' object has no attribute '__module__'
    [2022-05-24 21:00:35,142] {datahub.py:123} INFO - Supressing error because graceful_exceptions is set
    Have you ever met this error?
    d
    • 2
    • 19
  • g

    great-cpu-72376

    05/25/2022, 9:33 AM
    Hi, I am using datahub 0.8.35, the copy urn button seems not working, I do not if this is only my problem
    b
    • 2
    • 2
  • c

    clean-piano-28976

    05/25/2022, 10:20 AM
    Hi 👋, I’m trying to
    delete
    data from Datahub using a
    curl request
    and although my request runs I can still see the metadata when I search the UI. Does anyone know what I might be missing here?
    • 1
    • 3
  • b

    brave-businessperson-3969

    05/25/2022, 11:57 AM
    Hello, I just upgraded one of our environments from v0.8.31 to the latest verstion v0.8.35. For some reason, after the update logging in via LDAP stopped workin. In the logfiles of the frontend container a new error message is showing up (compared to v0.8.31):
    Copy code
    11:22:15 [application-akka.actor.default-dispatcher-71] ERROR application - The submitted callback is of type: class javax.security.auth.callback.NameCallback : javax.security.auth.callback.NameCallback@7be1fcbf 11:22:15 [application-akka.actor.default-dispatcher-71] ERROR application - The submitted callback is of type: class javax.security.auth.callback.PasswordCallback : javax.security.auth.callback.PasswordCallback@7d265380
    The line seems to be printed by the code at https://github.com/datahub-project/datahub/blob/5cce3acddcb46443c748bf2eb0b1e5e539[…]94d936/datahub-frontend/app/security/AuthenticationManager.java . But I'm a bit confused why this is an error message. For me, it looks more like a debug or info log message, not an error. Any idea if this is related to the no longer working LDAP login?
    b
    o
    a
    • 4
    • 7
  • h

    high-toothbrush-90528

    05/25/2022, 12:57 PM
    Hi everybody! I am building and running gms locally and I receive this error:
    Copy code
    14:46:10.886 [ForkJoinPool.commonPool-worker-9] ERROR c.l.metadata.boot.BootstrapManager:41 - Caught exception while executing bootstrap step IngestDataPlatformInstancesStep. Exiting...
    java.lang.IllegalArgumentException: Failed to find entity with name telemetry in EntityRegistry
            at com.linkedin.metadata.models.registry.MergedEntityRegistry.getEntitySpec(MergedEntityRegistry.java:113)
            at com.linkedin.metadata.entity.EntityService.getKeyAspectSpec(EntityService.java:817)
            at com.linkedin.metadata.entity.EntityService.getKeyAspectSpec(EntityService.java:813)
            at com.linkedin.metadata.boot.steps.IngestDataPlatformInstancesStep.getDataPlatformInstance(IngestDataPlatformInstancesStep.java:54)
            at com.linkedin.metadata.boot.steps.IngestDataPlatformInstancesStep.execute(IngestDataPlatformInstancesStep.java:80)
            at com.linkedin.metadata.boot.BootstrapManager.lambda$start$0(BootstrapManager.java:39)
            at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
            at java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1632)
            at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
            at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
            at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
            at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
    Any idea? Thanks!
    b
    o
    • 3
    • 3
  • a

    adamant-furniture-37835

    05/25/2022, 1:57 PM
    Hi! We need some help for enabling SSL on datahub-frontend service. We have deployed Datahub services on kubernetes cluster by following the deployment guidelines for kubernetes using helm charts. Currently both datahub-frontend and datahub-gms are served via http. To comply with security protocols in our environment, we want to enable SSL on both services but especially datahub-frontend. Our understanding is that we need to customize and build the frontend image for this purpose by following the guidelines for enabling HTTPS for a play service. Does that understanding right or there is some prebuilt functionality to achieve this ? When it comes to datahub-gms, would we require similar process because it is also a java application ? Thanks in Advance :)
    plus1 1
    c
    b
    • 3
    • 5
  • l

    lemon-hydrogen-83671

    05/25/2022, 3:47 PM
    Hey! does anyone know how to enable some additional transformer logs when running
    datahub ingest
    ? I’m trying to get a summary of what the transformer will do after transformations. Kind of like the workunit summary thats provided in a regular sink/source recipe
    b
    g
    • 3
    • 2
  • g

    gentle-oil-54863

    05/26/2022, 6:52 AM
    Hey guys sorry for the basic question, but I just did a crash course in docker today just so I could install datahub on Windows. I have no ideas where any files are! 1. I ran the pip install command from my user directory. 2. It appears that docker installed everything to my system temp directory, which probably isn't what I want. 3. The ingestion guide refers to ./examples/recipes...am I supposed to see this? 4. Basically I imagine I would want to create a manual directory somewhere (e.g. d:\datahub), and have my containers(?) and source files(?) in there. Is that right? 5. I did notice references to "HOME not set" during the install. Do i just need to uninstall all of this, and ensure I have a HOME environment set to d:\datahub before running docker quickstart again? Do I do that in the command line, or just delete datahub from docker desktop? 6. It seems I am going to be writing ingestion yaml files. How does one typically locate or manage a. datahub installation files b. my custom datahub config files c. the datahub git repository (for reference?)
    d
    • 2
    • 1
  • g

    great-cpu-72376

    05/26/2022, 10:07 AM
    Hi, I am trying to integrate datahub with airflow. I have problem on dag with params. I have always tried deplying this:
    pip install acryl-datahub[airflow]
    in airflow image. Now I would try with this plugin:
    pip install acryl-datahub-airflow-plugin
    but I am not able to install:
    Copy code
    pip install acryl-datahub-airflow-plugin
    ERROR: Could not find a version that satisfies the requirement acryl-datahub-airflow-plugin (from versions: none)
    ERROR: No matching distribution found for acryl-datahub-airflow-plugin
    What should I do?
    d
    • 2
    • 6
  • b

    brainy-wall-41694

    05/26/2022, 11:21 AM
    Hi! I'm following the manual from the link https://datahubproject.io/docs/how/auth/sso/configure-oidc-react-azure. When I use the
    datahub docker quickstart
    command the lineage is displayed correctly. However, when I use the command
    docker-compose -p datahub -f docker-compose.yml -f docker-compose.override.yml up datahub-frontend-react
    the lineage is not enabled. Do you know if there is something that needs to be configured?
    • 1
    • 1
  • n

    nutritious-bird-77396

    05/26/2022, 4:02 PM
    I am ingesting protobuf schemas for the Kafka Ingestion as this PR is merged. I am facing issues in ingesting proto schemas which has a field meta added by the Confluent Schema Registry Protobuf converter. Here is the flow: 1. Postgres Table has a datatype
    int2
    2. Kafka Connect Using Debezium and Confluent Schema Registry creates a protobuf schema in the Schema Registry and creates topics 3. Ingest kafka topics to Datahub. Error:
    Option "(confluent.field_meta)" unknown. Ensure that your proto definition file imports the proto which defines the option
    More details of the schema in 🧵
    h
    • 2
    • 3
  • r

    rich-policeman-92383

    05/27/2022, 8:50 AM
    Hello After setting up datahub monitoring using the quickstart a few of the grafana panels show "no data". Commands Used for Setup : https://datahubproject.io/docs/advanced/monitoring/#enable-monitoring-through-docker-compose Panels with no data: Get, Ingest Steps, Search Qps, Search Latency and others. @magnificent-notebook-88304
    b
    • 2
    • 1
  • g

    great-beard-50720

    05/27/2022, 9:43 AM
    Hi there! I am trying to use great-expectations to get evaluation data ("assertionInfo") into DataHub. I have added a datahub_action to my great-expectation action_list. When I run that I get the following:
    ERROR: ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class com.linkedin.assertion.AssertionInfo: ERROR :: /datasetAssertion/nativeParameters :: unrecognized field found but not allowed\nERROR :: /datasetAssertion/nativeType :: unrecognized field found but not allowed\nERROR :: /datasetAssertion/aggregation :: unrecognized field found but not allowed\nERROR :: /datasetAssertion/parameters :: unrecognized field found but not allowed\nERROR :: /datasetAssertion/dataset :: unrecognized field found but not allowed\nERROR  ...
    So there is some disagreement between DH and GE about the model structure there. Is this as a result of incompatible versions? Something else? Any help would be very welcome. Thanks.
    h
    • 2
    • 4
  • b

    breezy-portugal-43538

    05/27/2022, 11:03 AM
    Hello, I wanted to ask quick question regarding datahub setup. If I run the datahub on a server, and in order to connect to this server I need a proxy, does datahub require any proxy setting as well? I'm asking because I randomly receive 504 error: RetryError: HTTPConnectionPool(host='<my_proxy_address>', port=8080): Max retries exceeded with url: http//&lt;server ip&gt;8080/config (Caused by ResponseError('too many 504 error responses')) Could this 504 be connected to lack of proxies set somehwere in some datahub docker file? For ingestion we use CLI "datahub ingest -c <yml file>" where our source is S3. The ingestion command is ran from within docker container, which is set on the same server where datahub is running.
    e
    • 2
    • 2
  • n

    numerous-application-54063

    05/27/2022, 11:04 AM
    Hello, i'm testing out datahub actions framework with v0.8.35. i started the hello world consumer and runned some ingestions and performed actions on ui. Cannot figure out why i'm receiving only MetadataChangeLog_v1 events and no EntityChangeEvent_v1 at all. setup jobs for kafka runned correctly on our helm deployment. any idea?
    e
    b
    +2
    • 5
    • 31
  • b

    bland-smartphone-67838

    05/27/2022, 12:16 PM
    Hello! Does anyone know how to enable a source? I've created a source, then run
    datahub check plugins
    and I see then my source is disabled. Cant find any information about it.
    l
    • 2
    • 2
  • b

    billowy-jewelry-4209

    05/27/2022, 4:31 PM
    Hello team! I have faced with error during ingestion from vertica. It is
    'KeyError: 'Did not find a registered class for vertica'
    Full log is attached in thread. Thankx for help!
    vertica_ing_log.txt
    l
    • 2
    • 2
  • d

    damp-greece-27806

    05/27/2022, 5:11 PM
    Hi! We’ve started to see that our redshift and redshift-usage jobs just hang and don’t make any progress. We don’t see any queries hanging in redshift. We were wondering if there’s any way to turn on more debugging output. We are using this via Airflow 2.0.2 and are importing Pipeline to run the recipes in our DAG.
    d
    • 2
    • 3
  • g

    gentle-camera-33498

    05/27/2022, 6:28 PM
    Hello Guys, I'm having problems with restoring search and graph indices. Can someone help me? Sample:
    Copy code
    Reading rows 57000 through 58000 from the aspects table.
    Caught exception during attempt 0 of Step with id SendMAEStep: java.lang.IllegalStateException: Aspect clientId could not be found
    Retrying 0 more times...
    Failed Step 3/3: SendMAEStep. Failed after 0 retries.
    Exiting upgrade RestoreIndices with failure.
    b
    e
    +2
    • 5
    • 21
1...303132...119Latest