https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • n

    numerous-application-54063

    05/19/2022, 4:41 PM
    Hello, i'm having an issue while using stateful ingestion and add tag transformers together. digging into the code i figured out that the issue is introduced in datahub cli 0.8.28.0 and above and is related to the code added here: https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/run/pipeline.py#L235 i was able to see in debug mode that the entity that is not in the current commit, gets soft deleted correctly and emits an MCP for the soft deletion. but once the pipeline reach the code at line 235, the entity is re-ingested again. basically what it seems to me is that the add tag transformer consume the soft delete Mcp, and add the tag to the entity, but in this way he is undoing the soft deletion, not sure why. Any idea on it?
  • g

    gifted-bird-57147

    05/21/2022, 1:33 PM
    I think I'm running into the same issue as discussed here currently with v0.8.35. trying to get to the fields list for my postgres dataset. Dataset is ingested via a datahub ingest recipe.
    Copy code
    graph.get_aspect_v2(entity_urn=ds_urn, aspect_type=SchemaMetadataClass, aspect='schemaMetadata')
    throws:
    Copy code
    ValueError: com.linkedin.pegasus2avro.schema.Schemaless contains extra fields: {'com.linkedin.schema.MySqlDDL'}
  • a

    astonishing-dusk-99990

    05/23/2022, 4:21 AM
    Hi @powerful-telephone-71997 I'm having the same trouble, do you know how to fix it?
  • c

    chilly-elephant-51826

    05/23/2022, 5:51 AM
    Hi I am trying to setup Superset connection, but it is unable to get any data, using Admin credentials to login, Not sure if this issue from superset or datahub Superset Configured with Okta, using Admin Credentials created while deployment Using datahub UI to setup connection Test Connection Passed Using Provider as 'DB' if i try to choose ldap am getting unauthorised access Ingestion completes successfully but give empty result logs from execution
    Copy code
    'Source (superset) report:\n'
               "{'workunits_produced': 0,\n"
               " 'workunit_ids': [],\n"
               " 'warnings': {},\n"
               " 'failures': {},\n"
               " 'cli_version': '0.8.34.1',\n"
               " 'cli_entry_location': '/tmp/datahub/ingest/venv-79a6e9d4-1370-4f11-a3cf-3b1ad0466ef9/lib/python3.9/site-packages/datahub/__init__.py',\n"
               " 'py_version': '3.9.9 (main, Dec 21 2021, 10:03:34) \\n[GCC 10.2.1 20210110]',\n"
               " 'py_exec_path': '/tmp/datahub/ingest/venv-79a6e9d4-1370-4f11-a3cf-3b1ad0466ef9/bin/python3',\n"
               " 'os_details': 'Linux-5.10.109-104.*****'}\n"
               'Sink (datahub-rest) report:\n'
               "{'records_written': 0,\n"
               " 'warnings': [],\n"
               " 'failures': [],\n"
               " 'downstream_start_time': None,\n"
               " 'downstream_end_time': None,\n"
               " 'downstream_total_latency_in_seconds': None,\n"
               " 'gms_version': 'v0.8.33'}\n"
               '\n'
               'Pipeline finished successfully\n',
               "2022-05-23 05:30:56.587754 [exec_id=79a6e9d4-1370-4f11-a3cf-3b1ad0466ef9] INFO: Successfully executed 'datahub ingest'"]}
    Execution finished successfully!
  • q

    quick-pizza-8906

    05/23/2022, 10:02 AM
    Hello, I am experiencing problems while using gms deployed in kubernetes (2 replicas) - when there is a high amount of ingestion requests coming to the API we can see some connections timeout with an empty response as seen by our Load Balancer put in front of the service. At the same time cpu/mem consumption is not hitting pods resource limits. Did anyone experience similar problems? Will GMS work fine after simply scaling deployment to more than 2 replicas, or does it require some special parameters? We are using official helm charts (slightly modified) to deploy.
  • b

    bumpy-activity-74405

    05/23/2022, 11:01 AM
    Is there a way in the UI to filter entities that don’t have ownership and/or domain assigned?
  • b

    bumpy-activity-74405

    05/23/2022, 11:31 AM
    Also is there any way to change items per page displayed in the UI? 10 is very small number…
  • a

    acceptable-judge-21659

    05/23/2022, 1:03 PM
    Hi, I am doing the internationalizatio of Datahub on the module datahub-web-react. I have trouble translating the subtypes. I saw the possible subtypes in the Python ingestions files. But this is not evolutive to just copy the subtypes from there... Since there is a generated enum for the EntityTypes (cf. types.generated.ts). Did you plan anything for a generated enum of the EntitySubtypes ?
  • f

    flaky-market-12551

    05/24/2022, 1:56 AM
    Hi Guys, Would appreciate it a lot if someone can help guide me to get thru on this. Thanks
    Copy code
    datahub docker quickstart
    I am getting this error 😞
    Copy code
    ERROR: for mysql  Cannot start service mysql: b'Mounts denied: sxfs/#namespaces for more info.\r\n.\ny7f7690s324xf7k2_1ypvm0000gn/mysql/init.sql\r\nis not shared from OS X and is not known to Docker.\r\nYou can configure shared paths from Docker -> Preferences... -> File Sharing.\r\nSee <https://docs.docker.com/docker-for-mac/o>'
    ERROR: Encountered errors while bringing up the project.
    Btw, also attached the logs as well. Binary tmpw99ops7j.log
    tmpw99ops7j.log
  • h

    high-family-71209

    05/24/2022, 6:48 AM
    I am having this problem again. 😞 I've ingested a dataset via the command line. And it is accessible when I access the URL providing the urn, but it isn't shown in the UI anywhere... this time, my "trick" of re-ingesting random other things doesn't work. Has anyone found a permanent remedy?
  • e

    echoing-farmer-38304

    05/24/2022, 6:54 AM
    Hi, i'm testing my custom Meta Data Ingestion from the Power BI Report Server and getting this error, what am I missing? I used ready Powerbi BI Ingestion in datahub as example, I made mapping report -> dashboard. I can't get tiles for it, so ds_mcps, chart_mcps = [], [] I run ingest command with --dry-run,
    Copy code
    INFO     {datahub.cli.ingest_cli:130} - Finished metadata pipeline
    
    Source (powerbireportserver.report_server.PowerBiReportServerDashboardSource) report:
    {'workunits_produced': 560,
     'workunit_ids': ['powerbi-urn:li:corpuser:a-a.user-corpUserInfo',
                      'powerbi-urn:li:corpuser:a-a.user-status',
                      'powerbi-urn:li:corpuser:a-a.user-corpUserKey',
                      'powerbi-urn:li:dashboard:(powerbi,reports.38437a3f-9818-43e4-ad0f-be0b4aa2868d)-browsePaths',
                      'powerbi-urn:li:dashboard:(powerbi,reports.38437a3f-9818-43e4-ad0f-be0b4aa2868d)-dashboardInfo',
                      'powerbi-urn:li:dashboard:(powerbi,reports.38437a3f-9818-43e4-ad0f-be0b4aa2868d)-status',
                      'powerbi-urn:li:dashboard:(powerbi,reports.38437a3f-9818-43e4-ad0f-be0b4aa2868d)-dashboardKey',
                      'powerbi-urn:li:dashboard:(powerbi,reports.38437a3f-9818-43e4-ad0f-be0b4aa2868d)-ownership',
    ...
    Copy code
    'warnings': {},
     'failures': {},
     'cli_version': '0.8.34.2',
     'scanned_report': 70,
     'filtered_reports': []}
    Sink (datahub-rest) report:
    {'records_written': 0,
     'warnings': [],
     'failures': [],
     'downstream_start_time': None,
     'downstream_end_time': None,
     'downstream_total_latency_in_seconds': None,
     'gms_version': 'v0.8.34'}
    
    Pipeline finished successfully
    But getting this error,
    Copy code
    ERROR    {datahub.ingestion.run.pipeline:229} - Failed to extract some records due to: source produced an invalid metadata work unit: MetadataChangeProposalWrapper(
        entityType="dashboard",
        changeType="UPSERT",
        entityUrn="urn:li:dashboard:(powerbi,reports.8371ebd6-0385-4a70-a286-6ac20ee69f74)",
        entityKeyAspect=None,
        auditHeader=None,
        aspectName="dashboardInfo",
        aspect=DashboardInfoClass(
            {
                "customProperties": {
                    "chartCount": 0,
                    "workspaceName": "PowerBI Report Server",
                    "workspaceId": "8371ebd6-0385-4a70-a286-6ac20ee69f74",
                },
                "externalUrl": None,
                "title": "Staff_Tea",
                "description": "",
                "charts": [],
                "lastModified": ChangeAuditStampsClass(
                    {
                        "created": AuditStampClass(
                            {
                                "time": 0,
                                "actor": "urn:li:corpuser:unknown",
                                "impersonator": None,
                            }
                        ),
                        "lastModified": AuditStampClass(
                            {
                                "time": 0,
                                "actor": "urn:li:corpuser:unknown",
                                "impersonator": None,
                            }
                        ),
                        "deleted": None,
                    }
                ),
                "dashboardUrl": "<myurl>",
                "access": None,
                "lastRefreshed": None,
            }
        ),
        systemMetadata=SystemMetadataClass(
            {
                "lastObserved": 1653374365648,
                "runId": "powerbireportserver.report_server.PowerBiReportServerDashboardSource-2022_05_24-09_36_35",
                "registryName": None,
                "registryVersion": None,
                "properties": None,
            }
        ),
    )
  • a

    able-rain-74449

    05/24/2022, 9:00 AM
    Hi All Cloud anyone confirm what are the bare minimum components require for Datahub to run?
  • c

    calm-waitress-61333

    05/24/2022, 5:52 PM
    if I need to set the truststore for Kafka Connect exports.. where would I do that with a k8s deploy
  • c

    calm-waitress-61333

    05/24/2022, 5:53 PM
    i exported the trust and loaded it up as a jks into the deployments.. and set the param in the docs to point at it.. like this
  • c

    calm-waitress-61333

    05/24/2022, 5:53 PM
    Copy code
    volumeMounts:
            - mountPath: /tmp/jks/
              name: connect-devn2-trust-jks
    
          volumes:
          - configMap:
              defaultMode: 420
              items:
              - key: connect-devn2-trust.jks
                path: connect-devn2-trust.jks
              name: connect-devn2-trust-jks
            name: connect-devn2-trust-jks
    
    
            - name: SPRING_KAFKA_PROPERTIES_SSL_TRUSTSTORE_LOCATION
              value: /tmp/jks/connect-devn2-trust.jks
    
          hostAliases:
          - hostnames:
            - connect-devn2
            ip: 10.18.0.16
  • c

    calm-waitress-61333

    05/24/2022, 5:53 PM
    even forced the hostname..
  • c

    calm-waitress-61333

    05/24/2022, 5:53 PM
    but still getting
  • c

    calm-waitress-61333

    05/24/2022, 5:53 PM
    Copy code
    ConnectionError: HTTPSConnectionPool(host='connect-devn2', port=8083): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe81fa8e490>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
  • c

    calm-waitress-61333

    05/24/2022, 5:54 PM
    is there a different pod I am supposed to be editing.. the only ones I have left are the datastores.. ie kafka, schemareg, elasticsearch, mysql, neo4j, zookeeper
  • c

    calm-waitress-61333

    05/24/2022, 5:54 PM
    i shelled into the pods and verified the jks file contents..
  • c

    calm-waitress-61333

    05/24/2022, 5:54 PM
    can't figure out where the export is running though or how to make it use that truststore
  • c

    calm-waitress-61333

    05/24/2022, 5:55 PM
    Copy code
    $ k get po
    NAME                                               READY   STATUS      RESTARTS      AGE
    datahub-acryl-datahub-actions-f94557c78-pw6s9      1/1     Running     0             11m
    datahub-datahub-frontend-6548f8bb45-9sbtk          1/1     Running     0             6m31s
    datahub-datahub-gms-5c5c5d4f5b-rkhf4               1/1     Running     0             17m
    datahub-datahub-upgrade-job--1-p7z6p               0/1     Completed   0             3h31m
    datahub-elasticsearch-setup-job--1-r57m2           0/1     Completed   0             3h34m
    datahub-kafka-setup-job--1-wtqn7                   0/1     Completed   0             3h34m
    datahub-mysql-setup-job--1-ht4lx                   0/1     Completed   0             3h31m
    elasticsearch-master-0                             1/1     Running     0             19d
    elasticsearch-master-1                             1/1     Running     0             19d
    elasticsearch-master-2                             1/1     Running     0             19d
    prerequisites-cp-schema-registry-cf79bfccf-5pk27   2/2     Running     6 (19d ago)   19d
    prerequisites-kafka-0                              1/1     Running     4 (19d ago)   19d
    prerequisites-mysql-0                              1/1     Running     0             19d
    prerequisites-neo4j-community-0                    1/1     Running     0             19d
    prerequisites-zookeeper-0                          1/1     Running     0             19d
  • c

    calm-waitress-61333

    05/24/2022, 5:59 PM
    for some reason having issues resolving the internal cluster dns as well so i added that hostAliases
  • f

    fresh-garage-83780

    05/24/2022, 7:03 PM
    I think I found a typo in the Confluent Cloud setup page too. I believe the correct value for
    security.protocol
    is
    SASL_SSL
    but the docs show it as just
    SASL
    .
    Copy code
    springKafkaConfigurationOverrides: 
      security.protocol: SASL_SSL
    Still investigating that, as SASL_SSL causes issues for acryl-datahub-actions. Will report back
    Copy code
    KafkaException: KafkaError{code=_INVALID_ARG,val=-186,str="Failed to create consumer: No provider for SASL mechanism GSSAPI: recompile librdkafka with libsasl2 or openssl support. Current build options: PLAIN SASL_SCRAM OAUTHBEARER"}
  • n

    numerous-account-62719

    05/25/2022, 4:41 AM
    Please help me with this issue
  • s

    shy-ability-24875

    05/25/2022, 7:50 AM
    As shown in the picture below , we hope to see the all the table Lineage relationship of hive. so Is this achievable? like Atlas Hive Hook
  • a

    adamant-furniture-37835

    05/25/2022, 1:58 PM
    @adventurous-dream-16099
  • k

    kind-dawn-17532

    05/25/2022, 7:33 PM
    Hi All, I had a few questions -
  • k

    kind-dawn-17532

    05/25/2022, 7:33 PM
    Is there a special config that needs to be enables for blame view for entities?
  • k

    kind-dawn-17532

    05/25/2022, 7:34 PM
    We want to keep track of age of database objects. Assuming our ingestion runs daily, we want to be able to track when an entity was created. There are fields like createdon and lastObserved in underlying mysql table. Should be use those or are there other recommendation?
1...112113114...119Latest