DataHub #troubleshoot

loud-camera-71352

11/29/2021, 3:53 PM

Hi guys! Is it possible to add a tag to a column via curl ? I tried this but I get a 400 error “Cannot parse request entity”

Copy code

curl '<http://localhost:8080/entities?action=ingest>' -X POST --data '{
	"entity": {
		"value": {
			"com.linkedin.metadata.snapshot.DatasetSnapshot": {
				"urn": "urn:li:dataset:(urn:li:dataPlatform:exasol,main.dds.test2,PROD)",
				"aspects": [
					{
						"com.linkedin.schema.EditableSchemaMetadata":[
            				"editableSchemaFieldInfo":{
								"fieldPath": "member_id",
								"globalTags": { "tags": [{ "tag": "urn:li:tag:PII" }] }
            				}
    					]
					}
				]
			}
		}
	}
}'

cool-painting-92220

11/29/2021, 7:22 PM

Hi everyone! I'm still ramping up on learning Datahub and had question about the ingested metadata. If I needed to shutdown the datahub server (

datahub docker nuke

) but wanted to save all the metadata I had previously ingested so that I don't have to run the ingestion again when I started datahub back up again, what would be the best way to approach this? Additionally, if I had used Datahub's UI to add some descriptions and documentation to a few tables, how could this data be saved if the server were to be shutdown? Thank you for any guidance that can be provided! 😄

polite-flower-25924

11/29/2021, 9:51 PM

Hey folks, I’m not able to see the owned datasets for a specific group. Even though, several entities are assigned to this group (event-tracking), they don’t appear in the Ownership part. 😕

plain-farmer-27314

11/30/2021, 3:55 PM

hey all - we are looking into leverage datahub's lineage for some backend processes/alerting curious as to what the most efficient method is to fetch all downstream of entities of a given type, or upstream might be? Our use case is: Table X is experience ETL delays and we want to determine which charts/dashboards are impacted by this in Looker? I'm currently experimenting with the graphql endpoint but it seems like a lot of computation to do for each table

cool-painting-92220

11/30/2021, 9:09 PM

Hey everyone! I've been trying to read up on Datahub's architecture and its storage of data and was wondering - if the server hosting Datahub were to suddenly crash, is there any data (either ingested metadata or information that users have contributed to on the platform like documentation, tags, etc.) that would be at risk of being lost? And what would be the best setup for ensuring that this information isn't lost if any is at risk? I've come across the following link for restoring search and graph indices, but wasn't sure about the rest of the data in Datahub (https://datahubproject.io/docs/how/restore-indices)

numerous-translator-7230

12/01/2021, 3:26 AM

Hi everyone! I've been trying to set up DataHub on AWS ECS. There are two ports, Frontend:9002 and GMS:8080, needed to be exposed through load balancer. Is there anybody who figured out the same issue?

red-pizza-28006

12/01/2021, 10:52 AM

trying to delete a schema, I am getting this error

datahub delete -n --query "fivetran_headscarf_hurray_staging"

. Any ideas?

Copy code

---- (full traceback above) ----
File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/datahub/entrypoints.py", line 95, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/delete_cli.py", line 148, in delete
    deletion_result = delete_with_filters(
File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/delete_cli.py", line 211, in delete_with_filters
    batch_deletion_result.merge(one_result)
File "/Users/ajaykumarmuppuri/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/delete_cli.py", line 51, in merge
    self.sample_records.extend(another_result.sample_records)

AttributeError: 'FieldInfo' object has no attribute 'extend'

handsome-football-66174

12/01/2021, 6:07 PM

Hi everyone - Trying out the GraphQL API. I am able to use this to get the list of users. How to also get the relationships ?

listUsers(input: {  start: 0, count: 10 }){

start

count

total

users {

urn

type

username

status

properties {

displayName

email

title

departmentId

departmentName

firstName

lastName

fullName

countryCode

editableProperties {

aboutMe

pictureLink

bland-orange-13353

12/01/2021, 7:33 PM

This message was deleted.

refined-branch-44251

12/02/2021, 1:03 AM

Copy code

curl --location --request POST '<http://localhost:8080/entities?action=search>' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
    "input": "glossaryTerms:Classification.Sensitive",
    "entity": "dataset",
    "start": 0,
    "count": 10
}'

this returns all datasets with the glossary term 'Classification.Sensitive'. is there a way to search datasets where this glossary term has been applied to fields of the dataset (and not the dataset)?

abundant-flag-19546

12/02/2021, 3:56 AM

I’m trying to make MLExperiment entity to ingest from mlflow. (There is already Pull Request https://github.com/linkedin/datahub/pull/2725, but I need to make a lineage like ‘dataset -> consumes -> MLExperiment -> trained by -> MLModel’ so I’m trying to implement MLExperiment entity.) I implemented these files: •

MLExperimentUrn.java

MLExperimentUrn.pdl

in li-utils, •

MLExperimentKey.pdl

MLExperimentSnapshot.pdl

MLExperimentProperties.pdl

MLExperimentAspect.pdl

• and registered these items in

Aspect.pdl

and

Snapshot.pdl

When I tried to build with the command

Copy code

COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build

I got these error:

Copy code

#10 235.9 > Task :metadata-service:restli-impl:checkRestModel FAILED
#10 235.9 [checker]
#10 235.9 [checker] idl compatibility report:
#10 235.9 [checker] Incompatible changes:
#10 235.9 [checker]   1) /collection/actions/batchIngest/parameters/entities/type: new union added members com.linkedin.metadata.snapshot.MLExperimentSnapshot
#10 235.9 [checker]   2) com.linkedin.entity.Entity/value/com.linkedin.metadata.snapshot.Snapshot/ref/union: new union added members com.linkedin.metadata.snapshot.MLExperimentSnapshot, breaks old readers
#10 235.9 [checker]
#10 235.9 [checker] [RS-COMPAT]: false
#10 235.9 [checker] [MD-COMPAT]: false
#10 235.9 [checker] [RS-I]:/collection/actions/batchIngest/parameters/entities/type: new union added members com.linkedin.metadata.snapshot.MLExperimentSnapshot
#10 235.9 [checker] [MD-I]:com.linkedin.entity.Entity/value/com.linkedin.metadata.snapshot.Snapshot/ref/union: new union added members com.linkedin.metadata.snapshot.MLExperimentSnapshot, breaks old readers
#10 235.9 [checker]
#10 235.9
#10 235.9 FAILURE: Build failed with an exception.

So I tried this workaround (found from official docs)

Copy code

./gradlew :gms:impl:build -Prest.model.compatibility=ignore

but makes ‘project gms not found’ error. 1. Is this correct way to implement new Entity? (Creating custom urn java code, Registering items in aspect.pdl and snapshot.pdl) 2. How can I ignore that error while building docker images?

quaint-branch-37931

12/02/2021, 10:15 AM

Hey all, I'm trying to ingest data using the REST sink. This seems to work fine, but afterwards the UI still shows up empty. There are no errors in the gms or react-webapp logs, and when I check the backing postgres database the data does seem to be there. I'm running react-webapp and gms version v0.8.17, backed by postgres and AWS managed ES and kafka. Any ideas on how I could track down the issue?

full-area-6720

12/02/2021, 11:14 AM

I am getting this error while trying to ingest business glossary. This is the sample file provided.

refined-apple-6340

12/02/2021, 2:58 PM

i have a self signed cert using opensearch for the elasticsearch-start the wait does not use a curl -k so it fails waiting on opensearch any ideas?

aloof-forest-55926

12/02/2021, 7:31 PM

1. hi, i'm new in datahub and i get this error (Failed to log in! SyntaxError: Unexpected token < in JSON at position 0) while trying to sign in to http://localhost:9002 using

datahub

as both the username and password

bulky-controller-34643

12/03/2021, 2:04 AM

Hi all, I'm new to datahub and I recently install the datahub for test using the helm chart version on local kubernetes, I follow the install guideline here(link) and didn't change any settings, and the installation looks fine. However, when I login to the website I found something different to the demo site(link), on my site, there is no settings button on the right top. Do I need to enable or install something to get it visible? I would like to use the access token feature in the settings. Thanks for any helps.

plus1 1

ambitious-guitar-89068

12/03/2021, 6:35 AM

Hi Folks, trying to follow this document, set the two environment variables in frontend and gms and not seeing the settings menu after docker containers restart… https://datahubproject.io/docs/introducing-metadata-service-authentication/

best-crayon-19865

12/03/2021, 1:26 PM

Hi all. I'm trying to use lineage backend and get error when trying to run airflow tasks. I set url in AWS SSM as

Copy code

<http://datahub.dwh-stage.corp.loc>

Did someone have similar error?

Copy code

ERROR - ('Unable to emit metadata to DataHub GMS', {'message': "Invalid URL 'datahub.dwh-stage.corp.loc/entities?action=ingest': No schema supplied. Perhaps you meant <http://datahub.dwh-stage.corp.loc/entities?action=ingest>?"})

ambitious-vegetable-3452

12/03/2021, 2:06 PM

Hey folks, I've been trying to install the prerequisites for datahub on our EKS cluster with the helm charts. It fails to create the ElasticSearch instances and with the following error:

Copy code

{
  "type": "server",
  "timestamp": "2021-12-03T14:03:22,365Z",
  "level": "WARN",
  "component": "r.suppressed",
  "cluster.name": "elasticsearch",
  "node.name": "elasticsearch-master-1",
  "message": "path: /_cluster/health, params: {wait_for_status=green, timeout=1s}",
  "stacktrace": [
    "org.elasticsearch.discovery.MasterNotDiscoveredException: null",
    "at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.onTimeout(TransportMasterNodeAction.java:220) [elasticsearch-7.9.3.jar:7.9.3]",
    "at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:325) [elasticsearch-7.9.3.jar:7.9.3]",
    "at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:252) [elasticsearch-7.9.3.jar:7.9.3]",
    "at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:605) [elasticsearch-7.9.3.jar:7.9.3]",
    "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:678) [elasticsearch-7.9.3.jar:7.9.3]",
    "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
    "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
    "at java.lang.Thread.run(Thread.java:832) [?:?]"
  ]
}

Does anyone know how i could solve this?

refined-apple-6340

12/03/2021, 7:34 PM

2021/12/03 191715 Problem with request: Get "https://opensearch:9200": x509: certificate relies on legacy Common Name field, use SANs instead. Sleeping 1s

plain-farmer-27314

12/03/2021, 8:06 PM

Hey all, wondering if there is an ideal way to query all dataset entities that belong to a certain dataPlatform

best-planet-6756

12/03/2021, 8:11 PM

Hi All, looking for some help on an issue I am facing. I pulled the latest and then ran:

Copy code

COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build

But I'm getting the following error:

Copy code

#9 320.0 > Task :metadata-io:compileJava FAILED
#9 320.0
#9 320.0 FAILURE: Build failed with an exception.
#9 320.0
#9 320.0 * What went wrong:
#9 320.0 Execution failed for task ':metadata-io:compileJava'.
#9 320.0 > Could not resolve all files for configuration ':metadata-io:compileClasspath'.
#9 320.0    > Could not resolve com.linkedin.datahub-gma:ebean-dao:0.2.81.
#9 320.0      Required by:
#9 320.0          project :metadata-io
#9 320.0       > Could not resolve com.linkedin.datahub-gma:ebean-dao:0.2.81.
#9 320.0          > Could not get resource '<https://plugins.gradle.org/m2/com/linkedin/datahub-gma/ebean-dao/0.2.81/ebean-dao-0.2.81.pom>'.
#9 320.0             > Could not GET '<https://jcenter.bintray.com/com/linkedin/datahub-gma/ebean-dao/0.2.81/ebean-dao-0.2.81.pom>'.
#9 320.0                > Connect to <http://jcenter.bintray.com:443|jcenter.bintray.com:443> [<http://jcenter.bintray.com/34.95.74.180|jcenter.bintray.com/34.95.74.180>] failed: connect timed out
#9 320.0    > Could not resolve com.linkedin.datahub-gma:restli-resources:0.2.81.
#9 320.0      Required by:
#9 320.0          project :metadata-io
#9 320.0       > Could not resolve com.linkedin.datahub-gma:restli-resources:0.2.81.
#9 320.0          > Could not get resource '<https://plugins.gradle.org/m2/com/linkedin/datahub-gma/restli-resources/0.2.81/restli-resources-0.2.81.pom>'.
#9 320.0             > Could not GET '<https://jcenter.bintray.com/com/linkedin/datahub-gma/restli-resources/0.2.81/restli-resources-0.2.81.pom>'.
#9 320.0                > Connect to <http://jcenter.bintray.com:443|jcenter.bintray.com:443> [<http://jcenter.bintray.com/34.95.74.180|jcenter.bintray.com/34.95.74.180>] failed: connect timed out
#9 320.0    > Could not resolve com.linkedin.datahub-gma:elasticsearch-dao-7:0.2.81.
#9 320.0      Required by:
#9 320.0          project :metadata-io
#9 320.0       > Could not resolve com.linkedin.datahub-gma:elasticsearch-dao-7:0.2.81.
#9 320.0          > Could not get resource '<https://plugins.gradle.org/m2/com/linkedin/datahub-gma/elasticsearch-dao-7/0.2.81/elasticsearch-dao-7-0.2.81.pom>'.
#9 320.0             > Could not GET '<https://jcenter.bintray.com/com/linkedin/datahub-gma/elasticsearch-dao-7/0.2.81/elasticsearch-dao-7-0.2.81.pom>'.
#9 320.0                > Connect to <http://jcenter.bintray.com:443|jcenter.bintray.com:443> [<http://jcenter.bintray.com/34.95.74.180|jcenter.bintray.com/34.95.74.180>] failed: connect timed out
#9 320.0
#9 320.0 * Try:
#9 320.0 Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
#9 320.0
#9 320.0 * Get more help at <https://help.gradle.org>
#9 320.0
#9 320.0 Deprecated Gradle features were used in this build, making it incompatible with Gradle 6.0.
#9 320.0 Use '--warning-mode all' to show the individual deprecation warnings.
#9 320.0 See <https://docs.gradle.org/5.6.4/userguide/command_line_interface.html#sec:command_line_warnings>
#9 320.0
#9 320.0 BUILD FAILED in 5m 19s74 actionable tasks: 74 executed
#9 320.0
------
failed to solve with frontend dockerfile.v0: failed to build LLB: executor failed running [/bin/sh -c cd /datahub-src && ./gradlew :metadata-service:war:build -x test]: runc did not terminate sucessfully
ERROR: Service 'datahub-gms' failed to build : Build failed

Anyone run into this before?

lemon-greece-73651

12/04/2021, 1:32 AM

attempting to run a mysql to datahub ingestion through datahub and getting a strange error when running it via airflow. error details attached in the reply. for the record - the metadata ingestion works perfectly through the datahub ingestion cli when run indepedently. any ideas?

full-area-6720

12/06/2021, 7:05 AM

This is ingested here, but isn't reflected in the ui

bulky-controller-34643

12/07/2021, 3:50 AM

Hi, all. I try to use the ingestion feature from helm chart version datahub, and I successfully ingest the metadata from postgres as pic 1. However, I cannot see datasets on the datahub website like the pic 2, what could be the reason? I check the gms logs as pic 3 and saw the ingest logs

nice-country-99675

12/07/2021, 12:56 PM

👋 Hi Team! I found this strange behaviour... since I'm still playing with DataHub, a lot of back and forth, create and delete usually happens during the day... and this is what happened. I have a data set created using a custom platform (QuickSight), and my ingestion process also include some lineage from these datasets to postgres tables. At some point I delete the QuickSight datasets with a

datahub delete --platform quicksight

and some of them were not deleted which I have to delete them by

urn

. But when I try to re ingest these datasets, none appear in the UI. The ingestion process didn't fail, I see no errors in the logs... but nothing shows up. As a matter of fact, when I try a new

datahub delete --platform quicksight

it tells me there are 0 records. Then I checked the postgres tables in DataHub I still see the upstream lineage to the QuickSight datasets, so the datasets are in some way still in the DB, but not available to the UI,.... When I remove them one by one using the

urn

I'm able to re ingest them and the UI properly display them....

plus1 1

broad-crowd-13788

12/07/2021, 9:10 PM

I see the following error when trying to run ingestion with kafka as the sink. Any idea how do I fix this?

Copy code

ValueSerializationError: KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="Schema being registered is incompatible with an earlier schema for subject "MetadataChangeEvent_v4-value" (HTTP status code 409, SR code 409)"}

cool-painting-92220

12/08/2021, 12:57 AM

Hey everyone! I'm working through Snowflake metadata ingestion and am trying to prevent a few tables and DBs from being ingested. The code below worked perfectly fine for ingesting all tables earlier, but as soon as I added the sections for

database_pattern, view_pattern,

and

schema_pattern

, I got the error message shown below. Any thoughts on why I might be running into these issues? Error:

Copy code

3 validation errors for SnowflakeConfig
schema_pattern -> deny
  value is not a valid list (type=type_error.list)
view_pattern -> deny
  value is not a valid list (type=type_error.list)
database_pattern -> deny
  value is not a valid list (type=type_error.list)

Ingestion File:

Copy code

source:
  type: snowflake
  config:
    host_port: ****
    warehouse: ****
    username: ****
    password: ****
    role: ****
    database_pattern:
      deny: ****
    view_pattern:
      deny: ****
    schema_pattern:
      deny: ****

sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"

cool-painting-92220

12/08/2021, 1:21 AM

Hey everyone! I've tried to search around for guides on this but couldn't find any: how do I create other users for Datahub? I want to give my peers accounts to login with but wasn't sure how to achieve this

refined-apple-6340

12/08/2021, 3:56 AM

what is the docker config for the analytics tab to work in datahub / docker (I have - DATAHUB_ANALYTICS_ENABLED=true for the elasticsearch-setup, datahub-frontend-react) and