red-pizza-28006
10/26/2021, 11:51 AMloud-camera-71352
10/26/2021, 12:21 PMset_dataset_browse_path
but no dataset is visible.red-pizza-28006
10/26/2021, 6:48 PMbland-teacher-17190
10/27/2021, 1:18 PMwitty-butcher-82399
10/27/2021, 2:42 PM0.8.15
to 0.8.16
and GMS is not able to start.
Found a couple of exceptions suggesting there is an issue with ES indexes.
Any idea how we can overcome this?victorious-dream-46349
10/27/2021, 4:37 PM{
"com.linkedin.dataset.UpstreamLineage": {
"upstreams": []
}
},
``` and it was 200 OK.
And when I get using https://BASE_URL/entities/urn-of-dataset, it is returning with empty com.linkedin.dataset.UpstreamLineage
aspect as expected.
But in the UI
, I am still seeing the lineage.
Question: How to delete the lineage permanently in datahub (using rest-api preferably) ?curved-sandwich-81699
10/27/2021, 8:23 PMsource:
type: "snowflake"
config:
username: ...
password: ...
host_port: ...
warehouse: ...
role: "accountadmin"
database_pattern:
ignoreCase: true
allow:
- "db1"
- "db2"
table_pattern:
ignoreCase: true
allow:
- "db1.schema1.table1"
- "db2.schema2.table2"
include_tables: true
include_table_lineage: false
Only db1.schema1.table1
get ingested and the log show db2.schema1.table1
getting filtered out, even though there are no db2.schema1
schema, and db2.schema2.table2
does not show up in the log.adamant-van-40260
10/28/2021, 3:55 AM10:06:22.490 [ForkJoinPool.commonPool-worker-15] ERROR c.d.m.graphql.GraphQLController - Errors while executing graphQL query: "query getDataset($urn: String!) {\n dataset(urn: $urn
....
result: {errors=[{message=An unknown error occurred., locations=[{line=400, column=5}], path=[dataset, downstreamLineage, entities, 0, entity], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}, {message=An unknown error occurred., locations=[{line=400, column=5}], path=[dataset, downstreamLineage, entities, 1, entity], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}, {message=An unknown error occurred., locations=[{line=400, column=5}], path=[dataset, downstreamLineage, entities, 2, entity], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}, {message=An unknown error occurred., locations=[{line=400, column=5}], path=[dataset, downstreamLineage, entities, 3, entity], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}, {message=An unknown error occurred., locations=[{line=400, column=5}], path=[dataset, downstreamLineage, entities, 4, entity], extensions={code=500, type=SERVER_ERROR,
handsome-football-66174
10/28/2021, 7:24 PMjava.util.concurrent.CompletionException: com.linkedin.datahub.graphql.exception.AuthorizationException: Unauthorized to perform this action. Please contact your DataHub administrator.
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.linkedin.datahub.graphql.exception.AuthorizationException: Unauthorized to perform this action. Please contact your DataHub administrator.
at com.linkedin.datahub.graphql.types.tag.TagType.update(TagType.java:140)
at com.linkedin.datahub.graphql.types.tag.TagType.update(TagType.java:46)
at com.linkedin.datahub.graphql.resolvers.mutate.MutableTypeResolver.lambda$get$0(MutableTypeResolver.java:35)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
... 1 common frames omitted
19:20:52.101 [Thread-11216] ERROR c.d.m.graphql.GraphQLController - Errors while executing graphQL query: "mutation updateTag($input: TagUpdate!) {\n updateTag(input: $input) {\n urn\n name\n description\n ownership {\n ...ownershipFields\n __typename\n }\n __typename\n }\n}\n\nfragment ownershipFields on Ownership {\n owners {\n owner {\n ... on CorpUser {\n urn\n type\n username\n info {\n active\n displayName\n title\n email\n firstName\n lastName\n fullName\n __typename\n }\n editableInfo {\n pictureLink\n __typename\n }\n __typename\n }\n ... on CorpGroup {\n urn\n type\n name\n info {\n email\n admins {\n urn\n username\n info {\n active\n displayName\n title\n email\n firstName\n lastName\n fullName\n __typename\n }\n editableInfo {\n pictureLink\n teams\n skills\n __typename\n }\n __typename\n }\n members {\n urn\n username\n info {\n active\n displayName\n title\n email\n firstName\n lastName\n fullName\n __typename\n }\n editableInfo {\n pictureLink\n teams\n skills\n __typename\n }\n __typename\n }\n groups\n __typename\n }\n __typename\n }\n __typename\n }\n type\n __typename\n }\n lastModified {\n time\n __typename\n }\n __typename\n}\n", result: {errors=[{message=Unauthorized to perform this action. Please contact your DataHub administrator., locations=[{line=2, column=3}], path=[updateTag], extensions={code=403, classification=DataFetchingException}}], data={updateTag=null}}, errors: [DataHubGraphQLError{path=[updateTag], code=UNAUTHORIZED, locations=[SourceLocation{line=2, column=3}]}]
nice-country-99675
10/28/2021, 8:01 PMchilly-spring-43918
10/29/2021, 2:19 PMdatahub-gms
METADATA_CHANGE_EVENT_NAME: The name of the metadata change event topic.
METADATA_AUDIT_EVENT_NAME: The name of the metadata audit event topic.
FAILED_METADATA_CHANGE_EVENT_NAME: The name of the failed metadata change event topic.
datahub-mce-consumer
KAFKA_MCE_TOPIC_NAME: The name of the metadata change event topic.
KAFKA_FMCE_TOPIC_NAME: The name of the failed metadata change event topic.
datahub-mae-consumer
KAFKA_TOPIC_NAME: The name of the metadata audit event topic.
it causing error on this topics name. or maybe there is a way to ignore the error?
Caused by: org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [MetadataChangeProposal_v1]
little-address-54150
10/29/2021, 2:50 PMbrief-wolf-70822
10/29/2021, 7:59 PMtable_pattern.allow
. I notice that when I enable profiling, it seems to ignore this and profile all of the tables. Do I need to also set profile_pattern.allow
to the same list? I had assumed it would only profile tables I had allowed with the table allow pattern but maybe I'm wrong therefuture-hamburger-62563
10/30/2021, 3:27 AM./gradlew build
and I had significantly more success. So the steps I followed were:
1. Installed docker and forked/git cloned the repo.
2. Ran ./gradlew build
, got an error for: No Java. Installed Java 8 bc I prev. had an issue with Java 11. Added my JAVA_PATH to .bashrc and retried.
3. Got an error for no venv
- so I `sudo apt install python3.8-venv`and ran again.
4. Got an error for no jq
- so I sudo apt install jq
and ran again.
5. Got the :metadata-io:test
failed and re-found this thread I found with my last attempts. I used ./gradlew build -x check
and I'm happy to say that the build was successful.
So, the questions I have now are:
1. Did I miss a step with Docker? Do I need to run the quickstart so that the :metadata-io:test
will work correctly?
2. I'm interested in baby-stepping into making code changes, but I'm new and inexperienced. Would my next steps be to make the changes, re-build and then run docker/dev.sh
to launch a container and inspect the changes?
3. Is it necessary to always rebuild after changes are made and redeploy to docker? Or if I wanted to focus on the React part could I just run it locally? If so, how might I do this?
4. Finally would it be helpful at all for me to add slightly more detailed instructions to the developer's setup guide?
Thanks all. Have a nice weekend, try and catch that fall weather! 🍂🍁bland-orange-13353
11/01/2021, 6:48 AMred-pizza-28006
11/01/2021, 9:44 AM[2021-11-01 10:36:26,936] ERROR {datahub.ingestion.run.pipeline:69} - failed to write record with workunit urn:li:corpGroup:Catch%20up with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Conversion = 'u'\n\tat com.linkedin.metadata.restli.RestliUtil.badRequestException(RestliUtil.java:84)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:35)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)\n\tat com.linkedin.metadata.resources.entity.EntityResource.ingest(EntityResource.java:182)\n\tat sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat com.linkedin.restli.internal.server.RestLiMethodInvoker.doInvoke(RestLiMethodInvoker.java:172)\n\tat
Any ideas? Looking at the exception it seems that we are not able to process spaces in the name? Also, it looks like the recipe only ingested groups and not users with this error
ValueError: Unable to find the key mail in Group. Is it wrong?
handsome-belgium-11927
11/01/2021, 11:56 AMhallowed-article-64840
11/02/2021, 8:49 AMripe-sunset-20897
11/02/2021, 9:22 AM<http://localhost:8080/api/graphql>
, it gives me
"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '/api'\n\tat
can anyone helps me with this error ?, thanksred-pizza-28006
11/02/2021, 12:34 PM[2021-11-02 13:27:38,054] WARNING {datahub.ingestion.source.sql.snowflake:183} - Extracting lineage from Snowflake failed.Please check your premissions. Continuing...
Error was (snowflake.connector.errors.ProgrammingError) 000904 (42000): SQL compilation error: error line 13 at position 33
invalid identifier 'T.OBJECTS_MODIFIED'
[SQL:
WITH table_lineage_history AS (
SELECT
r.value:"objectName" AS upstream_table_name,
r.value:"objectDomain" AS upstream_table_domain,
r.value:"columns" AS upstream_table_columns,
w.value:"objectName" AS downstream_table_name,
w.value:"objectDomain" AS downstream_table_domain,
w.value:"columns" AS downstream_table_columns,
t.query_start_time AS query_start_time
FROM
(SELECT * from snowflake.account_usage.access_history) t,
lateral flatten(input => t.BASE_OBJECTS_ACCESSED) r,
lateral flatten(input => t.OBJECTS_MODIFIED) w
WHERE r.value:"objectId" IS NOT NULL
AND w.value:"objectId" IS NOT NULL
AND w.value:"objectName" NOT LIKE '%.GE_TMP_%'
AND t.query_start_time >= to_timestamp_ltz(1635724800000, 3)
AND t.query_start_time < to_timestamp_ltz(1635811200000, 3))
SELECT upstream_table_name, downstream_table_name, upstream_table_columns, downstream_table_columns
FROM table_lineage_history
WHERE upstream_table_domain = 'Table' and downstream_table_domain = 'Table'
QUALIFY ROW_NUMBER() OVER (PARTITION BY downstream_table_name, upstream_table_name ORDER BY query_start_time DESC) = 1 ]
(Background on this error at: <http://sqlalche.me/e/13/f405>).
Based on snowflake’s docs, I dont see OBJECTS_MODIFIED field in the snowflake.account_usage.access_history. Can this be a bug?red-pizza-28006
11/02/2021, 3:05 PMprofiling.limit
-> I started getting this error
[SQL: CREATE OR REPLACE TEMPORARY TABLE ge_tmp_eee423e9 AS SELECT *
FROM src_salesforce."ORDER"
LIMIT 2000
Without this, it seems to work fine. I also tried adding
profile.turn_off_expensive_profiling_metrics
But for some reason it does not accept the profile param. I am on the latest datahub version 0.8.16.2handsome-football-66174
11/02/2021, 6:46 PMbland-orange-13353
11/02/2021, 11:45 PMicy-scooter-74959
11/03/2021, 1:04 AMdatahub ingest
. However, when I try to do it in Airflow using the example I am getting
"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '/datasets'
Can someone point me in the right direction? Is this a mismatched version problem?adamant-van-40260
11/03/2021, 5:07 AMPOST /entities?action=searchAcrossEntities - searchAcrossEntities - 200 - 0ms
04:56:50.672 [ForkJoinPool.commonPool-worker-3] ERROR c.l.datahub.graphql.GmsGraphQLEngine - Failed to load Entities of type: DataJob, keys: [urn:li:dataJob:(urn:li:dataFlow:(airflow,ETL_30M,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),CORE_TRANS), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_30m,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_6hour,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,ETL_30M,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),MERGE_CORE_TRANS), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_30m,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),merge_core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_6hour,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),merge_core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,ETL_30M,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),MERGE_USERPAYMENT_SUBTRANSTYPE_CORE_TRANS), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_30m,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),merge_userpayment_subtranstype_core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,etl_6hour,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),merge_userpayment_subtranstype_core_trans), urn:li:dataJob:(urn:li:dataFlow:(airflow,growth_daily_crm_5h_10h_12h_15h_shared,<http://analytics.airflows.mservice.io|analytics.airflows.mservice.io>),CRM_RAW_DATA_CORE_TRANS)] Failed to batch load DataJobs
04:56:50.673 [ForkJoinPool.commonPool-worker-3] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler - Failed to execute DataFetcher
java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type DataJob
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1596)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.lang.RuntimeException: Failed to retrieve entities of type DataJob
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$104(GmsGraphQLEngine.java:862)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
... 5 common frames omitted
Caused by: java.lang.RuntimeException: Failed to batch load DataJobs
at com.linkedin.datahub.graphql.types.datajob.DataJobType.batchLoad(DataJobType.java:106)
at com.linkedin.datahub.graphql.GmsGraphQLEngine.lambda$null$104(GmsGraphQLEngine.java:859)
... 6 common frames omitted
Caused by: java.lang.IllegalStateException: Duplicate key com.linkedin.metadata.entity.ebean.EbeanAspectV2@5a1a7708
lively-jackal-83760
11/03/2021, 9:44 AMlively-jackal-83760
11/03/2021, 1:26 PMfaint-hair-91313
11/03/2021, 4:22 PMhigh-hospital-85984
11/03/2021, 5:27 PMc.l.m.r.entity.EntityResource - INGEST urn urn:li:corpuser:test-user with system metadata {lastObserved=1635959902784}
in the GMS log, and it shows up in search and in the UI, but I don't see it in the GMS database (select * from metadata_aspect_v2
). The DB only contains the default urn:li:corpuser:datahub
.
Am I missing something? 😅plain-farmer-27314
11/04/2021, 7:30 PM