gentle-portugal-21014
01/20/2023, 5:04 PM@Searchable = {
"/*": {
"fieldType": "KEYWORD",
"addToFilters": true,
"filterNameOverride": "Term Category",
"queryByDefault": true
}
}
termCategory: optional GlossaryTermCategoryEnum
In another part, I did something similar to a boolean attribute (with "fieldType": "BOOLEAN"
). Nevertheless, I don't get this attribute offered for filtering, even though my search finds glossary terms having these attributes populated. I know that the general mechanism works, because the following change performed to the Upstream and Downstream aspects of the Dataset entity results in getting possibility to filter based on the related attribute:
/**
* The type of the lineage
*/
@Searchable = {
"fieldType": "KEYWORD",
"addToFilters": true,
"filterNameOverride": "Lineage Type"
}
type: DatasetLineageType
Is the "addToFilters" support restricted to the Dataset entity (thus not working for Glossary Term)? If so, where (which file) do we need to modify to remove this limitation? Or is it possibly restricted to just some predefined list of supported aspects (again, where are those aspects listed in that case)? @bulky-soccer-26729, do you see some obvious issue in the outlined PDL file modification above? Or what else might be the reason?echoing-airport-49548
01/20/2023, 7:57 PMechoing-airport-49548
01/20/2023, 7:57 PMgentle-portugal-21014
01/20/2023, 8:35 PMechoing-airport-49548
01/23/2023, 8:01 PMechoing-airport-49548
01/23/2023, 8:01 PMechoing-airport-49548
01/23/2023, 8:01 PMgentle-portugal-21014
01/24/2023, 10:05 AMgentle-portugal-21014
01/30/2023, 9:07 AMechoing-airport-49548
01/30/2023, 6:38 PMbulky-soccer-26729
01/31/2023, 4:56 PM@Searchable = {
"/*": {
...
but the "/*"
syntax is used when you're adding an annotation to a list field. Here you have a regular enum value field, so I would make your PDL look like this instead:
@Searchable = {
"fieldType": "KEYWORD",
"addToFilters": true,
"filterNameOverride": "Term Category",
"queryByDefault": true
}
termCategory: optional GlossaryTermCategoryEnum
without the extra syntax for list fieldsgentle-portugal-21014
01/31/2023, 6:45 PMechoing-airport-49548
01/31/2023, 6:45 PMgentle-portugal-21014
02/01/2023, 2:51 PMenum GlossaryTermCategoryEnum {
@stringFormat = "Document"
DOCUMENT
@stringFormat = "ICT"
ICT
@stringFormat = "Role"
ROLE
@stringFormat = "Institution"
INSTITUTION
@stringFormat = "Other"
OTHER
}
Searching for "ICT" found glossary terms having "ICT" value in the termCategory defined above before that change, whereas the same search doesn't work any longer after that change. ๐ Maybe the "/*": {}
construct isn't appropriate for boolean fields (I had it there as well before the change), but it seems to be necessary for the enum / KEYWORD attributes according to my testing...
Moreover, the change did not help in making "Term Category" available after a search returning terms with the termCategory attribute populated. ๐ Any further idea? Maybe there's really some kind of limitation regarding support for filtering based on attributes defined for Glossary Term entity, and/or defined in newly added aspects?bulky-soccer-26729
02/01/2023, 2:57 PM"/*"
syntax was for array fields. After you made your change and rebuilt GMS, did you ingest new data to test this out on? any change to Searchable annotations on PDL files is only applied to new data and is not retroactive with existing data.bulky-soccer-26729
02/01/2023, 2:58 PMbulky-soccer-26729
02/01/2023, 2:59 PMgentle-portugal-21014
02/01/2023, 3:40 PMbulky-soccer-26729
02/01/2023, 3:53 PMgentle-portugal-21014
02/01/2023, 3:55 PMgentle-portugal-21014
02/01/2023, 3:56 PMbulky-soccer-26729
02/01/2023, 3:56 PMgentle-portugal-21014
02/01/2023, 5:34 PMbulky-soccer-26729
02/01/2023, 5:35 PMgentle-portugal-21014
02/01/2023, 5:38 PMgentle-portugal-21014
02/01/2023, 5:40 PM17:15:56.801 [ForkJoinPool.commonPool-worker-9] WARN c.l.m.s.e.query.ESSearchDAO:68 - Received 400 from Elasticsearch. Returning empty search response
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:187)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1892)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1869)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1626)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1583)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1553)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:1069)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:60)
at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:100)
at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:97)
at com.linkedin.metadata.search.client.CachingEntitySearchService.getRawSearchResults(CachingEntitySearchService.java:196)
at com.linkedin.metadata.search.client.CachingEntitySearchService.lambda$getCachedSearchResults$0(CachingEntitySearchService.java:117)
at com.linkedin.metadata.search.cache.CacheableSearcher.getBatch(CacheableSearcher.java:103)
at com.linkedin.metadata.search.cache.CacheableSearcher.getSearchResults(CacheableSearcher.java:55)
at com.linkedin.metadata.search.client.CachingEntitySearchService.getCachedSearchResults(CachingEntitySearchService.java:118)
at com.linkedin.metadata.search.client.CachingEntitySearchService.search(CachingEntitySearchService.java:54)
at com.linkedin.metadata.search.aggregator.AllEntitiesSearchAggregator.lambda$getSearchResultsForEachEntity$2(AllEntitiesSearchAggregator.java:161)
at com.linkedin.metadata.utils.ConcurrencyUtils.lambda$transformAndCollectAsync$0(ConcurrencyUtils.java:24)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1692)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://elasticsearch:9200>], URI [/glossarytermindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
error=[object Object] error=[object Object] status=400
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:302)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:272)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:246)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1613)
... 21 common frames omitted
gentle-portugal-21014
02/01/2023, 5:43 PMgentle-portugal-21014
02/01/2023, 5:46 PMgentle-portugal-21014
02/02/2023, 1:47 PM"/*"
syntax was not necessary. ๐
2. Unfortunately, this doesn't solve the original issue - filtering based on those attributes is not possible despite the "addToFilters": true
annotation. ๐ I still need to get this problem resolved and have no clue what to do there...
3. On top of that, it seems that reindexing the whole database using the datahub-upgrade docker image is not possible for forked repositories containing metamodel changes (extensions). ๐ As part of my testing, I could reindex the individual glossary term records using the API approach (/aspects?action=restoreIndices on GMS), but the datahub-upgrade docker image complains about unknown aspects, etc. Please, let me know if I should open this last point in a different channel - it's kind of related to metadata-modeling, but also to other stuff, and I can imagine that other members of your team might need to be involved.bulky-soccer-26729
02/02/2023, 4:05 PMtermCategory
column in your elastic search index? and the glossary terms you're creating have that field filled out in your database?
3. okay yes this is definitely interesting and something that we should bring up as a separate issue! I would suggest posting in #advice-metadata-modeling I believe as like you said it's due to modeling changesgentle-portugal-21014
02/02/2023, 4:13 PMgentle-portugal-21014
02/02/2023, 4:14 PMbulky-soccer-26729
02/02/2023, 4:16 PMgentle-portugal-21014
02/02/2023, 4:22 PMbulky-soccer-26729
02/02/2023, 4:36 PMgentle-portugal-21014
02/02/2023, 5:10 PMgentle-portugal-21014
02/02/2023, 5:36 PMgentle-portugal-21014
02/02/2023, 5:41 PMINFO c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 3ms
and there wer just a few other similar lines. I understand that those "omitted frames" might have been interesting, but those weren't written to that log.gentle-portugal-21014
02/03/2023, 3:39 PM