Hi Everyone I am struggling with search performanc...
# troubleshoot
p
Hi Everyone I am struggling with search performance and found some weird thing. When I call GraphQL API searchAcrossEntities with input parameter "start: 1200, count: 100" just one time, search request was called 91 times from GMS to Elasticsearch. I expect that the count of search request is 13, not 91 because batch size is 100 and total search results is 1230. Do I need to miss any configuration for the right operation? Tested DataHub version was latest, v0.10.2 GMS log
Copy code
2023-05-09 15:13:26,155 [ForkJoinPool.commonPool-worker-201] DEBUG c.l.metadata.search.SearchService - Searching Search documents entities: [dataset], input: 2nd, postFilters: null, sortCriterion: null, from: 1200, size: 100
2023-05-09 15:13:26,156 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 0, size: 100
2023-05-09 15:13:27,191 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 0, size: 100
2023-05-09 15:13:28,225 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 100, size: 100
2023-05-09 15:13:29,010 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 0, size: 100
 2023-05-09 15:13:29,764 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 100, size: 100
2023-05-09 15:13:30,549 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 200, size: 100
2023-05-09 15:13:31,417 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 0, size: 100
2023-05-09 15:13:32,378 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 100, size: 100
2023-05-09 15:13:33,462 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 200, size: 100
2023-05-09 15:13:34,105 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 300, size: 100
---------------<snip>---------------
2023-05-09 15:13:58,505 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 1100, size: 100
2023-05-09 15:13:58,586 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 0, size: 100
2023-05-09 15:13:59,617 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 100, size: 100
2023-05-09 15:14:00,402 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 200, size: 100
2023-05-09 15:14:01,266 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 300, size: 100
2023-05-09 15:14:01,430 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 400, size: 100
2023-05-09 15:14:01,515 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 500, size: 100
2023-05-09 15:14:01,585 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 600, size: 100
2023-05-09 15:14:01,664 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 700, size: 100
2023-05-09 15:14:01,736 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 800, size: 100
2023-05-09 15:14:01,814 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 900, size: 100
2023-05-09 15:14:01,883 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 1000, size: 100
2023-05-09 15:14:01,973 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 1100, size: 100
2023-05-09 15:14:02,056 [ForkJoinPool.commonPool-worker-87] DEBUG c.l.m.s.e.ElasticSearchService - Searching FullText Search documents entityName: dataset, input: 2nd, postFilters: null, sortCriterion: null, from: 1200, size: 100
GraphQL parameter
Copy code
{
  "input": {
    "types": [
      "DATASET"
    ],
    "query": "2nd",
    "start": 1200,
    "count": 100,
    "orFilters": []
  }
}
GraphQL query
Copy code
query getSearch($input:SearchAcrossEntitiesInput!){
  searchAcrossEntities(input:$input){
    total
    count
    searchResults{
      entity{
        urn
        type
        ... on Dataset{
          urn
          name
        }
      }
      matchedFields {
        name
      }
    }
  }
}
šŸ” 1
šŸ“– 1
l
Hey there šŸ‘‹ I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: āœ… There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? āœ… button āœ… It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues? āœ… button Did you find a solution to your issue? āŒ Sorry you weren't able to find a solution. I'm sending you some tips on info you can provide to help the community troubleshoot. Whenever you feel your issue is solved, please react āœ… to your original message to let us know!
d
@brainy-tent-14503 might be able to speak to this šŸ™‚
a
Each entity is a separate index and currently separate searches are done on each index. So 7 entities are being searched, the plan is to execute a search against all indices at once eventually.
p
Thanks for the answer šŸ™‚ We have decided that 'query search' is more appropriate than 'query searchAcrossEntities ' as we will provide search for only one entity type. Even if the 'query searchAcrossEntities' requests a search for only one type of entity, it behaves similarly to combining the search results of multiple types of entities to match the pagination size, making it inefficient when searching for only one type of entity. Here is the number of times the 'searchAcrossEntities' API makes a search request to Elasticsearch internally. • Reuqested Pagination Size : p • Bactch Size : b (default 100) • Expected Search Request Count : m = p / b • Actual Search Request Count : s = m(1+m) /2 => I think that when combining search results from multiple entity types, if there are not enough results or no results at all, the search starts again from the beginning, resulting in a considerable number of search attempts. Thank you for your response again.