A short question: is it correct that there is a ha...
# all-things-deployment
b
A short question: is it correct that there is a hard limit of 10 000 on the number of results you can get from a search query? I've run into strange errors when sending search requests via GraphQL API programmatically: "code 500 SERVER_ERROR, classification "DataFetchingException" " as soon as I tried to set the "start" parameter in GraphQL search queries to values >= 10 000
b
cc @early-lamp-41924 Is this part of those elastic limits?
e
yes. elasticsearch puts a limit on number of search results returned
it just doesn’t handle it
b
Ah, OK. Thanks for the info. So this is not a DataHub setting/issue but something I have to configure in elasticsearch. (or find another approache because elastic documentation discourage increasing the limit as I have just read ...)
b
Yes 😞 What are you hoping to achieve uwe?
b
Well, the usecase was quite simple: • Iterate over all datasets of a DataHub installation and perform a set of actions (add links, remove tags, etc.). The logic to perform the actions was quite complex. So I wrote a small python script which essentially performed a "search * limit to DataSets" search via the graphql interface and looped over the results. The catalog contains some quite large DBs and therefore the search query ran into the 10 000 results limit.
At the moment I've solved the problem by creating smaller artificial "groups" (adding additional constrains to the search query to limit the expected number of results). But as soon as one of my groups gets over the 10 000 the problem will appear again. So, not an ideal and especially not a long term solution.
e
Right. Seems like we should have the ability to scroll through the whole list. We could use the elasticsearch scroll api for this but that also has some limitations like there can only be 500 concurrent scrolls at any time
The otherway would be to do a full scan of the mysql table
b
You can currently do that with the "entities" "listUrns" entity, but we don't publicize this as a public API.
e
But that uses elasticsearch right? so it has the same limitation