bitter-lizard-32293
09/09/2022, 8:00 PMsearchAcrossLineage
graphQL query) and we've been seeing super high latencies (> 10s) which executing that query. We spent some time digging into things and it looks like we're spending the bulk of our time in the getLineage call in the ESGraphQueryDao class (as we use ES as our graph store too). I did find one minor bug in that the search lineage results were meant to be cached but that is actually not being done - https://github.com/datahub-project/datahub/pull/5892. This does help us fix repeated calls for the same URN, but first time calls are still taking a while. Does anyone have any recommendations on how we could tune / speed things up here?
Ballparks wise our graph_service_v1
index has around 36M docs (4.8GB on disk) and is currently running 1 shard and 1 replica (wonder if this is too low)little-megabyte-1074
orange-night-91387
09/12/2022, 7:52 PMbitter-lizard-32293
09/12/2022, 8:24 PMquery getDatasetUpstreams($urn: String!) {
searchAcrossLineage(input: {urn: $urn, direction: UPSTREAM, count: 1000, types:DATA_JOB}) {
total
searchResults {
degree
entity {
type
urn
}
}
}
}
for one of our entities and I see degree going up to 5. Total number of results is 17:
{
"data": {
"searchAcrossLineage": {
"total": 17,
"searchResults": [
{
"degree": 3,
...
bitter-lizard-32293
09/12/2022, 8:25 PMbitter-lizard-32293
09/12/2022, 8:38 PMorange-night-91387
09/12/2022, 9:10 PMorange-night-91387
09/12/2022, 9:11 PMbitter-lizard-32293
09/12/2022, 9:11 PMbitter-lizard-32293
09/12/2022, 9:12 PMNeo4J vs Elastic also has significant implications on ingestion time from my experienceInteresting, is neo a lot slower than ES?
bitter-lizard-32293
09/12/2022, 9:16 PMWe're generally seeing a few seconds as the maxSo it's likely the graphs we're hitting have maybe a larger / different shape that is resulting in things being a little slower. But given you have seen O(seconds) it doesn't seem like we're too far off. Are there any plans to try and optimize this a bit? I'm waiting on one of our infra teams giving us the go ahead to gather open telemetry traces in HoneyComb in prod but in qa I did see that the BFS across n hops was a bit slower (and also cause we can't push down entity type predicates to ES as we go hop by hop)
orange-night-91387
09/12/2022, 9:21 PMbitter-lizard-32293
09/12/2022, 9:31 PMbitter-lizard-32293
09/12/2022, 9:31 PMbitter-lizard-32293
09/13/2022, 5:47 PMorange-night-91387
09/13/2022, 5:53 PMbitter-lizard-32293
09/13/2022, 6:18 PM