Hi everyone. I see performance problem trying to e...
# troubleshoot
m
Hi everyone. I see performance problem trying to expand the nodes of the lineage graph. We have more than 300 datasource, each datasource has schema with 20-3000 fields. There are upstream and downstream job nodes linked with datasources, All job nodes are part of the same dataflow. When I expand the node I see the GraphQL API responses more then 20MB - 30MB each. Is this a problem with graphQL fragments design or incorrect modeling (all jobs in one flow, or loops in the graph) ? Datahub version 0.8.40
b
@most-monkey-10812 Which version? We recently did some work to reduce the cost of the lineage related queries!
m
Hi. It's 0.8.40. Could you please send the ref to branch/revision number?
@big-carpet-38439 Update: in our Datahub DB we have datasets with up to 3000 fields. The schemeMetadata volume is doubled by previousSchemaMetadat property. More over the lineage graph may include the same large datasource couple of times. Is it a good solution to create nonRecursiveDatasetFieldsWithoutSchemaMetadata and use it in dataJobFields fragment.
operationName:"getEntityLineage"
fragment nonRecursiveDatasetFields on Dataset {
schemaMetadata(version: 0) {
...schemaMetadataFields
__typename
}
previousSchemaMetadata: schemaMetadata(version: -1) {
...schemaMetadataFields
__typename
}
}
fragment dataJobFields on DataJob {
inputOutput {
inputDatasets {
...nonRecursiveDatasetFields
__typename
}
outputDatasets {
...nonRecursiveDatasetFields
__typename
}
inputDatajobs {
...nonRecursiveDataJobFields
__typename
}
__typename
}
}
Is referenced from dataJob.graphql and from lineage.grqphql.
fragment lineageNodeProperties on EntityWithRelationships {
urn
type
... on DataJob {
...dataJobFields
editableProperties {
description
__typename
}
….
}
b
Let me send a few commits where we improved this.