Hi! I have a question about "Compress lineage" fea...
# troubleshoot
r
Hi! I have a question about "Compress lineage" feature. There is lineage of
parent
attached. Two datasets
test-primary
and
test-secondary
has upstream to it. test-primary and test-secondary are siblings - should they be collapsed into one node if compress lineage is enabled? Entity info in thread
parent
Copy code
{
  "value": {
    "com.linkedin.metadata.snapshot.DatasetSnapshot": {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:mssql,parent,PROD)",
      "aspects": [
        {
          "com.linkedin.metadata.key.DatasetKey": {
            "origin": "PROD",
            "name": "parent",
            "platform": "urn:li:dataPlatform:mssql"
          }
        },
        {
          "com.linkedin.common.DataPlatformInstance": {
            "platform": "urn:li:dataPlatform:mssql"
          }
        },
        {
          "com.linkedin.common.BrowsePaths": {
            "paths": [
              "/prod/mssql/parent"
            ]
          }
        }
      ]
    }
  }
}
test-primary
Copy code
{
  "value": {
    "com.linkedin.metadata.snapshot.DatasetSnapshot": {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:mssql,test-primary,PROD)",
      "aspects": [
        {
          "com.linkedin.metadata.key.DatasetKey": {
            "origin": "PROD",
            "name": "test-primary",
            "platform": "urn:li:dataPlatform:mssql"
          }
        },
        {
          "com.linkedin.dataset.UpstreamLineage": {
            "upstreams": [
              {
                "type": "COPY",
                "dataset": "urn:li:dataset:(urn:li:dataPlatform:mssql,parent,PROD)"
              }
            ]
          }
        },
        {
          "com.linkedin.common.DataPlatformInstance": {
            "platform": "urn:li:dataPlatform:mssql"
          }
        },
        {
          "com.linkedin.common.Siblings": {
            "siblings": [
              "urn:li:dataset:(urn:li:dataPlatform:mssql,test-primary,PROD)",
              "urn:li:dataset:(urn:li:dataPlatform:mssql,test-secondary,PROD)"
            ],
            "primary": true
          }
        },
        {
          "com.linkedin.common.BrowsePaths": {
            "paths": [
              "/prod/mssql/test-primary"
            ]
          }
        }
      ]
    }
  }
}
test-secondary
Copy code
{
  "value": {
    "com.linkedin.metadata.snapshot.DatasetSnapshot": {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:mssql,test-secondary,PROD)",
      "aspects": [
        {
          "com.linkedin.metadata.key.DatasetKey": {
            "origin": "PROD",
            "name": "test-secondary",
            "platform": "urn:li:dataPlatform:mssql"
          }
        },
        {
          "com.linkedin.common.BrowsePaths": {
            "paths": [
              "/prod/mssql/test-secondary"
            ]
          }
        },
        {
          "com.linkedin.dataset.UpstreamLineage": {
            "upstreams": [
              {
                "type": "COPY",
                "dataset": "urn:li:dataset:(urn:li:dataPlatform:mssql,parent,PROD)"
              }
            ]
          }
        },
        {
          "com.linkedin.common.Siblings": {
            "siblings": [
              "urn:li:dataset:(urn:li:dataPlatform:mssql,test-primary,PROD)",
              "urn:li:dataset:(urn:li:dataPlatform:mssql,test-secondary,PROD)"
            ],
            "primary": false
          }
        },
        {
          "com.linkedin.common.DataPlatformInstance": {
            "platform": "urn:li:dataPlatform:mssql"
          }
        }
      ]
    }
  }
}
g
Hey there- the siblings urn array should just contain your sibling and not yourself
that may be causing the issues?
r
Hi Gabe! Thank you for response. Tried to remove entity from it's own siblings - same result. I found possible reason. Here is lineage graphql query like that UI sends to gms (I've omitted most of informational fields)
Copy code
{
  dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:mssql,parent,PROD)") {
    name
    downstream: lineage(
      input: {direction: DOWNSTREAM, start: 0, count: 100, separateSiblings: false}
    ) {
      relationships {
        type
        entity {
          urn
        }
      }
    }
  }
}
Please note that urn is
parent
, which has no siblings. Request is handled in SiblingGraphService#getLineage . Thus, at L56 siblingAspectOfEntity appears to be null, and no lineage nodes (test-primary and test-secondary) are merged. I suppose siblings should be fetched not only for requested entity, but for it's lineage as well
g
I see- I will look into this.
generally, siblings lineage logic were designed for siblings that are upstream/downstream of one another
rather than side to side
which may explain why you are seeing this strange behavior
out of curiosity, what is the reason you have two sibling datasets that both have the same parent but do not have lineage to one another? why do you consider these entities siblings?
r
actually we modeling data flow with lineage. Here some service reads data from parent and writes to test-primary and secondary, but there is no data flow between primary and secondary - these tables are HA cluster
Hi @green-football-43791! Just wondering if somebody is working on this issue? If not, I can try to fix it and contribute
r
Hey I have the same use case, is there any solution?
g
What’s your use case exactly?
r
To combine multiple datasets as a node in the lineage visualization. I also tried adding the datasets as siblings, but they are displayed as separate nodes in the visualization.
g
Which version are you on?
r
v0.10.2 @green-football-43791