Is there a way to go about deleting all lineage as...
# troubleshoot
p
Is there a way to go about deleting all lineage aspects that exist?
g
at the moment, our delete API doesn’t have the option to filter by aspect. The best suggestion I can give you is to publish a blank lineage aspect to all entities- this will work well for datasets but is a bit trickier for charts since their upstream lineage is embedded inside ChartInfo aspect.
Are you just interested in deleting dataset lineage?
here’s an example script that can reset the lineage of a set of dataset urns from a file:
Copy code
import concurrent.futures
import os
import sys 
from typing import Dict, Tuple

# from datahub.cli import cli_utils
from datahub.cli.cli_utils import post_entity

MAX_WORKERS: int = (os.cpu_count() or 8) * 3 


def reset_aspect(urn: str) -> Tuple[str, Dict]:

    status = post_entity(
        urn=urn,
        aspect_name="upstreamLineage",
        entity_type="dataset",
        aspect_value={"upstreams": []},
    )   
    return (urn, status)


def reset_upstream_aspect(file_name: str) -> None:
    print(f"Using {MAX_WORKERS} workers.")
    with open(file_name, "r") as f:
        with concurrent.futures.ThreadPoolExecutor(
            max_workers=MAX_WORKERS
        ) as async_executor:
            reset_futures = [ 
                async_executor.submit(reset_aspect, urn.strip().strip('"'))
                for urn in f.readlines()
            ]   
            for future in concurrent.futures.as_completed(reset_futures):
                urn, status = future.result()
                print(f"Update succeeded for {urn} with status {status}")


if __name__ == "__main__":
    reset_upstream_aspect(sys.argv[1])
which I have used before 😊
p
Appreciate it! Not sure why, but there were serious lineage issues in our prod env that weren't present in staging or local. Even when I ran the same ingestion job from my local env to prd, lineage still wasn't showing where it should. I ended up just reverting our runs and rerunning from local and now am seeing the proper lineage
1
I'll keep this script handy for future cases. Thanks!
👍 1
thank you 1
g
Glad to hear you got things resolved, sorry you got in that state in the first place though 🥴
p
Lol no worries Just wanted to make 100%, there should be no difference between using the datahub CLI tool and the datahub Pipeline class right?
g
there should be no difference- the CLI uses the pipeline class under the hood
👍 1
p
Still seeing a discrepency 😕 Ran from my local CLI, one to a local version of datahub (running 0.8.27) and prod (running 0.8.26) Same version of the view file
g
Hmmmmmm
which is local and which is prod?
is it possible the edge to BQ was there before?
and thats why its still around- because it was from an earlier run?
p
one with edge to BQ is local, other is prod i just nuked and reran local this morning, so that edge shouldn't have existed
i also ran with the older version of the sql parser just to check, and that didn't add the edge either
g
understood… so fwiw, the server version shouldn’t be affecting this at all
since the python client is the one producing the edges
the idea that the same client is giving different answers to different servers is very confusing
can you try setting the source to be a file, and looking through that file to verify the edge is there?
p
let me take a look - im gonna nuke local one more time for good measure
okay nuked and reran just the lookml ingestion, and am seeing the edge as expected in local
@green-football-43791 for the above script you sent, is there an easy way to get the list of every urn in the database?
g
Yes, if you have access to your db you can issue a sql query direcetly
select distinct urn from metadata_aspect_v2 where urn like '%your-pattern-here%'
👍 1