https://datahubproject.io logo
Docs
Join the conversationJoin Slack
Channels
acryl-omnisend
advice-data-governance
advice-metadata-modeling
all-things-datahub-in-windows
all-things-deployment
announcements
authentication-authorization
chatter
column-level-lineage
contribute
contribute-datahub-blog
data-council-workshop-2023
datahub-soda-test
demo-slack-notifications
design-business-glossary
design-data-product-entity
design-data-quality
design-datahub-documentation
design-dataset-access-requests
design-dataset-joins
feature-requests
flyte-datahub-integration
getting-started
github-activities
help-
i18n-community-contribution
ingestion
integration-alteryx-datahub
integration-azure-datahub
integration-dagster-datahub
integration-databricks-datahub
integration-datastudio-datahub
integration-iceberg-datahub
integration-powerbi-datahub
integration-prefect-datahub
integration-protobuf
integration-tableau-datahub
integration-vertica-datahub
introduce-yourself
jobs
metadata-day22-hackathon
muti-tenant-deployment
office-hours
openapi
plugins
show-and-tell
talk-data-product-management
troubleshoot
ui
Powered by Linen
column-level-lineage
  • a

    acceptable-architect-70237

    08/10/2020, 6:05 PM
    for this point
    SchemaField  -> SchemaField to describe metadata related to dataset schema.  This can be expanded to a new entity DatasetField which represents a field in the schema of the dataset.
    - how would it look like?
    a
    • 2
    • 26
  • b

    brave-appointment-76997

    03/31/2021, 11:05 AM
    hello there, is this feature already implemented?
    m
    • 2
    • 1
  • m

    miniature-eye-9764

    08/24/2021, 10:43 PM
    We at Datafold just opened up a GraphQL API that enables exporting column-level lineage graph that Datafold builds by analyzing SQL query logs into DataHub! https://www.datafold.com/column-level-lineage
    👍 1
    a
    • 2
    • 1
  • m

    millions-engineer-56536

    09/16/2021, 7:46 PM
    We are interested in feeding DataHub with column level lineage... Most of our processing is done by Spark and we can "intercept" execution plans and obtain lineage data from there. Would be nice to get some direction for this
    👍 1
    l
    b
    +8
    • 11
    • 22
  • n

    numerous-camera-74294

    03/22/2022, 2:55 PM
    hi folks! so excited about this feature being already implemented! love it Is there a screenshot of how it is displayed on the frontend? or it is just a metadata implementation for the moment?
    m
    • 2
    • 4
  • r

    rhythmic-stone-77840

    04/08/2022, 9:08 PM
    Super excited to see the Column-Level Lineage feature be released!! I was following up on the roadmap task and noticed that BigQuery had a note saying it was removed from scope due to limited native support. Wondering if there's any other information around what exactly was the issue (couldn't find anything through slack searching)? Also wondering if we need to re-request BigQuery Column Lineage as a feature now that the broader task is closed. (@little-megabyte-1074)
    l
    l
    • 3
    • 5
  • g

    gentle-father-80172

    04/13/2022, 7:38 PM
    Hey quick question! Are there any plans for automatic column level lineage association during ingestion?
    l
    f
    • 3
    • 3
  • l

    little-megabyte-1074

    05/04/2022, 9:29 PM
    Hi folks! 👋🏻 tl;dr: The Core DataHub Team is indefinitely pausing on work to automatically extract column-level lineage but we eagerly welcome Community-Led development in this area!! In early Q2 we had tentative plans to attempt to automatically extract column-level lineage from dbt and Looker sources based on an assumption that it would reliably yield high-accuracy results. After digging deeper in to these tools & all of the various ways that teams within the DataHub Community implement them, we are not confident that we can meaningfully and reliably generate column-level lineage without deeply investing in solving SQL parsing. The Core DataHub Team is going to hold off on this investment for now; we’re a small team and want to stay on track with our aggressive roadmap! Are you passionate about solving column-level lineage?? We are always eager to welcome and support Community-led contributions. If you’re interested in contributing back, let me know & we will do everything we can to support you!! :teamwork:
    s
    r
    • 3
    • 4
  • m

    miniature-television-17996

    05/17/2022, 3:31 PM
    Hello @little-megabyte-1074 ! Our team are looking for tool datalineage by column We will see it as a new entity "column" We are ready to close your tickets in this direction (contribute) Are you interested in it ?
    l
    • 2
    • 1
  • h

    handsome-football-66174

    06/07/2022, 2:50 PM
    @little-megabyte-1074 - Maggie, wanted to the check which sources are supported currently for Column level lineage. Here https://feature-requests.datahubproject.io/roadmap/541 , it mentions about dbt, looker and BigQuery
    l
    • 2
    • 1
  • w

    wooden-chef-22394

    06/24/2022, 1:48 AM
    Hi, When will we release Column-level lineage UI? expecting this feature😀
    l
    • 2
    • 1
  • b

    bland-orange-13353

    07/29/2022, 2:49 AM
    This message was deleted.
    l
    • 2
    • 1
  • a

    average-rocket-98592

    08/11/2022, 8:28 AM
    Hi! I’m adding the fine-grained lineage to a dataset through python SDK. Is there a way to emit in other environments other then prod? Thanks in advance for your help!
    g
    • 2
    • 1
  • r

    rhythmic-stone-77840

    09/06/2022, 3:01 PM
    Hi! Wondering when we'll be able to see column-level lineage within the Datahub UI? I can't find it anymore on the roadmap
    l
    m
    +3
    • 6
    • 8
  • r

    rhythmic-stone-77840

    09/06/2022, 3:02 PM
    Also wondering if the current fine-grained info can be searched through graphQL right now?
  • h

    high-hospital-85984

    10/14/2022, 7:55 AM
    Congrats on shipping 0.9! Really exciting to see the column level lineage in the UI! Quick Q about the Looker: Will we able to trace columns in Views back to say tables in Snowflake, rr does the Looker-source only create the relationship between Views and Explores? Also, has the Looker ingestion job feeding the demo site enabled this feature? I was not able to find an example for it.
    l
    m
    • 3
    • 4
  • d

    damp-school-97151

    10/18/2022, 11:40 PM
    hi! Wondering if there's an example to ingest column-level lineage by file like this https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/file_lineage.yml? Is it supported yet?
  • s

    some-chef-85850

    11/15/2022, 9:49 AM
    Hi everyone! I have been trying out the column level lineage and while doing so I have encountered a few unexpected behaviours that I hope someone could help me understand. The behaviours are: 1. When trying to create a lineage between the FineGrainedLineageUpstreamTypeClass "DATASET" and the FineGrainedLineageDownstreamTypeClass "FIELD", as seen in the fourth and fifth examples here, the lineage won't show up in the UI, whether its in the lineage graph or lineage tab. While recreating the example exactly, and not with my own code, I get the same result. 2. I can partially get around the problem in 1. by adding a single schema field to the upstream dataset and create a lineage between that and the corresponding field in the downstream table. However, that only makes the lineage visible in the lineage tab and not in the lineage graph with "Show Columns" activated. To make that visible I seem to have to also add a "DATASET" to "DATASET" lineage. Then both that, and the "FIELD" to "FIELD" lineage show up. 3. The table I am interested in adding column level lineage to has over 3000 columns, and as such I deemed it acceptable to not show the "FIELD" to "FIELD" lineage in the lineage graph since that would probably be very messy. So I wrote a script and used the Python Emitter to emit the MCPW with the upstreamLineage aspect containing some regular upstreams aswell as the list of 3000 or so fineGrainedLineages. However, none of it shows up in the UI. 4. Related to 3, I have now deleted the column level lineages and re-emitted a lower number of column lineages, deleted the table as a whole and reingested it, back and forth so that the behaviour as of now is that I can verify that the tables upstreamLineage aspect is empty before running the script again, but when I do so, the UI seems to somehow retain some older ingestions since column lineages that should not be there suddenly shows up in the lineage tab. Has anyone else encountered similar behaviours? Or even better, does anyone have any insight into better understanding the reason for these behaviours?
    b
    • 2
    • 13
  • a

    acceptable-terabyte-34789

    11/22/2022, 8:35 AM
    if we can't see the transformOperation in the UI yet...is there any place to look if we're emitting transformOperation correctly? at graphql?
    a
    b
    • 3
    • 2
  • f

    full-helicopter-95955

    11/30/2022, 6:14 PM
    Hello! I am trying to create column level lineage based on this example, and I have been unable to see the column level lineage when I click on 'Show Column' toggle as seen in the screenshot. For reference below is the script I used while trying to generate column level lineage. Can someone please help me to point out what I am missing here?
    def test_Lineage():
            fineGrainedLineages = [
                FineGrainedLineage(
                    upstreamType=FineGrainedLineageUpstreamType.FIELD_SET,
                    upstreams=[fldUrn("bar2", "c1"), fldUrn("bar4", "c1")],
                    downstreamType=FineGrainedLineageDownstreamType.FIELD,
                    downstreams=[fldUrn("bar", "c1")],
                ),
                FineGrainedLineage(
                    upstreamType=FineGrainedLineageUpstreamType.FIELD_SET,
                    upstreams=[fldUrn("bar3", "c2")],
                    downstreamType=FineGrainedLineageDownstreamType.FIELD,
                    downstreams=[fldUrn("bar", "c2")],
                    confidenceScore=0.8,
                    transformOperation="myfunc",
                ),
                FineGrainedLineage(
                    upstreamType=FineGrainedLineageUpstreamType.FIELD_SET,
                    upstreams=[fldUrn("bar2", "c2"), fldUrn("bar2", "c3"), fldUrn("bar3", "c1")],
                    downstreamType=FineGrainedLineageDownstreamType.FIELD_SET,
                    downstreams=[fldUrn("bar", "c3"), fldUrn("bar", "c4")],
                    confidenceScore=0.7,
                ),
                FineGrainedLineage(
                    upstreamType=FineGrainedLineageUpstreamType.DATASET,
                    upstreams=[datasetUrn("bar3")],
                    downstreamType=FineGrainedLineageDownstreamType.FIELD,
                    downstreams=[fldUrn("bar", "c5")],
                ),
                FineGrainedLineage(
                    upstreamType=FineGrainedLineageUpstreamType.DATASET,
                    upstreams=[datasetUrn("bar4")],
                    downstreamType=FineGrainedLineageDownstreamType.FIELD_SET,
                    downstreams=[fldUrn("bar", "c6"), fldUrn("bar", "c7")],
                )]
            upstream = Upstream(dataset=datasetUrn("bar2"), type=DatasetLineageType.TRANSFORMED)
    
            fieldLineages = UpstreamLineage(
                upstreams=[upstream], fineGrainedLineages=fineGrainedLineages
            )
    
            lineageMcp = MetadataChangeProposalWrapper(
                entityType="dataset",
                changeType=ChangeTypeClass.UPSERT,
                entityUrn=datasetUrn("bar"),
                aspectName="upstreamLineage",
                aspect=fieldLineages,
            )
    
            emitter = DatahubRestEmitter("<http://localhost:8080>")
    
            # Emit metadata!
            emitter.emit_mcp(lineageMcp)
    a
    b
    • 3
    • 4
  • f

    future-iron-16086

    12/21/2022, 6:27 PM
    Hello. What method can I use to delete/remove lineage through the python emitter?
    h
    • 2
    • 15
  • c

    crooked-coat-48406

    01/31/2023, 5:24 PM
    Hello all. I just started exploring datahub and one of the key features I am looking for is column level lineage. We are using BigQuery. Does Datahub support this feature on bigquery yet? when I ingested metadata from bigquery I do not see any column level lineage.
    d
    • 2
    • 2
  • w

    white-horse-97256

    02/10/2023, 10:28 PM
    Hi, is there a column-level lineage for mysql database?
    b
    • 2
    • 2
  • g

    green-lock-62163

    02/15/2023, 1:42 PM
    HI, I am searching for an example of Datastore & Column Lineage by file ingestion. Is this an existing feature or do we have to stick to programmatic expression of column level lineage ?
  • m

    miniature-xylophone-2277

    03/01/2023, 7:29 PM
    Hi, is there a column-level lineage for BQ database?
    d
    • 2
    • 1
  • m

    melodic-guitar-21477

    03/07/2023, 11:42 PM
    @astonishing-answer-96712 added a workflow to this channel: *Community Support Bot *.
  • g

    green-lock-62163

    03/16/2023, 4:30 PM
    When we display column level lineage on a Dataset -> DataJob -> Dataset lineage, the displayed lineage will directly connect the (in) Dataset columns with the (out) Dataset columns. This is a simplifying display in some cases. Sometimes it would be nice to have the Column lineage passing through the DataJob with a display of the columns transformation. Your thoughts ?
    l
    • 2
    • 5
  • r

    rhythmic-stone-77840

    03/22/2023, 5:34 PM
    Cross posting this here: https://datahubspace.slack.com/archives/C029A3M079U/p1679499149891749
Powered by Linen
Title
r

rhythmic-stone-77840

03/22/2023, 5:34 PM
Cross posting this here: https://datahubspace.slack.com/archives/C029A3M079U/p1679499149891749
View count: 2