Thread
#secoda-feature-requests
    h

    happy-scooter-645

    7 months ago
    Hi there đź‘‹ Are there plan to support OpenLineage? I think that could help bring the recent Lineage feature to the next level. If I am not mistaken, Amundsen integrates with it. Thanks!
    l

    limited-solstice-12595

    7 months ago
    What is better about Open Lineage? I haven't explored it in too much detail so it would be helpful to understand why this would be helpful?
    h

    happy-scooter-645

    7 months ago
    Hi Etai, different components (typically dbt and Airflow) can publish metadata a centralized lineage repository which readers like Amundsen can read from. That allows the lineage to span multiple systems instead of just one currently (at the moment, Secoda inspects query history to see the data flow from one relation to another within a single data warehouse (DWH), correct me if I am wrong) A complete lineage could go from the source (e.g. an operational database) to a transformation step (e.g. Spark) to a raw table in a DWH to some intermediate dbt models all the way to a final dbt model. Implementation wise: you let us users connect our Secoda workspace to a lineage repository to retrieve the lineage information. We users worry about running the lineage repository and feeding metadata into it. Also just getting started with lineage and OpenLineage, so I might be a bit off, but this should be more or less what it's all about
    e

    elegant-house-93198

    7 months ago
    Hey Boris, thanks providing the detailed perspective here. I have a couple of thoughts based on what you’ve said. We’re trying to achieve a similar goal to OpenLineage with Secoda’s lineage feature. Right now, you’re correct that we try to automatically parse lineage from sources such as Redshift by looking at query history. We also stitch together other lineage information such as dbt and BI tools (i.e. Tableau) lineage information. This works well, but doesn’t capture all the lineage from the source to the final dbt model. After releasing the Secoda Metadata API, it seems like a good use case to extend the functionality to also allow for users to push lineage information to the Secoda API, similar to OpenLineage. That would allow users to capture lineage information that we miss in Secoda. Alternatively, if there are many people who already use OpenLineage, we could treat it similar to dbt’s lineage info where we make an integration for it.
    h

    happy-scooter-645

    7 months ago
    push lineage information to the Secoda API, similar to OpenLineage
    Hey Andrew, OpenLineage already seems to be becoming the standard for data lineage collection, and Marquez which is the leading implementation of the lineage repository offers out-of-the-box integration with popular tools (Spark, dbt, Airflow). That makes it very tempting for data teams to just use this toolset. I don't want to speak for my fellow Secoda customers, but I'd rather use the existing integrations that OpenLineage+Marquez offers than writing custom code to push metadata from e.g. Airflow to the Secoda API.
    Alternatively, if there are many people who already use OpenLineage, we could treat it similar to dbt’s lineage info where we make an integration for it.
    So clearly, I am more interested in this approach. Even if not many people use OpenLineage yet, having an OpenLineage->Secoda integration can encourage many to look into it