b

    big-carpet-38439

    1 year ago
    @ambitious-battery-33996 @microscopic-receptionist-23548 Has the team considered introducing fine-grained access control against the Metadata graph? ie. Dictating which identities can read / write to which entities + aspects?
    m

    microscopic-receptionist-23548

    1 year ago
    Not so far as I am aware, but I don't have much context / history with the graph. @cool-river-24902 @steep-airplane-62865 might know more
    s

    steep-airplane-62865

    1 year ago
    @big-carpet-38439 At Linkedin, we have different service accounts defined which have different access privileges such as read-only, admin etc.
    But these are not fine-grained and it allows access to all metadata-graph
    b

    big-carpet-38439

    1 year ago
    Got it
    I think it's difficult to control writes coming from MCE because we would need to trust the upstream systems to validate the identity is who they say they are
    s

    steep-airplane-62865

    1 year ago
    We didn't explore fine-grained access to different entities etc nor I have explored if Neo4j provides something like that/
    b

    big-carpet-38439

    1 year ago
    Not just Neo4j specifically I mean more broadly. It'd be nice to say "only X can create Datasets"
    "only Y identities can create Dataset Ownership" , etc
    s

    steep-airplane-62865

    1 year ago
    Oh now I see what you mean
    We have some in-house ACL on the Kafka topics as well. So, it's pretty much controlled who can emit to specific Kafka topic. (This is not open source)
    b

    big-carpet-38439

    1 year ago
    So that a larger organization can sort of safeguard the system against internal actors.. this type of role-based curation may help guarantee higher quality metadata
    Got it that makes sense. But today anyone who can access that topic can change any part of the graph
    I guess with the introduction of finer-grained topics this may be more viable
    m

    microscopic-receptionist-23548

    1 year ago
    @steep-airplane-62865 that will only work with MXE v5 anyway; for now MCE v4 is a monotopic so everyone basically needs ACLs to it anyway
    s

    steep-airplane-62865

    1 year ago
    Exactly. I guess MXE v5 will solve that. cc @worried-nightfall-77549
    m

    microscopic-receptionist-23548

    1 year ago
    everyone* being anyone emitting any MCE
    b

    big-carpet-38439

    1 year ago
    This makes sense. Is the idea of MXE5 to have a topic per entity, or even finer grained?
    a

    ambitious-battery-33996

    1 year ago
    per entity aspect
    s

    steep-airplane-62865

    1 year ago
    Agreed @microscopic-receptionist-23548. MCE v4 is like an open gate right now. Once you get access, you can change any part of the graph. We don't have role-based access. Hopefully, MXE v5 will help with that going forward.
    m

    microscopic-receptionist-23548

    1 year ago
    @ambitious-battery-33996 not quite 🙂 its per aspect of each entity
    this distinction matters for things like ownership, which can be an aspect of multiple entities
    e.g. dataset ownership and metric ownership would be separate topics
    a

    ambitious-battery-33996

    1 year ago
    You are correct ... I meant per entity aspect 😛
    b

    big-carpet-38439

    1 year ago
    Ah
    So you could have Dataset_ownership topic and Schema_ownership topics?
    m

    microscopic-receptionist-23548

    1 year ago
    still, in most cases things aren't shared, so it doesn't matter 😉
    something like that, @big-carpet-38439
    though its more complicated than that even... 😕
    for shared stuff
    a

    ambitious-battery-33996

    1 year ago
    On a different thought ... actually we could think of going even finer than that. For instance, all espresso ownership can be determined only by Espresso systems, kafka ownership by kafka systems... etc.., etc.,
    m

    microscopic-receptionist-23548

    1 year ago
    PDL doesn't have proper inheritance which causes issues. let me open source a doc on it
    b

    big-carpet-38439

    1 year ago
    Does Avro?
    m

    microscopic-receptionist-23548

    1 year ago
    @ambitious-battery-33996 we didn't just think that, we're doing to do that. there's a catch here with legacy things, at least within LI
    externally it may not matter
    (as in we will have a DatasetOwnership_Dataset topic with a specific DatasetOwnership model; Ownership can't really be reused)
    b

    big-carpet-38439

    1 year ago
    That's no fun
    m

    microscopic-receptionist-23548

    1 year ago
    I agree, but it's a PDL issue 😐
    b

    big-carpet-38439

    1 year ago
    Why can't it be reused here? It has specific dataset related attributes?
    m

    microscopic-receptionist-23548

    1 year ago
    its twofold I guess, and it is quasi reused
    b

    big-carpet-38439

    1 year ago
    It looks like what is persisted today (mysql) would be just Ownership, not DatasetOwnership
    m

    microscopic-receptionist-23548

    1 year ago
    let me just write a doc on it and you can review it
    b

    big-carpet-38439

    1 year ago
    👍
    m

    microscopic-receptionist-23548

    1 year ago
    basically DatasetOwnership would look like
    record DatasetOwnership extends Ownership {}
    and thats it
    b

    big-carpet-38439

    1 year ago
    Got it...But what benefit in practice does it bring having separate DatasetOwnership? Is this to support evolvability in the case that DatasetOwnership has additional specific attributes?
    m

    microscopic-receptionist-23548

    1 year ago
    really its because we annotate the aspect with the entity and then that ends up generating the events
    so if you do that for Ownership, you have a huge list of entities. and what happens if company Y also wants their custom entities in there? they just fork it and rebuild everything?
    w

    worried-nightfall-77549

    1 year ago
    That will involve the concept of federated GMS?
    m

    microscopic-receptionist-23548

    1 year ago
    or they can make a subrecord and annotate that instead
    nope, unrelated chris
    also @big-carpet-38439 here's documentation on v5 basics
    b

    big-carpet-38439

    1 year ago
    Okay I see
    So this is related to some automation to facilitate generating the topic schemas more easily
    But I think also as you mentioned related to extensibility
    m

    microscopic-receptionist-23548

    1 year ago
    its been awhile. that was an issue, but maybe only with the solution
    ah yeah there's two solutions
    the above, "extending" it, or
    record DatasetOwnership { ownership: Ownership; }
    e.g. composition
    ill write the doc
    b

    big-carpet-38439

    1 year ago
    A wrapper.. yes.... Why does "include" directive not suffice here?
    record DatasetOwnership includes Ownership { ..extra fields here }
    m

    microscopic-receptionist-23548

    1 year ago
    includes
    doesn't actually generate java with any actual inheritance
    in java DatasetOwnership will not inherit from Ownership
    you can't reuse code
    😕
    described in detail
    basically PDL doesn't have inheritance... at all
    which is not great when it generates definitions to an object oriented language
    b

    big-carpet-38439

    1 year ago
    Hmm I see
    So bad for code reuse
    But other than that it would solve everything else
    Anyways we will need some level of divergent code to create those entity-specific aspects
    m

    microscopic-receptionist-23548

    1 year ago
    we do also have a secret unyet documented
    @gma.aspect.entities
    annotation. Really what we've talked about so far applies to new things
    if we forced existing things (dataset ownership) to move to new models, it'd require a data migration
    we want to avoid that, so in some cases we are going to allow a limited annotation that associates one aspect with multiple entities
    so don't worry about it for existing things 🙂
    we'll document it in markdown eventually. MXEv5 is still not ready 😛
    at least, externally
    c

    cool-river-24902

    1 year ago
    @big-carpet-38439 This is a great question. For reading metadata, we have ACLs on RestLi APIs per aspect; The RestLi APIs that query graph are lack of fine-grained access control though: will need to leverage GraphDB's access control, Neo4j examples: https://neo4j.com/docs/cypher-manual/current/administration/security/subgraph/
    c

    chilly-barista-6524

    1 year ago
    https://github.com/linkedin/datahub/issues/1983 We have an open issue for this ^