<@UURM9KWBB> <@UV40X5SF2> Has the team considered ...
# getting-started
b
@ambitious-battery-33996 @microscopic-receptionist-23548 Has the team considered introducing fine-grained access control against the Metadata graph? ie. Dictating which identities can read / write to which entities + aspects?
m
Not so far as I am aware, but I don't have much context / history with the graph. @cool-river-24902 @steep-airplane-62865 might know more
👍 1
s
@big-carpet-38439 At Linkedin, we have different service accounts defined which have different access privileges such as read-only, admin etc.
But these are not fine-grained and it allows access to all metadata-graph
b
Got it
I think it's difficult to control writes coming from MCE because we would need to trust the upstream systems to validate the identity is who they say they are
s
We didn't explore fine-grained access to different entities etc nor I have explored if Neo4j provides something like that/
b
Not just Neo4j specifically I mean more broadly. It'd be nice to say "only X can create Datasets"
"only Y identities can create Dataset Ownership" , etc
s
Oh now I see what you mean
We have some in-house ACL on the Kafka topics as well. So, it's pretty much controlled who can emit to specific Kafka topic. (This is not open source)
b
So that a larger organization can sort of safeguard the system against internal actors.. this type of role-based curation may help guarantee higher quality metadata
Got it that makes sense. But today anyone who can access that topic can change any part of the graph
I guess with the introduction of finer-grained topics this may be more viable
m
@steep-airplane-62865 that will only work with MXE v5 anyway; for now MCE v4 is a monotopic so everyone basically needs ACLs to it anyway
1
s
Exactly. I guess MXE v5 will solve that. cc @worried-nightfall-77549
m
everyone* being anyone emitting any MCE
b
This makes sense. Is the idea of MXE5 to have a topic per entity, or even finer grained?
a
per entity aspect
s
Agreed @microscopic-receptionist-23548. MCE v4 is like an open gate right now. Once you get access, you can change any part of the graph. We don't have role-based access. Hopefully, MXE v5 will help with that going forward.
m
@ambitious-battery-33996 not quite 🙂 its per aspect of each entity
this distinction matters for things like ownership, which can be an aspect of multiple entities
e.g. dataset ownership and metric ownership would be separate topics
a
You are correct ... I meant per entity aspect 😛
b
Ah
So you could have Dataset_ownership topic and Schema_ownership topics?
m
still, in most cases things aren't shared, so it doesn't matter 😉
something like that, @big-carpet-38439
👌 1
though its more complicated than that even... 😕
for shared stuff
a
On a different thought ... actually we could think of going even finer than that. For instance, all espresso ownership can be determined only by Espresso systems, kafka ownership by kafka systems... etc.., etc.,
2
m
PDL doesn't have proper inheritance which causes issues. let me open source a doc on it
b
Does Avro?
m
@ambitious-battery-33996 we didn't just think that, we're doing to do that. there's a catch here with legacy things, at least within LI
externally it may not matter
(as in we will have a DatasetOwnership_Dataset topic with a specific DatasetOwnership model; Ownership can't really be reused)
b
That's no fun
m
I agree, but it's a PDL issue 😐
b
Why can't it be reused here? It has specific dataset related attributes?
m
its twofold I guess, and it is quasi reused
b
It looks like what is persisted today (mysql) would be just Ownership, not DatasetOwnership
m
let me just write a doc on it and you can review it
b
👍
m
basically DatasetOwnership would look like
record DatasetOwnership extends Ownership {}
and thats it
b
Got it...But what benefit in practice does it bring having separate DatasetOwnership? Is this to support evolvability in the case that DatasetOwnership has additional specific attributes?
m
really its because we annotate the aspect with the entity and then that ends up generating the events
1
so if you do that for Ownership, you have a huge list of entities. and what happens if company Y also wants their custom entities in there? they just fork it and rebuild everything?
w
That will involve the concept of federated GMS?
m
or they can make a subrecord and annotate that instead
nope, unrelated chris
also @big-carpet-38439 here's documentation on v5 basics
b
Okay I see
So this is related to some automation to facilitate generating the topic schemas more easily
But I think also as you mentioned related to extensibility
m
its been awhile. that was an issue, but maybe only with the solution
ah yeah there's two solutions
the above, "extending" it, or
record DatasetOwnership { ownership: Ownership; }
e.g. composition
ill write the doc
b
A wrapper.. yes.... Why does "include" directive not suffice here?
Copy code
record DatasetOwnership includes Ownership { ..extra fields here }
m
includes
doesn't actually generate java with any actual inheritance
in java DatasetOwnership will not inherit from Ownership
you can't reuse code
😕
described in detail
basically PDL doesn't have inheritance... at all
which is not great when it generates definitions to an object oriented language
b
Hmm I see
So bad for code reuse
But other than that it would solve everything else
Anyways we will need some level of divergent code to create those entity-specific aspects
m
we do also have a secret unyet documented
@gma.aspect.entities
annotation. Really what we've talked about so far applies to new things
🤐 1
if we forced existing things (dataset ownership) to move to new models, it'd require a data migration
we want to avoid that, so in some cases we are going to allow a limited annotation that associates one aspect with multiple entities
so don't worry about it for existing things 🙂
we'll document it in markdown eventually. MXEv5 is still not ready 😛
at least, externally
c
@big-carpet-38439 This is a great question. For reading metadata, we have ACLs on RestLi APIs per aspect; The RestLi APIs that query graph are lack of fine-grained access control though: will need to leverage GraphDB's access control, Neo4j examples: https://neo4j.com/docs/cypher-manual/current/administration/security/subgraph/
thankyou 1
👀 1
c
https://github.com/linkedin/datahub/issues/1983 We have an open issue for this ^