< ambitious battery 33996> < microscopic receptionist 23548> DataHub #getting-started

<@UURM9KWBB> <@UV40X5SF2> Has the team considered ...

big-carpet-38439

12/16/2020, 6:02 PM

@ambitious-battery-33996 @microscopic-receptionist-23548 Has the team considered introducing fine-grained access control against the Metadata graph? ie. Dictating which identities can read / write to which entities + aspects?

microscopic-receptionist-23548

12/16/2020, 6:03 PM

Not so far as I am aware, but I don't have much context / history with the graph. @cool-river-24902 @steep-airplane-62865 might know more

👍 1

steep-airplane-62865

12/16/2020, 6:06 PM

@big-carpet-38439 At Linkedin, we have different service accounts defined which have different access privileges such as read-only, admin etc.

steep-airplane-62865

12/16/2020, 6:07 PM

But these are not fine-grained and it allows access to all metadata-graph

big-carpet-38439

12/16/2020, 6:07 PM

Got it

big-carpet-38439

12/16/2020, 6:07 PM

I think it's difficult to control writes coming from MCE because we would need to trust the upstream systems to validate the identity is who they say they are

steep-airplane-62865

12/16/2020, 6:07 PM

We didn't explore fine-grained access to different entities etc nor I have explored if Neo4j provides something like that/

big-carpet-38439

12/16/2020, 6:08 PM

Not just Neo4j specifically I mean more broadly. It'd be nice to say "only X can create Datasets"

big-carpet-38439

12/16/2020, 6:08 PM

"only Y identities can create Dataset Ownership" , etc

steep-airplane-62865

12/16/2020, 6:08 PM

Oh now I see what you mean

steep-airplane-62865

12/16/2020, 6:10 PM

We have some in-house ACL on the Kafka topics as well. So, it's pretty much controlled who can emit to specific Kafka topic. (This is not open source)

big-carpet-38439

12/16/2020, 6:10 PM

So that a larger organization can sort of safeguard the system against internal actors.. this type of role-based curation may help guarantee higher quality metadata

big-carpet-38439

12/16/2020, 6:10 PM

Got it that makes sense. But today anyone who can access that topic can change any part of the graph

big-carpet-38439

12/16/2020, 6:10 PM

I guess with the introduction of finer-grained topics this may be more viable

microscopic-receptionist-23548

12/16/2020, 6:11 PM

@steep-airplane-62865 that will only work with MXE v5 anyway; for now MCE v4 is a monotopic so everyone basically needs ACLs to it anyway

➕ 1

steep-airplane-62865

12/16/2020, 6:11 PM

Exactly. I guess MXE v5 will solve that. cc @worried-nightfall-77549

microscopic-receptionist-23548

12/16/2020, 6:11 PM

everyone* being anyone emitting any MCE

big-carpet-38439

12/16/2020, 6:13 PM

This makes sense. Is the idea of MXE5 to have a topic per entity, or even finer grained?

ambitious-battery-33996

12/16/2020, 6:13 PM

per entity aspect

steep-airplane-62865

12/16/2020, 6:13 PM

Agreed @microscopic-receptionist-23548. MCE v4 is like an open gate right now. Once you get access, you can change any part of the graph. We don't have role-based access. Hopefully, MXE v5 will help with that going forward.

microscopic-receptionist-23548

12/16/2020, 6:13 PM

@ambitious-battery-33996 not quite 🙂 its per aspect of each entity

microscopic-receptionist-23548

12/16/2020, 6:14 PM

this distinction matters for things like ownership, which can be an aspect of multiple entities

microscopic-receptionist-23548

12/16/2020, 6:14 PM

e.g. dataset ownership and metric ownership would be separate topics

ambitious-battery-33996

12/16/2020, 6:14 PM

You are correct ... I meant per entity aspect 😛

big-carpet-38439

12/16/2020, 6:14 PM

big-carpet-38439

12/16/2020, 6:14 PM

So you could have Dataset_ownership topic and Schema_ownership topics?

microscopic-receptionist-23548

12/16/2020, 6:14 PM

still, in most cases things aren't shared, so it doesn't matter 😉

microscopic-receptionist-23548

12/16/2020, 6:15 PM

something like that, @big-carpet-38439

👌 1

microscopic-receptionist-23548

12/16/2020, 6:16 PM

though its more complicated than that even... 😕

microscopic-receptionist-23548

12/16/2020, 6:17 PM

for shared stuff

ambitious-battery-33996

12/16/2020, 6:17 PM

On a different thought ... actually we could think of going even finer than that. For instance, all espresso ownership can be determined only by Espresso systems, kafka ownership by kafka systems... etc.., etc.,

➕ 2

microscopic-receptionist-23548

12/16/2020, 6:17 PM

PDL doesn't have proper inheritance which causes issues. let me open source a doc on it

big-carpet-38439

12/16/2020, 6:17 PM

Does Avro?

microscopic-receptionist-23548

12/16/2020, 6:18 PM

@ambitious-battery-33996 we didn't just think that, we're doing to do that. there's a catch here with legacy things, at least within LI

microscopic-receptionist-23548

12/16/2020, 6:18 PM

externally it may not matter

microscopic-receptionist-23548

12/16/2020, 6:18 PM

(as in we will have a DatasetOwnership_Dataset topic with a specific DatasetOwnership model; Ownership can't really be reused)

big-carpet-38439

12/16/2020, 6:18 PM

That's no fun

microscopic-receptionist-23548

12/16/2020, 6:19 PM

I agree, but it's a PDL issue 😐

big-carpet-38439

12/16/2020, 6:19 PM

Why can't it be reused here? It has specific dataset related attributes?

microscopic-receptionist-23548

12/16/2020, 6:19 PM

its twofold I guess, and it is quasi reused

big-carpet-38439

12/16/2020, 6:19 PM

It looks like what is persisted today (mysql) would be just Ownership, not DatasetOwnership

microscopic-receptionist-23548

12/16/2020, 6:19 PM

let me just write a doc on it and you can review it

big-carpet-38439

12/16/2020, 6:19 PM

👍

microscopic-receptionist-23548

12/16/2020, 6:20 PM

basically DatasetOwnership would look like

record DatasetOwnership extends Ownership {}

and thats it

big-carpet-38439

12/16/2020, 6:20 PM

Got it...But what benefit in practice does it bring having separate DatasetOwnership? Is this to support evolvability in the case that DatasetOwnership has additional specific attributes?

microscopic-receptionist-23548

12/16/2020, 6:21 PM

really its because we annotate the aspect with the entity and then that ends up generating the events

✅ 1

microscopic-receptionist-23548

12/16/2020, 6:22 PM

so if you do that for Ownership, you have a huge list of entities. and what happens if company Y also wants their custom entities in there? they just fork it and rebuild everything?

worried-nightfall-77549

12/16/2020, 6:22 PM

That will involve the concept of federated GMS?

microscopic-receptionist-23548

12/16/2020, 6:22 PM

or they can make a subrecord and annotate that instead

microscopic-receptionist-23548

12/16/2020, 6:22 PM

nope, unrelated chris

microscopic-receptionist-23548

12/16/2020, 6:22 PM

also @big-carpet-38439 here's documentation on v5 basics

microscopic-receptionist-23548

12/16/2020, 6:22 PM

https://github.com/linkedin/datahub-gma/blob/master/docs/what/mxev5.md

thankyou 1

big-carpet-38439

12/16/2020, 6:23 PM

Okay I see

big-carpet-38439

12/16/2020, 6:23 PM

So this is related to some automation to facilitate generating the topic schemas more easily

big-carpet-38439

12/16/2020, 6:23 PM

But I think also as you mentioned related to extensibility

microscopic-receptionist-23548

12/16/2020, 6:24 PM

its been awhile. that was an issue, but maybe only with the solution

microscopic-receptionist-23548

12/16/2020, 6:24 PM

ah yeah there's two solutions

microscopic-receptionist-23548

12/16/2020, 6:25 PM

the above, "extending" it, or

record DatasetOwnership { ownership: Ownership; }

e.g. composition

microscopic-receptionist-23548

12/16/2020, 6:25 PM

ill write the doc

big-carpet-38439

12/16/2020, 6:26 PM

A wrapper.. yes.... Why does "include" directive not suffice here?

Copy code

record DatasetOwnership includes Ownership { ..extra fields here }

microscopic-receptionist-23548

12/16/2020, 6:37 PM

includes

doesn't actually generate java with any actual inheritance

microscopic-receptionist-23548

12/16/2020, 6:37 PM

in java DatasetOwnership will not inherit from Ownership

microscopic-receptionist-23548

12/16/2020, 6:38 PM

you can't reuse code

microscopic-receptionist-23548

12/16/2020, 6:38 PM

😕

microscopic-receptionist-23548

12/16/2020, 6:38 PM

described in detail

microscopic-receptionist-23548

12/16/2020, 6:38 PM

https://github.com/linkedin/datahub-gma/pull/67

microscopic-receptionist-23548

12/16/2020, 6:39 PM

basically PDL doesn't have inheritance... at all

microscopic-receptionist-23548

12/16/2020, 6:40 PM

which is not great when it generates definitions to an object oriented language

big-carpet-38439

12/16/2020, 6:42 PM

Hmm I see

big-carpet-38439

12/16/2020, 6:42 PM

So bad for code reuse

big-carpet-38439

12/16/2020, 6:43 PM

But other than that it would solve everything else

big-carpet-38439

12/16/2020, 6:43 PM

Anyways we will need some level of divergent code to create those entity-specific aspects

microscopic-receptionist-23548

12/16/2020, 6:44 PM

we do also have a secret unyet documented

@gma.aspect.entities

annotation. Really what we've talked about so far applies to new things

🤐 1

microscopic-receptionist-23548

12/16/2020, 6:44 PM

if we forced existing things (dataset ownership) to move to new models, it'd require a data migration

microscopic-receptionist-23548

12/16/2020, 6:44 PM

we want to avoid that, so in some cases we are going to allow a limited annotation that associates one aspect with multiple entities

microscopic-receptionist-23548

12/16/2020, 6:45 PM

so don't worry about it for existing things 🙂

microscopic-receptionist-23548

12/16/2020, 6:46 PM

https://github.com/linkedin/datahub-gma/blob/master/gradle-plugins/metadata-annota[…]in/metadata/annotations/GmaEntitiesAnnotationAllowListImpl.java

microscopic-receptionist-23548

12/16/2020, 6:47 PM

we'll document it in markdown eventually. MXEv5 is still not ready 😛

microscopic-receptionist-23548

12/16/2020, 6:47 PM

at least, externally

cool-river-24902

12/16/2020, 7:58 PM

@big-carpet-38439 This is a great question. For reading metadata, we have ACLs on RestLi APIs per aspect; The RestLi APIs that query graph are lack of fine-grained access control though: will need to leverage GraphDB's access control, Neo4j examples: https://neo4j.com/docs/cypher-manual/current/administration/security/subgraph/

thankyou 1

👀 1

chilly-barista-6524

12/17/2020, 6:08 AM

https://github.com/linkedin/datahub/issues/1983 We have an open issue for this ^

Open in Slack

Previous Next