• b

    bumpy-keyboard-50565

    2 years ago
    Welcome @dazzling-judge-80093 ! Since most people are signing up using their personal email addresses, you might wanna share your company name as well.
    b
    d
    2 replies
    Copy to Clipboard
  • m

    magnificent-exabyte-60779

    2 years ago
    Ah, is that the line saying ‘Dataset & field-level commenting’ (6m to 1y)? Are there already some ideas forming around this topic that we can learn from / contribute to? Or others that are interested in this part? I saw the previous town hall some questions were asked about this by Jordan Preston and Anand Mehrotra.
    m
    s
    2 replies
    Copy to Clipboard
  • a

    agreeable-boots-73250

    2 years ago
    hi
    a
    1 replies
    Copy to Clipboard
  • n

    nutritious-ghost-21337

    2 years ago
    Hello #general, I'd like to raise a question f Can we use DataHub to store metadata other than store the metadata structure? I'd like to give you some context to better understand my question: My application process daily an high volume of inbound feeds for several customers. Each feed is transformed to a datamodel, processed to clean/compute/add some information and then stored in a parquet file on a given location (so far then we are working on Hadoop but this may change at some point). Each of those feed i'd be interesting to store information like:- the version of parser/processor/ecc which did generate it - the version of datamodel used ( and if it's deprecated ) - the inbound feed which did generate it - the date when it was generated - the location where i can find the output/input feed - the customer owning the inbound feed Those are just few example of metadata i would like to attach to a given dataset and to store with the main purpose of search through them later on. At the same time I would need to implement an ACL to restrict access to those metadata. I'm currently analyzing DataHub solution to asses if it could satisfy those requirements. I therefore clone the repository and tried some data ingestion. I've played with Rest.li to create some dataset. My first impression is that DataHub is meanly meant to store, manage and search through metadata structure only but maybe I'm approaching this tool from the wrong point of view. Given the usecase i described above can you suggest me if DataHub can fit my needs upon having implemented the appropriate extension to the actual model?
    n
    o
    +1
    6 replies
    Copy to Clipboard
  • m

    mammoth-whale-58647

    2 years ago
    I am a bit confused as to whether the GMS API should expose the aspect model or not. I see for instance that there are two Ownerships, one that seems to be a generic aspect, assignable to any URN and the other being an ownership dedicated to dataset ownership. There is a lot of repetition there, for instance between the "generic" OwnershipType and the dataset-specific "OwnershipCategory". I initially assumed that this was because the aspect model should not be directly exposed through the GMS but rather abstracted away into payloads designed specifically for the GMS API. However, then I noticed that the GMS's dataset ownership resource actually exposes the generic Ownership aspect, and not the dedicated dataset ownership payload from the API module. So now, I don't fully understand if we should abstract away the aspect-driven model in the gms or not.
    m
    b
    5 replies
    Copy to Clipboard
  • m

    mammoth-whale-58647

    2 years ago
    Hi everyone, I have another question. I noticed the GMS uses ComplexKeyResourceTask template underneath, yet I did not find any asynchronous implementations. Is there a particular reason why the async resource template was chosen over the synchronous one? Are there any plans for an async DAO for instance?
    m
    1 replies
    Copy to Clipboard
  • p

    plain-arm-6774

    2 years ago
    Hello all, I came across this blog post on how LinkedIn does authorizations. I was wondering whether DataHub also follows that model. A quick github repo search didn't find obvious ways to configure authorizations. Could someone link me to docs or code that may point me in the right direction? Thanks!
    p
    o
    2 replies
    Copy to Clipboard
  • b

    brash-lock-91510

    2 years ago
    Hello all, here to learn more about how to use DataHub.
    b
    1 replies
    Copy to Clipboard
  • w

    wide-teacher-69432

    2 years ago
    Each dataset is through its URN definition associated with a platform. There exists a model definition (com.linkedin.dataplatform.DataPlatformInfo.pdsc). I believe this dataplatform model is currently not used. Was the idea to have data platform as an entity along with the other 3 entities (dataset, user, userGroup)? I can imagine that information about the platform where datasets are stored might be interesting…
    w
    1 replies
    Copy to Clipboard
  • m

    mammoth-whale-58647

    2 years ago
    Hello everyone, I have another modeling question. I have noticed that for most "standard" resource operations, a complex key (e.g. DatasetKey) is used whereas for the actions (e.g. ingest, backfill), the URN instead seems to be the preferred identifier. Is there a particular reason for this? The reason I ask is because I would like to model "sub-entities" where both root and sub entity can have aspects. I decided against associations because really, the sub resource cannot exist without the parent and I didn't want to embed the sub resource because it is often accessed and modified independant of the parent resource. Sub resources generally work quite well in Rest.li , and they do here for the "standard" resource operations, where I can access/combine parent and child keys as I see fit. It is slightly different for the actions though. To give you some context, my parent resource has a composite key of 3 fields, my sub entity adds a fourth field to that key. Just to access the actions on the subresource, I already need to provide those 3 fields, e.g. /parent/field1,field2,field3/child?action=myaction. But then, I would have to provide the full URN to the subresource which repeats those 3 fields and adds a fourth. It seems so redundant and so now it had me wondering why the actions ingest and backfill aren't just on the Entity resource level, leveraging the same key documents that GET functions would. Before making such modifications, I wanted to better grasp why sometimes URN's are preferred, and sometimes the document keys
    m
    b
    4 replies
    Copy to Clipboard