Hi! Having a look to the doc of the transforms for...
# ingestion
w
Hi! Having a look to the doc of the transforms for dataset ownership, it mentions:
If you’d like to add more complex logic for assigning ownership, you can use the more generic `add_dataset_ownership` transformer, which calls a user-provided function to determine the ownership of each dataset.
Is there any example of this? I’m not sure how to set such a function in the yaml. Also, does the function need to be registered somewhere? Thanks!
b
just taking a crack at it since im looking at adapting transformers now as well for browsepaths..
Copy code
source:
  type: mssql
  config:
    username: sa
    password: ${MSSQL_PASSWORD}
    database: DemoData
transformers:
  - type: "simple_add_dataset_ownership"
    config:
      owner_urns:
        - "urn:li:corpuser:username1"
        - "urn:li:corpuser:username2"
        - "urn:li:corpGroup:groupname"
sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"
source: https://github.com/linkedin/datahub/tree/master/metadata-ingestion
w
This ☝️ is the example for the
simple_add_dataset_ownership
. However I’m looking for an example for the
add_dataset_ownership
transform. The particularity of this transform if that the config is a callback function. So I want to understand eg whether it requires some registration.
s
I would also be interested in having an example of how to add a custom transformer. I guess this https://github.com/linkedin/datahub/pull/2580/files can serve as an example Without having to open a PR to the main repository would be a plus (in case the transformer is business specific) or we want a feature which might take the team some time to implement due to other priorities.
l
@gray-shoe-75895 ^ can you please point to an example? also, you don't need to contribute back the custom transformer to OSS
g
Yep I’m writing something up here
🙌 1
@witty-butcher-82399 @better-orange-49102 @square-activity-64562 Here’s how you can use a full custom transformer:
Copy code
transformers:
  # Assuming `from <http://import.path.to|import.path.to> import MyTransformer` works and is derived from the Transformer base class:
  - type: "import.path.to.MyTransfomer"
    config:
      some_property: "some.value"
To define the function of the add_dataset_owners or add_dataset_tags, once this PR is merged https://github.com/linkedin/datahub/pull/2858, you can do something along these lines:
Copy code
transformers:
  - type: "add_dataset_tags"
    config:
      get_tags_to_add: "import.path.to.myfunction" # assuming `from <http://import.path.to|import.path.to> import myfunction` works
🚀 2
🥰 2
w
Nice! That’s exactly what I was looking for 👌, thanks!
🎉 1