Hi team, I have two questions about transformers f...
# troubleshoot
h
Hi team, I have two questions about transformers function. 1. 'Add dataset browse paths': I use this function for classification dataset. But I can't change path for each dataset. For example, I use command 'path templates: 'Test/Test route'. I want to set PATH for a specific table. Test/Test/[Test_table] How can I set PATH for each dataset? 2. 'Add a set of properties': Same question like question No.1. How can I set PATH for a specific table. Thx 🙂
b
can you share how you write the configuration for the transformer? (1) is definitely possible
Copy code
If you don't want the environment but wanted to add something static in the browse path like the database instance name you can use this.
transformers: - type: "set_dataset_browse_path" config: path_templates: - /PLATFORM/marketing_db/DATASET_PARTS
Copy code
It will create browse path like `/mysql/marketing_db/sales/orders` for a table `sales.orders` in `mysql` database instance.
for (2), are you trying to add the same properties to each dataset or different properties?
h
@better-orange-49102 Hi, my recipe is below. transformers: type: set_dataset_browse_path config: path_templates: test/test_1/test_2/DATASET_PARTS When I use this recipe, all dataset are indroduced in one PATH. But I want to separate tables for classification. For example, Database name: DB Tables name: T1, T2, T3, T4... • Animal/Land/T1 • Animal/Ocean/T2, T3 • Plant/Land/T4 But, the present PATH state is 'Animal/Land/T1, T2, T3, T4....' (2)I try to add different properties to each dataset(tables?). Thank you
b
it seems to me that you want a dynamic browsepath, but that transformer does not have that ability. it only allows a single, common rule when rewriting the original browsepath. You probably want to implement your own, based on the metadata-ingestion/src/datahub/ingestion/transformer/add_dataset_browse_path.py transformer.
and for (2), should be looking at this:
Copy code
### Adding a set of properties

If you'd like to add more complex logic for assigning properties, you can use the `add_dataset_properties` transformer, which calls a user-provided class (that extends from `AddDatasetPropertiesResolverBase` class) to determine the properties for each dataset.

The config, which we'd append to our ingestion recipe YAML, would look like this:

```yaml
transformers:
  - type: "add_dataset_properties"
    config:
      add_properties_resolver_class: "<your_module>.<your_class>"
Then define your class to return a list of custom properties, for example:
Copy code
python
import logging
from typing import Dict
from datahub.ingestion.transformer.add_dataset_properties import AddDatasetPropertiesResolverBase
from datahub.metadata.schema_classes import DatasetSnapshotClass

class MyPropertiesResolver(AddDatasetPropertiesResolverBase):
    def get_properties_to_add(self, current: DatasetSnapshotClass) -> Dict[str, str]:
        ### Add custom logic here        
        properties= {'my_custom_property': 'property value'}
        <http://logging.info|logging.info>(f"Adding properties: {properties} to dataset: {current.urn}.")
        return properties
There also exists
simple_add_dataset_properties
transformer for directly assigning properties from the configuration.
properties
field is a dictionary of string values. Note in case of any key collision, the value in the config will overwrite the previous value.```