Hi, just testing out the meta_mapping with the DBT...
# ingestion
s
Hi, just testing out the meta_mapping with the DBT Ingestion. What if we have a meta key that contains different values that should map to different terms. Lets say we can have this in different models:
Copy code
meta:
  some_key: S1
meta:
  some_key: S2
Is this possible? Currently we are doing the mapping ourselves, but wanted to test this out if we didn’t have to add our own logic/complexity. I can’t see in any documentation that we can reuse the actual value from the meta key. Or is it possible to use regexp match in the “match” field?
o
Hi! It is possible to use regex in the match field, but I don't think you can reuse the keys as they get pulled into a Python Dict. There's some extra documentation tucked away as a method doc here: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/utilities/mapping.py#L31 We should definitely make this more visible!
👍 2
m
@orange-night-91387 @green-football-43791 Because the
meta_mapping
is mapped to
operation_defs: Dict[str, Dict] = {}
, we cannot have something like this:
Copy code
meta_mapping:
      data_tier:
        match: "Bronze"
        operation: "add_term"
        config:
          term: "Bronze"
      data_tier:
        match: "Gold"
        operation: "add_term"
        config:
          term: "Gold"
      data_tier:
        match: "Silver"
        operation: "add_term"
        config:
          term: "Silver"
Only the last "block" is loaded in the config. What I would like to do is apply a different term depending on the value of
data_tier
. Is this a limitation of the current codebase, or I have overlooked something? I think the
operation_key
should map to an array of dict and not a dict... Thoughts?
o
Interesting, yeah I don't think this is currently supported. Like you said I think this would require an operation_defs array which would require a few changes here
m
Would this the way you (the DataHub core team) would like to see this be done in the config? If so, I can try and find some time to create a PR.
Copy code
meta_mapping:
      data_tier:
        - match: "Bronze"
          operation: "add_term"
          config:
            term: "Bronze"
        - match: "Gold"
          operation: "add_term"
          config:
            term: "Gold"
        - match: "Silver"
          operation: "add_term"
          config:
            term: "Silver"
          term: "Silver"
where the first entry to match would have its operation executed and the others would be ignored. There are many ways to achieve the same thing, so that's why I think it would be worthwhile if the core team could spend some time and evaluate if this would inline with datahub 's vision. I can probably implement the array solution if needed.
o
This makes sense to me, that way ordering the matches can serve as a priority list too. cc: @big-carpet-38439
👍 1
m
Hey @modern-monitor-81461: this can already be done like this. Looks like our docs on this got lost in the shuffle.
Copy code
meta_mapping:
  data_tier:
    match: "Bronze|Silver|Gold"
    operation: "add_term"
    config:
      term: "{{ $match }}"