Slackbot
08/09/2023, 11:22 PMElijah Ben Izzy
08/09/2023, 11:24 PMresolve
So, specifically:
1. Pandera can create a dataframe schema for multiple columns
2. If you do want extract_columns
you can use the target_
parameter to hit the dataframe
3. resolve
will make it config-driven. That said, if you validate a superset, you may just want optional columns and to validate all
4. If you do use resolve, I’d consider wrapping it in your own decorator that delegates to `resolve`(really just creating a function that calls to resolve.
OK, (2):
@check_output(schema=..., target_="foo") # tells you to check the dataframe
@extract_columns(...)
def foo() -> pd.DataFrame:
...
Elijah Ben Izzy
08/09/2023, 11:26 PMresolve
, you would do something like:
@resolve(
when=ResolveAt.CONFIG_AVAILABLE,
resolve = lambda columns_to_resolve : check_output(..., target_="foo")
)
def foo() -> pd.DataFrame
...
Then, if you wanted to, you could define a custom decorator:
@check_columns(columns: str, ..., target_="foo")
def foo() -> pd.DataFrame:
...
This would just delegate to resolve
Amos
08/09/2023, 11:53 PMstrict="filter"
so anything not recognised (required or optional) gets dropped at that point (easy to rejoin the cruft later if necessary). The issue is that the names of a subset of required columns can't be known in advance, so I need to pass them in and adjust the schema to pick them up. I think the solution is something like resolve
with a custom decorator, as you suggest. Just have to wrap my head around how to do that. Maybe a coffee first.Elijah Ben Izzy
08/09/2023, 11:56 PMdef check_columns_with_my_schema(columns: List[str], df_name: str): # maybe take in other params?
schema = _build_schema_from_columns_and_other_params(...) # implement this
return resolve(
when=ResolveAt.CONFIG_AVAILABLE,
resolve= lambda columns: check_output(..., target_="...")
)
Then you just have to pass in the columns
param at configuration timeAmos
08/09/2023, 11:59 PMElijah Ben Izzy
08/10/2023, 12:00 AM