average-finland-92144
02/27/2025, 6:24 PMbroad-monitor-993
02/28/2025, 1:48 AMbroad-monitor-993
03/06/2025, 1:44 PMpandas
and numpy
dependency from pandera so that folks who want to use it with polars
or pyspark
don’t have to install pandas as well. It does introduce a breaking change in the way some users might install pandera: of users who counted on pandera
to also install pandas
(which is not recommended anyway), they’ll have to explicitly install pandas in their environment.
Would appreciate any thoughts/concerns here. The plan is to add this to the 0.24.0
release and just make sure that the docs and changelog provide guidance on the breaking change.broad-monitor-993
03/13/2025, 2:24 AMgentle-toddler-32466
03/14/2025, 6:15 PMmake nox-tests
locally intended to be equivalent to running the ci-tests on github?
◦ It looks like this is the case, but testing matrices in noxfile.py and tests.yaml need to be kept in sync manually.
• doc-string tests...
◦ what is the proper way to run them locally? pytest --doctest-modules pandera
?
◦ It looks like they are currently disabled in CI, -name: Check Docs
is commented out.
◦ Is there already a mechanism in place to skip doc tests for optional-extras which are not installed?
• running make nox-tests
failed when I created a new pandera-dev environment in coda with error ValueError("No backends present, looked for ('uv',).")
installing UV with pip install uv
fixed the problem. Does it need to be added to the dev or test dependencies?broad-monitor-993
03/14/2025, 9:01 PMIs there a developer’s guide that covers testing, other than the contributing section of the docs?That’s the only guide, can you add improvements to that same doc?
It looks like this is the case, but testing matrices in noxfile.py and tests.yaml need to be kept in sync manually.correct. Any improvements to this welcome!
what is the proper way to run them locally?I typically dopytest --doctest-modules pandera
make docs
It looks like they are currently disabled in CI,Yeah I forget what exact errors I saw on CI, need to uncomment those and see what happensis commented out.-name: Check Docs
Is there already a mechanism in place to skip doc tests for optional-extras which are not installed?nope, improvements there welcome!
installing UV withyeah let’s add itfixed the problem. Does it need to be added to the dev or test dependencies?pip install uv
quick-bird-504
03/18/2025, 1:22 PMimport datetime
from pandera import dtypes
from pandera.engines import pandas_engine
@pandas_engine.Engine.register_dtype
@dtypes.immutable
class PythonDatetime(pandas_engine.DateTime):
def coerce(self, series):
return pd.to_datetime(series, errors='coerce').dt.to_pydatetime()
COERSE_DO_DATETIME = pa.DataFrameSchema(
{
'date': Column(PythonDatetime(), nullable=True),
},
index=Index(int),
strict=True,
coerce=True
)
data = pd.DataFrame({'date': [np.datetime64('2025-01-01'), np.datetime64('NaT')]})
try:
print('Pandera validation started')
validated_data = COERSE_DO_DATETIME.validate(data)
except pa.errors.SchemaErrors as e:
print(e)
return validated_data
broad-monitor-993
03/20/2025, 1:22 PMwonderful-piano-5966
03/21/2025, 8:22 PMimport pandera.polars as pla
import polars as pl
from typing import Optional
from pandera.engines.polars_engine import Struct
nested_struct = {
"col1": pl.Utf8,
"col2": Optional[pl.Utf8]
}
top_level_struct = {
"column": pl.Struct(nested_struct)
}
class ReproModel(pla.DataFrameModel):
column: Optional[Struct] = pla.Field(
nullable=True,
dtype_kwargs={ "fields": nested_struct }
)
df = (
pl.DataFrame()
.with_columns(
pl.struct(
pl.lit('some string').alias("col1"),
pl.lit('some other string').alias("col2")
).alias("column")
)
)
df2 = (
pl.DataFrame()
.with_columns(
pl.struct(
pl.lit('some string').alias("col1")
).alias("column")
)
)
# these both print correctly
print(df)
print(df2)
print(top_level_struct)
ReproModel.validate(df) # ok
ReproModel.validate(df2) # should be ok, but throws exception
agreeable-school-21279
04/25/2025, 8:20 PMfrom pandera import Field
from pandera.typing import Series
import pandera as pn
import pyarrow as pa
class Position(pn.DataFrameModel):
x: Series[pa.float32] = Field(default=0.0) # m
y: Series[pa.float32] = Field(default=0.0) # m
z: Series[pa.float32] = Field(default=0.0) # m
Position.example(size=10)
I am getting the following error.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File ~/anaconda3/lib/python3.11/site-packages/pandera/engines/pandas_engine.py:288, in Engine.numpy_dtype(cls, pandera_dtype)
287 try:
--> 288 return np.dtype(alias)
289 except TypeError as err:
TypeError: data type 'float[pyarrow]' not understood
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
File ~/anaconda3/lib/python3.11/site-packages/pandera/strategies/pandas_strategies.py:350, in to_numpy_dtype(pandera_dtype)
349 try:
--> 350 np_dtype = pandas_engine.Engine.numpy_dtype(pandera_dtype)
351 except TypeError as err:
File ~/anaconda3/lib/python3.11/site-packages/pandera/engines/pandas_engine.py:290, in Engine.numpy_dtype(cls, pandera_dtype)
289 except TypeError as err:
--> 290 raise TypeError(
291 f"Data type '{pandera_dtype}' cannot be cast to a numpy dtype."
292 ) from err
TypeError: Data type 'float[pyarrow]' cannot be cast to a numpy dtype.
...
358 ) from err
360 if np_dtype == np.dtype("object") or str(pandera_dtype) == "str":
361 np_dtype = np.dtype(str)
TypeError: Data generation for the 'float[pyarrow]' data type is currently unsupported.
rhythmic-boots-31361
05/01/2025, 3:32 PMrhythmic-boots-31361
05/01/2025, 3:46 PMpowerful-horse-58724
05/28/2025, 9:19 PMpowerful-horse-58724
05/28/2025, 9:19 PMcool-nest-98527
06/25/2025, 8:00 PMbroad-monitor-993
07/08/2025, 8:28 PMnutritious-piano-11388
07/14/2025, 3:47 PMaverage-finland-92144
08/01/2025, 3:07 PMfew-electrician-9464
08/01/2025, 5:02 PMvictorious-cpu-10033
08/05/2025, 6:34 PMbroad-monitor-993
08/05/2025, 7:06 PMvictorious-cpu-10033
08/05/2025, 7:26 PMvictorious-cpu-10033
08/05/2025, 7:27 PMbroad-monitor-993
08/06/2025, 5:23 PM