@Andy Dang we’re shooting for a Hamilton DQ release next week. Would love any feedback from you or @Jamie Broomall on what’s proposed with https://github.com/stitchfix/hamilton/pull/147. We don’t need a full whylabs implementation for feedback, more just someone taking an hour or so to see if they can write something like this https://github.com/stitchfix/hamilton/pull/147 that we did for pandera. Wouldn’t want to push something we need to walk back at some point 🙂
I'm looking to optimize performance on postrges
I've just joined a team that's doing ETL with pandas into Postgres. We pull from a series of internal APIs and load a simple docker image that runs the all our py file pipelines.
the design is such that we have the initial de-normalized table in postgres built from pandas, this generates the base table. Then aggregates are built on top via postgres materialized views in sql for specific dashboarding use cases.
1 py file will pulls the data, apply transforms with pandas, and then refreshes any of the mvs.
I am curious to hear what you think of this setup. Why do we need to have a de-normalized table here first of all, and second of all what are we gaining by having materialized views versus tables in postgres? I come from the snowflake world where the MVs would consistently be auto-refreshed, but where performance drops significantly over having a table instead of a view