This message was deleted.
# hamilton-help
s
This message was deleted.
e
Yoinks! Checking -- did yoyu copy the code from the post or tfolk/clone the repo? I have not seen that error, but we have seen folks succeed when they start with this: https://github.com/stitchfix/hamilton/tree/main/examples/dbt
s
Yup, I'm using that repo using the example DBT data. Using these instructions. I figure I've got something amiss in my Python install. I'll come at it from that angle.
e
Hmm strange -- yeah I'd try a fresh python env, then we can ask on the dbt slack --- the FAL people have been particularly helpful
gratitude thank you 1
s
The people on DBT's Slack channel are awesome indeed. They've provided great help to me over several months. Reinstalled Python from scratch to 3.11.0. Then uninstalled & reinstalled everything in
Requirements.txt
(I added some stuff):
Copy code
dbt-core
dbt-snowflake
dbt-fal
dbt-python
scikit-learn
sf-hamilton
dataclasses
numpy
pandas
typing_inspect
Pandas is installed but
dbt run
is giving an error "No module named 'pandas'". The error now:
Copy code
18:41:15  Running with dbt=1.3.1
18:41:15  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 303 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
18:41:15
18:41:17  Concurrency: 4 threads (target='dev')
18:41:17
18:41:17  1 of 2 START sql table model HAMILTON.raw_passengers ........................... [RUN]
18:41:19  1 of 2 OK created sql table model HAMILTON.raw_passengers ...................... [SUCCESS 1 in 1.95s]
18:41:19  2 of 2 START python table model HAMILTON.train_and_infer ....................... [RUN]
18:41:21  2 of 2 ERROR creating python table model HAMILTON.train_and_infer .............. [ERROR in 1.26s]
18:41:21
18:41:21  Finished running 2 table models in 0 hours 0 minutes and 5.23 seconds (5.23s).
18:41:21
18:41:21  Completed with 1 error and 0 warnings:
18:41:21
18:41:21  Database Error in model train_and_infer (models\train_and_infer.py)
18:41:21    100357 (P0000): Python Interpreter Error:
18:41:21    Traceback (most recent call last):
18:41:21      File "_udf_code.py", line 5, in <module>
18:41:21    ModuleNotFoundError: No module named 'pandas'
18:41:21     in function TRAIN_AND_INFER__DBT_SP with handler main
18:41:21    compiled Code at target\run\cq4ds\models\train_and_infer.py
18:41:21
18:41:21  Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2
e
yeah, agreed! And oi -- pandas should really be there ๐Ÿ˜• Running locally to see if I can reproduce...
s
There's got to be something funky in my environment. I appreciate you trying but I don't expect this will repro for you. I'm just wondering what I'm missing: is it the dependencies for Hamilton? For DBT? If for DBT, I've not installed any additional packages in its `packages.yml`; I'm wondering if I need to have pandas installed into DBT itself?
e
Either something with your environment or we missed a file, which would be weird as we've had people run this succesfully...
Also I'm seeing an error where the duckdb version changed and the data we have is not supported
s
part of my issue is i uploaded the hamilton test data to snowflake and am using that. i think what's breaking is DBT doesn't know what pandas is -- I think I need to configure DBT for Python. Let me tinker and when I find the issue I'll let you know what it was. Please don't waste your time on this; it's likely just my install. Thanks!
e
Ahh nice -- and I want the example to work!
Although I agree its likely snowflake in this case messing with the envitonments
s
If there exist any "official" install instructions for Hamilton beyond the link I posted above, that might be helpful.
e
OK, so I think this is what you're running into: https://github.com/dbt-labs/dbt-snowflake/issues/228
And re: official documentation -- you mean for hamilton in general, right?
๐Ÿ‘ 1
s
i saw that github page too ๐Ÿ™‚
๐Ÿ˜† 1
This is because in materialization code we use pandas to determine whether the returned dataframe is a snowpark dataframe or a pandas dataframe, and do slightly different things based on that
e
Feels like you're one step ahead of me here ๐Ÿ™‚ Gitbooks is a good place to get started -- otherwise we have tons of documentation in README + examples: https://github.com/stitchfix/hamilton. And if you feel like we're missing something, we're all ears! Would love to add more -- we've been meaning to improve it for a while.
But otherwise, I have to hop into a meeting soon, and can look more at this afterwards!
s
Thanks for your help! I'll post back when I discover anything interesting.
e
Awesome! You got it.
s
gratitude thank you 1
s
Right now, I'm thinking this may be related to having Anaconda "accepted" by snowflake, and looking into that. unfortunately the one page with my current issue doesn't go into detail on what exactly this means:
Always add pandas as the required dependencies.
One doesn't need to install Python/Pandas with DBT directly in
packages.yml
.
s
ok cool โ€” yeah Hamilton is available on conda if that helps.
s
I was able to get
dbt run
working with the sample data from Hamilton with DuckDB, instead of with Snowflake. (I have accepted the terms for Anaconda in Snowflake.) This leads me to believe the issue is the way I have Snowflake configured and will try to work through that.
There are several Snowflake-specific steps and it requires Python 3.8; I'm running 3.11.0.
e
Nice! OK, so the issue is with snowflake โ€” would love to hear what you end up doing. And we can add it to the examples!
๐Ÿ’ฏ 1
s
Once I get this working, I'll provide a procedure. Really appreciate y'all's help!
gratitude thank you 1
e
Awesome! Appreciate you digging in โ€” good learning here!
๐Ÿ’ฏ 1
s
The issue I'm having now is Snowflake is not recognizing all of the packages (e.g.,
pandas
,
snowflake-snowpark-python
,
hamilton
) needed to run the query.
dbt run
compiles the
.py
script and sends it to Snowflake where it will fail with errors like:
Copy code
Traceback (most recent call last):
  File "_udf_code.py", line 6, in <module>
ModuleNotFoundError: No module named 'python_transforms'
 in function TRAIN_AND_INFER__DBT_SP with handler main
According to various documentation, Snowflake should install the packages at run-time, but no matter what I try in the Snowflake UI, it will not recognize some of the packages, like
hamilton
or
sf-hamilton
. I've accepted the Anaconda stuff in Snowflake. I've installed the required packages locally with
pip install
. This is very new functionality so there's not much out there on troubleshooting running Python on Snowflake. Ideas? The first part of the script that gets run in Snowflake:
e
Ok, so it appears that the problem i that it can't find the local packages... But it doesn't look like it has problems finding pandas?
s
Snowflake can detect
pandas
and
snowflake-snowpark-python
but other packages like
hamilton
it cannot see.
e
OK -- just checking, are you marking
hamilton
as
hamilton
or
sf-hamilton
(the pypi name)?
I think this thread: https://getdbt.slack.com/archives/C03QUA7DWCW/p1663699555520659 shares something similar if you haven't seen it
s
adding
hamilton
as a "package" in the above code yields:
100357 (P0000): Cannot create a Python function with the specified packages. Please check your packages specification and try again.
I've tried both
hamilton
and
sf-hamilton
thanks, looking thru that dbt slack thread
s
@Seth Terrell it should be
sf-hamilton
(pypi) and https://anaconda.org/conda-forge/sf-hamilton
โœ… 1
e
s
i'm aware of it but haven't explored it relative to this.
e
I found it clicking through links in the help threads
s
Actually you may want this one โ€” https://anaconda.org/hamilton-opensource/sf-hamilton since it includes different architecturesโ€ฆ ๐Ÿค” itโ€™s also on my TODO to get an example running on snowpark this monthโ€ฆ
๐Ÿ‘ 1
s
this is very close to working, i'm just missing something. i'm learning a lot in this process for sure ๐Ÿ™‚
e
Awesome! Yeah, if you're still stuck in a bit we can start a thread in the dbt slack to go over this -- they'll probably have a little more insight. you need both custom packages (code) and python packagese. That said, you can define your functions inline, but it will be really ugly...
s
i'll start with that thread you linked, @Elijah Ben Izzy, thank you.
๐Ÿ‘ 1
I bet that's it, per that thread: Snowflake doesn't automatically recognize the third-party
sf-hamilton
package and it'll likely need to be installed manually. I'd suspected as much but I'll need to work through uploading and installing.
e
Also, sorry, I just looked you up on LI (creepily) and see that you're working at Literati! Worked closely with Doug and Daragh in the past.
๐Ÿ‘ 1
s
Yup, Doug is the one who suggested Hamilton to me. Both he and Daragh started several months ago: cool, very smart guys. ๐Ÿ™‚ I'm their Data Engineer that built an EDW in DBT on Snowflake to support some of their DS needs. (I am not a Data Scientist! ๐Ÿ˜„)
e
Oh awesome! Yeah, loved working with them. Always looking for feedback -- would love to have "used by Literati" on the OS page ๐Ÿ˜† We'll roll out the red carpet for y'all and your Hamilton needs. Happy to meet up and discuss approaches if you're ever interested -- getting Hamilton + snowflake to behave happily together is definitely high value for OS adoption.
๐Ÿ’ฏ 1
s
Happy to help out as we can! It'll be cool to get this working.
๐Ÿ”ฅ 1
Status report: I was able to get Hamilton's demo data working with DuckDB but cannot get it working in Snowflake directly. I think this is due to the
sf-hamilton
package not being part of the supported Anaconda packages, and it may not be readily possible to add additional third-party packages in Snowflake (see comment here). It may be possible to import the package as
.gz.tar
but I've been unable to get that working in a Python function created in Snowflake. I've tried various paths to get this working but haven't been successful. At this point we're looking into DBT's out-of-the-box Python functionality. If we can get that functionality working, I think it'll become apparent to us how Hamilton could take us to the next step. But right now, I think we need to get something simple working with respect to getting Python executable in Snowflake, with the models managed in our DBT project. I greatly appreciate everyone here's great help and attitude! Hopefully we'll be circling back to Hamilton in the near future: it's really pretty cool (Snowflake packages not withstanding). Thank you!
e
Understood! We'll look into getting it into snowflake's anaconda packages. As @Stefan Krawczyk said, Hamilton should be able to fit nicely with this, its just a matter of getting the packages supported. Happy to help, looking forward to hearing more from you!
โœ… 1