This message was deleted.
# ask-anything
s
This message was deleted.
e
what is triggering this error? (what command are you running?)
try this:
Copy code
conda update ploomber-core -c conda-forge
m
I tried to build an SQL pipeline, but even if at the command prompt i type “ploomber scaffold” I get the error
I have done that update command, even with “ploomber>=0.22”
It stays at 0.21
e
the error is coming from another package
core
, so
conda update ploomber-core -c conda-forge
should fix it
m
Ah… I missed the core
e
no worries - it should update automatically when upgrading ploomber, so I'm unsure what happened
the other thing is that upgrading ploomber should take you to 0.22, did you add the
-c conda-forge
part?
m
Yes…
e
alright, let me run a few commands to see if I can reproduce the issue
ok, so if you wanna stick to ploomber 0.21, you can downgrade ploomber-core:
Copy code
conda install ploomber-core=0.1.2 -c conda-forge
m
ploomber 0.21.9 pypi_0 pypi ploomber-core 0.1.2 pypi_0 pypi ploomber-engine 0.0.19 pyhd8ed1ab_0 conda-forge ploomber-scaffold 0.3.1 pyhd8ed1ab_0 conda-forge
after
mamba update -n base “ploomber>=0.22” -c conda-forge
e
ok so that combination should work ploomber 0.21.9 with ploomber-core 0.1.2 - what's weird is that it's not getting you to ploomber 0.22. I ran it with conda and it worked, let me try with mamba
yeah, I'm able to get 0.22:
Copy code
conda create --name test3 python=3.10 -c conda-forge
conda activate test3
mamba update  'ploomber>=0.22' -c conda-forge
ploomber --version

ploomber, version 0.22.0
do you still have the error? using ploomber-core 0.1.2 should fix it. you're not missing anything in 0.22 anyway, we just deprecated some unused APIs
m
let me check
I will report back tomorrow. I was in meetings all day
e
No problem!
m
Okay, I installed ploomber-core=0.1.2 and was able to scaffold a project. Thx
e
great. glad it's working now! still unsure why you're unable to get ploomber 0.22 but feel free to post any other questions if you have issues!
m
What does “fatal: bad revision ‘HEAD’” mean when I run ploomber status?
e
Are you using git tags in your pipeline.yaml? Like `{{git}}’
m
No…
I do have an env.yaml file though
e
Which version are you running? This might be a bug
m
I have never built this pipeline
ploomber 0.22.0 pyhd8ed1ab_0 conda-forge ploomber-core 0.1.2 pyhd8ed1ab_0 conda-forge ploomber-engine 0.0.19 pyhd8ed1ab_0 conda-forge ploomber-scaffold 0.3.1 pyhd8ed1ab_0 conda-forge
e
Yeah. I suspect this is a bug. Try downgrading to 0.21, I’ll try to reproduce it later today
m
Okay…
So “ploomber=0.21.0” ?
e
Any 0.21.X. I think the latest one was 0.21.9 another option would be to delete your env.yaml temporarily. I think that’s what’s causing. The issue
m
Let me try the downgrade
Should I still force: “ploomber-core=0.1.2” with the downgrade?
e
Yeah
👍 1
m
The downgrade did not fix it…
e
Ok, I’ll try to reproduce it and follow up.
m
Thanks…
Eliminating the env.yaml did not help…
e
Last quick fix I can think of is creating a git repo: “git init” in the same folder as your pipeline.yaml and then commit. You can delete it afterwards but I suspect the bug is because there isn’t one
meerkat 1
actually, I should've asked. do you have a git repository in that project?
m
Yes…
e
alright. reproduced the issue! the status table is still visible right? I'm seeing this:
Copy code
Loading pipeline...
fatal: bad revision 'HEAD'
100%|█████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 15101.00it/s]
name      Last run          Outdated?      Product            Doc (short)       Location
--------  ----------------  -------------  -----------------  ----------------  -----------------
get       Has not been run  Source code    File('output/get.  Get data          /Users/eduardo/De
                                           parquet')                            sktop/test/ml/tas
                                                                                ks.py:6
features  Has not been run  Source code &  File('output/feat  Generate new      /Users/eduardo/De
                            Upstream       ures.parquet')     features from     sktop/test/ml/tas
                                                              existing columns  ks.py:20
join      Has not been run  Source code &  File('output/join  Join raw data     /Users/eduardo/De
                            Upstream       .parquet')         with generated    sktop/test/ml/tas
                                                              features          ks.py:29
fit       Has not been run  Source code &  MetaProduct({'mod  Train a model     /Users/eduardo/De
                            Upstream       el': File('output                    sktop/test/ml/fit
                                           /model.pickle'),                     .py
                                           'nb': File('outpu
                                           t/nb.html')})
m
Yes… It seems to run even after the error report
e
looks like the error isn't causing any issues (apart from the error message). the problem happens if you have uncommitted changes or if you move into an older commit. I'll fix it and push a new update but it shouldn't cause any issues
thanks for reporting this!
m
I do have uncommitted changes
e
ok, so yeah this is a weird interaction between git and ploomber but it wont cause any issues
👍 1
m
Thanks…
to confirm, I committed my changes and the error went away
e
cool, thanks for confirming
m
What does this error mean: teradatasql.OperationalError: 1 is not a valid connection pool handle
?
The query did run…
e
looks like a sqlalchemy issue (a third party package we use for sql connections). i think sqlalchemy 2 broke a bunch of stuff, so try downgrading. otherwise, there might a problem in the way you're establishing the connection to the db
m
sqlalchemy 1.4.32
teradatasqlalchemy 17.0.0.3
<teradatasql//d|teradatasql//><user>:***@<server_name>/?logmech=LDAP
e
can you show your clients.py file? (you can delete any sensitive info)
m
"“” Ploomber helper “”" from ploomber.clients import SQLAlchemyClient from data_mngt_tools.context import Context def get_client(params=None): “”" Get the SQLAlchemy URI for a database. parameters ---------- context_root_path: str The path to the main context file. db_connection: str The name of the connection to the the url for. “”" ctx = Context(root_path=params[‘ctx_root_path’], init_db_engines=False) conns = ctx.connections.to_dict() return SQLAlchemyClient(conns[params[‘db_connection’]][‘url’])
e
based on this, it seems like the teradata db driver complains when calling
.close()
- you said the query ran, right? If so, I think this is happening at the end of your pipeline execution, so it shouldn't cause issues. if you want to know for sure if that's the case. run in the terminal:
Copy code
ploomber interact
this will load a Python session, then:
Copy code
dag.build(close_clients=False)
and see if the problem disappears. you might need to pass
force=True
to
dag.build
if the pipeline is up-to-date
m
Okay… That worked. What is the syntax in the pipiline.yaml for the force=True? I assume dag.build is for the python interface?
e
you can do
ploomber build --force
; however, the
close_clients
flag is currently not exposed to the CLI so the only way to change it is via the Python API
m
Did not work
Interactive worked, ploomber build --force did not…
e
I'm assuming that what fixed the issue was really the
close_clients
flag, but there is no way to set that one from the CLI, that's why it fails. you could get around it by doing something like this:
Copy code
class MySQLAlchemyClient(SQLAlchemyClient):
    def close(self):
        pass
then use
MySQLAlchemyClient
, this will prevent ploomber from closing the connection
m
Hmm… I am not sure where I would put this to be called.
e
in your clients.py
m
Ah… thanks
@Eduardo, why would Ploomber keep running a task even without change? I have looked at the metadata file, and the only thing that changes is the timestamp. The product is a CSV file from a pandas.sql_read saved with pandas.to_csv…
Figured it out! I needed to add an init.py to the dir that has the *.py tasks…
e
i dont think the
__init__.py
would make a difference. is this a Python notebook or Python function? maybe an upstream dependency changed? the rule to decide whether to run a task or not is either the code changed or an upstream dependency changed
m
No upstream… The product if from an panda.read_sql to csv…
I have an initial task that deletes the product file and its metadata for subsequent tasks that extract data from a database. Because there is no easy way to tell if an SQL source has changed, the initial task has an params entry for those data sources that have changed. But, Ploomber does not run the tasks that have their product and metadata files removed. I am guessing Ploomber decides upfront what tasks to rerun before any tasks are run. Is there a way around this?
e
i think the simplest way is to write a factory function. in the function's body you can run the logic to determine if the SQL source has changed. if so, delete the metadata/products, the initialize the spec and return the dag object
👍 1
m
When I run with this pipeline.py:
@with_env(‘env.yaml’) def reset_tasks(env): “”" Reset tasks and return DAG. Examples -------- The env[‘tracking_file’] file is a list of task_name: bool entries. If True, the task will be reset. Any entry that is True is set to False and the file is rewritten. Execute in the terminal: ploomber build -e pipeline.reset_tasks “”" dag = DAGSpec(‘pipeline.yaml’, env=dict(env)).to_dag() with open(env[‘tracking_file’], encoding=‘UTF-8’) as file: tracking: Dict[str, str] = yaml.safe_load(file) for task_name, reset in tracking.items(): if reset: print(‘*** resetting:’, task_name) dag[task_name].product.delete() tracking[task_name] = False with open(env[‘tracking_file’], ‘w’, encoding=‘UTF-8’) as file: yaml.dump(tracking, file) return DAGSpec(‘pipeline.yaml’, env=dict(env)).to_dag()
I get: UserWarning: The following placeholders are declared in the environment but unused in the spec: {‘cwd’, ‘root’, ‘now’, ‘here’, ‘git_hash’, ‘git’, ‘tracking_file’, ‘user’}
Also, I tried the example and found I had to delete the file first, then just return and new DAG
e
are you getting any issues? because you can ignore the warning
m
No issues, just the warning… But, the example did not work. That took a bit to figure out
e
ah i see. if you can open a issue and comment on your solution to help other members of our community that'd be great!
m
Open the issue in github? And not sure where you want me to comment.
e
yep, issue on github!