This message was deleted Ploomber #ask-anything

Join Slack

This message was deleted.

# ask-anything

Slackbot

01/25/2023, 5:24 PM

This message was deleted.

Eduardo

01/25/2023, 5:25 PM

what is triggering this error? (what command are you running?)

Eduardo

01/25/2023, 5:26 PM

try this:

Copy code

conda update ploomber-core -c conda-forge

Mike Rider

01/25/2023, 5:26 PM

I tried to build an SQL pipeline, but even if at the command prompt i type “ploomber scaffold” I get the error

Mike Rider

01/25/2023, 5:27 PM

I have done that update command, even with “ploomber>=0.22”

Mike Rider

01/25/2023, 5:27 PM

It stays at 0.21

Eduardo

01/25/2023, 5:27 PM

the error is coming from another package

core

, so

conda update ploomber-core -c conda-forge

should fix it

Mike Rider

01/25/2023, 5:28 PM

Ah… I missed the core

Eduardo

01/25/2023, 5:28 PM

no worries - it should update automatically when upgrading ploomber, so I'm unsure what happened

Eduardo

01/25/2023, 5:29 PM

the other thing is that upgrading ploomber should take you to 0.22, did you add the

-c conda-forge

part?

Mike Rider

01/25/2023, 5:30 PM

Yes…

Eduardo

01/25/2023, 5:31 PM

alright, let me run a few commands to see if I can reproduce the issue

Eduardo

01/25/2023, 5:38 PM

ok, so if you wanna stick to ploomber 0.21, you can downgrade ploomber-core:

Copy code

conda install ploomber-core=0.1.2 -c conda-forge

Mike Rider

01/25/2023, 5:43 PM

ploomber 0.21.9 pypi_0 pypi ploomber-core 0.1.2 pypi_0 pypi ploomber-engine 0.0.19 pyhd8ed1ab_0 conda-forge ploomber-scaffold 0.3.1 pyhd8ed1ab_0 conda-forge

Mike Rider

01/25/2023, 5:43 PM

after

Mike Rider

01/25/2023, 5:44 PM

mamba update -n base “ploomber>=0.22” -c conda-forge

Eduardo

01/25/2023, 5:45 PM

ok so that combination should work ploomber 0.21.9 with ploomber-core 0.1.2 - what's weird is that it's not getting you to ploomber 0.22. I ran it with conda and it worked, let me try with mamba

Eduardo

01/25/2023, 5:48 PM

yeah, I'm able to get 0.22:

Eduardo

01/25/2023, 5:48 PM

Copy code

conda create --name test3 python=3.10 -c conda-forge
conda activate test3
mamba update  'ploomber>=0.22' -c conda-forge
ploomber --version

ploomber, version 0.22.0

Eduardo

01/25/2023, 5:49 PM

do you still have the error? using ploomber-core 0.1.2 should fix it. you're not missing anything in 0.22 anyway, we just deprecated some unused APIs

Mike Rider

01/25/2023, 5:50 PM

let me check

Mike Rider

01/25/2023, 11:46 PM

I will report back tomorrow. I was in meetings all day

Eduardo

01/25/2023, 11:49 PM

No problem!

Mike Rider

01/26/2023, 3:50 PM

Okay, I installed ploomber-core=0.1.2 and was able to scaffold a project. Thx

Eduardo

01/26/2023, 3:55 PM

great. glad it's working now! still unsure why you're unable to get ploomber 0.22 but feel free to post any other questions if you have issues!

Mike Rider

01/27/2023, 4:25 PM

What does “fatal: bad revision ‘HEAD’” mean when I run ploomber status?

Eduardo

01/27/2023, 4:27 PM

Are you using git tags in your pipeline.yaml? Like `{{git}}’

Mike Rider

01/27/2023, 4:27 PM

No…

Mike Rider

01/27/2023, 4:27 PM

I do have an env.yaml file though

Eduardo

01/27/2023, 4:28 PM

Which version are you running? This might be a bug

Mike Rider

01/27/2023, 4:28 PM

I have never built this pipeline

Mike Rider

01/27/2023, 4:29 PM

ploomber 0.22.0 pyhd8ed1ab_0 conda-forge ploomber-core 0.1.2 pyhd8ed1ab_0 conda-forge ploomber-engine 0.0.19 pyhd8ed1ab_0 conda-forge ploomber-scaffold 0.3.1 pyhd8ed1ab_0 conda-forge

Eduardo

01/27/2023, 4:30 PM

Yeah. I suspect this is a bug. Try downgrading to 0.21, I’ll try to reproduce it later today

Mike Rider

01/27/2023, 4:30 PM

Okay…

Mike Rider

01/27/2023, 4:32 PM

So “ploomber=0.21.0” ?

Eduardo

01/27/2023, 4:33 PM

Any 0.21.X. I think the latest one was 0.21.9 another option would be to delete your env.yaml temporarily. I think that’s what’s causing. The issue

Mike Rider

01/27/2023, 4:34 PM

Let me try the downgrade

Mike Rider

01/27/2023, 4:35 PM

Should I still force: “ploomber-core=0.1.2” with the downgrade?

Eduardo

01/27/2023, 4:41 PM

Yeah

👍 1

Mike Rider

01/27/2023, 4:48 PM

The downgrade did not fix it…

Eduardo

01/27/2023, 4:51 PM

Ok, I’ll try to reproduce it and follow up.

Mike Rider

01/27/2023, 4:51 PM

Thanks…

Mike Rider

01/27/2023, 5:00 PM

Eliminating the env.yaml did not help…

Eduardo

01/27/2023, 5:02 PM

Last quick fix I can think of is creating a git repo: “git init” in the same folder as your pipeline.yaml and then commit. You can delete it afterwards but I suspect the bug is because there isn’t one

meerkat 1

Eduardo

01/27/2023, 5:18 PM

actually, I should've asked. do you have a git repository in that project?

Mike Rider

01/27/2023, 5:18 PM

Yes…

Eduardo

01/27/2023, 5:19 PM

alright. reproduced the issue! the status table is still visible right? I'm seeing this:

Copy code

Loading pipeline...
fatal: bad revision 'HEAD'
100%|█████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 15101.00it/s]
name      Last run          Outdated?      Product            Doc (short)       Location
--------  ----------------  -------------  -----------------  ----------------  -----------------
get       Has not been run  Source code    File('output/get.  Get data          /Users/eduardo/De
                                           parquet')                            sktop/test/ml/tas
                                                                                ks.py:6
features  Has not been run  Source code &  File('output/feat  Generate new      /Users/eduardo/De
                            Upstream       ures.parquet')     features from     sktop/test/ml/tas
                                                              existing columns  ks.py:20
join      Has not been run  Source code &  File('output/join  Join raw data     /Users/eduardo/De
                            Upstream       .parquet')         with generated    sktop/test/ml/tas
                                                              features          ks.py:29
fit       Has not been run  Source code &  MetaProduct({'mod  Train a model     /Users/eduardo/De
                            Upstream       el': File('output                    sktop/test/ml/fit
                                           /model.pickle'),                     .py
                                           'nb': File('outpu
                                           t/nb.html')})

Mike Rider

01/27/2023, 5:20 PM

Yes… It seems to run even after the error report

Eduardo

01/27/2023, 5:20 PM

looks like the error isn't causing any issues (apart from the error message). the problem happens if you have uncommitted changes or if you move into an older commit. I'll fix it and push a new update but it shouldn't cause any issues

Eduardo

01/27/2023, 5:21 PM

thanks for reporting this!

Mike Rider

01/27/2023, 5:21 PM

I do have uncommitted changes

Eduardo

01/27/2023, 5:21 PM

ok, so yeah this is a weird interaction between git and ploomber but it wont cause any issues

👍 1

Eduardo

01/27/2023, 5:24 PM

you can keep an eye on status: https://github.com/ploomber/ploomber/issues/1067

Mike Rider

01/27/2023, 5:25 PM

Thanks…

Mike Rider

01/27/2023, 5:28 PM

to confirm, I committed my changes and the error went away

Eduardo

01/27/2023, 6:20 PM

cool, thanks for confirming

Mike Rider

01/27/2023, 9:37 PM

What does this error mean: teradatasql.OperationalError: 1 is not a valid connection pool handle

Mike Rider

01/27/2023, 9:37 PM

Mike Rider

01/27/2023, 9:38 PM

The query did run…

Eduardo

01/27/2023, 9:43 PM

looks like a sqlalchemy issue (a third party package we use for sql connections). i think sqlalchemy 2 broke a bunch of stuff, so try downgrading. otherwise, there might a problem in the way you're establishing the connection to the db

Mike Rider

01/27/2023, 9:50 PM

sqlalchemy 1.4.32

Mike Rider

01/27/2023, 9:50 PM

teradatasqlalchemy 17.0.0.3

Mike Rider

01/27/2023, 9:52 PM

<teradatasql//d|teradatasql//><user>:***@<server_name>/?logmech=LDAP

Eduardo

01/27/2023, 9:58 PM

can you show your clients.py file? (you can delete any sensitive info)

Mike Rider

01/27/2023, 10:01 PM

"“” Ploomber helper “”" from ploomber.clients import SQLAlchemyClient from data_mngt_tools.context import Context def get_client(params=None): “”" Get the SQLAlchemy URI for a database. parameters ---------- context_root_path: str The path to the main context file. db_connection: str The name of the connection to the the url for. “”" ctx = Context(root_path=params[‘ctx_root_path’], init_db_engines=False) conns = ctx.connections.to_dict() return SQLAlchemyClient(conns[params[‘db_connection’]][‘url’])

Eduardo

01/27/2023, 10:10 PM

based on this, it seems like the teradata db driver complains when calling

.close()

- you said the query ran, right? If so, I think this is happening at the end of your pipeline execution, so it shouldn't cause issues. if you want to know for sure if that's the case. run in the terminal:

Copy code

ploomber interact

this will load a Python session, then:

Copy code

dag.build(close_clients=False)

and see if the problem disappears. you might need to pass

force=True

dag.build

if the pipeline is up-to-date

Mike Rider

01/27/2023, 10:43 PM

Okay… That worked. What is the syntax in the pipiline.yaml for the force=True? I assume dag.build is for the python interface?

Eduardo

01/27/2023, 10:44 PM

you can do

ploomber build --force

; however, the

close_clients

flag is currently not exposed to the CLI so the only way to change it is via the Python API

Mike Rider

01/27/2023, 10:47 PM

Did not work

Mike Rider

01/27/2023, 10:53 PM

Interactive worked, ploomber build --force did not…

Eduardo

01/27/2023, 11:10 PM

I'm assuming that what fixed the issue was really the

close_clients

flag, but there is no way to set that one from the CLI, that's why it fails. you could get around it by doing something like this:

Copy code

class MySQLAlchemyClient(SQLAlchemyClient):
    def close(self):
        pass

then use

MySQLAlchemyClient

, this will prevent ploomber from closing the connection

Mike Rider

01/28/2023, 11:21 AM

Hmm… I am not sure where I would put this to be called.

Eduardo

01/28/2023, 4:03 PM

in your clients.py

Mike Rider

01/28/2023, 4:06 PM

Ah… thanks

Mike Rider

01/31/2023, 2:40 PM

@Eduardo, why would Ploomber keep running a task even without change? I have looked at the metadata file, and the only thing that changes is the timestamp. The product is a CSV file from a pandas.sql_read saved with pandas.to_csv…

Mike Rider

01/31/2023, 3:05 PM

Figured it out! I needed to add an init.py to the dir that has the *.py tasks…

Eduardo

01/31/2023, 6:48 PM

i dont think the

__init__.py

would make a difference. is this a Python notebook or Python function? maybe an upstream dependency changed? the rule to decide whether to run a task or not is either the code changed or an upstream dependency changed

Mike Rider

01/31/2023, 6:50 PM

No upstream… The product if from an panda.read_sql to csv…

Mike Rider

01/31/2023, 7:36 PM

I have an initial task that deletes the product file and its metadata for subsequent tasks that extract data from a database. Because there is no easy way to tell if an SQL source has changed, the initial task has an params entry for those data sources that have changed. But, Ploomber does not run the tasks that have their product and metadata files removed. I am guessing Ploomber decides upfront what tasks to rerun before any tasks are run. Is there a way around this?

Eduardo

01/31/2023, 9:10 PM

i think the simplest way is to write a factory function. in the function's body you can run the logic to determine if the SQL source has changed. if so, delete the metadata/products, the initialize the spec and return the dag object

👍 1

Mike Rider

02/02/2023, 4:56 PM

When I run with this pipeline.py:

Mike Rider

02/02/2023, 4:56 PM

@with_env(‘env.yaml’) def reset_tasks(env): “”" Reset tasks and return DAG. Examples -------- The env[‘tracking_file’] file is a list of task_name: bool entries. If True, the task will be reset. Any entry that is True is set to False and the file is rewritten. Execute in the terminal: ploomber build -e pipeline.reset_tasks “”" dag = DAGSpec(‘pipeline.yaml’, env=dict(env)).to_dag() with open(env[‘tracking_file’], encoding=‘UTF-8’) as file: tracking: Dict[str, str] = yaml.safe_load(file) for task_name, reset in tracking.items(): if reset: print(‘*** resetting:’, task_name) dag[task_name].product.delete() tracking[task_name] = False with open(env[‘tracking_file’], ‘w’, encoding=‘UTF-8’) as file: yaml.dump(tracking, file) return DAGSpec(‘pipeline.yaml’, env=dict(env)).to_dag()

Mike Rider

02/02/2023, 4:57 PM

I get: UserWarning: The following placeholders are declared in the environment but unused in the spec: {‘cwd’, ‘root’, ‘now’, ‘here’, ‘git_hash’, ‘git’, ‘tracking_file’, ‘user’}

Mike Rider

02/02/2023, 4:58 PM

Also, I tried the example and found I had to delete the file first, then just return and new DAG

Eduardo

02/02/2023, 6:05 PM

are you getting any issues? because you can ignore the warning

Mike Rider

02/02/2023, 6:06 PM

No issues, just the warning… But, the example did not work. That took a bit to figure out

Eduardo

02/02/2023, 6:07 PM

ah i see. if you can open a issue and comment on your solution to help other members of our community that'd be great!

Mike Rider

02/03/2023, 12:26 PM

Open the issue in github? And not sure where you want me to comment.

Eduardo

02/03/2023, 12:46 PM

yep, issue on github!

2 Views

Open in Slack

Previous Next