This message was deleted Ploomber #ask-anything

Join Slack

This message was deleted.

# ask-anything

Slackbot

08/03/2022, 9:53 PM

This message was deleted.

Eduardo

08/03/2022, 9:57 PM

ploomber is agnostic to what the source code contains, so taks can execute spark, dask or anything else. is this what you're thinking?

Luke Smith

08/03/2022, 10:05 PM

Right, one could define spark operations in a task and, although the task code would be interpreted locally, the computations would be delegated to the configured spark instance. I'm interested in something more similar to joblib-spark.

Eduardo

08/03/2022, 10:06 PM

ah interesting. so this is using the spark cluster to run the code. can it run arbitrary python code?

Luke Smith

08/03/2022, 10:08 PM

I believe it's dataframe operations only. So, if there is a possible integration with ploomber, it won't fit ploomber's conception of an executor.

Luke Smith

08/03/2022, 10:08 PM

I'll look into some way of parameterizing a notebook or script to either run in pandas locally or distribute jobs to a config'd spark cluster.

🙌 1

Eduardo

08/03/2022, 10:09 PM

we're working with the team at Fugue for an integration. their backend supports Spark, so I think once that's ready, then ploomber will support Spark 🙂 https://github.com/fugue-project/fugue

🙌 1

Luke Smith

08/03/2022, 10:10 PM

that's awesome! thanks!

🙌 1

Luke Smith

08/03/2022, 10:17 PM

a similar thread: Can we specify the interpreter?

Copy code

# src/ploomber/tasks/notebook.py
if self.source.language == 'python':
  interpreter = _python_bin()

use case: Connect to local interpreter (default) or remote interpreter. Databricks has a remote kernel integration that allows local notebooks to be executed on Databricks. This is probably also something that could be parameterized at the task level, but it'd be nice to toggle for an entire DAG. note: I'm (probably incorrectly) equivocating interpreter and kernel here, but I hope the intent is clear.

Eduardo

08/03/2022, 10:45 PM

short answer: yes! we execute the notebooks using papermill and it's possible to choose a different kernel . let me dig a little bit to provide a more detailed answer

Eduardo

08/03/2022, 11:18 PM

ok, so I read the article. looks like it's possible. you can customize this:

Copy code

from pathlib import Path
from ploomber import DAG
from ploomber.tasks import NotebookRunner
from ploomber.products import File
dag = DAG()
NotebookRunner(Path('nb.ipynb'), File('report.html'), dag=dag, papermill_params=dict(kernel_name='kernel-name'))
dag.build()

to list the kernels available, you can run

jupyter kernelspec list

, then you can substitute

kernel-name

for the kernel you want to use. I don't have access to a databricks cluster, but let me know if this works

5 Views

Open in Slack

Previous Next