This message was deleted.
# ask-anything
s
This message was deleted.
e
ploomber is agnostic to what the source code contains, so taks can execute spark, dask or anything else. is this what you're thinking?
l
Right, one could define spark operations in a task and, although the task code would be interpreted locally, the computations would be delegated to the configured spark instance. I'm interested in something more similar to joblib-spark.
e
ah interesting. so this is using the spark cluster to run the code. can it run arbitrary python code?
l
I believe it's dataframe operations only. So, if there is a possible integration with ploomber, it won't fit ploomber's conception of an executor.
I'll look into some way of parameterizing a notebook or script to either run in pandas locally or distribute jobs to a config'd spark cluster.
🙌 1
e
we're working with the team at Fugue for an integration. their backend supports Spark, so I think once that's ready, then ploomber will support Spark 🙂 https://github.com/fugue-project/fugue
🙌 1
l
that's awesome! thanks!
🙌 1
a similar thread: Can we specify the interpreter?
Copy code
# src/ploomber/tasks/notebook.py
if self.source.language == 'python':
  interpreter = _python_bin()
use case: Connect to local interpreter (default) or remote interpreter. Databricks has a remote kernel integration that allows local notebooks to be executed on Databricks. This is probably also something that could be parameterized at the task level, but it'd be nice to toggle for an entire DAG. note: I'm (probably incorrectly) equivocating interpreter and kernel here, but I hope the intent is clear.
e
short answer: yes! we execute the notebooks using papermill and it's possible to choose a different kernel . let me dig a little bit to provide a more detailed answer
ok, so I read the article. looks like it's possible. you can customize this:
Copy code
from pathlib import Path
from ploomber import DAG
from ploomber.tasks import NotebookRunner
from ploomber.products import File
dag = DAG()
NotebookRunner(Path('nb.ipynb'), File('report.html'), dag=dag, papermill_params=dict(kernel_name='kernel-name'))
dag.build()
to list the kernels available, you can run
jupyter kernelspec list
, then you can substitute
kernel-name
for the kernel you want to use. I don't have access to a databricks cluster, but let me know if this works