polite-gold-10582
08/22/2025, 12:59 PMmammoth-mouse-1111
08/22/2025, 5:35 PMRay
tasks.
Do I need to configure something separately from the template URIs in the propeller configmap for ray tasks to give me a link? I couldn't find anything clear in the docs or previous slack messages unfortunately.
Thanks, and appreciate your help!polite-gold-10582
08/25/2025, 8:39 PMMessage:
MaxRetryError: HTTPConnectionPool(host='localhost', port=9000): Max retries exceeded with url: /flyte?location= (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x79443cd44d70>: Failed to establish a new connection: [Errno 111] Connection refused'))
@task(container_image=image_spec)
def download_wod_ds()->FlyteDirectory:
# Set up Minio client
client = Minio(
"localhost:9000",
access_key="BBsho4hEODTw0J2sv6uk",
secret_key="rQn3WKDAhci6EnfwJf48XjpIpkbXInx0R1ciMNG7",
secure=False
)
# Specify the bucket and folder to download
bucket_name = "flyte"
folder_name = "ns-wod-ds"
# Create the download directory
base_path = os.getcwd()
new_directory_path = os.path.join(base_path, "wod-ds")
os.makedirs(new_directory_path, exist_ok=True)
print("made bucket")
# List all objects in the folder
objects = client.list_objects(bucket_name, prefix="flytesnacks/development/ns-wod-ds/training", recursive=True)
print("ran list_objects")
# Download each object to the local directory. FAIL'ed from here when run remote.
for obj in objects:
print("going thru object ist")
object_name = obj.object_name
download_path = os.path.join(new_directory_path, os.path.relpath(object_name, folder_name))
download_dir_path = os.path.dirname(download_path)
os.makedirs(download_dir_path, exist_ok=True)
print("before downloading")
try:
client.fget_object(bucket_name, object_name, download_path)
print(f"Downloaded {object_name} to {download_path}")
except S3Error as exc:
print(f"Error downloading {object_name}: {exc}")
return FlyteDirectory(path=new_directory_path)
polite-gold-10582
08/25/2025, 8:41 PMrapid-artist-48509
08/26/2025, 12:09 AMmy_image = ImageSpec(name='my_custom_image',
builder='noop',
base_image='my-custom/base-image:abc')
@task(container_image=my_image)
def create_dataframe() -> str:
return "DataFrame created successfully!"
@dynamic
def my_dynamic_workflow() -> str:
"""
This dynamic workflow calls the task.
"""
# The dynamic workflow context automatically runs this task.
result = create_dataframe()
return result
When then @dynamic
workflow runs, it tries to kick off a task
with the default flyte image instead of my my-custom/base-image:abc
image. Now clearly the image builder is not supposed to run during a Flyte execution itself (at least i see an if-guard to that effect)... but this is the no-op builder which doesn't actually need to build. So what inside the dynamic
workflow is preventing my `ImageSpec`'s image name from NOT getting reported properly?creamy-whale-12531
08/26/2025, 11:51 AMTypeTransformerFailedError: Error converting input 'df' at position 0:
Literal value: Flyte Serialized object (Literal):
scalar:
structured_dataset:
uri: <s3://my-s3-bucket/data/cn/arnr4n65vn287zp446f5-n2-> [...]
metadata:
structured_dataset_type:
format: parquet
Expected Python type: <class 'pandas.core.frame.DataFrame'>
Exception: Expected checksum ++koqw== did not match calculated checksum: hniEMQ==
It seems to be happening when the pandas dataframe gets transferred between tasks. Here is the simplified workflow:
import os
import flytekit as fl
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from src.util.logging import get_logger, setup_logging
logger = get_logger(__name__)
CACHE_VERSION = "0.1"
image_spec = fl.ImageSpec(
name="weather-app",
requirements="uv.lock",
registry=os.environ["FLYTE_IMAGE_REGISTRY"],
)
request_resources = fl.Resources(cpu="1", mem="1000Mi", ephemeral_storage="500Mi")
limit_resources = fl.Resources(cpu="2", mem="2000Mi", ephemeral_storage="1000Mi")
@fl.task(container_image=image_spec)
def round_datetime_to_hour(dt: datetime) -> datetime:
return datetime(year=dt.year, month=dt.month, day=dt.day, hour=dt.hour, tzinfo=None)
@fl.task(container_image=image_spec)
def get_data(start: datetime, end: datetime) -> pd.DataFrame:
n_rows = 1_000_000
# n_rows = 1_000
<http://logger.info|logger.info>(f"running get_data with start={start}, end={end}")
np.random.seed(42)
start_date = datetime(2020, 1, 1)
end_date = datetime(2024, 12, 31)
dates = pd.date_range(start=start_date, end=end_date, periods=n_rows)
df = pd.DataFrame(
{
"date": dates.astype("datetime64[s]"),
"price": np.random.uniform(10.0, 1000.0, n_rows),
"quantity": np.random.randint(1, 1000, n_rows, int),
}
)
<http://logger.info|logger.info>(f"df dtypes={df.dtypes}")
return df
@fl.task(container_image=image_spec)
def log_data(df: pd.DataFrame):
<http://logger.info|logger.info>(f"data={df.head()}")
@fl.workflow
def experiment(
start: datetime = datetime.now().replace(
tzinfo=None, hour=0, minute=0, second=0, microsecond=0
)
- timedelta(days=3),
end: datetime = datetime.now().replace(
tzinfo=None, hour=0, minute=0, second=0, microsecond=0
),
):
start_date = round_datetime_to_hour(start)
end_date = round_datetime_to_hour(end)
df = get_data(start_date, end_date)
log_data(df)
setup_logging(level="INFO")
I noticed that if the number of rows is 1000 the error does not happen(n_rows variable). Looks like something related to serialisation, so I tried to store it as StructuredDataset, however that did not help. Any pointers on what can be the problem and how to fix it?millions-plastic-44322
08/26/2025, 1:56 PMkubectl port-forward -n flyte svc/flyteadmin 30080:81
, now pyflyte info
actually shows some information so it seems like it connects to something. But when I run the code example from https://github.com/unionai-oss/deploy-flyte/blob/main/environments/aws/flyte-core/README.md using pyflyte run --remote hello_world.py my_wf
it tells me
Running Execution on Remote.
00000 Running execution on remote.
[✔️] Go to http://localhost:30080/console/projects/flytesnacks/domains/development/executions/afr2zbgj2bkp5lfrv9sr to see execution in the console.
Of course there's nothing on that page, and my localhost:8080 (where I have svc/flyteconsole: kubectl port-forward -n flyte svc/flyteconsole 8080:80
) does not show any project info eithergentle-scientist-22504
08/27/2025, 7:12 PMdef *train_aggregator*(_fold_: str) -> str:
in workflow: map_task(train_aggregator)(_fold_=['fold1', 'fold2'])
but error is complaining about mismatch between string and list of strings
he output variable 'main.map_train_aggregator_6b3bd0353da5de6e84d7982921ead2b3-arraynode.o0' has type [collection_type:{simple:STRING}], but it's assigned to the input variable 'o0' which has type type [simple:STRING].
from my understanding, and confirmed by LLM, map_task should handle thisnutritious-rocket-28038
08/28/2025, 4:05 AMfrom flytekit import ContainerTask, kwtypes, FlyteDirectory
produce_data = ContainerTask(
name="produce_data",
image="alpine:3.18",
command=["sh", "-c"],
arguments=[
# Important: write to the output path that Flyte provides
"mkdir -p /var/flyte/output/results && echo 'hello' > /var/flyte/output/results/hello.txt"
],
output_data_dir="/var/flyte/output", # :white_check_mark: ensures Flyte watches this dir
outputs=kwtypes(results=FlyteDirectory),
)
Do you guys have some working exemple of ContainerTask returing FlyteDirectory ?gentle-night-59824
08/29/2025, 7:40 PMretries
parameter? we'd like to modify an internal method to use FlyteRecoverableException
but we need a way to bump the default retries
on all tasks. I noticed a default-max-attempts parameter on flyte propeller, but not entirely sure if this translates to task's retries
parameterworried-airplane-87065
09/03/2025, 3:13 AMenableArtifacts
feature in Flyte? Curious if folks have had any trouble using it?
https://www.union.ai/docs/v1/flyte/deployment/configuration-reference/flyteadmin-config/#featuregates-interfacesfeaturegatesgentle-tomato-480
09/03/2025, 10:30 AMtorch.Tensor
that is an input to another task.
I notice that when regardless if the tensor is on the GPU or CPU when it's outputted, it's put back on the GPU in the next task when running locally.
This is a bit unexepected behaviour, as I would expect it deserialize/load in the next step on the device it was serialized on (i.e. if the tensor was on CPU when outputted, I expect it to be loaded as a CPU tensor. If the tensor is on GPU, I expect it to be loaded on GPU tensor).
Running flytekit==1.15.3 (locally)gentle-tomato-480
09/03/2025, 10:31 AMloud-midnight-18633
09/03/2025, 2:32 PMModuleNotFoundError: No module named 'site-packages'
This is the complete traceback:
/opt/venv/bin/pyflyte-execute:8 in <module> │
│ │
│ ❱ 8 │ sys.exit(execute_task_cmd()) │
│ │
│ /opt/venv/lib/python3.10/site-packages/click/core.py:1161 in __call__ │
│ │
│ ❱ 1161 │ │ return self.main(*args, **kwargs) │
│ │
│ /opt/venv/lib/python3.10/site-packages/click/core.py:1082 in main │
│ │
│ ❱ 1082 │ │ │ │ │ rv = self.invoke(ctx) │
│ │
│ /opt/venv/lib/python3.10/site-packages/click/core.py:1443 in invoke │
│ │
│ ❱ 1443 │ │ │ return ctx.invoke(self.callback, **ctx.params) │
│ │
│ /opt/venv/lib/python3.10/site-packages/click/core.py:788 in invoke │
│ │
│ ❱ 788 │ │ │ │ return __callback(*args, **kwargs) │
│ │
│ /opt/venv/lib/python3.10/site-packages/flytekit/bin/entrypoint.py:715 in │
│ execute_task_cmd │
│ │
│ ❱ 715 │ _execute_task( │
│ │
│ /opt/venv/lib/python3.10/site-packages/flytekit/bin/entrypoint.py:579 in │
│ _execute_task │
│ │
│ ❱ 579 │ │ resolver_obj = load_object_from_module(resolver) │
│ │
│ /opt/venv/lib/python3.10/site-packages/flytekit/tools/module_loader.py:51 in │
│ load_object_from_module │
│ │
│ ❱ 51 │ class_obj_mod = importlib.import_module(".".join(class_obj_mod)) │
│ │
│ /usr/lib/python3.10/importlib/__init__.py:126 in import_module │
│ │
│ ❱ 126 │ return _bootstrap._gcd_import(name[level:], package, level) │
│ in _gcd_import:1050 │
│ in _find_and_load:1027 │
│ in _find_and_load_unlocked:992 │
│ in _call_with_frames_removed:241 │
│ in _gcd_import:1050 │
│ in _find_and_load:1027 │
│ in _find_and_load_unlocked:992 │
│ in _call_with_frames_removed:241 │
│ in _gcd_import:1050 │
│ in _find_and_load:1027 │
│ in _find_and_load_unlocked:992 │
│ in _call_with_frames_removed:241 │
│ in _gcd_import:1050 │
│ in _find_and_load:1027 │
│ in _find_and_load_unlocked:1004 │
╰──────────────────────────────────────────────────────────────────────────────╯
ModuleNotFoundError: No module named 'site-packages'
I tried using @ancient-wolf-19325 and it says to use python project structure which I did but I still keep getting either the same ModuleNotFoundError
or Empty module name
freezing-tailor-85994
09/03/2025, 6:55 PMpytorchjobs.kubeflow.org is forbidden: User "system:serviceaccount:flyte:flyte-backend-flyte-binary" cannot create resource "pytorchjobs" in API group "kubeflow.org" in the namespace "inference-staging"
Relevant section of the helm chart in threadrapid-artist-48509
09/05/2025, 1:33 AMgentle-tomato-480
09/05/2025, 11:15 AMmillions-plastic-44322
09/05/2025, 12:12 PMgreen-engineer-57198
09/08/2025, 3:08 PM@task
) but I can't find a good way to rename them. We use the task name for some of our metrics but they are too long and I would like to shorten themproud-answer-87162
09/09/2025, 3:07 PMproud-answer-87162
09/09/2025, 4:35 PMWebhookTask
. i got it to work but i need to explicitly add httpx
to the imagespec. is that expected? i assumed a feature like would be encapsulated in the base image (@task(container_image="<http://ghcr.io/flyteorg/flytekit:py3.12-latest|ghcr.io/flyteorg/flytekit:py3.12-latest>")
). i'm kinda a python noob, so apologies if this is an obvious package management questionlittle-cricket-84530
09/09/2025, 9:18 PMgentle-tomato-480
09/11/2025, 8:29 AMadventurous-ability-21671
09/12/2025, 8:58 AMaverage-secretary-61436
09/12/2025, 6:33 PMfamous-flag-22960
09/15/2025, 7:55 PMdocker run --rm -it <http://cr.flyte.org/flyteorg/datacatalog-release:v1.16.0|cr.flyte.org/flyteorg/datacatalog-release:v1.16.0> /bin/sh
/ $ vi /tmp/otel.yaml
/ $ cat /tmp/otel.yaml
otel:
file: /tmp/trace.txt
jaeger:
endpoint: <http://localhost:14268/api/traces>
otlpgrpc:
endpoint: <http://localhost:4317>
otlphttp:
endpoint: <http://localhost:4318/v1/traces>
sampler:
parentSampler: always
type: noop
/ $ datacatalog --config /tmp/otel.yaml migrate run
INFO[0000] Using config file: [/tmp/otel.yaml]
Error:
1 error(s) decoding:
* 'file' expected a map, got 'string'
Looks like at a glance the new otel config in the helm chart doesn't match what the various components are expecting.average-secretary-61436
09/16/2025, 5:29 PM.
├── local_scripts
│ ├── program1_input.csv
│ └── program1_quickrun.py
├── pyproject.toml
├── README.md
└── src
└── internal_flytekit
├── __init__.py
├── tasks
│ ├── __init__.py
│ └── program1.py
├── utils.py
└── workflows
├── __init__.py
└── program1.py
I'm running short test scripts in local_scripts
via python local_scripts/program1_quickrun.py
The contents of local_scripts/program1_quickrun.py
are very much like:
#!/usr/bin/env python
import os
FLYTE_TASK_VERSION = "17.4.0-alpha.2.program1"
os.environ["JNJ_FLYTE_IMAGE_VERSION"] = FLYTE_TASK_VERSION
from flytekit import Config, FlyteFile, FlyteRemote
from internal_flytekit.workflows.program1 import (
run_program1_wf,
)
def main() -> None:
remote = FlyteRemote(
config=Config.for_endpoint(endpoint="<http://flyteserver.internal.com|flyteserver.internal.com>"),
default_project="flyteproject",
default_domain="predev",
interactive_mode_enabled=True,
)
exe = remote.execute(
run_program1_wf,
inputs={
"input_csv": FlyteFile("local_scripts/program1_input.csv"),
},
)
print("job", exe.execution_url)
if __name__ == "__main__":
main()
When i run this workflow directly via pyflyte run: it works fine because it's packaging the relevant parts of internal_flytekit in a tar.gz file and serving that to the worker.
but when i just run python local_scripts/program1_quickrun.py
it is only packaging a pkl file and the task fails with the error:
Exception when loading_task, reason USER:RuntimeError: error=No module named 'internal_flytekit', cause=No module named 'internal_flytekit'
Is there something that I'm doing wrong here? or some way to convince the python script to package the whole repo or the relevant parts of the repo with the task?little-cricket-84530
09/16/2025, 5:42 PMgifted-minister-25110
09/18/2025, 11:30 AMbillions-hairdresser-78656
09/18/2025, 5:47 PMpyflyte --pkgs my_workflows package -f
Everything works fine, then we register it:
flytectl register files \
--project $PROJECT \
--domain $DOMAIN \
--archive $ARCHIVE \
--version "$VERSION"
and when we run it we see the following error:
Trace:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/flytekit/bin/entrypoint.py", line 182, in _dispatch_execute
task_def = load_task()
^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/flytekit/bin/entrypoint.py", line 604, in load_task
return resolver_obj.load_task(loader_args=resolver_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/flytekit/core/utils.py", line 309, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/flytekit/core/python_auto_container.py", line 302, in load_task
task_module = importlib.import_module(name=task_module) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/importlib/__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'my_workflows'
Message:
ModuleNotFoundError: No module named 'my_workflows'