https://flyte.org logo
Join Slack
Powered by
# flyte-support
  • g

    gentle-tomato-480

    11/03/2025, 12:03 PM
    Btw, the links in the https://www.union.ai/docs/v1/flyte/deployment/configuration-reference/ subpages (scheduler, datacatalog, flyteadmin, propeller) don't point to the page sections but instead refer back to https://www.union.ai/docs/v1/flyte/deployment/configuration-reference/
  • a

    abundant-laptop-47033

    11/04/2025, 9:33 PM
    Hello! Is there a plan to release a 1.16 patch with this fix? We would love to try it out when it's available!
    c
    • 2
    • 4
  • g

    gentle-tomato-480

    11/05/2025, 2:23 PM
    Did flytectl
    v0.9.0
    got removed/deprecated for the
    flytectl-setup-action
    ? I was using that in my CICD and it was still working last week. Today I'm getting:
    Copy code
    Error: Unable to find flytectl version "v0.9.0" for platform "Linux" and architecture "x86_64".
    in my GHA workflow when running this action.
    a
    • 2
    • 2
  • h

    high-autumn-89220

    11/10/2025, 5:08 PM
    hey all, im trying to get flyte working with okta for user + machine to machine auth. has anyone been able to make okta work with the Client Credential (
    ClientSecret
    ) auth type? does anyone know if it will work without custom auth servers on our plan? been struggling with this for a few weeks to no avail
    c
    • 2
    • 5
  • w

    wonderful-continent-24967

    11/12/2025, 12:01 AM
    What could be potential reasons for Cache write error in a Flyte task? I am seeing this error in flyte console -
    Failed to write output for this execution to cache.
    . I looked into datacatalog logs for the corresponding flyte task, nothing unusual there. Datacatalog created, updated & deleted reservations for that task as other tasks. We are using flyte
    1.15.3
    a
    • 2
    • 2
  • f

    fancy-hamburger-89099

    11/12/2025, 10:10 AM
    Hi, I am facing a very strange issue, and I am out of ideas. We have 4 instances of Flyte, all of which are configured the same and run the latest version. Each of them is running on a different cluster, and we route traffic using Ingress Nginx Controller, which is configured in exactly the same way on all clusters. All instances use Azure AD SSO, and all use the same App Registration/credentials. However, for some reason, one of these 4 instances does not work. The issue is that when I access the URL, I get to the login page, and then successfully log in using the Azure AD SSO but after that, every request fails on 400 error
    Copy code
    400 Bad Request
    Request Header Or Cookie Too Large
    nginx
    I tried different browsers, incognito mode, wiping cookies, everything. This only happens on that one instance, and it works without any issues on the other 3. Any ideas?
    a
    c
    • 3
    • 7
  • a

    abundant-judge-84756

    11/12/2025, 11:34 AM
    Hi! 👋 We're running into an issue where executions are stuck in an
    ABORTING
    state and can't be fully terminated. The executions include a dynamic workflow step, and these dynamic workflows show 2 x tasks as
    RUNNING
    - the task descriptions specify they are
    initializing
    . I think these initializing dynamic tasks are somehow blocking the workflows from resolving the abort request. Any suggestions for ways we can trigger these workflows to transition to
    ABORTED
    ? We're currently running flyte
    1.15.3
    .
    c
    • 2
    • 2
  • c

    cool-waitress-85601

    11/12/2025, 3:40 PM
    Hi, is there a way to use podman instead of docker to build images when running
    pyflyte run --remote
    ?
    a
    • 2
    • 8
  • f

    fierce-monitor-77717

    11/13/2025, 12:20 PM
    Hi, is there any plan to support python3.13/14 in flytekit any soon?
    e
    a
    • 3
    • 10
  • c

    cool-waitress-85601

    11/17/2025, 5:41 PM
    Hi everyone, I'm desperately trying to setup flyte-core with an s3 bucket and provide my access key and secret key via a secret. I can't find how to do that, the documentation isn't clear on what form should that secret take and the ai bot ins't helping and giving contradictory and false information. Can someone please provide an example? Thanks a lot
    c
    • 2
    • 17
  • m

    mysterious-painter-66441

    11/17/2025, 9:57 PM
    Hi Flyte Team, I noticed that in Flyte UI, workflow inputs defined as structured types (e.g.,
    dataclass
    ) are displayed as a single opaque field rather than expanding into individual attributes. This makes it unclear to users what values are expected for each field. Could you advise if there’s a recommended approach to make structured inputs more user-friendly in the UI? For example, is there a way to automatically expand fields or provide schema hints for structured types? Thanks for your help!
    • 1
    • 1
  • b

    brash-ram-89454

    11/18/2025, 1:23 PM
    Just a heads up that, Flyte v1 docs are down at the moment: https://www.union.ai/docs/v1/flyte/user-guide/
    b
    a
    f
    • 4
    • 3
  • c

    cool-waitress-85601

    11/18/2025, 8:49 PM
    Hi! I'm trying to figure out if/how it's possible to setup flyte for multi-tenancy, ie. isolate tenants workloads in separate namespaces, without sharing/mounting any global secret, thus relying only on tenant-scoped secrets. Ideally tenant workloads would run under tenant namespace. While there seems to be a way to have propellers per tenants, thus enabling true parallelism, IIUC there doesn't seem to be any way to isolate metadata per tenant, since there's a single s3 configuration shared by admin and all propellers/task executions. Which means sharing the bucket secret with all tenants, which wouldn't fit our requirements. Has anybody any experience/recommendations to share? Thanks a lot
    c
    • 2
    • 21
  • g

    gray-ocean-43286

    11/19/2025, 4:33 PM
    Hello Gents, I am currently working on Flyte to AWS Sagemaker Integration and facing problems with the idempotence_token in the create_sagemaker_deployment method in the flytekitplugin-awssagemaker_inference plugin version 1.16.1. I am currently testing model, ednpoint config and endpoint deployment using the Flyte Sagemaker plugin and passing the idempotence_token=False in the create_sagemaker_deployment method. But the endpoint_config deployment task still keeps expecting the idempotence_token field in it's input (which is the model_creation task's output). Copilot keeps saying this is known bug and I need to set it to True in order resolve it. But when I set it to True, the model_creation task itself fails in Flyte and gives me an error like so - failed to do boto task with error: Could not find the key model_name}-{idempotence_token in {'model_path': 's3://s3-bucket/models/xgboost-model.tar.gz', 'execution_role_arn': 'arnawsiam::account-id:role/app-flyte-sagemaker-executor-role', 'model_name': 'xgboost-diabetes-endpoint-model'}.. Having a tough time figuring this one out. I have tried multiple approaches but all in vain. Anyone who knows what this is all about?
    f
    t
    • 3
    • 3
  • c

    cool-waitress-85601

    11/19/2025, 4:45 PM
    Hello folks, Do you know what metadata go into the project/domain specific bucket vs the global bucket when you use
    raw_output_data_config
    ? For instance the user local code when using fast registration, will it be uploaded to the global or project scoped bucket? More generally, what data would go in the global bucket vs the project scoped bucket? Thanks
  • e

    early-addition-41415

    11/20/2025, 10:21 PM
    in flyte-binary if you are not on aws or is there a way to provide access keys using secrets in helm values, so that aws can be accessed frrom somewhere else
  • e

    early-addition-41415

    11/20/2025, 10:22 PM
    specifically here https://github.com/flyteorg/flyte/blob/master/charts/flyte-binary/values.yaml#L85-L87
  • e

    early-addition-41415

    11/20/2025, 10:23 PM
    we need ti use authtype as accesskey
  • f

    fancy-twilight-30247

    11/21/2025, 10:12 AM
    Hey everyone- I have a question about running multi-node pytorch workflows and error/exception handling. We're currently defining our training task as something like this:
    Copy code
    @task(
        task_config=task_config,
        cache=False,
        container_image=container_image,
        pod_template=pod_template,
        timeout=timeout,
        retries=max_retries,
    )
    def flyte_training_main_task():
      ...
    with the task_config being (note that we don't really need the elastic part of things - we just need to launch a multi-node pytorch task):
    Copy code
    task_config = Elastic(
        nnodes=num_nodes,
        nproc_per_node=8,
    )
    Now imagine that a rank in the distributed training has an error of some sort - is there a way for us to configure our task so that the whole task/workflow is terminated (including all the pods corresponding to it) as soon as a single rank errors? Currently it seems like it requires all the ranks to exit/error until the task/workflow is terminated, which we often don't want (because other ranks might be stuck until NCCL timeout or might be stuck for other reasons). I've tried raising special exception types like
    SignalException
    or
    ChildFailedError
    , but it seems like it always waits until all the ranks exit. One hacky workaround I could think of is to manually terminate the workflow, but that also does not seem ideal. Thanks!!
    👀 1
    f
    t
    • 3
    • 10
  • n

    numerous-hamburger-7178

    11/25/2025, 11:45 PM
    Do newer versions of flyte have pydantic inputs to workflows/tasks show up as something other than structs? I've been using dataclassjsonmixin to get well formatted input in the UI but want to try switching over to pydantic but on a flyte 1.16.2 deployment, an example wf shows up as struct
    g
    f
    l
    • 4
    • 8
  • c

    cool-waitress-85601

    11/26/2025, 1:16 PM
    Hi folks, as anybody tried using Dex as the external authorization server? I'd be interested to hear about it. Thanks
    f
    • 2
    • 1
  • a

    aloof-magazine-44547

    12/01/2025, 10:39 AM
    Hi, can I get some help to merge https://github.com/flyteorg/flytekit/pull/3339? Its about serialising and deserialising models with FlyteFile/FlyteDirectory in them, causing a attribute error. cc @swift-oil-78197
  • t

    thankful-lighter-72752

    12/01/2025, 11:01 PM
    Hello. Is there a recommended approach for removing older intermediate values from s3 that aren't required anymore? I have some large values returned from tasks that are taking up a lot of space. I can use PutBucketLifecycleConfiguration on the s3 side, but currently I don't want the workflow results to expire
    👍 1
    f
    • 2
    • 1
  • p

    proud-napkin-10936

    12/03/2025, 11:16 AM
    Hey everyone. I'm preparing a large batch job (256 parallel tasks), I noticed this "max parallelism" under domain settings in flyte console. • What is this limit exactly? • How can I adjust it? Can't really find anything on it in the (legacy) docs.
    f
    e
    • 3
    • 12
  • w

    wooden-scooter-1097

    12/03/2025, 9:50 PM
    Hi folks, I'm starting my Flyte journey, going through the local install docs. At the point of doing
    flytectl demo start
    , and the
    flyte-sandbox-xxx
    and
    flyteconnector-xxx
    services never get out of Pending state. Looking at the Docker (Rancher) logs show a few x509 ca cert issues as well as some "back-off" entries. Not sure what's going on. I am on the company VPN which I'm not allowed to disable, so if it's a cert issue, not sure how to get around it.
    --admin.insecure
    and
    --admin.insecureSkipVerify
    doesn't help. Ideas? Sample log entries...
    Copy code
    2025-12-03T21:43:57.491204261Z E1203 21:43:57.491163      68 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"local-path-provisioner\" with ErrImagePull: \"failed to pull and unpack image \\\"<http://docker.io/rancher/local-path-provisioner:v0.0.24\\\|docker.io/rancher/local-path-provisioner:v0.0.24\\\>": failed to copy: httpReadSeeker: failed open: failed to do request: Get \\\"<https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/10/10ada9a7f8ab578464314da2df287d1d384c6ef9f474d00dc73bf232599df55f/data?expires=1764801238&signature=KC81Pwa1VNzUPyOJ089%2BQZbYlH4%3D&version=2>\\\": tls: failed to verify certificate: x509: certificate signed by unknown authority\"" pod="kube-system/local-path-provisioner-84db5d44d9-q2chh" podUID="fad13c92-96bd-4cec-b19f-0e9ade5ffb19"
    
    ...
    
    2025-12-03T21:44:05.221227848Z E1203 21:44:05.220969      68 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"coredns\" with ImagePullBackOff: \"Back-off pulling image \\\"rancher/mirrored-coredns-coredns:1.10.1\\\"\"" pod="kube-system/coredns-6799fbcd5-27h25" podUID="1fa7b663-8c6b-492e-a816-d35a29e56e30"
    a
    • 2
    • 1
  • f

    fierce-oil-47448

    12/03/2025, 11:54 PM
    Hello. The
    flytectl
    install instructions mention: •
    curl -sL <https://ctl.flyte.org/install> | bash
    This errors out on Ubunutu:
    Copy code
    flyteorg/flyte info checking GitHub for latest tag
    flyteorg/flyte crit unable to find '' - use 'latest' or see <https://github.com/flyteorg/flyte/releases> for details
    a
    f
    • 3
    • 6
  • h

    handsome-lock-30336

    12/04/2025, 5:00 PM
    hi! how would Flyte most easily support multi-cluster/multi-cloud and compute allocation. I see discussion about volcano plugin here. What's the general direction? cc @freezing-airport-6809 @ancient-apple-95774
    a
    c
    s
    • 4
    • 9
  • n

    nice-hairdresser-45030

    12/05/2025, 2:31 PM
    I have a question about the expected flytepropeller performance with a large number of pods in combination with array node/map task: • Workflow with 4 to 5 map tasks, between 5 to 15k pods existing at the same time. • I'm seeing that propeller sometimes doesn't look at the status of some completed pods for hours (have seen up to 10h) ◦ (I put print statements into plugin manager to see in which phase which resource is evaluated, they are not evaluated despite having completed so this is not related to errors sending update events to admin) • Sometimes the succeeded pods have been garbage collected and propeller treats the "missing" pod as a failure I'm aware that I can prevent the last point with
    inject-finalizer
    to at least get eventual consistency. But my question is whether propeller not evaluating pods for hours in such a scenario is expected or unexpected. I know that I can shard propeller but this would only help me if I break this down into multiple workflows? Any other parameters I can tune so have propeller evaluate the pods earlier? Thank you!
    👀 1
    a
    • 2
    • 17
  • m

    melodic-mechanic-59879

    12/06/2025, 10:45 PM
    Hi!, please how can I load data from a service bus directly and use it in flyte?
  • f

    fierce-farmer-40956

    12/08/2025, 12:29 PM
    Hello, we are getting lots of these error logs:
    Copy code
    duplicate key value violates unique constraint "tasks_pkey" (SQLSTATE 23505)
    and our executions are all in UNKNOWN state. Would you have a pointer?
    a
    • 2
    • 1