https://flyte.org logo
Join Slack
Powered by
# flyte-support
  • b

    billions-hairdresser-78656

    04/24/2025, 9:52 PM
    Hi guys, I'm trying to understand how to configure resource limits in flyte tasks. I ran into the error:
    Copy code
    <jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
    <jemalloc>: (This is the expected behavior if you are running under QEMU)
    Running Execution on Remote.
    Request rejected by the API, due to Invalid input.
    RPC Failed, with Status: StatusCode.INVALID_ARGUMENT 
    Details: Requested MEMORY default [2Gi] is greater than current limit set in the platform configuration [1Gi]. Please contact Flyte Admins to change these limits or consult the configuration
    Researching I see that there are several configMaps that can be modified: • flyte-admin-base-config • flyte-admin-clusters-config • flyte-clusterresourcesync-config Could you tell me some doc or how to correctly configure these parameters?
    a
    • 2
    • 1
  • c

    curved-whale-1505

    04/25/2025, 1:21 AM
    At a high level, I’m curious why do we need to use the Kubeflow Training Operator to start a PyTorchJob for multi node multi gpu support in flyte, rather than supporting this directly as a multi node “Python Task” managed by flyte? Is this something that can be handed by the new JobSet API? https://kubernetes.io/blog/2025/03/23/introducing-jobset/
    f
    c
    • 3
    • 10
  • c

    clean-glass-36808

    04/25/2025, 3:07 AM
    Hello. I am enabling propeller replicas for the first time with leader election and I noticed that followers seem to enqueue workflows without processing them. This was setting off some alerts we setup and seems like a bit of a memory leak even though it is bounded by the configured max queue size. Is this intentional?
    • 1
    • 1
  • c

    cuddly-engine-34540

    04/25/2025, 12:31 PM
    I have two tasks in my workflow:
    Copy code
    @workflow
    def data_processing_workflow(
        trigger_file_s3_uri: str,
        triggering_timestamp: str,
        hub_name: str,
        environment: str,
        poll_interval: int = 60,
        poll_timeout: int = 7200,
        sleep_time: int = 0
    ) -> None:
    
        wait_until_timestamp_task(
            triggering_timestamp=triggering_timestamp,
            sleep_time=sleep_time
        )
    
        should_launch, processing_params = validate_and_get_params_task(
            trigger_file_s3_uri=trigger_file_s3_uri,
            hub_name=hub_name,
            environment=environment
        )
    
    ...
    Notice
    validate_and_get_params_task
    do not receive inputs from
    wait_until_timestamp_task
    , which outputs None.
    Copy code
    def wait_until_timestamp_task(
        triggering_timestamp: str,
        sleep_time: int = 120
    ) -> None:
    ...
    Currently
    validate_and_get_params_task
    and
    wait_until_timestamp_task
    runs in parallel inside
    data_processing_workflow
    How can I make sure
    validate_and_get_params_task
    runs only after
    wait_until_timestamp_task
    , even though the former do not receive inputs from the latter? It worked to alter
    wait_until_timestamp_task
    to output a dummy string then alter
    validate_and_get_params_task
    to receive this dummy string as input, but it seems hacky. Is there another way?
    a
    • 2
    • 3
  • f

    freezing-tailor-85994

    04/25/2025, 6:03 PM
    Is there a way to do multiple map tasks in a row without doing a coalesce/resplit in the middle. My thought is to construct a dataset, chunk it up, build data modules out of each chunk (map task 1) and then run ML inference on each data module (task 2)
    a
    • 2
    • 3
  • e

    echoing-park-83350

    04/25/2025, 9:23 PM
    Hello all! I was hoping to get some help on a, what I believe to be trivial, issue. I currently have a workflow in my local repo that I can successfully run locally against the remote cluster using the following
    pyflyte
    command with respect to project root in my terminal.
    Copy code
    pyflyte run --remote --project some-project --domain development tests/regression/workflows/test_deidentify_workflow.py test_deidentify_clinical_note_file --storage_account 'abc12345' --base_path 'container_name/validation/clinical_note/'
    However when trying to run this same workflow from a local jupyter notebook using the following code against the remote cluster using
    flytekit.FlyteRemote
    (this notebook file is located in the
    root/tests/
    directory of my project):
    Copy code
    from flytekit import Config, FlyteRemote
    from tests.regression.workflows.test_deidentify_workflow import test_deidentify_clinical_note_file
    
    remote = FlyteRemote(
        config=Config.auto(),
        default_project="some-project",
        default_domain="development",
        interactive_mode_enabled=True,
    )
    
    remote.fast_register_workflow(entity=test_deidentify_clinical_note_file)
    
    execution = remote.execute(test_deidentify_clinical_note_file, inputs={"storage_account": "abc12345","base_path": "container-name/validation/clinical_note/"}, wait=True)
    print(execution.outputs)
    I can see the attempt of the workflow execution in the UI but it results in the following ERROR:
    Copy code
    FlyteAssertion: USER:AssertionError: error=Outputs could not be found because the execution ended in failure. Error
    message: Trace:
    
        Traceback (most recent call last):
          File "/usr/local/lib/python3.11/site-packages/flytekit/bin/entrypoint.py", line 163, in _dispatch_execute
            task_def = load_task()
                       ^^^^^^^^^^^
          File "/usr/local/lib/python3.11/site-packages/flytekit/bin/entrypoint.py", line 578, in load_task
            return resolver_obj.load_task(loader_args=resolver_args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          File "/usr/local/lib/python3.11/site-packages/flytekit/core/utils.py", line 312, in wrapper
            return func(*args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^
          File "/usr/local/lib/python3.11/site-packages/flytekit/core/python_auto_container.py", line 332, in load_task
            loaded_data = cloudpickle.load(f)
                          ^^^^^^^^^^^^^^^^^^^
        ModuleNotFoundError: No module named 'tests'
    
    Message:
    
        ModuleNotFoundError: No module named 'tests'
    The module
    tests
    it’s referring to is a local directory located immediately under the root project directory that houses test code and is where the workflow I am trying to run is located (albeit a directories down). It seems like the project code isn’t being packaged and registered correctly before attempting to execute the workflow. I have tried manually setting the sys path to the project root path in the notebook before registering and executing the workflow but that seems to make no difference. I suspect I am misconfiguring
    FlyteRemote
    or need to further configure Jupyter for Flyte usage in some way. Anyone have any insight or could help me solve this problem?
    c
    a
    • 3
    • 5
  • c

    curved-whale-1505

    04/26/2025, 5:27 PM
    does anyone know if flyte supports supports the ACK service controlled for sagemaker in the sagemaker plugin? the plugin docs currently link to the deprecated sagemaker operator for k8s https://www.union.ai/docs/flyte/deployment/flyte-plugins/sagemaker/
    f
    • 2
    • 9
  • c

    curved-whale-1505

    04/27/2025, 8:13 PM
    Quick question before I spend too much time on this -- is there a way to use
    pyflyte
    or
    flytectl
    with the HTTP REST API instead of GRPC API? I haven't been able to figure out how to set
    admin.endpoint
    properly to use the HTTP REST successfully. GRPC works fine when I use
    kubectl
    to forward port
    8089
    and set it
    dns:///localhost:8089
    .
    c
    g
    f
    • 4
    • 9
  • w

    wonderful-continent-24967

    04/29/2025, 10:32 PM
    Hi Flyte team, I need help debugging a stuck workflow. This workflow is stuck for 30+ hrs, its not executing, but the Python Task has status
    UNKNOWN
    and the sub-workflow that contains that task has status
    RUNNING
    . Seems like a similar issue was reported here https://github.com/flyteorg/flyte/issues/3536 . Any pointers on what could be wrong here?
    w
    c
    • 3
    • 3
  • c

    curved-whale-1505

    04/30/2025, 7:19 AM
    Has anyone seen this issue when connecting to flyte GRPC over TLS (i.e insecure: false). I get this error when just running “pyflyte info”. I verified I am able to use grpcurl to hit the server successfully but when using pyflyte I get this error:
    Cannot check peer: missing selected ALPN property.
    c
    • 2
    • 8
  • b

    bland-dress-83134

    04/30/2025, 9:08 AM
    If any flyte contributors/reviewers are awake: I've submitted a small PR that would have avoided an issue we ran into recently https://github.com/flyteorg/flyte/pull/6433
    c
    f
    • 3
    • 3
  • c

    clean-glass-36808

    04/30/2025, 4:47 PM
    Has anyone seen Flyte Propeller fail workflows with this failure?
    Last known status message: AlreadyExists: Event Already Exists, caused by [event has already been sent]
    I can''t tell if this indicates a larger issue or if Flyte Propeller should just be updated to more gracefully handle
    AlreadyExists
    . Going to dig deeper into this to understand if DB state was updated in Flyte Admin but maybe gRPC call failed the first time the event was sent..
    c
    • 2
    • 3
  • h

    helpful-jelly-64228

    05/01/2025, 11:11 AM
    Hi all, Has anyone encountered dict[str, FlyteDirectory] issues? I have a task that passes a dictionary with FlyteDirectories initated using FlyteDirectory(path="s3://...") that doesn't seem to work fine but another input that is passed just as a FlyteDirecotry works fine. When Im checking the inputs from the console, it seems to be not properly serialized.
    Copy code
    {
      "cpus": 16,
      "mesh_result": {
        "instance_info": {
          "instance_id": "...",
          "public_ip": "..."
        },
        "mesh_log": {
          "path": "<s3://flytecfd-bucket/task-data/mesh-dir-6d95f0d05e21457aa117451cbf4ffdfe/mesh/mesh.log>"
        },
        "mesh_dir": {
          "path": "<s3://flytecfd-bucket/task-data/mesh-dir-6d95f0d05e21457aa117451cbf4ffdfe/mesh/constant/polyMesh/>"
        }
      },
      "cases": {
        "case_aoa_01.00": {
          "type": "multi-part blob",
          "uri": "<s3://flytecfd-bucket/task-data/cases-dir-29b204be507a48e79a657657beb1e1f3/case_aoa_01.00/>"
        }
      }
    }
    Here the mesh_log and mesh_dir is part of a dataclass and is working as expected and FlyteDirectory and FlyteFile is initailized the same way.
    Copy code
    @dataclasses.dataclass
    class MeshResult:
        instance_info: InstanceInfo
        mesh_log: FlyteFile
        mesh_dir: FlyteDirectory
    Copy code
    def start_solvers_wf(
        cases: dict[str, FlyteDirectory], mesh_result: MeshResult, cpus: int
    )
    Any idea what could be causing this?
    b
    • 2
    • 2
  • b

    brief-egg-3683

    05/01/2025, 1:52 PM
    Hey all, I want to emit model performance metrics like accuracy, precision, and recall to Prometheus to track model performance over time, we have Prometheus setup for basic flyte metrics but emitting custom stats at task runtime seems like a heavy effort involving using ExecutionParameters statsd client, setting up Prometheus statsd exporter, customizing the statsdconfig, is there any way to return metrics to propeller and have them emitted from flyte backend instead of using statsd from task pods
    f
    • 2
    • 11
  • c

    clean-glass-36808

    05/01/2025, 11:25 PM
    Does anyone know which PR fixed this issue? https://github.com/flyteorg/flyte/issues/5273 We're seeing it in v1.14 but its not clear to me if upgrading to v1.15 will solve it
    c
    • 2
    • 1
  • e

    echoing-account-76888

    05/02/2025, 12:41 AM
    Hi all, I noticed that a large paragraph is missing from the "Contributing to Code" section in the new Flyte docs. Originally, there should be a detailed guideline on how to set up the development environment. Does anyone know where to find those? https://www.union.ai/docs/flyte/community/contribute/contribute-code/ Thanks!
    f
    a
    • 3
    • 3
  • b

    bland-dress-83134

    05/02/2025, 9:44 AM
    I ran into a task being scheduled but never starting due to a system problem in k8s blocking it (a failing mount) -- I can see there's the
    timout
    arg for
    @task
    decorators, but just making sure there isn't a system-wide default that can be configured?
    f
    a
    • 3
    • 10
  • a

    abundant-judge-84756

    05/02/2025, 10:26 AM
    Is there anywhere we can find out more information on the
    webApi
    settings listed in the
    connector.Config
    on this docs page? There's a small amount of info on the page, but not a lot. We're still trying to understand why we're unable to use connectors/agents at scale - as soon as we try to send 1000+ tasks to our connectors, flytepropeller starts to significantly slow down - we see the unprocessed queue depth grow, flytepropeller CPU usage spikes, and the throughput of tasks is very slow. It's not clear whether this is an issue with the connector setup (eg. the number of grpc worker threads?), something to do with the propeller web API, or something else. We're trying to identify which specific settings we need to modify to be able to improve propeller 🤝 connector throughput - any advice would be greatly appreciated 🙏
    d
    • 2
    • 16
  • b

    busy-lawyer-8908

    05/02/2025, 6:02 PM
    Are
    flytekit.Artifact
    entities visible/browsable in the OSS Flyte UI anywhere?
    s
    • 2
    • 1
  • b

    bored-laptop-29637

    05/02/2025, 8:17 PM
    I was wondering if there is away to make sure that tasks that are run as part of an imperative workflow are able to be named in a way that denotes what they do? I currently add tasks in this way
    Copy code
    task_node = wf.add_task(
                            my_task,
                            ...
                        ).with_overrides(name="staging_model_calculation")
    But when I go to the actual flyte execution every task is named just
    my_task
    . Should I be applying this override in a different spot?
    g
    e
    • 3
    • 3
  • c

    curved-whale-1505

    05/03/2025, 3:00 PM
    how do folks handle multiple sets of conflicting dependencies in a monorepo of flyte workflows and tasks? I want to have some shared set of reusable building blocks that relies on a shared set of dependencies (let’s say this just relies on boto3 and nothing else) and then give some way to allow a given workflow to have its own set of additional dependencies (let’s say a specific version of numpy). other workflows may conflict and want a different version of numpy when fast registration occurs, I want both the code of the shared reusable building blocks and the target workflow to end up in s3
    f
    h
    j
    • 4
    • 8
  • s

    sparse-carpenter-66912

    05/05/2025, 7:31 AM
    Is anyone able to download data from S3 with a container specified with
    ImageSpec
    in a
    --remote
    execution? Seems like a standard thing, but I can't get it to work. I described it here in more detail
    g
    • 2
    • 2
  • r

    rapid-artist-48509

    05/05/2025, 4:18 PM
    stumbled upon another docs bug, this page seems to have broken formatting https://www.union.ai/docs/flyte/deployment/flyte-configuration/monitoring/#monitoring-a-flyte-deployment
    g
    • 2
    • 1
  • b

    brave-nail-30599

    05/05/2025, 7:09 PM
    Hi team! We have multiple data processing pipelines currently deployed as separate services on EKS using Helm charts. We're exploring Flyte to potentially: 1. Connect these pipelines into cohesive workflows 2. Better manage their execution lifecycle 3. Handle dependencies between pipelines What's the recommended approach? Should we keep Helm for deployments and use Flyte for orchestration, or migrate everything to be Flyte-native? Looking for best practices in integrating Flyte with existing EKS-deployed services 🤔
    f
    • 2
    • 5
  • n

    nice-kangaroo-62690

    05/06/2025, 9:43 AM
    Hi there 👋 I'm looking into Flyte as the potential backbone for an ML delivery infrastructure, and one of the initial appeal was its multi-language support. We need to combine a variety of processing steps in our pipelines, with only a handful of them requiring to be in Python. As much as possible we'd like to confine Python to where it's unavoidable, and rely on other tech stacks for the rest. However it appears the Java/Scala SDK hasn't been updated since july 2024, and the README states that it is still an Alpha / MVP / unstable. Should we understand that these SDKs are a bit of a dead-end at this point and basically "not really usable" due to being neither integrated with the wider Flyte ecosystem nor documented? Or is there a plan for these SDKs going forward? Are there documentation / testimonies of companies where the multi-language nature of Flyte was leveraged in some way, and why/how? I've looked a bit into the
    ContainerTask
    , but it looks like that would require a lot of contorsions and stack juggling.
    f
    b
    • 3
    • 5
  • g

    gentle-night-59824

    05/07/2025, 12:40 AM
    👋 after upgrading to Flyte backend + flytekit to 1.15, we've recently started seeing some unexpected cache misses for tasks where the signature should be the same when I dived deeper, I noticed this error from datacatalog emitted here which seems to be the cause:
    Copy code
    err missing entity of type Tag with identifier
    so I queried our DB for the tag name, dataset fields in the logs but I do see a row exists in the
    tags
    table, so I'm unsure why datacatalog might report this - has anyone seen this or have ideas? I also looked at our DB metrics too and doesn't seem like there were any latency spikes, and it seems consistent for particular tasks whereas other tasks are able to query cache fine. I wasn't able to identify anything unique about the problematic tasks either, and they all use
    cache_serializable
  • h

    helpful-church-28990

    05/07/2025, 11:36 AM
    Hi Team, We have being using flyte for past 1 year to train and build our model and what we have realised that its s3 memory consumption have grown extensively also this has the direct implication to the cost. So I would like to know is there a simpler way to manage data in s3 bucket. I have setup some lifecycle rules but we have to very careful with deleting the data from s3 as i know flyte store metadata ins s3 as well. As deleting some data might corrupt the flyte system. Have anyone of you find a solution on how can we do the flyte cleanup
  • n

    nutritious-cat-43409

    05/07/2025, 12:18 PM
    Hello!—at Factorial, we’re expanding our Flyte use cases to include scheduled dbt runs. We understand that the dbt plugin requires each model’s SQL and YAML files to be present in the local environment. What’s the best way to package or provision these files for Flyte executions?
    f
    • 2
    • 5
  • c

    crooked-holiday-38139

    05/07/2025, 1:53 PM
    I've been testing the ContainerTask task type, and I would like to be able to pass a file in to a task that runs on a docker container (separate from the main) and get a file in return. We have lots of models we'd like to co-ordinate and each model is dockerised so that we can encapsulate them, and all of them expect to take a file as input and output to a file. However, with a minimal example — https://pastebin.com/PPMbs1uL — reading in a text file that contains the string "Hello World!", reading it and replacing "Hello" with "Goodbye" I end up with an empty file. To debug this, I set the "remove" kwargs to False from the run call in container_task.py#L280. In the docker logs I get:
    Copy code
    cat: /var/inputs/input_file: No such file or directory
    ... which makes sense, the file isn't getting into the docker container. I look through the container_task.py source, and I can see that we bind a mount for the output, but I can't see how the inputs get in to the container, I had assumed that we'd mount two directories, one for inputs and one for outputs. How do inputs get into the ContainerTask? Can a FlyteFile be given as an input to a container?
    f
    • 2
    • 3
  • b

    brainy-carpenter-31280

    05/07/2025, 4:26 PM
    Im currently running into alot of issues with spark on flyte. Which integration works best for big data processing on flyte? Ray, Dask ...
    a
    f
    • 3
    • 24